[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* Kernel-based Virtual Machine driver for Linux
|
|
|
|
*
|
|
|
|
* This module enables machines with Intel VT-x extensions to run virtual
|
|
|
|
* machines without emulation or binary translation.
|
|
|
|
*
|
|
|
|
* MMU support
|
|
|
|
*
|
|
|
|
* Copyright (C) 2006 Qumranet, Inc.
|
2010-10-06 20:23:22 +08:00
|
|
|
* Copyright 2010 Red Hat, Inc. and/or its affiliates.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Yaniv Kamay <yaniv@qumranet.com>
|
|
|
|
* Avi Kivity <avi@qumranet.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2. See
|
|
|
|
* the COPYING file in the top-level directory.
|
|
|
|
*
|
|
|
|
*/
|
2007-06-29 02:15:57 +08:00
|
|
|
|
2010-10-14 17:22:46 +08:00
|
|
|
#include "irq.h"
|
2007-12-14 09:35:10 +08:00
|
|
|
#include "mmu.h"
|
2010-01-21 21:31:49 +08:00
|
|
|
#include "x86.h"
|
2009-06-01 03:58:47 +08:00
|
|
|
#include "kvm_cache_regs.h"
|
2010-10-14 17:22:46 +08:00
|
|
|
#include "x86.h"
|
2007-06-29 02:15:57 +08:00
|
|
|
|
2007-12-16 17:02:48 +08:00
|
|
|
#include <linux/kvm_host.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/highmem.h>
|
|
|
|
#include <linux/module.h>
|
2007-11-26 20:08:14 +08:00
|
|
|
#include <linux/swap.h>
|
2008-02-23 22:44:30 +08:00
|
|
|
#include <linux/hugetlb.h>
|
2008-02-23 01:21:37 +08:00
|
|
|
#include <linux/compiler.h>
|
2009-12-24 00:35:21 +08:00
|
|
|
#include <linux/srcu.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2010-05-31 14:28:19 +08:00
|
|
|
#include <linux/uaccess.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-06-29 02:15:57 +08:00
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/cmpxchg.h>
|
2007-11-21 20:08:40 +08:00
|
|
|
#include <asm/io.h>
|
2008-11-18 05:03:13 +08:00
|
|
|
#include <asm/vmx.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-02-07 20:47:41 +08:00
|
|
|
/*
|
|
|
|
* When setting this variable to true it enables Two-Dimensional-Paging
|
|
|
|
* where the hardware walks 2 page tables:
|
|
|
|
* 1. the guest-virtual to guest-physical
|
|
|
|
* 2. while doing 1. it walks guest-physical to host-physical
|
|
|
|
* If the hardware supports that we don't need to do shadow paging.
|
|
|
|
*/
|
2008-02-23 01:21:37 +08:00
|
|
|
bool tdp_enabled = false;
|
2008-02-07 20:47:41 +08:00
|
|
|
|
2010-08-30 18:22:53 +08:00
|
|
|
enum {
|
|
|
|
AUDIT_PRE_PAGE_FAULT,
|
|
|
|
AUDIT_POST_PAGE_FAULT,
|
|
|
|
AUDIT_PRE_PTE_WRITE,
|
2010-09-27 18:09:29 +08:00
|
|
|
AUDIT_POST_PTE_WRITE,
|
|
|
|
AUDIT_PRE_SYNC,
|
|
|
|
AUDIT_POST_SYNC
|
2010-08-30 18:22:53 +08:00
|
|
|
};
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2010-08-30 18:22:53 +08:00
|
|
|
char *audit_point_name[] = {
|
|
|
|
"pre page fault",
|
|
|
|
"post page fault",
|
|
|
|
"pre pte write",
|
2010-09-27 18:09:29 +08:00
|
|
|
"post pte write",
|
|
|
|
"pre sync",
|
|
|
|
"post sync"
|
2010-08-30 18:22:53 +08:00
|
|
|
};
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2010-08-30 18:22:53 +08:00
|
|
|
#undef MMU_DEBUG
|
2007-01-06 08:36:56 +08:00
|
|
|
|
|
|
|
#ifdef MMU_DEBUG
|
|
|
|
|
|
|
|
#define pgprintk(x...) do { if (dbg) printk(x); } while (0)
|
|
|
|
#define rmap_printk(x...) do { if (dbg) printk(x); } while (0)
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
#define pgprintk(x...) do { } while (0)
|
|
|
|
#define rmap_printk(x...) do { } while (0)
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
2010-08-30 18:22:53 +08:00
|
|
|
#ifdef MMU_DEBUG
|
2008-06-22 21:45:24 +08:00
|
|
|
static int dbg = 0;
|
|
|
|
module_param(dbg, bool, 0644);
|
2007-01-06 08:36:56 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-09-24 00:18:41 +08:00
|
|
|
static int oos_shadow = 1;
|
|
|
|
module_param(oos_shadow, bool, 0644);
|
|
|
|
|
2007-04-25 14:17:25 +08:00
|
|
|
#ifndef MMU_DEBUG
|
|
|
|
#define ASSERT(x) do { } while (0)
|
|
|
|
#else
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define ASSERT(x) \
|
|
|
|
if (!(x)) { \
|
|
|
|
printk(KERN_WARNING "assertion failed %s:%d: %s\n", \
|
|
|
|
__FILE__, __LINE__, #x); \
|
|
|
|
}
|
2007-04-25 14:17:25 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2010-08-22 19:12:48 +08:00
|
|
|
#define PTE_PREFETCH_NUM 8
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PT_FIRST_AVAIL_BITS_SHIFT 9
|
|
|
|
#define PT64_SECOND_AVAIL_BITS_SHIFT 52
|
|
|
|
|
|
|
|
#define PT64_LEVEL_BITS 9
|
|
|
|
|
|
|
|
#define PT64_LEVEL_SHIFT(level) \
|
2007-10-08 21:02:08 +08:00
|
|
|
(PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT64_INDEX(address, level)\
|
|
|
|
(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
|
|
|
|
|
|
|
|
|
|
|
|
#define PT32_LEVEL_BITS 10
|
|
|
|
|
|
|
|
#define PT32_LEVEL_SHIFT(level) \
|
2007-10-08 21:02:08 +08:00
|
|
|
(PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT32_LVL_OFFSET_MASK(level) \
|
|
|
|
(PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT32_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT32_INDEX(address, level)\
|
|
|
|
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
|
|
|
|
|
|
|
|
|
2007-03-09 19:04:31 +08:00
|
|
|
#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PT64_DIR_BASE_ADDR_MASK \
|
|
|
|
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT64_LVL_ADDR_MASK(level) \
|
|
|
|
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT64_LEVEL_BITS))) - 1))
|
|
|
|
#define PT64_LVL_OFFSET_MASK(level) \
|
|
|
|
(PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT64_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
#define PT32_BASE_ADDR_MASK PAGE_MASK
|
|
|
|
#define PT32_DIR_BASE_ADDR_MASK \
|
|
|
|
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT32_LVL_ADDR_MASK(level) \
|
|
|
|
(PAGE_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
|
|
|
|
* PT32_LEVEL_BITS))) - 1))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-21 08:06:21 +08:00
|
|
|
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
|
|
|
|
| PT64_NX_MASK)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-01-06 08:36:38 +08:00
|
|
|
#define RMAP_EXT 4
|
|
|
|
|
2007-12-09 22:15:46 +08:00
|
|
|
#define ACC_EXEC_MASK 1
|
|
|
|
#define ACC_WRITE_MASK PT_WRITABLE_MASK
|
|
|
|
#define ACC_USER_MASK PT_USER_MASK
|
|
|
|
#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
|
|
|
|
|
2009-12-31 18:10:16 +08:00
|
|
|
#include <trace/events/kvm.h>
|
|
|
|
|
2009-07-06 17:21:32 +08:00
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include "mmutrace.h"
|
|
|
|
|
2009-09-24 02:47:17 +08:00
|
|
|
#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
|
|
|
|
|
2008-08-21 22:49:56 +08:00
|
|
|
#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
|
|
|
|
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc {
|
2009-06-10 19:24:23 +08:00
|
|
|
u64 *sptes[RMAP_EXT];
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc *more;
|
|
|
|
};
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
struct kvm_shadow_walk_iterator {
|
|
|
|
u64 addr;
|
|
|
|
hpa_t shadow_addr;
|
|
|
|
int level;
|
|
|
|
u64 *sptep;
|
|
|
|
unsigned index;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define for_each_shadow_entry(_vcpu, _addr, _walker) \
|
|
|
|
for (shadow_walk_init(&(_walker), _vcpu, _addr); \
|
|
|
|
shadow_walk_okay(&(_walker)); \
|
|
|
|
shadow_walk_next(&(_walker)))
|
|
|
|
|
2010-06-11 21:35:15 +08:00
|
|
|
typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte);
|
2008-09-24 00:18:36 +08:00
|
|
|
|
2007-04-15 21:31:09 +08:00
|
|
|
static struct kmem_cache *pte_chain_cache;
|
|
|
|
static struct kmem_cache *rmap_desc_cache;
|
2007-05-30 17:34:53 +08:00
|
|
|
static struct kmem_cache *mmu_page_header_cache;
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
static struct percpu_counter kvm_total_used_mmu_pages;
|
2007-04-15 21:31:09 +08:00
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
static u64 __read_mostly shadow_trap_nonpresent_pte;
|
|
|
|
static u64 __read_mostly shadow_notrap_nonpresent_pte;
|
2008-04-25 21:13:50 +08:00
|
|
|
static u64 __read_mostly shadow_nx_mask;
|
|
|
|
static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
|
|
|
|
static u64 __read_mostly shadow_user_mask;
|
|
|
|
static u64 __read_mostly shadow_accessed_mask;
|
|
|
|
static u64 __read_mostly shadow_dirty_mask;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
static inline u64 rsvd_bits(int s, int e)
|
|
|
|
{
|
|
|
|
return ((1ULL << (e - s + 1)) - 1) << s;
|
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
|
|
|
|
{
|
|
|
|
shadow_trap_nonpresent_pte = trap_pte;
|
|
|
|
shadow_notrap_nonpresent_pte = notrap_pte;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes);
|
|
|
|
|
2008-04-25 21:13:50 +08:00
|
|
|
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
|
2009-04-27 20:35:42 +08:00
|
|
|
u64 dirty_mask, u64 nx_mask, u64 x_mask)
|
2008-04-25 21:13:50 +08:00
|
|
|
{
|
|
|
|
shadow_user_mask = user_mask;
|
|
|
|
shadow_accessed_mask = accessed_mask;
|
|
|
|
shadow_dirty_mask = dirty_mask;
|
|
|
|
shadow_nx_mask = nx_mask;
|
|
|
|
shadow_x_mask = x_mask;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
|
|
|
|
|
2010-05-12 16:48:18 +08:00
|
|
|
static bool is_write_protection(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2009-12-30 00:07:30 +08:00
|
|
|
return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int is_cpuid_PSE36(void)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2007-01-26 16:56:41 +08:00
|
|
|
static int is_nx(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-01-21 21:31:50 +08:00
|
|
|
return vcpu->arch.efer & EFER_NX;
|
2007-01-26 16:56:41 +08:00
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
static int is_shadow_present_pte(u64 pte)
|
|
|
|
{
|
|
|
|
return pte != shadow_trap_nonpresent_pte
|
|
|
|
&& pte != shadow_notrap_nonpresent_pte;
|
|
|
|
}
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
static int is_large_pte(u64 pte)
|
|
|
|
{
|
|
|
|
return pte & PT_PAGE_SIZE_MASK;
|
|
|
|
}
|
|
|
|
|
2010-01-18 17:45:10 +08:00
|
|
|
static int is_writable_pte(unsigned long pte)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
return pte & PT_WRITABLE_MASK;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
static int is_dirty_gpte(unsigned long pte)
|
2007-10-11 18:32:30 +08:00
|
|
|
{
|
2009-06-10 17:56:54 +08:00
|
|
|
return pte & PT_DIRTY_MASK;
|
2007-10-11 18:32:30 +08:00
|
|
|
}
|
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
static int is_rmap_spte(u64 pte)
|
2007-01-06 08:36:38 +08:00
|
|
|
{
|
2008-03-23 18:18:19 +08:00
|
|
|
return is_shadow_present_pte(pte);
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
|
|
|
|
2009-06-10 23:27:03 +08:00
|
|
|
static int is_last_spte(u64 pte, int level)
|
|
|
|
{
|
|
|
|
if (level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
return 1;
|
2009-07-27 22:30:44 +08:00
|
|
|
if (is_large_pte(pte))
|
2009-06-10 23:27:03 +08:00
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
static pfn_t spte_to_pfn(u64 pte)
|
2008-03-23 21:06:23 +08:00
|
|
|
{
|
2008-04-03 03:46:56 +08:00
|
|
|
return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
2008-03-23 21:06:23 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 19:54:47 +08:00
|
|
|
static gfn_t pse36_gfn_delta(u32 gpte)
|
|
|
|
{
|
|
|
|
int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
|
|
|
|
|
|
|
|
return (gpte & PT32_DIR_PSE36_MASK) << shift;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
static void __set_spte(u64 *sptep, u64 spte)
|
2007-05-31 20:46:04 +08:00
|
|
|
{
|
2010-08-07 03:18:11 +08:00
|
|
|
set_64bit(sptep, spte);
|
2007-05-31 20:46:04 +08:00
|
|
|
}
|
|
|
|
|
2010-06-06 19:48:06 +08:00
|
|
|
static u64 __xchg_spte(u64 *sptep, u64 new_spte)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
return xchg(sptep, new_spte);
|
|
|
|
#else
|
|
|
|
u64 old_spte;
|
|
|
|
|
|
|
|
do {
|
|
|
|
old_spte = *sptep;
|
|
|
|
} while (cmpxchg64(sptep, old_spte, new_spte) != old_spte);
|
|
|
|
|
|
|
|
return old_spte;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2010-08-02 16:14:04 +08:00
|
|
|
static bool spte_has_volatile_bits(u64 spte)
|
|
|
|
{
|
|
|
|
if (!shadow_accessed_mask)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (!is_shadow_present_pte(spte))
|
|
|
|
return false;
|
|
|
|
|
2010-08-02 16:15:08 +08:00
|
|
|
if ((spte & shadow_accessed_mask) &&
|
|
|
|
(!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
|
2010-08-02 16:14:04 +08:00
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2010-08-02 16:15:08 +08:00
|
|
|
static bool spte_is_bit_cleared(u64 old_spte, u64 new_spte, u64 bit_mask)
|
|
|
|
{
|
|
|
|
return (old_spte & bit_mask) && !(new_spte & bit_mask);
|
|
|
|
}
|
|
|
|
|
2010-06-06 20:46:44 +08:00
|
|
|
static void update_spte(u64 *sptep, u64 new_spte)
|
|
|
|
{
|
2010-08-02 16:15:08 +08:00
|
|
|
u64 mask, old_spte = *sptep;
|
|
|
|
|
|
|
|
WARN_ON(!is_rmap_spte(new_spte));
|
2010-06-06 20:46:44 +08:00
|
|
|
|
2010-08-02 16:15:08 +08:00
|
|
|
new_spte |= old_spte & shadow_dirty_mask;
|
|
|
|
|
|
|
|
mask = shadow_accessed_mask;
|
|
|
|
if (is_writable_pte(old_spte))
|
|
|
|
mask |= shadow_dirty_mask;
|
|
|
|
|
|
|
|
if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask)
|
2010-06-06 20:46:44 +08:00
|
|
|
__set_spte(sptep, new_spte);
|
2010-08-02 16:15:08 +08:00
|
|
|
else
|
2010-06-06 20:46:44 +08:00
|
|
|
old_spte = __xchg_spte(sptep, new_spte);
|
2010-08-02 16:15:08 +08:00
|
|
|
|
|
|
|
if (!shadow_accessed_mask)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask))
|
|
|
|
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
|
|
|
|
if (spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask))
|
|
|
|
kvm_set_pfn_dirty(spte_to_pfn(old_spte));
|
2010-06-06 20:46:44 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:54 +08:00
|
|
|
static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
struct kmem_cache *base_cache, int min)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
|
|
|
void *obj;
|
|
|
|
|
|
|
|
if (cache->nobjs >= min)
|
2007-01-06 08:36:54 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:53 +08:00
|
|
|
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
|
2007-09-10 16:28:17 +08:00
|
|
|
obj = kmem_cache_zalloc(base_cache, GFP_KERNEL);
|
2007-01-06 08:36:53 +08:00
|
|
|
if (!obj)
|
2007-01-06 08:36:54 +08:00
|
|
|
return -ENOMEM;
|
2007-01-06 08:36:53 +08:00
|
|
|
cache->objects[cache->nobjs++] = obj;
|
|
|
|
}
|
2007-01-06 08:36:54 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
2010-05-13 10:06:02 +08:00
|
|
|
static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
|
|
|
|
struct kmem_cache *cache)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
|
|
|
while (mc->nobjs)
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(cache, mc->objects[--mc->nobjs]);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
2007-07-20 13:18:27 +08:00
|
|
|
static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
int min)
|
2007-07-20 13:18:27 +08:00
|
|
|
{
|
2011-03-04 19:01:10 +08:00
|
|
|
void *page;
|
2007-07-20 13:18:27 +08:00
|
|
|
|
|
|
|
if (cache->nobjs >= min)
|
|
|
|
return 0;
|
|
|
|
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
|
2011-03-04 19:01:10 +08:00
|
|
|
page = (void *)__get_free_page(GFP_KERNEL);
|
2007-07-20 13:18:27 +08:00
|
|
|
if (!page)
|
|
|
|
return -ENOMEM;
|
2011-03-04 19:01:10 +08:00
|
|
|
cache->objects[cache->nobjs++] = page;
|
2007-07-20 13:18:27 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
|
|
|
|
{
|
|
|
|
while (mc->nobjs)
|
2007-07-21 14:06:46 +08:00
|
|
|
free_page((unsigned long)mc->objects[--mc->nobjs]);
|
2007-07-20 13:18:27 +08:00
|
|
|
}
|
|
|
|
|
2007-09-10 16:28:17 +08:00
|
|
|
static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
2007-01-06 08:36:54 +08:00
|
|
|
int r;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
pte_chain_cache, 4);
|
2007-01-06 08:36:54 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache,
|
2010-08-22 19:12:48 +08:00
|
|
|
rmap_desc_cache, 4 + PTE_PREFETCH_NUM);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2007-12-13 23:50:52 +08:00
|
|
|
r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
|
2007-09-10 16:28:17 +08:00
|
|
|
mmu_page_header_cache, 4);
|
2007-01-06 08:36:54 +08:00
|
|
|
out:
|
|
|
|
return r;
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-05-13 10:06:02 +08:00
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache);
|
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache);
|
2007-12-13 23:50:52 +08:00
|
|
|
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
|
2010-05-13 10:06:02 +08:00
|
|
|
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
|
|
|
|
mmu_page_header_cache);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
|
|
|
|
size_t size)
|
|
|
|
{
|
|
|
|
void *p;
|
|
|
|
|
|
|
|
BUG_ON(!mc->nobjs);
|
|
|
|
p = mc->objects[--mc->nobjs];
|
|
|
|
return p;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache,
|
2007-01-06 08:36:53 +08:00
|
|
|
sizeof(struct kvm_pte_chain));
|
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(pte_chain_cache, pc);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache,
|
2007-01-06 08:36:53 +08:00
|
|
|
sizeof(struct kvm_rmap_desc));
|
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
|
2007-01-06 08:36:53 +08:00
|
|
|
{
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(rmap_desc_cache, rd);
|
2007-01-06 08:36:53 +08:00
|
|
|
}
|
|
|
|
|
2010-05-26 16:49:59 +08:00
|
|
|
static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
|
|
|
|
{
|
|
|
|
if (!sp->role.direct)
|
|
|
|
return sp->gfns[index];
|
|
|
|
|
|
|
|
return sp->gfn + (index << ((sp->role.level - 1) * PT64_LEVEL_BITS));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_page_set_gfn(struct kvm_mmu_page *sp, int index, gfn_t gfn)
|
|
|
|
{
|
|
|
|
if (sp->role.direct)
|
|
|
|
BUG_ON(gfn != kvm_mmu_page_get_gfn(sp, index));
|
|
|
|
else
|
|
|
|
sp->gfns[index] = gfn;
|
|
|
|
}
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
/*
|
2010-12-07 11:59:07 +08:00
|
|
|
* Return the pointer to the large page information for a given gfn,
|
|
|
|
* handling slots that are not large page aligned.
|
2008-02-23 22:44:30 +08:00
|
|
|
*/
|
2010-12-07 11:59:07 +08:00
|
|
|
static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
|
|
|
|
struct kvm_memory_slot *slot,
|
|
|
|
int level)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
|
|
|
unsigned long idx;
|
|
|
|
|
2010-07-01 22:00:11 +08:00
|
|
|
idx = (gfn >> KVM_HPAGE_GFN_SHIFT(level)) -
|
|
|
|
(slot->base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
|
2010-12-07 11:59:07 +08:00
|
|
|
return &slot->lpage_info[level - 2][idx];
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void account_shadowed(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2009-07-27 22:30:43 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
2010-12-07 11:59:07 +08:00
|
|
|
struct kvm_lpage_info *linfo;
|
2009-07-27 22:30:43 +08:00
|
|
|
int i;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2010-06-21 16:44:20 +08:00
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
2009-07-27 22:30:43 +08:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
2010-12-07 11:59:07 +08:00
|
|
|
linfo = lpage_info_slot(gfn, slot, i);
|
|
|
|
linfo->write_count += 1;
|
2009-07-27 22:30:43 +08:00
|
|
|
}
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2009-07-27 22:30:43 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
2010-12-07 11:59:07 +08:00
|
|
|
struct kvm_lpage_info *linfo;
|
2009-07-27 22:30:43 +08:00
|
|
|
int i;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2010-06-21 16:44:20 +08:00
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
2009-07-27 22:30:43 +08:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
2010-12-07 11:59:07 +08:00
|
|
|
linfo = lpage_info_slot(gfn, slot, i);
|
|
|
|
linfo->write_count -= 1;
|
|
|
|
WARN_ON(linfo->write_count < 0);
|
2009-07-27 22:30:43 +08:00
|
|
|
}
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
static int has_wrprotected_page(struct kvm *kvm,
|
|
|
|
gfn_t gfn,
|
|
|
|
int level)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
2008-10-03 22:40:32 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
2010-12-07 11:59:07 +08:00
|
|
|
struct kvm_lpage_info *linfo;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2010-06-21 16:44:20 +08:00
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
2008-02-23 22:44:30 +08:00
|
|
|
if (slot) {
|
2010-12-07 11:59:07 +08:00
|
|
|
linfo = lpage_info_slot(gfn, slot, level);
|
|
|
|
return linfo->write_count;
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
2010-01-28 19:37:56 +08:00
|
|
|
unsigned long page_size;
|
2009-07-27 22:30:43 +08:00
|
|
|
int i, ret = 0;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2010-01-28 19:37:56 +08:00
|
|
|
page_size = kvm_host_page_size(kvm, gfn);
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
for (i = PT_PAGE_TABLE_LEVEL;
|
|
|
|
i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
|
|
|
|
if (page_size >= KVM_HPAGE_SIZE(i))
|
|
|
|
ret = i;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2008-09-17 07:54:47 +08:00
|
|
|
return ret;
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
2011-03-09 15:43:00 +08:00
|
|
|
static struct kvm_memory_slot *
|
|
|
|
gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
|
|
|
|
bool no_dirty_log)
|
2008-02-23 22:44:30 +08:00
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
2011-03-09 15:43:00 +08:00
|
|
|
|
|
|
|
slot = gfn_to_memslot(vcpu->kvm, gfn);
|
|
|
|
if (!slot || slot->flags & KVM_MEMSLOT_INVALID ||
|
|
|
|
(no_dirty_log && slot->dirty_bitmap))
|
|
|
|
slot = NULL;
|
|
|
|
|
|
|
|
return slot;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
|
|
|
|
{
|
|
|
|
return gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
|
2011-01-14 07:46:48 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
|
|
|
|
{
|
|
|
|
int host_level, level, max_level;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2009-07-27 22:30:43 +08:00
|
|
|
host_level = host_mapping_level(vcpu->kvm, large_gfn);
|
|
|
|
|
|
|
|
if (host_level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
return host_level;
|
|
|
|
|
2010-01-05 19:02:29 +08:00
|
|
|
max_level = kvm_x86_ops->get_lpage_level() < host_level ?
|
|
|
|
kvm_x86_ops->get_lpage_level() : host_level;
|
|
|
|
|
|
|
|
for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
|
2009-07-27 22:30:43 +08:00
|
|
|
if (has_wrprotected_page(vcpu->kvm, large_gfn, level))
|
|
|
|
break;
|
|
|
|
|
|
|
|
return level - 1;
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
/*
|
|
|
|
* Take gfn and return the reverse mapping to it.
|
|
|
|
*/
|
|
|
|
|
2009-07-27 22:30:42 +08:00
|
|
|
static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
|
2007-09-27 20:11:22 +08:00
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
2010-12-07 11:59:07 +08:00
|
|
|
struct kvm_lpage_info *linfo;
|
2007-09-27 20:11:22 +08:00
|
|
|
|
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
2009-07-27 22:30:42 +08:00
|
|
|
if (likely(level == PT_PAGE_TABLE_LEVEL))
|
2008-02-23 22:44:30 +08:00
|
|
|
return &slot->rmap[gfn - slot->base_gfn];
|
|
|
|
|
2010-12-07 11:59:07 +08:00
|
|
|
linfo = lpage_info_slot(gfn, slot, level);
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2010-12-07 11:59:07 +08:00
|
|
|
return &linfo->rmap_pde;
|
2007-09-27 20:11:22 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:38 +08:00
|
|
|
/*
|
|
|
|
* Reverse mapping data structures:
|
|
|
|
*
|
2007-09-27 20:11:22 +08:00
|
|
|
* If rmapp bit zero is zero, then rmapp point to the shadw page table entry
|
|
|
|
* that points to page_address(page).
|
2007-01-06 08:36:38 +08:00
|
|
|
*
|
2007-09-27 20:11:22 +08:00
|
|
|
* If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
|
|
|
|
* containing more mappings.
|
2009-08-06 02:43:58 +08:00
|
|
|
*
|
|
|
|
* Returns the number of rmap entries before the spte was added or zero if
|
|
|
|
* the spte was not added.
|
|
|
|
*
|
2007-01-06 08:36:38 +08:00
|
|
|
*/
|
2009-07-27 22:30:42 +08:00
|
|
|
static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
2007-01-06 08:36:38 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc *desc;
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp;
|
2009-08-06 02:43:58 +08:00
|
|
|
int i, count = 0;
|
2007-01-06 08:36:38 +08:00
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_rmap_spte(*spte))
|
2009-08-06 02:43:58 +08:00
|
|
|
return count;
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(__pa(spte));
|
2010-05-26 16:49:59 +08:00
|
|
|
kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
|
2009-07-27 22:30:42 +08:00
|
|
|
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
|
2007-09-27 20:11:22 +08:00
|
|
|
if (!*rmapp) {
|
2007-01-06 08:36:38 +08:00
|
|
|
rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = (unsigned long)spte;
|
|
|
|
} else if (!(*rmapp & 1)) {
|
2007-01-06 08:36:38 +08:00
|
|
|
rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte);
|
2007-01-06 08:36:53 +08:00
|
|
|
desc = mmu_alloc_rmap_desc(vcpu);
|
2009-06-10 19:24:23 +08:00
|
|
|
desc->sptes[0] = (u64 *)*rmapp;
|
|
|
|
desc->sptes[1] = spte;
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = (unsigned long)desc | 1;
|
2010-09-18 08:41:02 +08:00
|
|
|
++count;
|
2007-01-06 08:36:38 +08:00
|
|
|
} else {
|
|
|
|
rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2009-06-10 19:24:23 +08:00
|
|
|
while (desc->sptes[RMAP_EXT-1] && desc->more) {
|
2007-01-06 08:36:38 +08:00
|
|
|
desc = desc->more;
|
2009-08-06 02:43:58 +08:00
|
|
|
count += RMAP_EXT;
|
|
|
|
}
|
2009-06-10 19:24:23 +08:00
|
|
|
if (desc->sptes[RMAP_EXT-1]) {
|
2007-01-06 08:36:53 +08:00
|
|
|
desc->more = mmu_alloc_rmap_desc(vcpu);
|
2007-01-06 08:36:38 +08:00
|
|
|
desc = desc->more;
|
|
|
|
}
|
2009-06-10 19:24:23 +08:00
|
|
|
for (i = 0; desc->sptes[i]; ++i)
|
2010-09-18 08:41:02 +08:00
|
|
|
++count;
|
2009-06-10 19:24:23 +08:00
|
|
|
desc->sptes[i] = spte;
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
2009-08-06 02:43:58 +08:00
|
|
|
return count;
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
static void rmap_desc_remove_entry(unsigned long *rmapp,
|
2007-01-06 08:36:38 +08:00
|
|
|
struct kvm_rmap_desc *desc,
|
|
|
|
int i,
|
|
|
|
struct kvm_rmap_desc *prev_desc)
|
|
|
|
{
|
|
|
|
int j;
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j)
|
2007-01-06 08:36:38 +08:00
|
|
|
;
|
2009-06-10 19:24:23 +08:00
|
|
|
desc->sptes[i] = desc->sptes[j];
|
|
|
|
desc->sptes[j] = NULL;
|
2007-01-06 08:36:38 +08:00
|
|
|
if (j != 0)
|
|
|
|
return;
|
|
|
|
if (!prev_desc && !desc->more)
|
2009-06-10 19:24:23 +08:00
|
|
|
*rmapp = (unsigned long)desc->sptes[0];
|
2007-01-06 08:36:38 +08:00
|
|
|
else
|
|
|
|
if (prev_desc)
|
|
|
|
prev_desc->more = desc->more;
|
|
|
|
else
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = (unsigned long)desc->more | 1;
|
2007-07-17 18:04:56 +08:00
|
|
|
mmu_free_rmap_desc(desc);
|
2007-01-06 08:36:38 +08:00
|
|
|
}
|
|
|
|
|
2007-09-27 20:11:22 +08:00
|
|
|
static void rmap_remove(struct kvm *kvm, u64 *spte)
|
2007-01-06 08:36:38 +08:00
|
|
|
{
|
|
|
|
struct kvm_rmap_desc *desc;
|
|
|
|
struct kvm_rmap_desc *prev_desc;
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-05-26 16:49:59 +08:00
|
|
|
gfn_t gfn;
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp;
|
2007-01-06 08:36:38 +08:00
|
|
|
int i;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(__pa(spte));
|
2010-05-26 16:49:59 +08:00
|
|
|
gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
|
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
|
2007-09-27 20:11:22 +08:00
|
|
|
if (!*rmapp) {
|
2010-07-27 11:21:18 +08:00
|
|
|
printk(KERN_ERR "rmap_remove: %p 0->BUG\n", spte);
|
2007-01-06 08:36:38 +08:00
|
|
|
BUG();
|
2007-09-27 20:11:22 +08:00
|
|
|
} else if (!(*rmapp & 1)) {
|
2010-07-27 11:21:18 +08:00
|
|
|
rmap_printk("rmap_remove: %p 1->0\n", spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
if ((u64 *)*rmapp != spte) {
|
2010-07-27 11:21:18 +08:00
|
|
|
printk(KERN_ERR "rmap_remove: %p 1->BUG\n", spte);
|
2007-01-06 08:36:38 +08:00
|
|
|
BUG();
|
|
|
|
}
|
2007-09-27 20:11:22 +08:00
|
|
|
*rmapp = 0;
|
2007-01-06 08:36:38 +08:00
|
|
|
} else {
|
2010-07-27 11:21:18 +08:00
|
|
|
rmap_printk("rmap_remove: %p many->many\n", spte);
|
2007-09-27 20:11:22 +08:00
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
2007-01-06 08:36:38 +08:00
|
|
|
prev_desc = NULL;
|
|
|
|
while (desc) {
|
2009-06-10 19:24:23 +08:00
|
|
|
for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i)
|
|
|
|
if (desc->sptes[i] == spte) {
|
2007-09-27 20:11:22 +08:00
|
|
|
rmap_desc_remove_entry(rmapp,
|
2007-01-06 08:36:53 +08:00
|
|
|
desc, i,
|
2007-01-06 08:36:38 +08:00
|
|
|
prev_desc);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
prev_desc = desc;
|
|
|
|
desc = desc->more;
|
|
|
|
}
|
2010-07-27 11:21:18 +08:00
|
|
|
pr_err("rmap_remove: %p many->many\n", spte);
|
2007-01-06 08:36:38 +08:00
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-10-25 21:58:22 +08:00
|
|
|
static int set_spte_track_bits(u64 *sptep, u64 new_spte)
|
2010-06-06 19:31:27 +08:00
|
|
|
{
|
2010-06-06 19:38:12 +08:00
|
|
|
pfn_t pfn;
|
2010-07-16 11:30:18 +08:00
|
|
|
u64 old_spte = *sptep;
|
|
|
|
|
2010-08-02 16:14:04 +08:00
|
|
|
if (!spte_has_volatile_bits(old_spte))
|
2010-07-16 11:30:18 +08:00
|
|
|
__set_spte(sptep, new_spte);
|
2010-08-02 16:14:04 +08:00
|
|
|
else
|
2010-07-16 11:30:18 +08:00
|
|
|
old_spte = __xchg_spte(sptep, new_spte);
|
2010-06-06 19:38:12 +08:00
|
|
|
|
2010-06-06 19:48:06 +08:00
|
|
|
if (!is_rmap_spte(old_spte))
|
2010-10-25 21:58:22 +08:00
|
|
|
return 0;
|
2010-08-02 16:14:04 +08:00
|
|
|
|
2010-06-06 19:48:06 +08:00
|
|
|
pfn = spte_to_pfn(old_spte);
|
2010-07-16 11:23:04 +08:00
|
|
|
if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
|
2010-06-06 19:38:12 +08:00
|
|
|
kvm_set_pfn_accessed(pfn);
|
2010-08-02 16:15:08 +08:00
|
|
|
if (!shadow_dirty_mask || (old_spte & shadow_dirty_mask))
|
2010-06-06 19:38:12 +08:00
|
|
|
kvm_set_pfn_dirty(pfn);
|
2010-10-25 21:58:22 +08:00
|
|
|
return 1;
|
2010-07-16 11:28:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
|
|
|
|
{
|
2010-10-25 21:58:22 +08:00
|
|
|
if (set_spte_track_bits(sptep, new_spte))
|
|
|
|
rmap_remove(kvm, sptep);
|
2010-06-06 19:31:27 +08:00
|
|
|
}
|
|
|
|
|
2007-10-16 20:42:30 +08:00
|
|
|
static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
|
2007-01-06 08:36:43 +08:00
|
|
|
{
|
|
|
|
struct kvm_rmap_desc *desc;
|
2007-10-16 20:42:30 +08:00
|
|
|
u64 *prev_spte;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!*rmapp)
|
|
|
|
return NULL;
|
|
|
|
else if (!(*rmapp & 1)) {
|
|
|
|
if (!spte)
|
|
|
|
return (u64 *)*rmapp;
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
|
|
|
|
prev_spte = NULL;
|
|
|
|
while (desc) {
|
2009-06-10 19:24:23 +08:00
|
|
|
for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) {
|
2007-10-16 20:42:30 +08:00
|
|
|
if (prev_spte == spte)
|
2009-06-10 19:24:23 +08:00
|
|
|
return desc->sptes[i];
|
|
|
|
prev_spte = desc->sptes[i];
|
2007-10-16 20:42:30 +08:00
|
|
|
}
|
|
|
|
desc = desc->more;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:03 +08:00
|
|
|
static int rmap_write_protect(struct kvm *kvm, u64 gfn)
|
2007-10-16 20:42:30 +08:00
|
|
|
{
|
2007-09-27 20:11:22 +08:00
|
|
|
unsigned long *rmapp;
|
2007-01-06 08:36:43 +08:00
|
|
|
u64 *spte;
|
2009-07-27 22:30:42 +08:00
|
|
|
int i, write_protected = 0;
|
2007-01-06 08:36:43 +08:00
|
|
|
|
2009-07-27 22:30:42 +08:00
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL);
|
2007-01-06 08:36:43 +08:00
|
|
|
|
2007-10-16 20:42:30 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
2007-01-06 08:36:43 +08:00
|
|
|
BUG_ON(!spte);
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(*spte)) {
|
2010-06-06 20:46:44 +08:00
|
|
|
update_spte(spte, *spte & ~PT_WRITABLE_MASK);
|
2007-12-18 06:08:27 +08:00
|
|
|
write_protected = 1;
|
|
|
|
}
|
2007-10-16 20:43:46 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
2007-01-06 08:36:43 +08:00
|
|
|
}
|
2008-03-21 00:17:24 +08:00
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
/* check for huge page mappings */
|
2009-07-27 22:30:42 +08:00
|
|
|
for (i = PT_DIRECTORY_LEVEL;
|
|
|
|
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
|
|
|
rmapp = gfn_to_rmap(kvm, gfn, i);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
BUG_ON(!spte);
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
|
|
|
|
pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(*spte)) {
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(kvm, spte,
|
|
|
|
shadow_trap_nonpresent_pte);
|
2009-07-27 22:30:42 +08:00
|
|
|
--kvm->stat.lpages;
|
|
|
|
spte = NULL;
|
|
|
|
write_protected = 1;
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:03 +08:00
|
|
|
return write_protected;
|
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int need_tlb_flush = 0;
|
|
|
|
|
|
|
|
while ((spte = rmap_next(kvm, rmapp, NULL))) {
|
|
|
|
BUG_ON(!(*spte & PT_PRESENT_MASK));
|
|
|
|
rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(kvm, spte, shadow_trap_nonpresent_pte);
|
2008-07-25 22:24:52 +08:00
|
|
|
need_tlb_flush = 1;
|
|
|
|
}
|
|
|
|
return need_tlb_flush;
|
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2009-09-24 02:47:18 +08:00
|
|
|
{
|
|
|
|
int need_flush = 0;
|
2010-07-16 11:28:09 +08:00
|
|
|
u64 *spte, new_spte;
|
2009-09-24 02:47:18 +08:00
|
|
|
pte_t *ptep = (pte_t *)data;
|
|
|
|
pfn_t new_pfn;
|
|
|
|
|
|
|
|
WARN_ON(pte_huge(*ptep));
|
|
|
|
new_pfn = pte_pfn(*ptep);
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
BUG_ON(!is_shadow_present_pte(*spte));
|
|
|
|
rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
|
|
|
|
need_flush = 1;
|
|
|
|
if (pte_write(*ptep)) {
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(kvm, spte, shadow_trap_nonpresent_pte);
|
2009-09-24 02:47:18 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
} else {
|
|
|
|
new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
|
|
|
|
new_spte |= (u64)new_pfn << PAGE_SHIFT;
|
|
|
|
|
|
|
|
new_spte &= ~PT_WRITABLE_MASK;
|
|
|
|
new_spte &= ~SPTE_HOST_WRITEABLE;
|
2010-06-06 20:46:44 +08:00
|
|
|
new_spte &= ~shadow_accessed_mask;
|
2010-07-16 11:28:09 +08:00
|
|
|
set_spte_track_bits(spte, new_spte);
|
2009-09-24 02:47:18 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (need_flush)
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
|
|
|
|
unsigned long data,
|
2009-09-24 02:47:18 +08:00
|
|
|
int (*handler)(struct kvm *kvm, unsigned long *rmapp,
|
2009-10-09 19:42:56 +08:00
|
|
|
unsigned long data))
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
2009-07-27 22:30:44 +08:00
|
|
|
int i, j;
|
2009-12-31 18:10:16 +08:00
|
|
|
int ret;
|
2008-07-25 22:24:52 +08:00
|
|
|
int retval = 0;
|
2009-12-24 00:35:21 +08:00
|
|
|
struct kvm_memslots *slots;
|
|
|
|
|
2010-04-19 17:41:23 +08:00
|
|
|
slots = kvm_memslots(kvm);
|
2008-07-25 22:24:52 +08:00
|
|
|
|
2009-12-24 00:35:16 +08:00
|
|
|
for (i = 0; i < slots->nmemslots; i++) {
|
|
|
|
struct kvm_memory_slot *memslot = &slots->memslots[i];
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long start = memslot->userspace_addr;
|
|
|
|
unsigned long end;
|
|
|
|
|
|
|
|
end = start + (memslot->npages << PAGE_SHIFT);
|
|
|
|
if (hva >= start && hva < end) {
|
|
|
|
gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
|
2010-12-07 11:59:07 +08:00
|
|
|
gfn_t gfn = memslot->base_gfn + gfn_offset;
|
2009-07-27 22:30:44 +08:00
|
|
|
|
2009-12-31 18:10:16 +08:00
|
|
|
ret = handler(kvm, &memslot->rmap[gfn_offset], data);
|
2009-07-27 22:30:44 +08:00
|
|
|
|
|
|
|
for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
|
2010-12-07 11:59:07 +08:00
|
|
|
struct kvm_lpage_info *linfo;
|
|
|
|
|
|
|
|
linfo = lpage_info_slot(gfn, memslot,
|
|
|
|
PT_DIRECTORY_LEVEL + j);
|
|
|
|
ret |= handler(kvm, &linfo->rmap_pde, data);
|
2009-07-27 22:30:44 +08:00
|
|
|
}
|
2009-12-31 18:10:16 +08:00
|
|
|
trace_kvm_age_page(hva, memslot, ret);
|
|
|
|
retval |= ret;
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
2009-09-24 02:47:18 +08:00
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
|
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
|
|
|
|
{
|
2009-10-09 19:42:56 +08:00
|
|
|
kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
|
2009-10-09 19:42:56 +08:00
|
|
|
static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int young = 0;
|
|
|
|
|
2010-02-04 05:11:03 +08:00
|
|
|
/*
|
|
|
|
* Emulate the accessed bit for EPT, by checking if this page has
|
|
|
|
* an EPT mapping, and clearing it if it does. On the next access,
|
|
|
|
* a new EPT mapping will be established.
|
|
|
|
* This has some overhead, but not as much as the cost of swapping
|
|
|
|
* out actively used pages or breaking up actively used hugepages.
|
|
|
|
*/
|
2008-09-08 15:12:30 +08:00
|
|
|
if (!shadow_accessed_mask)
|
2010-02-04 05:11:03 +08:00
|
|
|
return kvm_unmap_rmapp(kvm, rmapp, data);
|
2008-09-08 15:12:30 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
int _young;
|
|
|
|
u64 _spte = *spte;
|
|
|
|
BUG_ON(!(_spte & PT_PRESENT_MASK));
|
|
|
|
_young = _spte & PT_ACCESSED_MASK;
|
|
|
|
if (_young) {
|
|
|
|
young = 1;
|
|
|
|
clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2011-01-14 07:47:10 +08:00
|
|
|
static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|
|
|
unsigned long data)
|
|
|
|
{
|
|
|
|
u64 *spte;
|
|
|
|
int young = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If there's no access bit in the secondary pte set by the
|
|
|
|
* hardware it's up to gup-fast/gup to set the access bit in
|
|
|
|
* the primary pte or in the page structure.
|
|
|
|
*/
|
|
|
|
if (!shadow_accessed_mask)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
spte = rmap_next(kvm, rmapp, NULL);
|
|
|
|
while (spte) {
|
|
|
|
u64 _spte = *spte;
|
|
|
|
BUG_ON(!(_spte & PT_PRESENT_MASK));
|
|
|
|
young = _spte & PT_ACCESSED_MASK;
|
|
|
|
if (young) {
|
|
|
|
young = 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
spte = rmap_next(kvm, rmapp, spte);
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2009-08-06 02:43:58 +08:00
|
|
|
#define RMAP_RECYCLE_THRESHOLD 1000
|
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
|
2009-08-06 02:43:58 +08:00
|
|
|
{
|
|
|
|
unsigned long *rmapp;
|
2009-07-27 22:30:44 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
sp = page_header(__pa(spte));
|
2009-08-06 02:43:58 +08:00
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
|
2009-08-06 02:43:58 +08:00
|
|
|
|
2009-09-24 02:47:18 +08:00
|
|
|
kvm_unmap_rmapp(vcpu->kvm, rmapp, 0);
|
2009-08-06 02:43:58 +08:00
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
}
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
int kvm_age_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
2009-09-24 02:47:18 +08:00
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
|
2011-01-14 07:47:10 +08:00
|
|
|
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
|
|
|
|
{
|
|
|
|
return kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp);
|
|
|
|
}
|
|
|
|
|
2007-04-25 14:17:25 +08:00
|
|
|
#ifdef MMU_DEBUG
|
2007-05-06 20:50:58 +08:00
|
|
|
static int is_empty_shadow_page(u64 *spt)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-01-06 08:36:50 +08:00
|
|
|
u64 *pos;
|
|
|
|
u64 *end;
|
|
|
|
|
2007-05-06 20:50:58 +08:00
|
|
|
for (pos = spt, end = pos + PAGE_SIZE / sizeof(u64); pos != end; pos++)
|
2008-05-20 21:21:13 +08:00
|
|
|
if (is_shadow_present_pte(*pos)) {
|
2008-03-04 04:59:56 +08:00
|
|
|
printk(KERN_ERR "%s: %p %llx\n", __func__,
|
2007-01-06 08:36:50 +08:00
|
|
|
pos, *pos);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:50 +08:00
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 1;
|
|
|
|
}
|
2007-04-25 14:17:25 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
/*
|
|
|
|
* This value is the sum of all of the kvm instances's
|
|
|
|
* kvm->arch.n_used_mmu_pages values. We need a global,
|
|
|
|
* aggregate version in order to make the slab shrinker
|
|
|
|
* faster
|
|
|
|
*/
|
|
|
|
static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
|
|
|
|
{
|
|
|
|
kvm->arch.n_used_mmu_pages += nr;
|
|
|
|
percpu_counter_add(&kvm_total_used_mmu_pages, nr);
|
|
|
|
}
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2007-01-06 08:36:49 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
ASSERT(is_empty_shadow_page(sp->spt));
|
2010-06-04 21:53:54 +08:00
|
|
|
hlist_del(&sp->hash_link);
|
2007-11-21 21:28:32 +08:00
|
|
|
list_del(&sp->link);
|
2011-03-04 19:01:10 +08:00
|
|
|
free_page((unsigned long)sp->spt);
|
2010-05-26 16:49:59 +08:00
|
|
|
if (!sp->role.direct)
|
2011-03-04 19:01:10 +08:00
|
|
|
free_page((unsigned long)sp->gfns);
|
2010-05-13 10:06:02 +08:00
|
|
|
kmem_cache_free(mmu_page_header_cache, sp);
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
kvm_mod_used_mmu_pages(kvm, -1);
|
2007-01-06 08:36:49 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
static unsigned kvm_page_table_hashfn(gfn_t gfn)
|
|
|
|
{
|
2008-01-07 19:20:25 +08:00
|
|
|
return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:42 +08:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
|
2010-05-26 16:49:59 +08:00
|
|
|
u64 *parent_pte, int direct)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
|
|
|
|
sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
|
2010-05-26 16:49:59 +08:00
|
|
|
if (!direct)
|
|
|
|
sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
|
|
|
|
PAGE_SIZE);
|
2007-11-21 21:28:32 +08:00
|
|
|
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
|
2007-12-14 10:01:48 +08:00
|
|
|
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
|
2008-10-16 17:30:57 +08:00
|
|
|
bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->multimapped = 0;
|
|
|
|
sp->parent_pte = parent_pte;
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
kvm_mod_used_mmu_pages(vcpu->kvm, +1);
|
2007-11-21 21:28:32 +08:00
|
|
|
return sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:53 +08:00
|
|
|
static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp, u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!parent_pte)
|
|
|
|
return;
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp->multimapped) {
|
|
|
|
u64 *old = sp->parent_pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
|
|
|
|
if (!old) {
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->parent_pte = parent_pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return;
|
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
sp->multimapped = 1;
|
2007-01-06 08:36:53 +08:00
|
|
|
pte_chain = mmu_alloc_pte_chain(vcpu);
|
2007-11-21 21:28:32 +08:00
|
|
|
INIT_HLIST_HEAD(&sp->parent_ptes);
|
|
|
|
hlist_add_head(&pte_chain->link, &sp->parent_ptes);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
pte_chain->parent_ptes[0] = old;
|
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1])
|
|
|
|
continue;
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i)
|
|
|
|
if (!pte_chain->parent_ptes[i]) {
|
|
|
|
pte_chain->parent_ptes[i] = parent_pte;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
2007-01-06 08:36:53 +08:00
|
|
|
pte_chain = mmu_alloc_pte_chain(vcpu);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
BUG_ON(!pte_chain);
|
2007-11-21 21:28:32 +08:00
|
|
|
hlist_add_head(&pte_chain->link, &sp->parent_ptes);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
pte_chain->parent_ptes[0] = parent_pte;
|
|
|
|
}
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
u64 *parent_pte)
|
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
int i;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp->multimapped) {
|
|
|
|
BUG_ON(sp->parent_pte != parent_pte);
|
|
|
|
sp->parent_pte = NULL;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return;
|
|
|
|
}
|
2007-11-21 21:28:32 +08:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
|
|
|
if (!pte_chain->parent_ptes[i])
|
|
|
|
break;
|
|
|
|
if (pte_chain->parent_ptes[i] != parent_pte)
|
|
|
|
continue;
|
2007-01-06 08:36:46 +08:00
|
|
|
while (i + 1 < NR_PTE_CHAIN_ENTRIES
|
|
|
|
&& pte_chain->parent_ptes[i + 1]) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
pte_chain->parent_ptes[i]
|
|
|
|
= pte_chain->parent_ptes[i + 1];
|
|
|
|
++i;
|
|
|
|
}
|
|
|
|
pte_chain->parent_ptes[i] = NULL;
|
2007-01-06 08:36:46 +08:00
|
|
|
if (i == 0) {
|
|
|
|
hlist_del(&pte_chain->link);
|
2007-07-17 18:04:56 +08:00
|
|
|
mmu_free_pte_chain(pte_chain);
|
2007-11-21 21:28:32 +08:00
|
|
|
if (hlist_empty(&sp->parent_ptes)) {
|
|
|
|
sp->multimapped = 0;
|
|
|
|
sp->parent_pte = NULL;
|
2007-01-06 08:36:46 +08:00
|
|
|
}
|
|
|
|
}
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
2010-04-16 21:29:17 +08:00
|
|
|
static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn)
|
2008-09-24 00:18:36 +08:00
|
|
|
{
|
|
|
|
struct kvm_pte_chain *pte_chain;
|
|
|
|
struct hlist_node *node;
|
|
|
|
struct kvm_mmu_page *parent_sp;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!sp->multimapped && sp->parent_pte) {
|
|
|
|
parent_sp = page_header(__pa(sp->parent_pte));
|
2010-06-11 21:35:15 +08:00
|
|
|
fn(parent_sp, sp->parent_pte);
|
2008-09-24 00:18:36 +08:00
|
|
|
return;
|
|
|
|
}
|
2010-06-11 21:35:15 +08:00
|
|
|
|
2008-09-24 00:18:36 +08:00
|
|
|
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|
|
|
|
for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|
2010-06-11 21:35:15 +08:00
|
|
|
u64 *spte = pte_chain->parent_ptes[i];
|
|
|
|
|
|
|
|
if (!spte)
|
2008-09-24 00:18:36 +08:00
|
|
|
break;
|
2010-06-11 21:35:15 +08:00
|
|
|
parent_sp = page_header(__pa(spte));
|
|
|
|
fn(parent_sp, spte);
|
2008-09-24 00:18:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-06-11 21:35:15 +08:00
|
|
|
static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte);
|
|
|
|
static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
|
2008-09-24 00:18:40 +08:00
|
|
|
{
|
2010-06-11 21:35:15 +08:00
|
|
|
mmu_parent_walk(sp, mark_unsync);
|
2008-09-24 00:18:40 +08:00
|
|
|
}
|
|
|
|
|
2010-06-11 21:35:15 +08:00
|
|
|
static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte)
|
2008-09-24 00:18:40 +08:00
|
|
|
{
|
2010-06-11 21:35:15 +08:00
|
|
|
unsigned int index;
|
2008-09-24 00:18:40 +08:00
|
|
|
|
2010-06-11 21:35:15 +08:00
|
|
|
index = spte - sp->spt;
|
|
|
|
if (__test_and_set_bit(index, sp->unsync_child_bitmap))
|
2008-09-24 00:18:40 +08:00
|
|
|
return;
|
2010-06-11 21:35:15 +08:00
|
|
|
if (sp->unsync_children++)
|
2008-09-24 00:18:40 +08:00
|
|
|
return;
|
2010-06-11 21:35:15 +08:00
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
2008-09-24 00:18:40 +08:00
|
|
|
}
|
|
|
|
|
2008-05-29 19:55:03 +08:00
|
|
|
static void nonpaging_prefetch_page(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
|
|
|
|
sp->spt[i] = shadow_trap_nonpresent_pte;
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:33 +08:00
|
|
|
static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
|
2010-11-19 17:04:03 +08:00
|
|
|
struct kvm_mmu_page *sp)
|
2008-09-24 00:18:33 +08:00
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:35 +08:00
|
|
|
static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
#define KVM_PAGE_ARRAY_NR 16
|
|
|
|
|
|
|
|
struct kvm_mmu_pages {
|
|
|
|
struct mmu_page_and_offset {
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
unsigned int idx;
|
|
|
|
} page[KVM_PAGE_ARRAY_NR];
|
|
|
|
unsigned int nr;
|
|
|
|
};
|
|
|
|
|
2008-09-24 00:18:40 +08:00
|
|
|
#define for_each_unsync_children(bitmap, idx) \
|
|
|
|
for (idx = find_first_bit(bitmap, 512); \
|
|
|
|
idx < 512; \
|
|
|
|
idx = find_next_bit(bitmap, 512, idx+1))
|
|
|
|
|
2009-02-21 09:19:13 +08:00
|
|
|
static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
|
|
|
|
int idx)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
int i;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
if (sp->unsync)
|
|
|
|
for (i=0; i < pvec->nr; i++)
|
|
|
|
if (pvec->page[i].sp == sp)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
pvec->page[pvec->nr].sp = sp;
|
|
|
|
pvec->page[pvec->nr].idx = idx;
|
|
|
|
pvec->nr++;
|
|
|
|
return (pvec->nr == KVM_PAGE_ARRAY_NR);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
|
|
|
{
|
|
|
|
int i, ret, nr_unsync_leaf = 0;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-09-24 00:18:40 +08:00
|
|
|
for_each_unsync_children(sp->unsync_child_bitmap, i) {
|
2010-06-11 21:34:04 +08:00
|
|
|
struct kvm_mmu_page *child;
|
2008-09-24 00:18:39 +08:00
|
|
|
u64 ent = sp->spt[i];
|
|
|
|
|
2010-06-11 21:34:04 +08:00
|
|
|
if (!is_shadow_present_pte(ent) || is_large_pte(ent))
|
|
|
|
goto clear_child_bitmap;
|
|
|
|
|
|
|
|
child = page_header(ent & PT64_BASE_ADDR_MASK);
|
|
|
|
|
|
|
|
if (child->unsync_children) {
|
|
|
|
if (mmu_pages_add(pvec, child, i))
|
|
|
|
return -ENOSPC;
|
|
|
|
|
|
|
|
ret = __mmu_unsync_walk(child, pvec);
|
|
|
|
if (!ret)
|
|
|
|
goto clear_child_bitmap;
|
|
|
|
else if (ret > 0)
|
|
|
|
nr_unsync_leaf += ret;
|
|
|
|
else
|
|
|
|
return ret;
|
|
|
|
} else if (child->unsync) {
|
|
|
|
nr_unsync_leaf++;
|
|
|
|
if (mmu_pages_add(pvec, child, i))
|
|
|
|
return -ENOSPC;
|
|
|
|
} else
|
|
|
|
goto clear_child_bitmap;
|
|
|
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
clear_child_bitmap:
|
|
|
|
__clear_bit(i, sp->unsync_child_bitmap);
|
|
|
|
sp->unsync_children--;
|
|
|
|
WARN_ON((int)sp->unsync_children < 0);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
return nr_unsync_leaf;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_unsync_walk(struct kvm_mmu_page *sp,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
|
|
|
{
|
|
|
|
if (!sp->unsync_children)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
mmu_pages_add(pvec, sp, 0);
|
|
|
|
return __mmu_unsync_walk(sp, pvec);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
WARN_ON(!sp->unsync);
|
2010-04-28 11:55:06 +08:00
|
|
|
trace_kvm_mmu_sync_page(sp);
|
2008-09-24 00:18:39 +08:00
|
|
|
sp->unsync = 0;
|
|
|
|
--kvm->stat.mmu_unsync;
|
|
|
|
}
|
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
struct list_head *invalid_list);
|
|
|
|
static void kvm_mmu_commit_zap_page(struct kvm *kvm,
|
|
|
|
struct list_head *invalid_list);
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2010-06-04 21:56:11 +08:00
|
|
|
#define for_each_gfn_sp(kvm, sp, gfn, pos) \
|
|
|
|
hlist_for_each_entry(sp, pos, \
|
2010-06-04 21:53:07 +08:00
|
|
|
&(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \
|
|
|
|
if ((sp)->gfn != (gfn)) {} else
|
|
|
|
|
2010-06-04 21:56:11 +08:00
|
|
|
#define for_each_gfn_indirect_valid_sp(kvm, sp, gfn, pos) \
|
|
|
|
hlist_for_each_entry(sp, pos, \
|
2010-06-04 21:53:07 +08:00
|
|
|
&(kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)], hash_link) \
|
|
|
|
if ((sp)->gfn != (gfn) || (sp)->role.direct || \
|
|
|
|
(sp)->role.invalid) {} else
|
|
|
|
|
2010-06-11 21:30:36 +08:00
|
|
|
/* @sp->gfn should be write-protected at the call site */
|
2010-05-15 18:51:24 +08:00
|
|
|
static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
2010-06-04 21:55:29 +08:00
|
|
|
struct list_head *invalid_list, bool clear_unsync)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2010-04-15 00:20:03 +08:00
|
|
|
if (sp->role.cr4_pae != !!is_pae(vcpu)) {
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
|
2008-09-24 00:18:39 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2010-06-11 21:30:36 +08:00
|
|
|
if (clear_unsync)
|
2010-05-15 18:51:24 +08:00
|
|
|
kvm_unlink_unsync_page(vcpu->kvm, sp);
|
|
|
|
|
2010-11-19 17:04:03 +08:00
|
|
|
if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
|
2008-09-24 00:18:39 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-05-15 18:51:24 +08:00
|
|
|
static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2010-05-15 18:51:24 +08:00
|
|
|
int ret;
|
|
|
|
|
2010-06-04 21:55:29 +08:00
|
|
|
ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
|
2010-06-11 21:31:38 +08:00
|
|
|
if (ret)
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
|
|
|
|
2010-05-15 18:51:24 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2010-06-04 21:55:29 +08:00
|
|
|
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
|
|
|
struct list_head *invalid_list)
|
2010-05-15 18:51:24 +08:00
|
|
|
{
|
2010-06-04 21:55:29 +08:00
|
|
|
return __kvm_sync_page(vcpu, sp, invalid_list, true);
|
2010-05-15 18:51:24 +08:00
|
|
|
}
|
|
|
|
|
2010-05-24 15:41:33 +08:00
|
|
|
/* @gfn should be write-protected at the call site */
|
|
|
|
static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *s;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2010-05-24 15:41:33 +08:00
|
|
|
bool flush = false;
|
|
|
|
|
2010-06-04 21:56:11 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
|
2010-06-04 21:53:07 +08:00
|
|
|
if (!s->unsync)
|
2010-05-24 15:41:33 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
|
2010-11-19 17:04:03 +08:00
|
|
|
kvm_unlink_unsync_page(vcpu->kvm, s);
|
2010-05-24 15:41:33 +08:00
|
|
|
if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
|
2010-11-19 17:04:03 +08:00
|
|
|
(vcpu->arch.mmu.sync_page(vcpu, s))) {
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list);
|
2010-05-24 15:41:33 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
flush = true;
|
|
|
|
}
|
|
|
|
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
2010-05-24 15:41:33 +08:00
|
|
|
if (flush)
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
struct mmu_page_path {
|
|
|
|
struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
|
|
|
|
unsigned int idx[PT64_ROOT_LEVEL-1];
|
2008-09-24 00:18:39 +08:00
|
|
|
};
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
#define for_each_sp(pvec, sp, parents, i) \
|
|
|
|
for (i = mmu_pages_next(&pvec, &parents, -1), \
|
|
|
|
sp = pvec.page[i].sp; \
|
|
|
|
i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
|
|
|
|
i = mmu_pages_next(&pvec, &parents, i))
|
|
|
|
|
2009-02-21 09:19:13 +08:00
|
|
|
static int mmu_pages_next(struct kvm_mmu_pages *pvec,
|
|
|
|
struct mmu_page_path *parents,
|
|
|
|
int i)
|
2008-12-02 08:32:02 +08:00
|
|
|
{
|
|
|
|
int n;
|
|
|
|
|
|
|
|
for (n = i+1; n < pvec->nr; n++) {
|
|
|
|
struct kvm_mmu_page *sp = pvec->page[n].sp;
|
|
|
|
|
|
|
|
if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
|
|
|
|
parents->idx[0] = pvec->page[n].idx;
|
|
|
|
return n;
|
|
|
|
}
|
|
|
|
|
|
|
|
parents->parent[sp->role.level-2] = sp;
|
|
|
|
parents->idx[sp->role.level-1] = pvec->page[n].idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
return n;
|
|
|
|
}
|
|
|
|
|
2009-02-21 09:19:13 +08:00
|
|
|
static void mmu_pages_clear_parents(struct mmu_page_path *parents)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
unsigned int level = 0;
|
|
|
|
|
|
|
|
do {
|
|
|
|
unsigned int idx = parents->idx[level];
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
sp = parents->parent[level];
|
|
|
|
if (!sp)
|
|
|
|
return;
|
|
|
|
|
|
|
|
--sp->unsync_children;
|
|
|
|
WARN_ON((int)sp->unsync_children < 0);
|
|
|
|
__clear_bit(idx, sp->unsync_child_bitmap);
|
|
|
|
level++;
|
|
|
|
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
|
|
|
|
struct mmu_page_path *parents,
|
|
|
|
struct kvm_mmu_pages *pvec)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
parents->parent[parent->role.level-1] = NULL;
|
|
|
|
pvec->nr = 0;
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
static void mmu_sync_children(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *parent)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
struct mmu_page_path parents;
|
|
|
|
struct kvm_mmu_pages pages;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2008-12-02 08:32:02 +08:00
|
|
|
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
while (mmu_unsync_walk(parent, &pages)) {
|
2008-12-02 08:32:03 +08:00
|
|
|
int protected = 0;
|
|
|
|
|
|
|
|
for_each_sp(pages, sp, parents, i)
|
|
|
|
protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
|
|
|
|
|
|
|
|
if (protected)
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
for_each_sp(pages, sp, parents, i) {
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_sync_page(vcpu, sp, &invalid_list);
|
2008-12-02 08:32:02 +08:00
|
|
|
mmu_pages_clear_parents(&parents);
|
|
|
|
}
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
2008-09-24 00:18:39 +08:00
|
|
|
cond_resched_lock(&vcpu->kvm->mmu_lock);
|
2008-12-02 08:32:02 +08:00
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
|
|
|
|
gfn_t gfn,
|
|
|
|
gva_t gaddr,
|
|
|
|
unsigned level,
|
2009-01-11 19:02:10 +08:00
|
|
|
int direct,
|
2007-12-09 23:00:02 +08:00
|
|
|
unsigned access,
|
2008-02-27 04:12:10 +08:00
|
|
|
u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
{
|
|
|
|
union kvm_mmu_page_role role;
|
|
|
|
unsigned quadrant;
|
2010-05-24 15:41:33 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-05-24 15:41:33 +08:00
|
|
|
bool need_sync = false;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
|
2008-12-22 01:20:09 +08:00
|
|
|
role = vcpu->arch.mmu.base_role;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
role.level = level;
|
2009-01-11 19:02:10 +08:00
|
|
|
role.direct = direct;
|
2010-03-14 16:16:40 +08:00
|
|
|
if (role.direct)
|
2010-04-15 00:20:03 +08:00
|
|
|
role.cr4_pae = 0;
|
2007-12-09 23:00:02 +08:00
|
|
|
role.access = access;
|
2010-09-10 23:30:39 +08:00
|
|
|
if (!vcpu->arch.mmu.direct_map
|
|
|
|
&& vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
|
|
|
|
quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
|
|
|
|
role.quadrant = quadrant;
|
|
|
|
}
|
2010-06-04 21:56:11 +08:00
|
|
|
for_each_gfn_sp(vcpu->kvm, sp, gfn, node) {
|
2010-06-04 21:53:07 +08:00
|
|
|
if (!need_sync && sp->unsync)
|
|
|
|
need_sync = true;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
if (sp->role.word != role.word)
|
|
|
|
continue;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
|
|
|
|
break;
|
KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:
- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp->gfn since it not broke unsync
rule(one shadow page for a gfn)
- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-15 18:52:34 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
|
|
|
|
if (sp->unsync_children) {
|
2010-05-10 17:34:53 +08:00
|
|
|
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
|
2010-06-04 21:53:07 +08:00
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
|
|
|
} else if (sp->unsync)
|
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
KVM: MMU: don't write-protect if have new mapping to unsync page
Two cases maybe happen in kvm_mmu_get_page() function:
- one case is, the goal sp is already in cache, if the sp is unsync,
we only need update it to assure this mapping is valid, but not
mark it sync and not write-protect sp->gfn since it not broke unsync
rule(one shadow page for a gfn)
- another case is, the goal sp not existed, we need create a new sp
for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule,
we should sync(mark sync and write-protect) gfn's unsync shadow page.
After enabling multiple unsync shadows, we sync those shadow pages
only when the new sp not allow to become unsync(also for the unsyc
rule, the new rule is: allow all pte page become unsync)
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-15 18:52:34 +08:00
|
|
|
|
2010-06-04 21:53:07 +08:00
|
|
|
trace_kvm_mmu_get_page(sp, false);
|
|
|
|
return sp;
|
|
|
|
}
|
2007-12-19 01:47:18 +08:00
|
|
|
++vcpu->kvm->stat.mmu_cache_miss;
|
2010-05-26 16:49:59 +08:00
|
|
|
sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp)
|
|
|
|
return sp;
|
|
|
|
sp->gfn = gfn;
|
|
|
|
sp->role = role;
|
2010-06-04 21:53:07 +08:00
|
|
|
hlist_add_head(&sp->hash_link,
|
|
|
|
&vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
|
2009-01-11 19:02:10 +08:00
|
|
|
if (!direct) {
|
2008-12-02 08:32:03 +08:00
|
|
|
if (rmap_write_protect(vcpu->kvm, gfn))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2010-05-24 15:41:33 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
|
|
|
|
kvm_sync_pages(vcpu, gfn);
|
|
|
|
|
2008-09-24 00:18:39 +08:00
|
|
|
account_shadowed(vcpu->kvm, gfn);
|
|
|
|
}
|
2008-05-29 19:56:28 +08:00
|
|
|
if (shadow_trap_nonpresent_pte != shadow_notrap_nonpresent_pte)
|
|
|
|
vcpu->arch.mmu.prefetch_page(vcpu, sp);
|
|
|
|
else
|
|
|
|
nonpaging_prefetch_page(vcpu, sp);
|
2009-07-06 20:58:14 +08:00
|
|
|
trace_kvm_mmu_get_page(sp, true);
|
2007-11-21 21:28:32 +08:00
|
|
|
return sp;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
|
|
|
|
struct kvm_vcpu *vcpu, u64 addr)
|
|
|
|
{
|
|
|
|
iterator->addr = addr;
|
|
|
|
iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
|
|
|
|
iterator->level = vcpu->arch.mmu.shadow_root_level;
|
2010-09-10 23:31:00 +08:00
|
|
|
|
|
|
|
if (iterator->level == PT64_ROOT_LEVEL &&
|
|
|
|
vcpu->arch.mmu.root_level < PT64_ROOT_LEVEL &&
|
|
|
|
!vcpu->arch.mmu.direct_map)
|
|
|
|
--iterator->level;
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
if (iterator->level == PT32E_ROOT_LEVEL) {
|
|
|
|
iterator->shadow_addr
|
|
|
|
= vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
|
|
|
|
iterator->shadow_addr &= PT64_BASE_ADDR_MASK;
|
|
|
|
--iterator->level;
|
|
|
|
if (!iterator->shadow_addr)
|
|
|
|
iterator->level = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
|
|
|
|
{
|
|
|
|
if (iterator->level < PT_PAGE_TABLE_LEVEL)
|
|
|
|
return false;
|
2009-06-11 23:07:41 +08:00
|
|
|
|
|
|
|
if (iterator->level == PT_PAGE_TABLE_LEVEL)
|
|
|
|
if (is_large_pte(*iterator->sptep))
|
|
|
|
return false;
|
|
|
|
|
2008-12-25 20:39:47 +08:00
|
|
|
iterator->index = SHADOW_PT_INDEX(iterator->addr, iterator->level);
|
|
|
|
iterator->sptep = ((u64 *)__va(iterator->shadow_addr)) + iterator->index;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator)
|
|
|
|
{
|
|
|
|
iterator->shadow_addr = *iterator->sptep & PT64_BASE_ADDR_MASK;
|
|
|
|
--iterator->level;
|
|
|
|
}
|
|
|
|
|
2010-07-13 19:27:04 +08:00
|
|
|
static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
u64 spte;
|
|
|
|
|
|
|
|
spte = __pa(sp->spt)
|
|
|
|
| PT_PRESENT_MASK | PT_ACCESSED_MASK
|
|
|
|
| PT_WRITABLE_MASK | PT_USER_MASK;
|
2010-07-13 19:27:05 +08:00
|
|
|
__set_spte(sptep, spte);
|
2010-07-13 19:27:04 +08:00
|
|
|
}
|
|
|
|
|
2010-07-13 19:27:06 +08:00
|
|
|
static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
|
|
|
|
{
|
|
|
|
if (is_large_pte(*sptep)) {
|
|
|
|
drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-07-13 19:27:07 +08:00
|
|
|
static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|
|
|
unsigned direct_access)
|
|
|
|
{
|
|
|
|
if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
|
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For the direct sp, if the guest pte's dirty bit
|
|
|
|
* changed form clean to dirty, it will corrupt the
|
|
|
|
* sp's access: allow writable in the read-only sp,
|
|
|
|
* so we should update the spte at this point to get
|
|
|
|
* a new sp with the correct access.
|
|
|
|
*/
|
|
|
|
child = page_header(*sptep & PT64_BASE_ADDR_MASK);
|
|
|
|
if (child->role.access == direct_access)
|
|
|
|
return;
|
|
|
|
|
|
|
|
mmu_page_remove_parent_pte(child, sptep);
|
|
|
|
__set_spte(sptep, shadow_trap_nonpresent_pte);
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
static void kvm_mmu_page_unlink_children(struct kvm *kvm,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp)
|
2007-01-06 08:36:45 +08:00
|
|
|
{
|
2007-01-06 08:36:46 +08:00
|
|
|
unsigned i;
|
|
|
|
u64 *pt;
|
|
|
|
u64 ent;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
pt = sp->spt;
|
2007-01-06 08:36:46 +08:00
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
ent = pt[i];
|
|
|
|
|
2008-02-23 22:44:30 +08:00
|
|
|
if (is_shadow_present_pte(ent)) {
|
2009-06-10 23:27:03 +08:00
|
|
|
if (!is_last_spte(ent, sp->role.level)) {
|
2008-02-23 22:44:30 +08:00
|
|
|
ent &= PT64_BASE_ADDR_MASK;
|
|
|
|
mmu_page_remove_parent_pte(page_header(ent),
|
|
|
|
&pt[i]);
|
|
|
|
} else {
|
2009-06-10 23:27:03 +08:00
|
|
|
if (is_large_pte(ent))
|
|
|
|
--kvm->stat.lpages;
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(kvm, &pt[i],
|
|
|
|
shadow_trap_nonpresent_pte);
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
|
|
|
}
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
pt[i] = shadow_trap_nonpresent_pte;
|
2007-01-06 08:36:46 +08:00
|
|
|
}
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
mmu_page_remove_parent_pte(sp, parent_pte);
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
|
2007-09-23 20:10:49 +08:00
|
|
|
static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
2009-06-09 20:56:29 +08:00
|
|
|
struct kvm_vcpu *vcpu;
|
2007-09-23 20:10:49 +08:00
|
|
|
|
2009-06-09 20:56:29 +08:00
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm)
|
|
|
|
vcpu->arch.last_pte_updated = NULL;
|
2007-09-23 20:10:49 +08:00
|
|
|
}
|
|
|
|
|
2008-07-11 22:59:46 +08:00
|
|
|
static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
|
2007-01-06 08:36:45 +08:00
|
|
|
{
|
|
|
|
u64 *parent_pte;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
while (sp->multimapped || sp->parent_pte) {
|
|
|
|
if (!sp->multimapped)
|
|
|
|
parent_pte = sp->parent_pte;
|
2007-01-06 08:36:45 +08:00
|
|
|
else {
|
|
|
|
struct kvm_pte_chain *chain;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
chain = container_of(sp->parent_ptes.first,
|
2007-01-06 08:36:45 +08:00
|
|
|
struct kvm_pte_chain, link);
|
|
|
|
parent_pte = chain->parent_ptes[0];
|
|
|
|
}
|
2007-01-06 08:36:46 +08:00
|
|
|
BUG_ON(!parent_pte);
|
2007-11-21 21:28:32 +08:00
|
|
|
kvm_mmu_put_page(sp, parent_pte);
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(parent_pte, shadow_trap_nonpresent_pte);
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
2008-07-11 22:59:46 +08:00
|
|
|
}
|
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
static int mmu_zap_unsync_children(struct kvm *kvm,
|
2010-06-04 21:53:54 +08:00
|
|
|
struct kvm_mmu_page *parent,
|
|
|
|
struct list_head *invalid_list)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
2008-12-02 08:32:02 +08:00
|
|
|
int i, zapped = 0;
|
|
|
|
struct mmu_page_path parents;
|
|
|
|
struct kvm_mmu_pages pages;
|
2008-09-24 00:18:39 +08:00
|
|
|
|
2008-12-02 08:32:02 +08:00
|
|
|
if (parent->role.level == PT_PAGE_TABLE_LEVEL)
|
2008-09-24 00:18:39 +08:00
|
|
|
return 0;
|
2008-12-02 08:32:02 +08:00
|
|
|
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
while (mmu_unsync_walk(parent, &pages)) {
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
for_each_sp(pages, sp, parents, i) {
|
2010-06-04 21:53:54 +08:00
|
|
|
kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
|
2008-12-02 08:32:02 +08:00
|
|
|
mmu_pages_clear_parents(&parents);
|
2010-04-16 16:34:42 +08:00
|
|
|
zapped++;
|
2008-12-02 08:32:02 +08:00
|
|
|
}
|
|
|
|
kvm_mmu_pages_init(parent, &parents, &pages);
|
|
|
|
}
|
|
|
|
|
|
|
|
return zapped;
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
|
|
|
|
struct list_head *invalid_list)
|
2008-07-11 22:59:46 +08:00
|
|
|
{
|
2008-09-24 00:18:39 +08:00
|
|
|
int ret;
|
2009-07-06 20:58:14 +08:00
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
trace_kvm_mmu_prepare_zap_page(sp);
|
2008-07-11 22:59:46 +08:00
|
|
|
++kvm->stat.mmu_shadow_zapped;
|
2010-06-04 21:53:54 +08:00
|
|
|
ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
|
2007-11-21 21:28:32 +08:00
|
|
|
kvm_mmu_page_unlink_children(kvm, sp);
|
2008-07-11 22:59:46 +08:00
|
|
|
kvm_mmu_unlink_parents(kvm, sp);
|
2009-01-11 19:02:10 +08:00
|
|
|
if (!sp->role.invalid && !sp->role.direct)
|
2008-07-11 23:07:26 +08:00
|
|
|
unaccount_shadowed(kvm, sp->gfn);
|
2008-09-24 00:18:39 +08:00
|
|
|
if (sp->unsync)
|
|
|
|
kvm_unlink_unsync_page(kvm, sp);
|
2007-11-21 21:28:32 +08:00
|
|
|
if (!sp->root_count) {
|
2010-05-05 09:03:49 +08:00
|
|
|
/* Count self */
|
|
|
|
ret++;
|
2010-06-04 21:53:54 +08:00
|
|
|
list_move(&sp->link, invalid_list);
|
2008-02-21 03:47:24 +08:00
|
|
|
} else {
|
2008-07-11 23:07:26 +08:00
|
|
|
list_move(&sp->link, &kvm->arch.active_mmu_pages);
|
2008-02-21 03:47:24 +08:00
|
|
|
kvm_reload_remote_mmus(kvm);
|
|
|
|
}
|
2010-06-04 21:53:54 +08:00
|
|
|
|
|
|
|
sp->role.invalid = 1;
|
2007-09-23 20:10:49 +08:00
|
|
|
kvm_mmu_reset_last_pte_updated(kvm);
|
2008-09-24 00:18:39 +08:00
|
|
|
return ret;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
|
2010-06-04 21:53:54 +08:00
|
|
|
static void kvm_mmu_commit_zap_page(struct kvm *kvm,
|
|
|
|
struct list_head *invalid_list)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
if (list_empty(invalid_list))
|
|
|
|
return;
|
|
|
|
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
|
|
|
do {
|
|
|
|
sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
|
|
|
|
WARN_ON(!sp->role.invalid || sp->root_count);
|
|
|
|
kvm_mmu_free_page(kvm, sp);
|
|
|
|
} while (!list_empty(invalid_list));
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2007-10-03 00:52:55 +08:00
|
|
|
/*
|
|
|
|
* Changing the number of mmu pages allocated to the vm
|
2010-08-20 09:11:28 +08:00
|
|
|
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
|
2007-10-03 00:52:55 +08:00
|
|
|
*/
|
2010-08-20 09:11:28 +08:00
|
|
|
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
|
2007-10-03 00:52:55 +08:00
|
|
|
{
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2007-10-03 00:52:55 +08:00
|
|
|
/*
|
|
|
|
* If we set the number of mmu pages to be smaller be than the
|
|
|
|
* number of actived pages , we must to free some mmu pages before we
|
|
|
|
* change the value
|
|
|
|
*/
|
|
|
|
|
2010-08-20 09:11:28 +08:00
|
|
|
if (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages) {
|
|
|
|
while (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages &&
|
2010-04-16 16:34:42 +08:00
|
|
|
!list_empty(&kvm->arch.active_mmu_pages)) {
|
2007-10-03 00:52:55 +08:00
|
|
|
struct kvm_mmu_page *page;
|
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
page = container_of(kvm->arch.active_mmu_pages.prev,
|
2007-10-03 00:52:55 +08:00
|
|
|
struct kvm_mmu_page, link);
|
2010-08-24 10:31:07 +08:00
|
|
|
kvm_mmu_prepare_zap_page(kvm, page, &invalid_list);
|
|
|
|
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
2007-10-03 00:52:55 +08:00
|
|
|
}
|
2010-08-20 09:11:28 +08:00
|
|
|
goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
|
2007-10-03 00:52:55 +08:00
|
|
|
}
|
|
|
|
|
2010-08-20 09:11:28 +08:00
|
|
|
kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages;
|
2007-10-03 00:52:55 +08:00
|
|
|
}
|
|
|
|
|
2007-10-11 08:25:50 +08:00
|
|
|
static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
|
2007-01-06 08:36:45 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2007-01-06 08:36:45 +08:00
|
|
|
int r;
|
|
|
|
|
2010-08-28 19:19:42 +08:00
|
|
|
pgprintk("%s: looking for gfn %llx\n", __func__, gfn);
|
2007-01-06 08:36:45 +08:00
|
|
|
r = 0;
|
2010-06-04 21:56:11 +08:00
|
|
|
|
|
|
|
for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
|
2010-08-28 19:19:42 +08:00
|
|
|
pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
|
2010-06-04 21:53:07 +08:00
|
|
|
sp->role.word);
|
|
|
|
r = 1;
|
2010-06-04 21:56:11 +08:00
|
|
|
kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
|
2010-06-04 21:53:07 +08:00
|
|
|
}
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
2007-01-06 08:36:45 +08:00
|
|
|
return r;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
}
|
|
|
|
|
2007-10-11 08:25:50 +08:00
|
|
|
static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
|
2007-05-31 20:08:29 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2007-05-31 20:08:29 +08:00
|
|
|
|
2010-06-04 21:56:11 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
|
2010-08-28 19:19:42 +08:00
|
|
|
pgprintk("%s: zap %llx %x\n",
|
2010-06-04 21:53:07 +08:00
|
|
|
__func__, gfn, sp->role.word);
|
2010-06-04 21:56:11 +08:00
|
|
|
kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
|
2007-05-31 20:08:29 +08:00
|
|
|
}
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
2007-05-31 20:08:29 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 20:20:22 +08:00
|
|
|
static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2009-12-24 00:35:21 +08:00
|
|
|
int slot = memslot_id(kvm, gfn);
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp = page_header(__pa(pte));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-10-16 17:30:57 +08:00
|
|
|
__set_bit(slot, sp->slot_bitmap);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:38 +08:00
|
|
|
static void mmu_convert_notrap(struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u64 *pt = sp->spt;
|
|
|
|
|
|
|
|
if (shadow_trap_nonpresent_pte == shadow_notrap_nonpresent_pte)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
|
|
|
if (pt[i] == shadow_notrap_nonpresent_pte)
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(&pt[i], shadow_trap_nonpresent_pte);
|
2008-09-24 00:18:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-10-09 16:01:56 +08:00
|
|
|
/*
|
|
|
|
* The function is based on mtrr_type_lookup() in
|
|
|
|
* arch/x86/kernel/cpu/mtrr/generic.c
|
|
|
|
*/
|
|
|
|
static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
|
|
|
|
u64 start, u64 end)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u64 base, mask;
|
|
|
|
u8 prev_match, curr_match;
|
|
|
|
int num_var_ranges = KVM_NR_VAR_MTRR;
|
|
|
|
|
|
|
|
if (!mtrr_state->enabled)
|
|
|
|
return 0xFF;
|
|
|
|
|
|
|
|
/* Make end inclusive end, instead of exclusive */
|
|
|
|
end--;
|
|
|
|
|
|
|
|
/* Look in fixed ranges. Just return the type as per start */
|
|
|
|
if (mtrr_state->have_fixed && (start < 0x100000)) {
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
if (start < 0x80000) {
|
|
|
|
idx = 0;
|
|
|
|
idx += (start >> 16);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
} else if (start < 0xC0000) {
|
|
|
|
idx = 1 * 8;
|
|
|
|
idx += ((start - 0x80000) >> 14);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
} else if (start < 0x1000000) {
|
|
|
|
idx = 3 * 8;
|
|
|
|
idx += ((start - 0xC0000) >> 12);
|
|
|
|
return mtrr_state->fixed_ranges[idx];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Look in variable ranges
|
|
|
|
* Look of multiple ranges matching this address and pick type
|
|
|
|
* as per MTRR precedence
|
|
|
|
*/
|
|
|
|
if (!(mtrr_state->enabled & 2))
|
|
|
|
return mtrr_state->def_type;
|
|
|
|
|
|
|
|
prev_match = 0xFF;
|
|
|
|
for (i = 0; i < num_var_ranges; ++i) {
|
|
|
|
unsigned short start_state, end_state;
|
|
|
|
|
|
|
|
if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11)))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) +
|
|
|
|
(mtrr_state->var_ranges[i].base_lo & PAGE_MASK);
|
|
|
|
mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) +
|
|
|
|
(mtrr_state->var_ranges[i].mask_lo & PAGE_MASK);
|
|
|
|
|
|
|
|
start_state = ((start & mask) == (base & mask));
|
|
|
|
end_state = ((end & mask) == (base & mask));
|
|
|
|
if (start_state != end_state)
|
|
|
|
return 0xFE;
|
|
|
|
|
|
|
|
if ((start & mask) != (base & mask))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
curr_match = mtrr_state->var_ranges[i].base_lo & 0xff;
|
|
|
|
if (prev_match == 0xFF) {
|
|
|
|
prev_match = curr_match;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match == MTRR_TYPE_UNCACHABLE ||
|
|
|
|
curr_match == MTRR_TYPE_UNCACHABLE)
|
|
|
|
return MTRR_TYPE_UNCACHABLE;
|
|
|
|
|
|
|
|
if ((prev_match == MTRR_TYPE_WRBACK &&
|
|
|
|
curr_match == MTRR_TYPE_WRTHROUGH) ||
|
|
|
|
(prev_match == MTRR_TYPE_WRTHROUGH &&
|
|
|
|
curr_match == MTRR_TYPE_WRBACK)) {
|
|
|
|
prev_match = MTRR_TYPE_WRTHROUGH;
|
|
|
|
curr_match = MTRR_TYPE_WRTHROUGH;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match != curr_match)
|
|
|
|
return MTRR_TYPE_UNCACHABLE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev_match != 0xFF)
|
|
|
|
return prev_match;
|
|
|
|
|
|
|
|
return mtrr_state->def_type;
|
|
|
|
}
|
|
|
|
|
2009-04-27 20:35:42 +08:00
|
|
|
u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2008-10-09 16:01:56 +08:00
|
|
|
{
|
|
|
|
u8 mtrr;
|
|
|
|
|
|
|
|
mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT,
|
|
|
|
(gfn << PAGE_SHIFT) + PAGE_SIZE);
|
|
|
|
if (mtrr == 0xfe || mtrr == 0xff)
|
|
|
|
mtrr = MTRR_TYPE_WRBACK;
|
|
|
|
return mtrr;
|
|
|
|
}
|
2009-04-27 20:35:42 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
|
2008-10-09 16:01:56 +08:00
|
|
|
|
2010-05-24 15:40:07 +08:00
|
|
|
static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
|
|
|
{
|
|
|
|
trace_kvm_mmu_unsync_page(sp);
|
|
|
|
++vcpu->kvm->stat.mmu_unsync;
|
|
|
|
sp->unsync = 1;
|
|
|
|
|
|
|
|
kvm_mmu_mark_parents_unsync(sp);
|
|
|
|
mmu_convert_notrap(sp);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2008-09-24 00:18:39 +08:00
|
|
|
{
|
|
|
|
struct kvm_mmu_page *s;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-05-24 15:40:07 +08:00
|
|
|
|
2010-06-04 21:56:11 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
|
2010-06-04 21:53:07 +08:00
|
|
|
if (s->unsync)
|
2008-09-24 00:18:39 +08:00
|
|
|
continue;
|
2010-05-24 15:40:07 +08:00
|
|
|
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
|
|
|
|
__kvm_unsync_page(vcpu, s);
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
|
|
|
|
bool can_unsync)
|
|
|
|
{
|
2010-05-24 15:40:07 +08:00
|
|
|
struct kvm_mmu_page *s;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-05-24 15:40:07 +08:00
|
|
|
bool need_unsync = false;
|
|
|
|
|
2010-06-04 21:56:11 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn, node) {
|
KVM: MMU: fix writable sync sp mapping
While we sync many unsync sp at one time(in mmu_sync_children()),
we may mapping the spte writable, it's dangerous, if one unsync
sp's mapping gfn is another unsync page's gfn.
For example:
SP1.pte[0] = P
SP2.gfn's pfn = P
[SP1.pte[0] = SP2.gfn's pfn]
First, we write protected SP1 and SP2, but SP1 and SP2 are still the
unsync sp.
Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp,
that is SP2, so it will mapping it writable, but we plan to sync SP2 soon,
at this point, the SP2->unsync is not reliable since later we sync SP2 but
SP2->gfn is already writable.
So the final result is: SP2 is the sync page but SP2.gfn is writable.
This bug will corrupt guest's page table, fixed by mark read-only mapping
if the mapped gfn has shadow pages.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-06-30 16:02:02 +08:00
|
|
|
if (!can_unsync)
|
|
|
|
return 1;
|
|
|
|
|
2010-05-24 15:40:07 +08:00
|
|
|
if (s->role.level != PT_PAGE_TABLE_LEVEL)
|
2008-09-24 00:18:39 +08:00
|
|
|
return 1;
|
2010-05-24 15:40:07 +08:00
|
|
|
|
|
|
|
if (!need_unsync && !s->unsync) {
|
KVM: MMU: fix writable sync sp mapping
While we sync many unsync sp at one time(in mmu_sync_children()),
we may mapping the spte writable, it's dangerous, if one unsync
sp's mapping gfn is another unsync page's gfn.
For example:
SP1.pte[0] = P
SP2.gfn's pfn = P
[SP1.pte[0] = SP2.gfn's pfn]
First, we write protected SP1 and SP2, but SP1 and SP2 are still the
unsync sp.
Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp,
that is SP2, so it will mapping it writable, but we plan to sync SP2 soon,
at this point, the SP2->unsync is not reliable since later we sync SP2 but
SP2->gfn is already writable.
So the final result is: SP2 is the sync page but SP2.gfn is writable.
This bug will corrupt guest's page table, fixed by mark read-only mapping
if the mapped gfn has shadow pages.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-06-30 16:02:02 +08:00
|
|
|
if (!oos_shadow)
|
2010-05-24 15:40:07 +08:00
|
|
|
return 1;
|
|
|
|
need_unsync = true;
|
|
|
|
}
|
2008-09-24 00:18:39 +08:00
|
|
|
}
|
2010-05-24 15:40:07 +08:00
|
|
|
if (need_unsync)
|
|
|
|
kvm_unsync_pages(vcpu, gfn);
|
2008-09-24 00:18:39 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
2008-09-24 00:18:30 +08:00
|
|
|
unsigned pte_access, int user_fault,
|
2009-07-27 22:30:44 +08:00
|
|
|
int write_fault, int dirty, int level,
|
2009-04-06 01:54:47 +08:00
|
|
|
gfn_t gfn, pfn_t pfn, bool speculative,
|
2010-11-19 17:03:22 +08:00
|
|
|
bool can_unsync, bool host_writable)
|
2007-12-09 23:40:31 +08:00
|
|
|
{
|
2010-11-19 17:02:35 +08:00
|
|
|
u64 spte, entry = *sptep;
|
2008-09-24 00:18:30 +08:00
|
|
|
int ret = 0;
|
2008-10-09 16:01:57 +08:00
|
|
|
|
2007-12-09 23:40:31 +08:00
|
|
|
/*
|
|
|
|
* We don't set the accessed bit, since we sometimes want to see
|
|
|
|
* whether the guest actually used the pte (in order to detect
|
|
|
|
* demand paging).
|
|
|
|
*/
|
2010-10-23 00:18:16 +08:00
|
|
|
spte = PT_PRESENT_MASK;
|
2008-03-18 17:05:52 +08:00
|
|
|
if (!speculative)
|
2008-08-28 01:01:04 +08:00
|
|
|
spte |= shadow_accessed_mask;
|
2007-12-09 23:40:31 +08:00
|
|
|
if (!dirty)
|
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2008-04-25 21:13:50 +08:00
|
|
|
if (pte_access & ACC_EXEC_MASK)
|
|
|
|
spte |= shadow_x_mask;
|
|
|
|
else
|
|
|
|
spte |= shadow_nx_mask;
|
2007-12-09 23:40:31 +08:00
|
|
|
if (pte_access & ACC_USER_MASK)
|
2008-04-25 21:13:50 +08:00
|
|
|
spte |= shadow_user_mask;
|
2009-07-27 22:30:44 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL)
|
2008-02-23 22:44:30 +08:00
|
|
|
spte |= PT_PAGE_SIZE_MASK;
|
2010-09-13 22:45:28 +08:00
|
|
|
if (tdp_enabled)
|
2009-04-27 20:35:42 +08:00
|
|
|
spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
|
|
|
|
kvm_is_mmio_pfn(pfn));
|
2007-12-09 23:40:31 +08:00
|
|
|
|
2010-11-19 17:03:22 +08:00
|
|
|
if (host_writable)
|
2009-09-24 02:47:17 +08:00
|
|
|
spte |= SPTE_HOST_WRITEABLE;
|
2010-12-23 16:09:29 +08:00
|
|
|
else
|
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2009-09-24 02:47:17 +08:00
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
spte |= (u64)pfn << PAGE_SHIFT;
|
2007-12-09 23:40:31 +08:00
|
|
|
|
|
|
|
if ((pte_access & ACC_WRITE_MASK)
|
2010-09-10 23:30:39 +08:00
|
|
|
|| (!vcpu->arch.mmu.direct_map && write_fault
|
|
|
|
&& !is_write_protection(vcpu) && !user_fault)) {
|
2007-12-09 23:40:31 +08:00
|
|
|
|
2009-07-27 22:30:44 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL &&
|
|
|
|
has_wrprotected_page(vcpu->kvm, gfn, level)) {
|
2008-09-24 00:18:32 +08:00
|
|
|
ret = 1;
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
|
|
|
|
goto done;
|
2008-09-24 00:18:32 +08:00
|
|
|
}
|
|
|
|
|
2007-12-09 23:40:31 +08:00
|
|
|
spte |= PT_WRITABLE_MASK;
|
|
|
|
|
2010-09-10 23:30:39 +08:00
|
|
|
if (!vcpu->arch.mmu.direct_map
|
|
|
|
&& !(pte_access & ACC_WRITE_MASK))
|
2010-05-27 19:35:58 +08:00
|
|
|
spte &= ~PT_USER_MASK;
|
|
|
|
|
2008-11-25 22:58:07 +08:00
|
|
|
/*
|
|
|
|
* Optimization: for pte sync, if spte was writable the hash
|
|
|
|
* lookup is unnecessary (and expensive). Write protection
|
|
|
|
* is responsibility of mmu_get_page / kvm_sync_page.
|
|
|
|
* Same reasoning can be applied to dirty page accounting.
|
|
|
|
*/
|
2010-01-18 17:45:10 +08:00
|
|
|
if (!can_unsync && is_writable_pte(*sptep))
|
2008-11-25 22:58:07 +08:00
|
|
|
goto set_pte;
|
|
|
|
|
2008-09-24 00:18:39 +08:00
|
|
|
if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
|
2010-08-28 19:19:42 +08:00
|
|
|
pgprintk("%s: found shadow page for %llx, marking ro\n",
|
2008-03-04 04:59:56 +08:00
|
|
|
__func__, gfn);
|
2008-09-24 00:18:30 +08:00
|
|
|
ret = 1;
|
2007-12-09 23:40:31 +08:00
|
|
|
pte_access &= ~ACC_WRITE_MASK;
|
2010-01-18 17:45:10 +08:00
|
|
|
if (is_writable_pte(spte))
|
2007-12-09 23:40:31 +08:00
|
|
|
spte &= ~PT_WRITABLE_MASK;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pte_access & ACC_WRITE_MASK)
|
|
|
|
mark_page_dirty(vcpu->kvm, gfn);
|
|
|
|
|
2008-09-24 00:18:32 +08:00
|
|
|
set_pte:
|
2010-06-06 20:46:44 +08:00
|
|
|
update_spte(sptep, spte);
|
2010-11-19 17:02:35 +08:00
|
|
|
/*
|
|
|
|
* If we overwrite a writable spte with a read-only one we
|
|
|
|
* should flush remote TLBs. Otherwise rmap_write_protect
|
|
|
|
* will find a read-only spte, even though the writable spte
|
|
|
|
* might be cached on a CPU's TLB.
|
|
|
|
*/
|
|
|
|
if (is_writable_pte(entry) && !is_writable_pte(*sptep))
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2010-06-06 19:31:27 +08:00
|
|
|
done:
|
2008-09-24 00:18:30 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
2008-09-24 00:18:30 +08:00
|
|
|
unsigned pt_access, unsigned pte_access,
|
|
|
|
int user_fault, int write_fault, int dirty,
|
2009-07-27 22:30:44 +08:00
|
|
|
int *ptwrite, int level, gfn_t gfn,
|
2009-09-24 02:47:17 +08:00
|
|
|
pfn_t pfn, bool speculative,
|
2010-11-19 17:03:22 +08:00
|
|
|
bool host_writable)
|
2008-09-24 00:18:30 +08:00
|
|
|
{
|
|
|
|
int was_rmapped = 0;
|
2009-08-06 02:43:58 +08:00
|
|
|
int rmap_count;
|
2008-09-24 00:18:30 +08:00
|
|
|
|
|
|
|
pgprintk("%s: spte %llx access %x write_fault %d"
|
2010-08-28 19:19:42 +08:00
|
|
|
" user_fault %d gfn %llx\n",
|
2009-06-10 19:24:23 +08:00
|
|
|
__func__, *sptep, pt_access,
|
2008-09-24 00:18:30 +08:00
|
|
|
write_fault, user_fault, gfn);
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
if (is_rmap_spte(*sptep)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
/*
|
|
|
|
* If we overwrite a PTE page pointer with a 2MB PMD, unlink
|
|
|
|
* the parent of the now unreachable PTE.
|
|
|
|
*/
|
2009-07-27 22:30:44 +08:00
|
|
|
if (level > PT_PAGE_TABLE_LEVEL &&
|
|
|
|
!is_large_pte(*sptep)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
struct kvm_mmu_page *child;
|
2009-06-10 19:24:23 +08:00
|
|
|
u64 pte = *sptep;
|
2008-09-24 00:18:30 +08:00
|
|
|
|
|
|
|
child = page_header(pte & PT64_BASE_ADDR_MASK);
|
2009-06-10 19:24:23 +08:00
|
|
|
mmu_page_remove_parent_pte(child, sptep);
|
2010-05-28 20:44:59 +08:00
|
|
|
__set_spte(sptep, shadow_trap_nonpresent_pte);
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2009-06-10 19:24:23 +08:00
|
|
|
} else if (pfn != spte_to_pfn(*sptep)) {
|
2010-08-28 19:19:42 +08:00
|
|
|
pgprintk("hfn old %llx new %llx\n",
|
2009-06-10 19:24:23 +08:00
|
|
|
spte_to_pfn(*sptep), pfn);
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte);
|
2010-06-30 16:04:06 +08:00
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2009-02-18 21:08:59 +08:00
|
|
|
} else
|
|
|
|
was_rmapped = 1;
|
2008-09-24 00:18:30 +08:00
|
|
|
}
|
2009-07-27 22:30:44 +08:00
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
|
2009-09-24 02:47:17 +08:00
|
|
|
dirty, level, gfn, pfn, speculative, true,
|
2010-11-19 17:03:22 +08:00
|
|
|
host_writable)) {
|
2008-09-24 00:18:30 +08:00
|
|
|
if (write_fault)
|
|
|
|
*ptwrite = 1;
|
2010-06-08 20:05:57 +08:00
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
2008-09-24 00:18:31 +08:00
|
|
|
}
|
2008-09-24 00:18:30 +08:00
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
|
2010-08-28 19:19:42 +08:00
|
|
|
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
|
2009-06-10 19:24:23 +08:00
|
|
|
is_large_pte(*sptep)? "2MB" : "4kB",
|
2009-07-09 22:36:01 +08:00
|
|
|
*sptep & PT_PRESENT_MASK ?"RW":"R", gfn,
|
|
|
|
*sptep, sptep);
|
2009-06-10 19:24:23 +08:00
|
|
|
if (!was_rmapped && is_large_pte(*sptep))
|
2008-02-23 22:44:30 +08:00
|
|
|
++vcpu->kvm->stat.lpages;
|
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
page_header_update_slot(vcpu->kvm, sptep, gfn);
|
2007-12-09 23:40:31 +08:00
|
|
|
if (!was_rmapped) {
|
2009-07-27 22:30:42 +08:00
|
|
|
rmap_count = rmap_add(vcpu, sptep, gfn);
|
2009-08-06 02:43:58 +08:00
|
|
|
if (rmap_count > RMAP_RECYCLE_THRESHOLD)
|
2009-07-27 22:30:44 +08:00
|
|
|
rmap_recycle(vcpu, sptep, gfn);
|
2007-12-09 23:40:31 +08:00
|
|
|
}
|
KVM: MMU: fix page dirty tracking lost while sync page
In sync-page path, if spte.writable is changed, it will lose page dirty
tracking, for example:
assume spte.writable = 0 in a unsync-page, when it's synced, it map spte
to writable(that is spte.writable = 1), later guest write spte.gfn, it means
spte.gfn is dirty, then guest changed this mapping to read-only, after it's
synced, spte.writable = 0
So, when host release the spte, it detect spte.writable = 0 and not mark page
dirty
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-16 11:25:17 +08:00
|
|
|
kvm_release_pfn_clean(pfn);
|
2008-05-15 18:51:35 +08:00
|
|
|
if (speculative) {
|
2009-06-10 19:24:23 +08:00
|
|
|
vcpu->arch.last_pte_updated = sptep;
|
2008-05-15 18:51:35 +08:00
|
|
|
vcpu->arch.last_pte_gfn = gfn;
|
|
|
|
}
|
2007-12-09 23:40:31 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2010-08-22 19:12:48 +08:00
|
|
|
static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
|
|
|
|
bool no_dirty_log)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
|
|
|
unsigned long hva;
|
|
|
|
|
2011-03-09 15:43:00 +08:00
|
|
|
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
|
2010-08-22 19:12:48 +08:00
|
|
|
if (!slot) {
|
|
|
|
get_page(bad_page);
|
|
|
|
return page_to_pfn(bad_page);
|
|
|
|
}
|
|
|
|
|
|
|
|
hva = gfn_to_hva_memslot(slot, gfn);
|
|
|
|
|
|
|
|
return hva_to_pfn_atomic(vcpu->kvm, hva);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp,
|
|
|
|
u64 *start, u64 *end)
|
|
|
|
{
|
|
|
|
struct page *pages[PTE_PREFETCH_NUM];
|
|
|
|
unsigned access = sp->role.access;
|
|
|
|
int i, ret;
|
|
|
|
gfn_t gfn;
|
|
|
|
|
|
|
|
gfn = kvm_mmu_page_get_gfn(sp, start - sp->spt);
|
2011-03-09 15:43:00 +08:00
|
|
|
if (!gfn_to_memslot_dirty_bitmap(vcpu, gfn, access & ACC_WRITE_MASK))
|
2010-08-22 19:12:48 +08:00
|
|
|
return -1;
|
|
|
|
|
|
|
|
ret = gfn_to_page_many_atomic(vcpu->kvm, gfn, pages, end - start);
|
|
|
|
if (ret <= 0)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
for (i = 0; i < ret; i++, gfn++, start++)
|
|
|
|
mmu_set_spte(vcpu, start, ACC_ALL,
|
|
|
|
access, 0, 0, 1, NULL,
|
|
|
|
sp->role.level, gfn,
|
|
|
|
page_to_pfn(pages[i]), true, true);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp, u64 *sptep)
|
|
|
|
{
|
|
|
|
u64 *spte, *start = NULL;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
WARN_ON(!sp->role.direct);
|
|
|
|
|
|
|
|
i = (sptep - sp->spt) & ~(PTE_PREFETCH_NUM - 1);
|
|
|
|
spte = sp->spt + i;
|
|
|
|
|
|
|
|
for (i = 0; i < PTE_PREFETCH_NUM; i++, spte++) {
|
|
|
|
if (*spte != shadow_trap_nonpresent_pte || spte == sptep) {
|
|
|
|
if (!start)
|
|
|
|
continue;
|
|
|
|
if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
|
|
|
|
break;
|
|
|
|
start = NULL;
|
|
|
|
} else if (!start)
|
|
|
|
start = spte;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Since it's no accessed bit on EPT, it's no way to
|
|
|
|
* distinguish between actually accessed translations
|
|
|
|
* and prefetched, so disable pte prefetch if EPT is
|
|
|
|
* enabled.
|
|
|
|
*/
|
|
|
|
if (!shadow_accessed_mask)
|
|
|
|
return;
|
|
|
|
|
|
|
|
sp = page_header(__pa(sptep));
|
|
|
|
if (sp->role.level > PT_PAGE_TABLE_LEVEL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
__direct_pte_prefetch(vcpu, sp, sptep);
|
|
|
|
}
|
|
|
|
|
2008-12-25 20:54:25 +08:00
|
|
|
static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
|
2010-12-07 10:34:42 +08:00
|
|
|
int map_writable, int level, gfn_t gfn, pfn_t pfn,
|
|
|
|
bool prefault)
|
2008-08-23 00:28:04 +08:00
|
|
|
{
|
2008-12-25 20:54:25 +08:00
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
2008-08-23 00:28:04 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2008-12-25 20:54:25 +08:00
|
|
|
int pt_write = 0;
|
2008-08-23 00:28:04 +08:00
|
|
|
gfn_t pseudo_gfn;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-12-25 20:54:25 +08:00
|
|
|
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
|
2009-07-27 22:30:44 +08:00
|
|
|
if (iterator.level == level) {
|
2010-10-23 00:18:18 +08:00
|
|
|
unsigned pte_access = ACC_ALL;
|
|
|
|
|
|
|
|
mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, pte_access,
|
2008-12-25 20:54:25 +08:00
|
|
|
0, write, 1, &pt_write,
|
2010-12-07 10:34:42 +08:00
|
|
|
level, gfn, pfn, prefault, map_writable);
|
2010-08-22 19:12:48 +08:00
|
|
|
direct_pte_prefetch(vcpu, iterator.sptep);
|
2008-12-25 20:54:25 +08:00
|
|
|
++vcpu->stat.pf_fixed;
|
|
|
|
break;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-12-25 20:54:25 +08:00
|
|
|
if (*iterator.sptep == shadow_trap_nonpresent_pte) {
|
2010-05-26 16:48:25 +08:00
|
|
|
u64 base_addr = iterator.addr;
|
|
|
|
|
|
|
|
base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
|
|
|
|
pseudo_gfn = base_addr >> PAGE_SHIFT;
|
2008-12-25 20:54:25 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
|
|
|
|
iterator.level - 1,
|
|
|
|
1, ACC_ALL, iterator.sptep);
|
|
|
|
if (!sp) {
|
|
|
|
pgprintk("nonpaging_map: ENOMEM\n");
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2008-08-23 00:28:04 +08:00
|
|
|
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(iterator.sptep,
|
|
|
|
__pa(sp->spt)
|
|
|
|
| PT_PRESENT_MASK | PT_WRITABLE_MASK
|
2010-09-27 18:05:00 +08:00
|
|
|
| shadow_user_mask | shadow_x_mask
|
|
|
|
| shadow_accessed_mask);
|
2008-12-25 20:54:25 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return pt_write;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-10-08 16:24:15 +08:00
|
|
|
static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk)
|
2010-05-31 14:28:19 +08:00
|
|
|
{
|
2010-10-08 16:24:15 +08:00
|
|
|
siginfo_t info;
|
|
|
|
|
|
|
|
info.si_signo = SIGBUS;
|
|
|
|
info.si_errno = 0;
|
|
|
|
info.si_code = BUS_MCEERR_AR;
|
|
|
|
info.si_addr = (void __user *)address;
|
|
|
|
info.si_addr_lsb = PAGE_SHIFT;
|
2010-05-31 14:28:19 +08:00
|
|
|
|
2010-10-08 16:24:15 +08:00
|
|
|
send_sig_info(SIGBUS, &info, tsk);
|
2010-05-31 14:28:19 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn)
|
|
|
|
{
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
if (is_hwpoison_pfn(pfn)) {
|
2010-10-08 16:24:15 +08:00
|
|
|
kvm_send_hwpoison_signal(gfn_to_hva(kvm, gfn), current);
|
2010-05-31 14:28:19 +08:00
|
|
|
return 0;
|
2010-07-08 01:16:45 +08:00
|
|
|
} else if (is_fault_pfn(pfn))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2010-05-31 14:28:19 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2011-01-14 07:46:48 +08:00
|
|
|
static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
|
|
|
|
gfn_t *gfnp, pfn_t *pfnp, int *levelp)
|
|
|
|
{
|
|
|
|
pfn_t pfn = *pfnp;
|
|
|
|
gfn_t gfn = *gfnp;
|
|
|
|
int level = *levelp;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if it's a transparent hugepage. If this would be an
|
|
|
|
* hugetlbfs page, level wouldn't be set to
|
|
|
|
* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
|
|
|
|
* here.
|
|
|
|
*/
|
|
|
|
if (!is_error_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
|
|
|
|
level == PT_PAGE_TABLE_LEVEL &&
|
|
|
|
PageTransCompound(pfn_to_page(pfn)) &&
|
|
|
|
!has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
|
|
|
|
unsigned long mask;
|
|
|
|
/*
|
|
|
|
* mmu_notifier_retry was successful and we hold the
|
|
|
|
* mmu_lock here, so the pmd can't become splitting
|
|
|
|
* from under us, and in turn
|
|
|
|
* __split_huge_page_refcount() can't run from under
|
|
|
|
* us and we can safely transfer the refcount from
|
|
|
|
* PG_tail to PG_head as we switch the pfn to tail to
|
|
|
|
* head.
|
|
|
|
*/
|
|
|
|
*levelp = level = PT_DIRECTORY_LEVEL;
|
|
|
|
mask = KVM_PAGES_PER_HPAGE(level) - 1;
|
|
|
|
VM_BUG_ON((gfn & mask) != (pfn & mask));
|
|
|
|
if (pfn & mask) {
|
|
|
|
gfn &= ~mask;
|
|
|
|
*gfnp = gfn;
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
pfn &= ~mask;
|
|
|
|
if (!get_page_unless_zero(pfn_to_page(pfn)))
|
|
|
|
BUG();
|
|
|
|
*pfnp = pfn;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-07 10:48:06 +08:00
|
|
|
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
|
2010-11-12 14:49:11 +08:00
|
|
|
gva_t gva, pfn_t *pfn, bool write, bool *writable);
|
|
|
|
|
|
|
|
static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
|
2010-12-07 10:48:06 +08:00
|
|
|
bool prefault)
|
2007-12-21 08:18:22 +08:00
|
|
|
{
|
|
|
|
int r;
|
2009-07-27 22:30:44 +08:00
|
|
|
int level;
|
2011-01-14 07:46:48 +08:00
|
|
|
int force_pt_level;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long mmu_seq;
|
2010-10-23 00:18:18 +08:00
|
|
|
bool map_writable;
|
2007-12-21 08:18:26 +08:00
|
|
|
|
2011-01-14 07:46:48 +08:00
|
|
|
force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
|
|
|
|
if (likely(!force_pt_level)) {
|
|
|
|
level = mapping_level(vcpu, gfn);
|
|
|
|
/*
|
|
|
|
* This path builds a PAE pagetable - so we can map
|
|
|
|
* 2mb pages at maximum. Therefore check if the level
|
|
|
|
* is larger than that.
|
|
|
|
*/
|
|
|
|
if (level > PT_DIRECTORY_LEVEL)
|
|
|
|
level = PT_DIRECTORY_LEVEL;
|
2009-07-27 22:30:44 +08:00
|
|
|
|
2011-01-14 07:46:48 +08:00
|
|
|
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
|
|
|
} else
|
|
|
|
level = PT_PAGE_TABLE_LEVEL;
|
2008-02-23 22:44:30 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2010-11-12 14:49:11 +08:00
|
|
|
|
2010-12-07 10:48:06 +08:00
|
|
|
if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable))
|
2010-11-12 14:49:11 +08:00
|
|
|
return 0;
|
2007-12-21 08:18:26 +08:00
|
|
|
|
2008-01-24 17:44:11 +08:00
|
|
|
/* mmio */
|
2010-05-31 14:28:19 +08:00
|
|
|
if (is_error_pfn(pfn))
|
|
|
|
return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
|
2008-01-24 17:44:11 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 22:24:52 +08:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2007-12-31 21:27:49 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2011-01-14 07:46:48 +08:00
|
|
|
if (likely(!force_pt_level))
|
|
|
|
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
|
2010-12-07 10:34:42 +08:00
|
|
|
r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
|
|
|
|
prefault);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
|
2007-12-21 08:18:22 +08:00
|
|
|
return r;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
2007-12-21 08:18:22 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
static void mmu_free_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int i;
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
2007-06-05 17:17:03 +08:00
|
|
|
return;
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-09-10 23:31:00 +08:00
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL &&
|
|
|
|
(vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL ||
|
|
|
|
vcpu->arch.mmu.direct_map)) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(root);
|
|
|
|
--sp->root_count;
|
2010-06-04 21:55:29 +08:00
|
|
|
if (!sp->root_count && sp->role.invalid) {
|
|
|
|
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
|
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
|
|
|
}
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-01-06 08:36:40 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-04-12 22:35:58 +08:00
|
|
|
if (root) {
|
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = page_header(root);
|
|
|
|
--sp->root_count;
|
2008-02-21 03:47:24 +08:00
|
|
|
if (!sp->root_count && sp->role.invalid)
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
|
|
|
|
&invalid_list);
|
2007-04-12 22:35:58 +08:00
|
|
|
}
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
|
|
|
|
2009-05-13 05:55:45 +08:00
|
|
|
static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (!kvm_is_visible_gfn(vcpu->kvm, root_gfn)) {
|
2010-05-10 17:34:53 +08:00
|
|
|
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
|
2009-05-13 05:55:45 +08:00
|
|
|
ret = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:59 +08:00
|
|
|
static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_page *sp;
|
2010-10-04 00:51:39 +08:00
|
|
|
unsigned i;
|
2010-09-10 23:30:59 +08:00
|
|
|
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
|
|
|
sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
|
|
|
|
1, ACC_ALL, NULL);
|
|
|
|
++sp->root_count;
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
|
|
|
|
} else if (vcpu->arch.mmu.shadow_root_level == PT32E_ROOT_LEVEL) {
|
|
|
|
for (i = 0; i < 4; ++i) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2010-12-28 18:09:07 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
|
|
|
|
i << 30,
|
2010-09-10 23:30:59 +08:00
|
|
|
PT32_ROOT_LEVEL, 1, ACC_ALL,
|
|
|
|
NULL);
|
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
|
|
|
|
}
|
2010-09-27 18:02:12 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
|
2010-09-10 23:30:59 +08:00
|
|
|
} else
|
|
|
|
BUG();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:40 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-09-10 23:31:00 +08:00
|
|
|
u64 pdptr, pm_mask;
|
|
|
|
gfn_t root_gfn;
|
|
|
|
int i;
|
2007-01-06 08:36:51 +08:00
|
|
|
|
2010-09-10 23:30:42 +08:00
|
|
|
root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2010-09-10 23:30:59 +08:00
|
|
|
if (mmu_check_root(vcpu, root_gfn))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Do we shadow a long mode page table? If so we need to
|
|
|
|
* write-protect the guests page table root.
|
|
|
|
*/
|
|
|
|
if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
2010-09-10 23:30:59 +08:00
|
|
|
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-05-13 08:00:35 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2010-09-10 23:30:59 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
|
|
|
|
0, ACC_ALL, NULL);
|
2007-11-21 21:28:32 +08:00
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = root;
|
2009-05-13 05:55:45 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
2010-09-02 23:29:45 +08:00
|
|
|
|
2010-09-10 23:30:59 +08:00
|
|
|
/*
|
|
|
|
* We shadow a 32 bit page table. This may be a legacy 2-level
|
2010-09-10 23:31:00 +08:00
|
|
|
* or a PAE 3-level page table. In either case we need to be aware that
|
|
|
|
* the shadow page table may be a PAE or a long mode page table.
|
2010-09-10 23:30:59 +08:00
|
|
|
*/
|
2010-09-10 23:31:00 +08:00
|
|
|
pm_mask = PT_PRESENT_MASK;
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL)
|
|
|
|
pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
for (i = 0; i < 4; ++i) {
|
2007-12-13 23:50:52 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
2007-01-06 08:36:40 +08:00
|
|
|
|
|
|
|
ASSERT(!VALID_PAGE(root));
|
2007-12-13 23:50:52 +08:00
|
|
|
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
|
2010-09-10 23:30:58 +08:00
|
|
|
pdptr = kvm_pdptr_read_mmu(vcpu, &vcpu->arch.mmu, i);
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_present_gpte(pdptr)) {
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = 0;
|
2007-04-12 22:35:58 +08:00
|
|
|
continue;
|
|
|
|
}
|
2009-06-01 03:58:47 +08:00
|
|
|
root_gfn = pdptr >> PAGE_SHIFT;
|
2010-09-02 23:29:45 +08:00
|
|
|
if (mmu_check_root(vcpu, root_gfn))
|
|
|
|
return 1;
|
2010-04-27 08:00:05 +08:00
|
|
|
}
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-05-13 08:00:35 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2007-11-21 21:28:32 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
|
2010-09-10 23:30:59 +08:00
|
|
|
PT32_ROOT_LEVEL, 0,
|
2008-02-27 04:12:10 +08:00
|
|
|
ACC_ALL, NULL);
|
2007-11-21 21:28:32 +08:00
|
|
|
root = __pa(sp->spt);
|
|
|
|
++sp->root_count;
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
2010-09-10 23:31:00 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = root | pm_mask;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
2010-09-27 18:02:12 +08:00
|
|
|
vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
|
2010-09-10 23:31:00 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we shadow a 32 bit page table with a long mode page
|
|
|
|
* table we enter this path.
|
|
|
|
*/
|
|
|
|
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
|
|
|
|
if (vcpu->arch.mmu.lm_root == NULL) {
|
|
|
|
/*
|
|
|
|
* The additional page necessary for this is only
|
|
|
|
* allocated on demand.
|
|
|
|
*/
|
|
|
|
|
|
|
|
u64 *lm_root;
|
|
|
|
|
|
|
|
lm_root = (void*)get_zeroed_page(GFP_KERNEL);
|
|
|
|
if (lm_root == NULL)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
lm_root[0] = __pa(vcpu->arch.mmu.pae_root) | pm_mask;
|
|
|
|
|
|
|
|
vcpu->arch.mmu.lm_root = lm_root;
|
|
|
|
}
|
|
|
|
|
|
|
|
vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.lm_root);
|
|
|
|
}
|
|
|
|
|
2009-05-13 05:55:45 +08:00
|
|
|
return 0;
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:59 +08:00
|
|
|
static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (vcpu->arch.mmu.direct_map)
|
|
|
|
return mmu_alloc_direct_roots(vcpu);
|
|
|
|
else
|
|
|
|
return mmu_alloc_shadow_roots(vcpu);
|
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:34 +08:00
|
|
|
static void mmu_sync_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_mmu_page *sp;
|
|
|
|
|
2010-09-10 23:31:00 +08:00
|
|
|
if (vcpu->arch.mmu.direct_map)
|
|
|
|
return;
|
|
|
|
|
2008-09-24 00:18:34 +08:00
|
|
|
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
return;
|
2010-09-27 18:09:29 +08:00
|
|
|
|
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
|
2010-09-10 23:31:00 +08:00
|
|
|
if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
|
2008-09-24 00:18:34 +08:00
|
|
|
hpa_t root = vcpu->arch.mmu.root_hpa;
|
|
|
|
sp = page_header(root);
|
|
|
|
mmu_sync_children(vcpu, sp);
|
2010-11-12 14:46:08 +08:00
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
|
2008-09-24 00:18:34 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < 4; ++i) {
|
|
|
|
hpa_t root = vcpu->arch.mmu.pae_root[i];
|
|
|
|
|
2009-05-13 05:55:45 +08:00
|
|
|
if (root && VALID_PAGE(root)) {
|
2008-09-24 00:18:34 +08:00
|
|
|
root &= PT64_BASE_ADDR_MASK;
|
|
|
|
sp = page_header(root);
|
|
|
|
mmu_sync_children(vcpu, sp);
|
|
|
|
}
|
|
|
|
}
|
2010-09-27 18:09:29 +08:00
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
|
2008-09-24 00:18:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
mmu_sync_roots(vcpu);
|
2008-12-02 08:32:04 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2008-09-24 00:18:34 +08:00
|
|
|
}
|
|
|
|
|
2010-02-10 20:21:32 +08:00
|
|
|
static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
|
2010-11-22 23:53:26 +08:00
|
|
|
u32 access, struct x86_exception *exception)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-11-22 23:53:26 +08:00
|
|
|
if (exception)
|
|
|
|
exception->error_code = 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return vaddr;
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:50 +08:00
|
|
|
static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
|
2010-11-22 23:53:26 +08:00
|
|
|
u32 access,
|
|
|
|
struct x86_exception *exception)
|
2010-09-10 23:30:50 +08:00
|
|
|
{
|
2010-11-22 23:53:26 +08:00
|
|
|
if (exception)
|
|
|
|
exception->error_code = 0;
|
2010-09-10 23:30:50 +08:00
|
|
|
return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access);
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
|
2010-12-07 10:48:06 +08:00
|
|
|
u32 error_code, bool prefault)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-12-10 00:43:00 +08:00
|
|
|
gfn_t gfn;
|
2007-01-06 08:36:54 +08:00
|
|
|
int r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
|
2007-01-06 08:36:54 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
2007-01-06 08:36:53 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-10 00:43:00 +08:00
|
|
|
gfn = gva >> PAGE_SHIFT;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-10 00:43:00 +08:00
|
|
|
return nonpaging_map(vcpu, gva & PAGE_MASK,
|
2010-12-07 10:48:06 +08:00
|
|
|
error_code & PFERR_WRITE_MASK, gfn, prefault);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-10-20 21:18:02 +08:00
|
|
|
static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
|
2010-10-14 17:22:46 +08:00
|
|
|
{
|
|
|
|
struct kvm_arch_async_pf arch;
|
2010-12-07 10:35:25 +08:00
|
|
|
|
2010-10-14 17:22:53 +08:00
|
|
|
arch.token = (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id;
|
2010-10-14 17:22:46 +08:00
|
|
|
arch.gfn = gfn;
|
2010-11-12 14:49:55 +08:00
|
|
|
arch.direct_map = vcpu->arch.mmu.direct_map;
|
2010-12-07 10:35:25 +08:00
|
|
|
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);
|
2010-10-14 17:22:46 +08:00
|
|
|
|
|
|
|
return kvm_setup_async_pf(vcpu, gva, gfn, &arch);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool can_do_async_pf(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (unlikely(!irqchip_in_kernel(vcpu->kvm) ||
|
|
|
|
kvm_event_needs_reinjection(vcpu)))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return kvm_x86_ops->interrupt_allowed(vcpu);
|
|
|
|
}
|
|
|
|
|
2010-12-07 10:48:06 +08:00
|
|
|
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
|
2010-10-23 00:18:18 +08:00
|
|
|
gva_t gva, pfn_t *pfn, bool write, bool *writable)
|
2010-10-14 17:22:46 +08:00
|
|
|
{
|
|
|
|
bool async;
|
|
|
|
|
2010-10-23 00:18:18 +08:00
|
|
|
*pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async, write, writable);
|
2010-10-14 17:22:46 +08:00
|
|
|
|
|
|
|
if (!async)
|
|
|
|
return false; /* *pfn has correct page already */
|
|
|
|
|
|
|
|
put_page(pfn_to_page(*pfn));
|
|
|
|
|
2010-12-07 10:48:06 +08:00
|
|
|
if (!prefault && can_do_async_pf(vcpu)) {
|
2010-11-01 16:58:43 +08:00
|
|
|
trace_kvm_try_async_get_page(gva, gfn);
|
2010-10-14 17:22:46 +08:00
|
|
|
if (kvm_find_async_pf_gfn(vcpu, gfn)) {
|
|
|
|
trace_kvm_async_pf_doublefault(gva, gfn);
|
|
|
|
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
|
|
|
|
return true;
|
|
|
|
} else if (kvm_arch_setup_async_pf(vcpu, gva, gfn))
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2010-10-23 00:18:18 +08:00
|
|
|
*pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, writable);
|
2010-10-14 17:22:46 +08:00
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2010-10-18 00:13:42 +08:00
|
|
|
static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
|
2010-12-07 10:48:06 +08:00
|
|
|
bool prefault)
|
2008-02-07 20:47:44 +08:00
|
|
|
{
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2008-02-07 20:47:44 +08:00
|
|
|
int r;
|
2009-07-27 22:30:44 +08:00
|
|
|
int level;
|
2011-01-14 07:46:48 +08:00
|
|
|
int force_pt_level;
|
2008-02-23 22:44:30 +08:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long mmu_seq;
|
2010-10-23 00:18:18 +08:00
|
|
|
int write = error_code & PFERR_WRITE_MASK;
|
|
|
|
bool map_writable;
|
2008-02-07 20:47:44 +08:00
|
|
|
|
|
|
|
ASSERT(vcpu);
|
|
|
|
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
|
|
|
|
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
2011-01-14 07:46:48 +08:00
|
|
|
force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
|
|
|
|
if (likely(!force_pt_level)) {
|
|
|
|
level = mapping_level(vcpu, gfn);
|
|
|
|
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
|
|
|
} else
|
|
|
|
level = PT_PAGE_TABLE_LEVEL;
|
2009-07-27 22:30:44 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2010-10-14 17:22:46 +08:00
|
|
|
|
2010-12-07 10:48:06 +08:00
|
|
|
if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
|
2010-10-14 17:22:46 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* mmio */
|
2010-05-31 14:28:19 +08:00
|
|
|
if (is_error_pfn(pfn))
|
|
|
|
return kvm_handle_bad_page(vcpu->kvm, gfn, pfn);
|
2008-02-07 20:47:44 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 22:24:52 +08:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2008-02-07 20:47:44 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2011-01-14 07:46:48 +08:00
|
|
|
if (likely(!force_pt_level))
|
|
|
|
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
|
2010-10-23 00:18:18 +08:00
|
|
|
r = __direct_map(vcpu, gpa, write, map_writable,
|
2010-12-07 10:34:42 +08:00
|
|
|
level, gfn, pfn, prefault);
|
2008-02-07 20:47:44 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
return r;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
2008-02-07 20:47:44 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void nonpaging_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-01-06 08:36:40 +08:00
|
|
|
mmu_free_roots(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
static int nonpaging_init_context(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu *context)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
context->new_cr3 = nonpaging_new_cr3;
|
|
|
|
context->page_fault = nonpaging_page_fault;
|
|
|
|
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
|
|
|
context->free = nonpaging_free;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
context->prefetch_page = nonpaging_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = nonpaging_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = nonpaging_invlpg;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
context->root_level = 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->shadow_root_level = PT32E_ROOT_LEVEL;
|
2007-06-04 20:58:30 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
2010-09-10 23:30:39 +08:00
|
|
|
context->direct_map = true;
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = false;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-11-21 08:57:59 +08:00
|
|
|
void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-04-19 22:27:43 +08:00
|
|
|
++vcpu->stat.tlb_flush;
|
2010-05-10 17:34:53 +08:00
|
|
|
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void paging_new_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-12-05 23:30:00 +08:00
|
|
|
pgprintk("%s: cr3 %lx\n", __func__, kvm_read_cr3(vcpu));
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
mmu_free_roots(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:42 +08:00
|
|
|
static unsigned long get_cr3(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-12-05 23:30:00 +08:00
|
|
|
return kvm_read_cr3(vcpu);
|
2010-09-10 23:30:42 +08:00
|
|
|
}
|
|
|
|
|
2010-11-29 22:12:30 +08:00
|
|
|
static void inject_page_fault(struct kvm_vcpu *vcpu,
|
|
|
|
struct x86_exception *fault)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-11-29 22:12:30 +08:00
|
|
|
vcpu->arch.mmu.inject_page_fault(vcpu, fault);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void paging_free(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
nonpaging_free(vcpu);
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:45 +08:00
|
|
|
static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
|
2009-03-30 16:21:08 +08:00
|
|
|
{
|
|
|
|
int bit7;
|
|
|
|
|
|
|
|
bit7 = (gpte >> 7) & 1;
|
2010-09-10 23:30:45 +08:00
|
|
|
return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
|
2009-03-30 16:21:08 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PTTYPE 64
|
|
|
|
#include "paging_tmpl.h"
|
|
|
|
#undef PTTYPE
|
|
|
|
|
|
|
|
#define PTTYPE 32
|
|
|
|
#include "paging_tmpl.h"
|
|
|
|
#undef PTTYPE
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu *context,
|
|
|
|
int level)
|
2009-03-30 16:21:08 +08:00
|
|
|
{
|
|
|
|
int maxphyaddr = cpuid_maxphyaddr(vcpu);
|
|
|
|
u64 exb_bit_rsvd = 0;
|
|
|
|
|
2010-09-10 23:31:01 +08:00
|
|
|
if (!context->nx)
|
2009-03-30 16:21:08 +08:00
|
|
|
exb_bit_rsvd = rsvd_bits(63, 63);
|
|
|
|
switch (level) {
|
|
|
|
case PT32_ROOT_LEVEL:
|
|
|
|
/* no rsvd bits for 2 level 4K page table entries */
|
|
|
|
context->rsvd_bits_mask[0][1] = 0;
|
|
|
|
context->rsvd_bits_mask[0][0] = 0;
|
2010-03-19 17:58:53 +08:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
|
|
|
|
|
|
|
|
if (!is_pse(vcpu)) {
|
|
|
|
context->rsvd_bits_mask[1][1] = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2009-03-30 16:21:08 +08:00
|
|
|
if (is_cpuid_PSE36())
|
|
|
|
/* 36bits PSE 4MB page */
|
|
|
|
context->rsvd_bits_mask[1][1] = rsvd_bits(17, 21);
|
|
|
|
else
|
|
|
|
/* 32 bits PSE 4MB page */
|
|
|
|
context->rsvd_bits_mask[1][1] = rsvd_bits(13, 21);
|
|
|
|
break;
|
|
|
|
case PT32E_ROOT_LEVEL:
|
2009-03-31 23:03:45 +08:00
|
|
|
context->rsvd_bits_mask[0][2] =
|
|
|
|
rsvd_bits(maxphyaddr, 63) |
|
|
|
|
rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 62); /* PDE */
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 62); /* PTE */
|
|
|
|
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 62) |
|
|
|
|
rsvd_bits(13, 20); /* large page */
|
2010-03-19 17:58:53 +08:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
case PT64_ROOT_LEVEL:
|
|
|
|
context->rsvd_bits_mask[0][3] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
|
|
|
|
context->rsvd_bits_mask[0][2] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
|
|
|
|
context->rsvd_bits_mask[0][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 51);
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[0][0] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51);
|
|
|
|
context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
|
2009-07-27 22:30:45 +08:00
|
|
|
context->rsvd_bits_mask[1][2] = exb_bit_rsvd |
|
|
|
|
rsvd_bits(maxphyaddr, 51) |
|
|
|
|
rsvd_bits(13, 29);
|
2009-03-30 16:21:08 +08:00
|
|
|
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
|
2009-04-02 10:28:37 +08:00
|
|
|
rsvd_bits(maxphyaddr, 51) |
|
|
|
|
rsvd_bits(13, 20); /* large page */
|
2010-03-19 17:58:53 +08:00
|
|
|
context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
|
2009-03-30 16:21:08 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
static int paging64_init_context_common(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu *context,
|
|
|
|
int level)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = is_nx(vcpu);
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, context, level);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
ASSERT(is_pae(vcpu));
|
|
|
|
context->new_cr3 = paging_new_cr3;
|
|
|
|
context->page_fault = paging64_page_fault;
|
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
context->prefetch_page = paging64_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = paging64_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = paging64_invlpg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->free = paging_free;
|
2007-01-06 08:36:40 +08:00
|
|
|
context->root_level = level;
|
|
|
|
context->shadow_root_level = level;
|
2007-06-04 20:58:30 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
2010-09-10 23:30:39 +08:00
|
|
|
context->direct_map = false;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
static int paging64_init_context(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu *context)
|
2007-01-06 08:36:40 +08:00
|
|
|
{
|
2010-09-10 23:30:44 +08:00
|
|
|
return paging64_init_context_common(vcpu, context, PT64_ROOT_LEVEL);
|
2007-01-06 08:36:40 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
static int paging32_init_context(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu *context)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = false;
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
context->new_cr3 = paging_new_cr3;
|
|
|
|
context->page_fault = paging32_page_fault;
|
|
|
|
context->gva_to_gpa = paging32_gva_to_gpa;
|
|
|
|
context->free = paging_free;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
context->prefetch_page = paging32_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = paging32_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = paging32_invlpg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
context->shadow_root_level = PT32E_ROOT_LEVEL;
|
2007-06-04 20:58:30 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
2010-09-10 23:30:39 +08:00
|
|
|
context->direct_map = false;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
static int paging32E_init_context(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu *context)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-09-10 23:30:44 +08:00
|
|
|
return paging64_init_context_common(vcpu, context, PT32E_ROOT_LEVEL);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-02-07 20:47:44 +08:00
|
|
|
static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-09-10 23:30:49 +08:00
|
|
|
struct kvm_mmu *context = vcpu->arch.walk_mmu;
|
2008-02-07 20:47:44 +08:00
|
|
|
|
2010-12-21 22:26:01 +08:00
|
|
|
context->base_role.word = 0;
|
2008-02-07 20:47:44 +08:00
|
|
|
context->new_cr3 = nonpaging_new_cr3;
|
|
|
|
context->page_fault = tdp_page_fault;
|
|
|
|
context->free = nonpaging_free;
|
|
|
|
context->prefetch_page = nonpaging_prefetch_page;
|
2008-09-24 00:18:33 +08:00
|
|
|
context->sync_page = nonpaging_sync_page;
|
2008-09-24 00:18:35 +08:00
|
|
|
context->invlpg = nonpaging_invlpg;
|
2008-04-25 10:20:22 +08:00
|
|
|
context->shadow_root_level = kvm_x86_ops->get_tdp_level();
|
2008-02-07 20:47:44 +08:00
|
|
|
context->root_hpa = INVALID_PAGE;
|
2010-09-10 23:30:39 +08:00
|
|
|
context->direct_map = true;
|
2010-09-10 23:30:41 +08:00
|
|
|
context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
|
2010-09-10 23:30:42 +08:00
|
|
|
context->get_cr3 = get_cr3;
|
2010-09-10 23:30:43 +08:00
|
|
|
context->inject_page_fault = kvm_inject_page_fault;
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = is_nx(vcpu);
|
2008-02-07 20:47:44 +08:00
|
|
|
|
|
|
|
if (!is_paging(vcpu)) {
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = false;
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = nonpaging_gva_to_gpa;
|
|
|
|
context->root_level = 0;
|
|
|
|
} else if (is_long_mode(vcpu)) {
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = is_nx(vcpu);
|
2010-09-10 23:30:44 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, context, PT64_ROOT_LEVEL);
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
|
|
|
context->root_level = PT64_ROOT_LEVEL;
|
|
|
|
} else if (is_pae(vcpu)) {
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = is_nx(vcpu);
|
2010-09-10 23:30:44 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, context, PT32E_ROOT_LEVEL);
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = paging64_gva_to_gpa;
|
|
|
|
context->root_level = PT32E_ROOT_LEVEL;
|
|
|
|
} else {
|
2010-09-10 23:31:01 +08:00
|
|
|
context->nx = false;
|
2010-09-10 23:30:44 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
|
2008-02-07 20:47:44 +08:00
|
|
|
context->gva_to_gpa = paging32_gva_to_gpa;
|
|
|
|
context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:44 +08:00
|
|
|
int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2008-12-22 01:20:09 +08:00
|
|
|
int r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
if (!is_paging(vcpu))
|
2010-09-10 23:30:44 +08:00
|
|
|
r = nonpaging_init_context(vcpu, context);
|
2006-12-30 08:49:37 +08:00
|
|
|
else if (is_long_mode(vcpu))
|
2010-09-10 23:30:44 +08:00
|
|
|
r = paging64_init_context(vcpu, context);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
else if (is_pae(vcpu))
|
2010-09-10 23:30:44 +08:00
|
|
|
r = paging32E_init_context(vcpu, context);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
else
|
2010-09-10 23:30:44 +08:00
|
|
|
r = paging32_init_context(vcpu, context);
|
2008-12-22 01:20:09 +08:00
|
|
|
|
2010-04-15 00:20:03 +08:00
|
|
|
vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
|
2010-09-10 23:30:40 +08:00
|
|
|
vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
|
2010-09-10 23:30:44 +08:00
|
|
|
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
|
|
|
|
|
|
|
|
static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-09-10 23:30:49 +08:00
|
|
|
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
|
2010-09-10 23:30:44 +08:00
|
|
|
|
2010-09-10 23:30:49 +08:00
|
|
|
vcpu->arch.walk_mmu->set_cr3 = kvm_x86_ops->set_cr3;
|
|
|
|
vcpu->arch.walk_mmu->get_cr3 = get_cr3;
|
|
|
|
vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
|
2008-12-22 01:20:09 +08:00
|
|
|
|
|
|
|
return r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:54 +08:00
|
|
|
static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct kvm_mmu *g_context = &vcpu->arch.nested_mmu;
|
|
|
|
|
|
|
|
g_context->get_cr3 = get_cr3;
|
|
|
|
g_context->inject_page_fault = kvm_inject_page_fault;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Note that arch.mmu.gva_to_gpa translates l2_gva to l1_gpa. The
|
|
|
|
* translation of l2_gpa to l1_gpa addresses is done using the
|
|
|
|
* arch.nested_mmu.gva_to_gpa function. Basically the gva_to_gpa
|
|
|
|
* functions between mmu and nested_mmu are swapped.
|
|
|
|
*/
|
|
|
|
if (!is_paging(vcpu)) {
|
2010-09-10 23:31:01 +08:00
|
|
|
g_context->nx = false;
|
2010-09-10 23:30:54 +08:00
|
|
|
g_context->root_level = 0;
|
|
|
|
g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
|
|
|
|
} else if (is_long_mode(vcpu)) {
|
2010-09-10 23:31:01 +08:00
|
|
|
g_context->nx = is_nx(vcpu);
|
2010-09-10 23:30:54 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, g_context, PT64_ROOT_LEVEL);
|
|
|
|
g_context->root_level = PT64_ROOT_LEVEL;
|
|
|
|
g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
|
|
|
|
} else if (is_pae(vcpu)) {
|
2010-09-10 23:31:01 +08:00
|
|
|
g_context->nx = is_nx(vcpu);
|
2010-09-10 23:30:54 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, g_context, PT32E_ROOT_LEVEL);
|
|
|
|
g_context->root_level = PT32E_ROOT_LEVEL;
|
|
|
|
g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
|
|
|
|
} else {
|
2010-09-10 23:31:01 +08:00
|
|
|
g_context->nx = false;
|
2010-09-10 23:30:54 +08:00
|
|
|
reset_rsvds_bits_mask(vcpu, g_context, PT32_ROOT_LEVEL);
|
|
|
|
g_context->root_level = PT32_ROOT_LEVEL;
|
|
|
|
g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-02-07 20:47:44 +08:00
|
|
|
static int init_kvm_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2008-04-03 03:46:56 +08:00
|
|
|
vcpu->arch.update_pte.pfn = bad_pfn;
|
|
|
|
|
2010-09-10 23:30:54 +08:00
|
|
|
if (mmu_is_nested(vcpu))
|
|
|
|
return init_kvm_nested_mmu(vcpu);
|
|
|
|
else if (tdp_enabled)
|
2008-02-07 20:47:44 +08:00
|
|
|
return init_kvm_tdp_mmu(vcpu);
|
|
|
|
else
|
|
|
|
return init_kvm_softmmu(vcpu);
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void destroy_kvm_mmu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2010-05-12 16:40:41 +08:00
|
|
|
if (VALID_PAGE(vcpu->arch.mmu.root_hpa))
|
|
|
|
/* mmu.free() should set root_hpa = INVALID_PAGE */
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.free(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
|
2007-06-04 20:58:30 +08:00
|
|
|
{
|
|
|
|
destroy_kvm_mmu(vcpu);
|
|
|
|
return init_kvm_mmu(vcpu);
|
|
|
|
}
|
2007-10-10 14:26:45 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
|
2007-06-04 20:58:30 +08:00
|
|
|
|
|
|
|
int kvm_mmu_load(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-01-06 08:36:53 +08:00
|
|
|
int r;
|
|
|
|
|
2007-01-06 08:36:54 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
2007-06-04 20:58:30 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2009-05-13 05:55:45 +08:00
|
|
|
r = mmu_alloc_roots(vcpu);
|
2010-05-04 17:58:32 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-09-24 00:18:34 +08:00
|
|
|
mmu_sync_roots(vcpu);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2009-05-13 05:55:45 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
2009-07-09 17:00:42 +08:00
|
|
|
/* set_cr3() should ensure TLB has been flushed */
|
2010-09-10 23:30:40 +08:00
|
|
|
vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
|
2007-01-06 08:36:53 +08:00
|
|
|
out:
|
|
|
|
return r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2007-06-04 20:58:30 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_load);
|
|
|
|
|
|
|
|
void kvm_mmu_unload(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
mmu_free_roots(vcpu);
|
|
|
|
}
|
2010-09-10 23:31:03 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_unload);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-05-01 19:16:52 +08:00
|
|
|
static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp,
|
2007-03-08 23:13:32 +08:00
|
|
|
u64 *spte)
|
|
|
|
{
|
|
|
|
u64 pte;
|
|
|
|
struct kvm_mmu_page *child;
|
|
|
|
|
|
|
|
pte = *spte;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
if (is_shadow_present_pte(pte)) {
|
2009-06-10 23:27:03 +08:00
|
|
|
if (is_last_spte(pte, sp->role.level))
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(vcpu->kvm, spte, shadow_trap_nonpresent_pte);
|
2007-03-08 23:13:32 +08:00
|
|
|
else {
|
|
|
|
child = page_header(pte & PT64_BASE_ADDR_MASK);
|
2007-07-17 18:04:56 +08:00
|
|
|
mmu_page_remove_parent_pte(child, spte);
|
2007-03-08 23:13:32 +08:00
|
|
|
}
|
|
|
|
}
|
2009-06-10 19:24:23 +08:00
|
|
|
__set_spte(spte, shadow_trap_nonpresent_pte);
|
2008-02-23 22:44:30 +08:00
|
|
|
if (is_large_pte(pte))
|
|
|
|
--vcpu->kvm->stat.lpages;
|
2007-03-08 23:13:32 +08:00
|
|
|
}
|
|
|
|
|
2007-05-01 21:53:31 +08:00
|
|
|
static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp,
|
2007-05-01 21:53:31 +08:00
|
|
|
u64 *spte,
|
2008-01-07 17:14:20 +08:00
|
|
|
const void *new)
|
2007-05-01 21:53:31 +08:00
|
|
|
{
|
2008-06-12 07:32:40 +08:00
|
|
|
if (sp->role.level != PT_PAGE_TABLE_LEVEL) {
|
2009-07-27 22:30:46 +08:00
|
|
|
++vcpu->kvm->stat.mmu_pde_zapped;
|
|
|
|
return;
|
2008-06-12 07:32:40 +08:00
|
|
|
}
|
2007-05-01 21:53:31 +08:00
|
|
|
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_pte_updated;
|
2010-04-15 00:20:03 +08:00
|
|
|
if (!sp->role.cr4_pae)
|
2008-01-07 17:14:20 +08:00
|
|
|
paging32_update_pte(vcpu, sp, spte, new);
|
2007-05-01 21:53:31 +08:00
|
|
|
else
|
2008-01-07 17:14:20 +08:00
|
|
|
paging64_update_pte(vcpu, sp, spte, new);
|
2007-05-01 21:53:31 +08:00
|
|
|
}
|
|
|
|
|
2007-11-21 08:06:21 +08:00
|
|
|
static bool need_remote_flush(u64 old, u64 new)
|
|
|
|
{
|
|
|
|
if (!is_shadow_present_pte(old))
|
|
|
|
return false;
|
|
|
|
if (!is_shadow_present_pte(new))
|
|
|
|
return true;
|
|
|
|
if ((old ^ new) & PT64_BASE_ADDR_MASK)
|
|
|
|
return true;
|
|
|
|
old ^= PT64_NX_MASK;
|
|
|
|
new ^= PT64_NX_MASK;
|
|
|
|
return (old & ~new & PT64_PERM_MASK) != 0;
|
|
|
|
}
|
|
|
|
|
2010-06-04 21:56:59 +08:00
|
|
|
static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page,
|
|
|
|
bool remote_flush, bool local_flush)
|
2007-11-21 08:06:21 +08:00
|
|
|
{
|
2010-06-04 21:56:59 +08:00
|
|
|
if (zap_page)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (remote_flush)
|
2007-11-21 08:06:21 +08:00
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2010-06-04 21:56:59 +08:00
|
|
|
else if (local_flush)
|
2007-11-21 08:06:21 +08:00
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
|
2007-09-23 20:10:49 +08:00
|
|
|
static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
u64 *spte = vcpu->arch.last_pte_updated;
|
2007-09-23 20:10:49 +08:00
|
|
|
|
2008-04-25 21:13:50 +08:00
|
|
|
return !!(spte && (*spte & shadow_accessed_mask));
|
2007-09-23 20:10:49 +08:00
|
|
|
}
|
|
|
|
|
2007-12-30 18:29:05 +08:00
|
|
|
static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
2010-03-15 19:59:53 +08:00
|
|
|
u64 gpte)
|
2007-12-30 18:29:05 +08:00
|
|
|
{
|
|
|
|
gfn_t gfn;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2007-12-30 18:29:05 +08:00
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_present_gpte(gpte))
|
2007-12-30 18:29:05 +08:00
|
|
|
return;
|
|
|
|
gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
|
2008-02-11 00:04:15 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
vcpu->arch.update_pte.mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn = gfn_to_pfn(vcpu->kvm, gfn);
|
2008-02-11 00:04:15 +08:00
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
2008-01-24 17:44:11 +08:00
|
|
|
return;
|
|
|
|
}
|
2008-04-03 03:46:56 +08:00
|
|
|
vcpu->arch.update_pte.pfn = pfn;
|
2007-12-30 18:29:05 +08:00
|
|
|
}
|
|
|
|
|
2008-05-15 18:51:35 +08:00
|
|
|
static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
u64 *spte = vcpu->arch.last_pte_updated;
|
|
|
|
|
|
|
|
if (spte
|
|
|
|
&& vcpu->arch.last_pte_gfn == gfn
|
|
|
|
&& shadow_accessed_mask
|
|
|
|
&& !(*spte & shadow_accessed_mask)
|
|
|
|
&& is_shadow_present_pte(*spte))
|
|
|
|
set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
|
|
|
|
}
|
|
|
|
|
2007-05-01 19:16:52 +08:00
|
|
|
void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
|
2008-12-02 08:32:05 +08:00
|
|
|
const u8 *new, int bytes,
|
|
|
|
bool guest_initiated)
|
2007-01-06 08:36:44 +08:00
|
|
|
{
|
2007-01-06 08:36:45 +08:00
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2010-07-16 11:19:51 +08:00
|
|
|
union kvm_mmu_page_role mask = { .word = 0 };
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-06-04 21:56:11 +08:00
|
|
|
struct hlist_node *node;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2008-01-07 17:14:20 +08:00
|
|
|
u64 entry, gentry;
|
2007-01-06 08:36:45 +08:00
|
|
|
u64 *spte;
|
|
|
|
unsigned offset = offset_in_page(gpa);
|
2007-01-06 08:36:48 +08:00
|
|
|
unsigned pte_size;
|
2007-01-06 08:36:45 +08:00
|
|
|
unsigned page_offset;
|
2007-01-06 08:36:48 +08:00
|
|
|
unsigned misaligned;
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 21:44:05 +08:00
|
|
|
unsigned quadrant;
|
2007-01-06 08:36:45 +08:00
|
|
|
int level;
|
2007-01-06 08:36:50 +08:00
|
|
|
int flooded = 0;
|
2007-03-08 23:13:32 +08:00
|
|
|
int npte;
|
2008-01-07 17:14:20 +08:00
|
|
|
int r;
|
2010-03-15 19:59:57 +08:00
|
|
|
int invlpg_counter;
|
2010-06-04 21:56:59 +08:00
|
|
|
bool remote_flush, local_flush, zap_page;
|
|
|
|
|
|
|
|
zap_page = remote_flush = local_flush = false;
|
2007-01-06 08:36:45 +08:00
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
|
2010-03-15 19:59:53 +08:00
|
|
|
|
2010-03-15 19:59:57 +08:00
|
|
|
invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
|
2010-03-15 19:59:53 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Assume that the pte write on a page table of the same type
|
2011-03-04 19:00:00 +08:00
|
|
|
* as the current vcpu paging mode since we update the sptes only
|
|
|
|
* when they have the same mode.
|
2010-03-15 19:59:53 +08:00
|
|
|
*/
|
2010-03-15 19:59:57 +08:00
|
|
|
if ((is_pae(vcpu) && bytes == 4) || !new) {
|
2010-03-15 19:59:53 +08:00
|
|
|
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
|
2010-03-15 19:59:57 +08:00
|
|
|
if (is_pae(vcpu)) {
|
|
|
|
gpa &= ~(gpa_t)7;
|
|
|
|
bytes = 8;
|
|
|
|
}
|
|
|
|
r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
|
2010-03-15 19:59:53 +08:00
|
|
|
if (r)
|
|
|
|
gentry = 0;
|
2010-03-15 19:59:57 +08:00
|
|
|
new = (const u8 *)&gentry;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (bytes) {
|
|
|
|
case 4:
|
|
|
|
gentry = *(const u32 *)new;
|
|
|
|
break;
|
|
|
|
case 8:
|
|
|
|
gentry = *(const u64 *)new;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
gentry = 0;
|
|
|
|
break;
|
2010-03-15 19:59:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2010-03-15 19:59:57 +08:00
|
|
|
if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
|
|
|
|
gentry = 0;
|
2007-12-31 21:27:49 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_pte_write;
|
2010-08-30 18:22:53 +08:00
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
|
2008-12-02 08:32:05 +08:00
|
|
|
if (guest_initiated) {
|
2011-03-04 18:58:02 +08:00
|
|
|
kvm_mmu_access_page(vcpu, gfn);
|
2008-12-02 08:32:05 +08:00
|
|
|
if (gfn == vcpu->arch.last_pt_write_gfn
|
|
|
|
&& !last_updated_pte_accessed(vcpu)) {
|
|
|
|
++vcpu->arch.last_pt_write_count;
|
|
|
|
if (vcpu->arch.last_pt_write_count >= 3)
|
|
|
|
flooded = 1;
|
|
|
|
} else {
|
|
|
|
vcpu->arch.last_pt_write_gfn = gfn;
|
|
|
|
vcpu->arch.last_pt_write_count = 1;
|
|
|
|
vcpu->arch.last_pte_updated = NULL;
|
|
|
|
}
|
2007-01-06 08:36:50 +08:00
|
|
|
}
|
2010-04-16 16:35:54 +08:00
|
|
|
|
2010-07-16 11:19:51 +08:00
|
|
|
mask.cr0_wp = mask.cr4_pae = mask.nxe = 1;
|
2010-06-04 21:56:11 +08:00
|
|
|
for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) {
|
2010-04-15 00:20:03 +08:00
|
|
|
pte_size = sp->role.cr4_pae ? 8 : 4;
|
2007-01-06 08:36:48 +08:00
|
|
|
misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
|
2007-04-30 19:47:02 +08:00
|
|
|
misaligned |= bytes < 4;
|
2007-01-06 08:36:50 +08:00
|
|
|
if (misaligned || flooded) {
|
2007-01-06 08:36:48 +08:00
|
|
|
/*
|
|
|
|
* Misaligned accesses are too much trouble to fix
|
|
|
|
* up; also, they usually indicate a page is not used
|
|
|
|
* as a page table.
|
2007-01-06 08:36:50 +08:00
|
|
|
*
|
|
|
|
* If we're seeing too many writes to a page,
|
|
|
|
* it may no longer be a page table, or we may be
|
|
|
|
* forking, in which case it is better to unmap the
|
|
|
|
* page.
|
2007-01-06 08:36:48 +08:00
|
|
|
*/
|
|
|
|
pgprintk("misaligned: gpa %llx bytes %d role %x\n",
|
2007-11-21 21:28:32 +08:00
|
|
|
gpa, bytes, sp->role.word);
|
2010-06-04 21:56:59 +08:00
|
|
|
zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
|
2010-06-04 21:56:11 +08:00
|
|
|
&invalid_list);
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_flooded;
|
2007-01-06 08:36:48 +08:00
|
|
|
continue;
|
|
|
|
}
|
2007-01-06 08:36:45 +08:00
|
|
|
page_offset = offset;
|
2007-11-21 21:28:32 +08:00
|
|
|
level = sp->role.level;
|
2007-03-08 23:13:32 +08:00
|
|
|
npte = 1;
|
2010-04-15 00:20:03 +08:00
|
|
|
if (!sp->role.cr4_pae) {
|
2007-03-08 23:13:32 +08:00
|
|
|
page_offset <<= 1; /* 32->64 */
|
|
|
|
/*
|
|
|
|
* A 32-bit pde maps 4MB while the shadow pdes map
|
|
|
|
* only 2MB. So we need to double the offset again
|
|
|
|
* and zap two pdes instead of one.
|
|
|
|
*/
|
|
|
|
if (level == PT32_ROOT_LEVEL) {
|
2007-04-18 16:18:18 +08:00
|
|
|
page_offset &= ~7; /* kill rounding error */
|
2007-03-08 23:13:32 +08:00
|
|
|
page_offset <<= 1;
|
|
|
|
npte = 2;
|
|
|
|
}
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 21:44:05 +08:00
|
|
|
quadrant = page_offset >> PAGE_SHIFT;
|
2007-01-06 08:36:45 +08:00
|
|
|
page_offset &= ~PAGE_MASK;
|
2007-11-21 21:28:32 +08:00
|
|
|
if (quadrant != sp->role.quadrant)
|
KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-01 21:44:05 +08:00
|
|
|
continue;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
2010-06-04 21:56:59 +08:00
|
|
|
local_flush = true;
|
2007-11-21 21:28:32 +08:00
|
|
|
spte = &sp->spt[page_offset / sizeof(*spte)];
|
2007-03-08 23:13:32 +08:00
|
|
|
while (npte--) {
|
2007-11-21 08:06:21 +08:00
|
|
|
entry = *spte;
|
2007-11-21 21:28:32 +08:00
|
|
|
mmu_pte_write_zap_pte(vcpu, sp, spte);
|
2010-07-16 11:19:51 +08:00
|
|
|
if (gentry &&
|
|
|
|
!((sp->role.word ^ vcpu->arch.mmu.base_role.word)
|
|
|
|
& mask.word))
|
2010-03-15 19:59:53 +08:00
|
|
|
mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
|
2010-06-04 21:56:59 +08:00
|
|
|
if (!remote_flush && need_remote_flush(entry, *spte))
|
|
|
|
remote_flush = true;
|
2007-03-08 23:13:32 +08:00
|
|
|
++spte;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
|
|
|
}
|
2010-06-04 21:56:59 +08:00
|
|
|
mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
2010-08-30 18:22:53 +08:00
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2008-04-03 03:46:56 +08:00
|
|
|
if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
|
|
|
|
kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
|
|
|
|
vcpu->arch.update_pte.pfn = bad_pfn;
|
2007-12-30 18:29:05 +08:00
|
|
|
}
|
2007-01-06 08:36:44 +08:00
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:45 +08:00
|
|
|
int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
2007-12-21 08:18:22 +08:00
|
|
|
gpa_t gpa;
|
|
|
|
int r;
|
2007-01-06 08:36:45 +08:00
|
|
|
|
2010-09-10 23:30:39 +08:00
|
|
|
if (vcpu->arch.mmu.direct_map)
|
2009-08-27 18:37:06 +08:00
|
|
|
return 0;
|
|
|
|
|
2010-02-10 20:21:32 +08:00
|
|
|
gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
|
2007-12-21 08:18:22 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2007-12-21 08:18:22 +08:00
|
|
|
r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2007-12-21 08:18:22 +08:00
|
|
|
return r;
|
2007-01-06 08:36:45 +08:00
|
|
|
}
|
2008-07-19 13:57:05 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
|
2007-01-06 08:36:45 +08:00
|
|
|
|
2007-09-15 01:26:06 +08:00
|
|
|
void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
|
2007-01-06 08:36:47 +08:00
|
|
|
{
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2010-06-04 21:54:38 +08:00
|
|
|
|
2010-08-20 09:11:05 +08:00
|
|
|
while (kvm_mmu_available_pages(vcpu->kvm) < KVM_REFILL_PAGES &&
|
2009-07-29 02:26:58 +08:00
|
|
|
!list_empty(&vcpu->kvm->arch.active_mmu_pages)) {
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2007-01-06 08:36:47 +08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
sp = container_of(vcpu->kvm->arch.active_mmu_pages.prev,
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page, link);
|
2010-08-20 09:11:05 +08:00
|
|
|
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
|
2010-08-24 10:31:07 +08:00
|
|
|
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
2007-11-18 22:37:07 +08:00
|
|
|
++vcpu->kvm->stat.mmu_recycled;
|
2007-01-06 08:36:47 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-21 18:12:07 +08:00
|
|
|
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
|
|
|
|
void *insn, int insn_len)
|
2007-10-29 00:48:59 +08:00
|
|
|
{
|
|
|
|
int r;
|
|
|
|
enum emulation_result er;
|
|
|
|
|
2010-10-18 00:13:42 +08:00
|
|
|
r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
|
2007-10-29 00:48:59 +08:00
|
|
|
if (r < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!r) {
|
|
|
|
r = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2007-10-29 00:52:05 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
2010-12-21 18:12:07 +08:00
|
|
|
er = x86_emulate_instruction(vcpu, cr2, 0, insn, insn_len);
|
2007-10-29 00:48:59 +08:00
|
|
|
|
|
|
|
switch (er) {
|
|
|
|
case EMULATE_DONE:
|
|
|
|
return 1;
|
|
|
|
case EMULATE_DO_MMIO:
|
|
|
|
++vcpu->stat.mmio_exits;
|
2010-05-10 16:16:56 +08:00
|
|
|
/* fall through */
|
2007-10-29 00:48:59 +08:00
|
|
|
case EMULATE_FAIL:
|
2009-06-11 20:43:28 +08:00
|
|
|
return 0;
|
2007-10-29 00:48:59 +08:00
|
|
|
default:
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
|
|
|
|
|
2008-09-24 00:18:35 +08:00
|
|
|
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
|
|
|
|
{
|
|
|
|
vcpu->arch.mmu.invlpg(vcpu, gva);
|
|
|
|
kvm_mmu_flush_tlb(vcpu);
|
|
|
|
++vcpu->stat.invlpg;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
|
|
|
|
|
2008-02-07 20:47:41 +08:00
|
|
|
void kvm_enable_tdp(void)
|
|
|
|
{
|
|
|
|
tdp_enabled = true;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_enable_tdp);
|
|
|
|
|
2008-07-15 02:36:36 +08:00
|
|
|
void kvm_disable_tdp(void)
|
|
|
|
{
|
|
|
|
tdp_enabled = false;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_disable_tdp);
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
static void free_mmu_pages(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-12-13 23:50:52 +08:00
|
|
|
free_page((unsigned long)vcpu->arch.mmu.pae_root);
|
2010-09-10 23:31:00 +08:00
|
|
|
if (vcpu->arch.mmu.lm_root != NULL)
|
|
|
|
free_page((unsigned long)vcpu->arch.mmu.lm_root);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int alloc_mmu_pages(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2007-01-06 08:36:40 +08:00
|
|
|
struct page *page;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
ASSERT(vcpu);
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
/*
|
|
|
|
* When emulating 32-bit mode, cr3 is only 32 bits even on x86_64.
|
|
|
|
* Therefore we need to allocate shadow page tables in the first
|
|
|
|
* 4GB of memory, which happens to fit the DMA32 zone.
|
|
|
|
*/
|
|
|
|
page = alloc_page(GFP_KERNEL | __GFP_DMA32);
|
|
|
|
if (!page)
|
2010-01-22 16:55:05 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root = page_address(page);
|
2007-01-06 08:36:40 +08:00
|
|
|
for (i = 0; i < 4; ++i)
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.mmu.pae_root[i] = INVALID_PAGE;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
int kvm_mmu_create(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
return alloc_mmu_pages(vcpu);
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
int kvm_mmu_setup(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
2007-12-13 23:50:52 +08:00
|
|
|
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
|
2006-12-22 17:05:28 +08:00
|
|
|
|
2006-12-30 08:50:01 +08:00
|
|
|
return init_kvm_mmu(vcpu);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
int i;
|
|
|
|
u64 *pt;
|
|
|
|
|
2008-10-16 17:30:57 +08:00
|
|
|
if (!test_bit(slot, sp->slot_bitmap))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
continue;
|
|
|
|
|
2007-11-21 21:28:32 +08:00
|
|
|
pt = sp->spt;
|
2010-12-27 18:08:45 +08:00
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
|
2011-03-04 18:56:41 +08:00
|
|
|
if (!is_shadow_present_pte(pt[i]) ||
|
|
|
|
!is_last_spte(pt[i], sp->role.level))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (is_large_pte(pt[i])) {
|
2010-12-27 18:08:45 +08:00
|
|
|
drop_spte(kvm, &pt[i],
|
|
|
|
shadow_trap_nonpresent_pte);
|
|
|
|
--kvm->stat.lpages;
|
2011-03-04 18:56:41 +08:00
|
|
|
continue;
|
2010-12-27 18:08:45 +08:00
|
|
|
}
|
2011-03-04 18:56:41 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* avoid RMW */
|
2010-05-27 16:09:48 +08:00
|
|
|
if (is_writable_pte(pt[i]))
|
2010-12-06 00:11:33 +08:00
|
|
|
update_spte(&pt[i], pt[i] & ~PT_WRITABLE_MASK);
|
2010-12-27 18:08:45 +08:00
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2008-08-27 21:40:51 +08:00
|
|
|
kvm_flush_remote_tlbs(kvm);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2007-01-06 08:36:56 +08:00
|
|
|
|
2007-07-17 18:04:56 +08:00
|
|
|
void kvm_mmu_zap_all(struct kvm *kvm)
|
2007-03-30 18:06:33 +08:00
|
|
|
{
|
2007-11-21 21:28:32 +08:00
|
|
|
struct kvm_mmu_page *sp, *node;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2007-03-30 18:06:33 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
2010-04-16 16:35:54 +08:00
|
|
|
restart:
|
2007-12-14 10:01:48 +08:00
|
|
|
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
|
2010-06-04 21:55:29 +08:00
|
|
|
if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
|
2010-04-16 16:35:54 +08:00
|
|
|
goto restart;
|
|
|
|
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2007-03-30 18:06:33 +08:00
|
|
|
}
|
|
|
|
|
2010-06-04 21:55:29 +08:00
|
|
|
static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
|
|
|
|
struct list_head *invalid_list)
|
2008-03-30 20:17:21 +08:00
|
|
|
{
|
|
|
|
struct kvm_mmu_page *page;
|
|
|
|
|
|
|
|
page = container_of(kvm->arch.active_mmu_pages.prev,
|
|
|
|
struct kvm_mmu_page, link);
|
2010-06-04 21:55:29 +08:00
|
|
|
return kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
|
2008-03-30 20:17:21 +08:00
|
|
|
}
|
|
|
|
|
2010-07-19 12:56:17 +08:00
|
|
|
static int mmu_shrink(struct shrinker *shrink, int nr_to_scan, gfp_t gfp_mask)
|
2008-03-30 20:17:21 +08:00
|
|
|
{
|
|
|
|
struct kvm *kvm;
|
|
|
|
struct kvm *kvm_freed = NULL;
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
|
|
|
|
if (nr_to_scan == 0)
|
|
|
|
goto out;
|
2008-03-30 20:17:21 +08:00
|
|
|
|
2011-02-08 19:55:33 +08:00
|
|
|
raw_spin_lock(&kvm_lock);
|
2008-03-30 20:17:21 +08:00
|
|
|
|
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
int idx, freed_pages;
|
2010-06-04 21:55:29 +08:00
|
|
|
LIST_HEAD(invalid_list);
|
2008-03-30 20:17:21 +08:00
|
|
|
|
2009-12-24 00:35:25 +08:00
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2008-03-30 20:17:21 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
if (!kvm_freed && nr_to_scan > 0 &&
|
|
|
|
kvm->arch.n_used_mmu_pages > 0) {
|
2010-06-04 21:55:29 +08:00
|
|
|
freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm,
|
|
|
|
&invalid_list);
|
2008-03-30 20:17:21 +08:00
|
|
|
kvm_freed = kvm;
|
|
|
|
}
|
|
|
|
nr_to_scan--;
|
|
|
|
|
2010-06-04 21:55:29 +08:00
|
|
|
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
2008-03-30 20:17:21 +08:00
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2009-12-24 00:35:25 +08:00
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
2008-03-30 20:17:21 +08:00
|
|
|
}
|
|
|
|
if (kvm_freed)
|
|
|
|
list_move_tail(&kvm_freed->vm_list, &vm_list);
|
|
|
|
|
2011-02-08 19:55:33 +08:00
|
|
|
raw_spin_unlock(&kvm_lock);
|
2008-03-30 20:17:21 +08:00
|
|
|
|
KVM: create aggregate kvm_total_used_mmu_pages value
Of slab shrinkers, the VM code says:
* Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
* querying the cache size, so a fastpath for that case is appropriate.
and it *means* it. Look at how it calls the shrinkers:
nr_before = (*shrinker->shrink)(0, gfp_mask);
shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
So, if you do anything stupid in your shrinker, the VM will doubly
punish you.
The mmu_shrink() function takes the global kvm_lock, then acquires
every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then
we're going to take 101 locks. We do it twice, so each call takes
202 locks. If we're under memory pressure, we can have each cpu
trying to do this. It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
A 'percpu_counter' can be a large object, but we only have one
of these for the entire system. There are not any better
alternatives at the moment, especially ones that handle CPU
hotplug.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-20 09:11:37 +08:00
|
|
|
out:
|
|
|
|
return percpu_counter_read_positive(&kvm_total_used_mmu_pages);
|
2008-03-30 20:17:21 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct shrinker mmu_shrinker = {
|
|
|
|
.shrink = mmu_shrink,
|
|
|
|
.seeks = DEFAULT_SEEKS * 10,
|
|
|
|
};
|
|
|
|
|
2008-05-22 16:37:48 +08:00
|
|
|
static void mmu_destroy_caches(void)
|
2007-04-15 21:31:09 +08:00
|
|
|
{
|
|
|
|
if (pte_chain_cache)
|
|
|
|
kmem_cache_destroy(pte_chain_cache);
|
|
|
|
if (rmap_desc_cache)
|
|
|
|
kmem_cache_destroy(rmap_desc_cache);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (mmu_page_header_cache)
|
|
|
|
kmem_cache_destroy(mmu_page_header_cache);
|
2007-04-15 21:31:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_mmu_module_init(void)
|
|
|
|
{
|
|
|
|
pte_chain_cache = kmem_cache_create("kvm_pte_chain",
|
|
|
|
sizeof(struct kvm_pte_chain),
|
2007-07-20 09:11:58 +08:00
|
|
|
0, 0, NULL);
|
2007-04-15 21:31:09 +08:00
|
|
|
if (!pte_chain_cache)
|
|
|
|
goto nomem;
|
|
|
|
rmap_desc_cache = kmem_cache_create("kvm_rmap_desc",
|
|
|
|
sizeof(struct kvm_rmap_desc),
|
2007-07-20 09:11:58 +08:00
|
|
|
0, 0, NULL);
|
2007-04-15 21:31:09 +08:00
|
|
|
if (!rmap_desc_cache)
|
|
|
|
goto nomem;
|
|
|
|
|
2007-05-30 17:34:53 +08:00
|
|
|
mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header",
|
|
|
|
sizeof(struct kvm_mmu_page),
|
2007-07-20 09:11:58 +08:00
|
|
|
0, 0, NULL);
|
2007-05-30 17:34:53 +08:00
|
|
|
if (!mmu_page_header_cache)
|
|
|
|
goto nomem;
|
|
|
|
|
2010-08-23 16:13:15 +08:00
|
|
|
if (percpu_counter_init(&kvm_total_used_mmu_pages, 0))
|
|
|
|
goto nomem;
|
|
|
|
|
2008-03-30 20:17:21 +08:00
|
|
|
register_shrinker(&mmu_shrinker);
|
|
|
|
|
2007-04-15 21:31:09 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
nomem:
|
2008-03-30 20:17:21 +08:00
|
|
|
mmu_destroy_caches();
|
2007-04-15 21:31:09 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2007-11-20 13:11:38 +08:00
|
|
|
/*
|
|
|
|
* Caculate mmu pages needed for kvm.
|
|
|
|
*/
|
|
|
|
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
unsigned int nr_mmu_pages;
|
|
|
|
unsigned int nr_pages = 0;
|
2009-12-24 00:35:21 +08:00
|
|
|
struct kvm_memslots *slots;
|
2007-11-20 13:11:38 +08:00
|
|
|
|
2010-04-19 17:41:23 +08:00
|
|
|
slots = kvm_memslots(kvm);
|
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
for (i = 0; i < slots->nmemslots; i++)
|
|
|
|
nr_pages += slots->memslots[i].npages;
|
2007-11-20 13:11:38 +08:00
|
|
|
|
|
|
|
nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
|
|
|
|
nr_mmu_pages = max(nr_mmu_pages,
|
|
|
|
(unsigned int) KVM_MIN_ALLOC_MMU_PAGES);
|
|
|
|
|
|
|
|
return nr_mmu_pages;
|
|
|
|
}
|
|
|
|
|
2008-02-23 01:21:37 +08:00
|
|
|
static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
|
|
|
|
unsigned len)
|
|
|
|
{
|
|
|
|
if (len > buffer->len)
|
|
|
|
return NULL;
|
|
|
|
return buffer->ptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
|
|
|
|
unsigned len)
|
|
|
|
{
|
|
|
|
void *ret;
|
|
|
|
|
|
|
|
ret = pv_mmu_peek_buffer(buffer, len);
|
|
|
|
if (!ret)
|
|
|
|
return ret;
|
|
|
|
buffer->ptr += len;
|
|
|
|
buffer->len -= len;
|
|
|
|
buffer->processed += len;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
|
|
|
|
gpa_t addr, gpa_t value)
|
|
|
|
{
|
|
|
|
int bytes = 8;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
if (!is_long_mode(vcpu) && !is_pae(vcpu))
|
|
|
|
bytes = 4;
|
|
|
|
|
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
2008-03-30 07:17:59 +08:00
|
|
|
if (!emulator_write_phys(vcpu, addr, &value, bytes))
|
2008-02-23 01:21:37 +08:00
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2010-12-05 23:30:00 +08:00
|
|
|
(void)kvm_set_cr3(vcpu, kvm_read_cr3(vcpu));
|
2008-02-23 01:21:37 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
|
|
|
|
{
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
mmu_unshadow(vcpu->kvm, addr >> PAGE_SHIFT);
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_pv_mmu_op_one(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_pv_mmu_op_buffer *buffer)
|
|
|
|
{
|
|
|
|
struct kvm_mmu_op_header *header;
|
|
|
|
|
|
|
|
header = pv_mmu_peek_buffer(buffer, sizeof *header);
|
|
|
|
if (!header)
|
|
|
|
return 0;
|
|
|
|
switch (header->op) {
|
|
|
|
case KVM_MMU_OP_WRITE_PTE: {
|
|
|
|
struct kvm_mmu_op_write_pte *wpte;
|
|
|
|
|
|
|
|
wpte = pv_mmu_read_buffer(buffer, sizeof *wpte);
|
|
|
|
if (!wpte)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_write(vcpu, wpte->pte_phys,
|
|
|
|
wpte->pte_val);
|
|
|
|
}
|
|
|
|
case KVM_MMU_OP_FLUSH_TLB: {
|
|
|
|
struct kvm_mmu_op_flush_tlb *ftlb;
|
|
|
|
|
|
|
|
ftlb = pv_mmu_read_buffer(buffer, sizeof *ftlb);
|
|
|
|
if (!ftlb)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_flush_tlb(vcpu);
|
|
|
|
}
|
|
|
|
case KVM_MMU_OP_RELEASE_PT: {
|
|
|
|
struct kvm_mmu_op_release_pt *rpt;
|
|
|
|
|
|
|
|
rpt = pv_mmu_read_buffer(buffer, sizeof *rpt);
|
|
|
|
if (!rpt)
|
|
|
|
return 0;
|
|
|
|
return kvm_pv_mmu_release_pt(vcpu, rpt->pt_phys);
|
|
|
|
}
|
|
|
|
default: return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
|
|
|
|
gpa_t addr, unsigned long *ret)
|
|
|
|
{
|
|
|
|
int r;
|
2008-08-12 01:01:49 +08:00
|
|
|
struct kvm_pv_mmu_op_buffer *buffer = &vcpu->arch.mmu_op_buffer;
|
2008-02-23 01:21:37 +08:00
|
|
|
|
2008-08-12 01:01:49 +08:00
|
|
|
buffer->ptr = buffer->buf;
|
|
|
|
buffer->len = min_t(unsigned long, bytes, sizeof buffer->buf);
|
|
|
|
buffer->processed = 0;
|
2008-02-23 01:21:37 +08:00
|
|
|
|
2008-08-12 01:01:49 +08:00
|
|
|
r = kvm_read_guest(vcpu->kvm, addr, buffer->buf, buffer->len);
|
2008-02-23 01:21:37 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
2008-08-12 01:01:49 +08:00
|
|
|
while (buffer->len) {
|
|
|
|
r = kvm_pv_mmu_op_one(vcpu, buffer);
|
2008-02-23 01:21:37 +08:00
|
|
|
if (r < 0)
|
|
|
|
goto out;
|
|
|
|
if (r == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
r = 1;
|
|
|
|
out:
|
2008-08-12 01:01:49 +08:00
|
|
|
*ret = buffer->processed;
|
2008-02-23 01:21:37 +08:00
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2009-06-11 23:07:42 +08:00
|
|
|
int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
|
|
|
|
{
|
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
|
|
|
int nr_sptes = 0;
|
|
|
|
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
|
|
|
for_each_shadow_entry(vcpu, addr, iterator) {
|
|
|
|
sptes[iterator.level-1] = *iterator.sptep;
|
|
|
|
nr_sptes++;
|
|
|
|
if (!is_shadow_present_pte(*iterator.sptep))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
|
|
|
|
return nr_sptes;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
|
|
|
|
|
2010-09-27 18:07:07 +08:00
|
|
|
void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
ASSERT(vcpu);
|
|
|
|
|
|
|
|
destroy_kvm_mmu(vcpu);
|
|
|
|
free_mmu_pages(vcpu);
|
|
|
|
mmu_free_memory_caches(vcpu);
|
2010-12-23 16:08:35 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef CONFIG_KVM_MMU_AUDIT
|
|
|
|
#include "mmu_audit.c"
|
|
|
|
#else
|
|
|
|
static void mmu_audit_disable(void) { }
|
|
|
|
#endif
|
|
|
|
|
|
|
|
void kvm_mmu_module_exit(void)
|
|
|
|
{
|
|
|
|
mmu_destroy_caches();
|
|
|
|
percpu_counter_destroy(&kvm_total_used_mmu_pages);
|
|
|
|
unregister_shrinker(&mmu_shrinker);
|
2010-09-27 18:07:07 +08:00
|
|
|
mmu_audit_disable();
|
|
|
|
}
|