[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#ifndef __LINUX_KVM_H
|
|
|
|
#define __LINUX_KVM_H
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Userspace interface for /dev/kvm - kernel based virtual machine
|
|
|
|
*
|
2007-07-17 21:12:26 +08:00
|
|
|
* Note: you must update KVM_API_VERSION if you change this interface.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*/
|
|
|
|
|
2009-01-16 05:51:26 +08:00
|
|
|
#include <linux/types.h>
|
2008-03-13 01:10:45 +08:00
|
|
|
#include <linux/compiler.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#include <linux/ioctl.h>
|
2007-11-20 07:06:31 +08:00
|
|
|
#include <asm/kvm.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-04-29 21:25:49 +08:00
|
|
|
#define KVM_API_VERSION 12
|
2006-12-22 17:06:02 +08:00
|
|
|
|
2009-11-03 00:20:28 +08:00
|
|
|
/* *** Deprecated interfaces *** */
|
|
|
|
|
|
|
|
#define KVM_TRC_SHIFT 16
|
|
|
|
|
|
|
|
#define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT)
|
|
|
|
#define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1))
|
|
|
|
|
|
|
|
#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01)
|
|
|
|
#define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02)
|
|
|
|
#define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01)
|
|
|
|
|
|
|
|
#define KVM_TRC_HEAD_SIZE 12
|
|
|
|
#define KVM_TRC_CYCLE_SIZE 8
|
|
|
|
#define KVM_TRC_EXTRA_MAX 7
|
|
|
|
|
|
|
|
#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
|
|
|
|
#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
|
|
|
|
#define KVM_TRC_PEND_INTR (KVM_TRC_HANDLER + 0x04)
|
|
|
|
#define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05)
|
|
|
|
#define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06)
|
|
|
|
#define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07)
|
|
|
|
#define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08)
|
|
|
|
#define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09)
|
|
|
|
#define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A)
|
|
|
|
#define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B)
|
|
|
|
#define KVM_TRC_MSR_WRITE (KVM_TRC_HANDLER + 0x0C)
|
|
|
|
#define KVM_TRC_CPUID (KVM_TRC_HANDLER + 0x0D)
|
|
|
|
#define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E)
|
|
|
|
#define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F)
|
|
|
|
#define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10)
|
|
|
|
#define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11)
|
|
|
|
#define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12)
|
|
|
|
#define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13)
|
|
|
|
#define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14)
|
|
|
|
#define KVM_TRC_TDP_FAULT (KVM_TRC_HANDLER + 0x15)
|
|
|
|
#define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16)
|
|
|
|
#define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17)
|
|
|
|
#define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18)
|
|
|
|
#define KVM_TRC_PPC_INSTR (KVM_TRC_HANDLER + 0x19)
|
|
|
|
|
2008-04-11 03:31:10 +08:00
|
|
|
struct kvm_user_trace_setup {
|
2009-11-03 00:20:28 +08:00
|
|
|
__u32 buf_size;
|
|
|
|
__u32 buf_nr;
|
|
|
|
};
|
|
|
|
|
|
|
|
#define __KVM_DEPRECATED_MAIN_W_0x06 \
|
|
|
|
_IOW(KVMIO, 0x06, struct kvm_user_trace_setup)
|
|
|
|
#define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07)
|
|
|
|
#define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08)
|
|
|
|
|
|
|
|
#define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq)
|
|
|
|
|
|
|
|
struct kvm_breakpoint {
|
|
|
|
__u32 enabled;
|
|
|
|
__u32 padding;
|
|
|
|
__u64 address;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_debug_guest {
|
|
|
|
__u32 enabled;
|
|
|
|
__u32 pad;
|
|
|
|
struct kvm_breakpoint breakpoints[4];
|
|
|
|
__u32 singlestep;
|
2008-04-11 03:31:10 +08:00
|
|
|
};
|
|
|
|
|
2009-11-03 00:20:28 +08:00
|
|
|
#define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest)
|
|
|
|
|
|
|
|
/* *** End of deprecated interfaces *** */
|
|
|
|
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* for KVM_CREATE_MEMORY_REGION */
|
|
|
|
struct kvm_memory_region {
|
|
|
|
__u32 slot;
|
|
|
|
__u32 flags;
|
|
|
|
__u64 guest_phys_addr;
|
|
|
|
__u64 memory_size; /* bytes */
|
|
|
|
};
|
|
|
|
|
2007-10-10 01:20:39 +08:00
|
|
|
/* for KVM_SET_USER_MEMORY_REGION */
|
|
|
|
struct kvm_userspace_memory_region {
|
|
|
|
__u32 slot;
|
|
|
|
__u32 flags;
|
|
|
|
__u64 guest_phys_addr;
|
|
|
|
__u64 memory_size; /* bytes */
|
|
|
|
__u64 userspace_addr; /* start of the userspace allocated memory */
|
|
|
|
};
|
|
|
|
|
2012-08-21 10:58:45 +08:00
|
|
|
/*
|
|
|
|
* The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
|
|
|
|
* other bits are reserved for kvm internal use which are defined in
|
|
|
|
* include/linux/kvm_host.h.
|
|
|
|
*/
|
2012-08-21 11:02:51 +08:00
|
|
|
#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
|
|
|
|
#define KVM_MEM_READONLY (1UL << 1)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-09-12 15:58:04 +08:00
|
|
|
/* for KVM_IRQ_LINE */
|
2007-07-06 17:20:49 +08:00
|
|
|
struct kvm_irq_level {
|
|
|
|
/*
|
|
|
|
* ACPI gsi notion of irq.
|
|
|
|
* For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
|
|
|
|
* For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
|
2013-01-21 07:28:08 +08:00
|
|
|
* For ARM: See Documentation/virtual/kvm/api.txt
|
2007-07-06 17:20:49 +08:00
|
|
|
*/
|
2009-02-04 23:28:14 +08:00
|
|
|
union {
|
|
|
|
__u32 irq;
|
|
|
|
__s32 status;
|
|
|
|
};
|
2007-07-06 17:20:49 +08:00
|
|
|
__u32 level;
|
|
|
|
};
|
|
|
|
|
2007-07-26 16:05:18 +08:00
|
|
|
|
|
|
|
struct kvm_irqchip {
|
|
|
|
__u32 chip_id;
|
|
|
|
__u32 pad;
|
|
|
|
union {
|
|
|
|
char dummy[512]; /* reserving space */
|
2009-01-19 20:57:52 +08:00
|
|
|
#ifdef __KVM_HAVE_PIT
|
2007-07-26 16:05:18 +08:00
|
|
|
struct kvm_pic_state pic;
|
2007-12-17 20:27:27 +08:00
|
|
|
#endif
|
2009-01-19 20:57:52 +08:00
|
|
|
#ifdef __KVM_HAVE_IOAPIC
|
2007-08-05 15:49:16 +08:00
|
|
|
struct kvm_ioapic_state ioapic;
|
2007-11-26 22:33:53 +08:00
|
|
|
#endif
|
2007-07-26 16:05:18 +08:00
|
|
|
} chip;
|
|
|
|
};
|
|
|
|
|
2009-05-15 04:42:53 +08:00
|
|
|
/* for KVM_CREATE_PIT2 */
|
|
|
|
struct kvm_pit_config {
|
|
|
|
__u32 flags;
|
|
|
|
__u32 pad[15];
|
|
|
|
};
|
|
|
|
|
|
|
|
#define KVM_PIT_SPEAKER_DUMMY 1
|
|
|
|
|
2014-09-23 21:23:01 +08:00
|
|
|
struct kvm_s390_skeys {
|
|
|
|
__u64 start_gfn;
|
|
|
|
__u64 count;
|
|
|
|
__u64 skeydata_addr;
|
|
|
|
__u32 flags;
|
|
|
|
__u32 reserved[9];
|
|
|
|
};
|
|
|
|
#define KVM_S390_GET_SKEYS_NONE 1
|
|
|
|
#define KVM_S390_SKEYS_MAX 1048576
|
|
|
|
|
2007-10-10 20:03:16 +08:00
|
|
|
#define KVM_EXIT_UNKNOWN 0
|
|
|
|
#define KVM_EXIT_EXCEPTION 1
|
|
|
|
#define KVM_EXIT_IO 2
|
|
|
|
#define KVM_EXIT_HYPERCALL 3
|
|
|
|
#define KVM_EXIT_DEBUG 4
|
|
|
|
#define KVM_EXIT_HLT 5
|
|
|
|
#define KVM_EXIT_MMIO 6
|
|
|
|
#define KVM_EXIT_IRQ_WINDOW_OPEN 7
|
|
|
|
#define KVM_EXIT_SHUTDOWN 8
|
|
|
|
#define KVM_EXIT_FAIL_ENTRY 9
|
|
|
|
#define KVM_EXIT_INTR 10
|
|
|
|
#define KVM_EXIT_SET_TPR 11
|
2007-10-22 22:50:39 +08:00
|
|
|
#define KVM_EXIT_TPR_ACCESS 12
|
2008-03-26 01:47:23 +08:00
|
|
|
#define KVM_EXIT_S390_SIEIC 13
|
2008-03-26 01:47:34 +08:00
|
|
|
#define KVM_EXIT_S390_RESET 14
|
2014-07-29 01:29:13 +08:00
|
|
|
#define KVM_EXIT_DCR 15 /* deprecated */
|
2008-09-26 15:30:55 +08:00
|
|
|
#define KVM_EXIT_NMI 16
|
2009-06-11 20:43:28 +08:00
|
|
|
#define KVM_EXIT_INTERNAL_ERROR 17
|
2010-03-25 04:48:30 +08:00
|
|
|
#define KVM_EXIT_OSI 18
|
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 08:21:34 +08:00
|
|
|
#define KVM_EXIT_PAPR_HCALL 19
|
2012-01-04 17:25:22 +08:00
|
|
|
#define KVM_EXIT_S390_UCONTROL 20
|
2012-08-09 04:38:19 +08:00
|
|
|
#define KVM_EXIT_WATCHDOG 21
|
2012-12-20 22:32:12 +08:00
|
|
|
#define KVM_EXIT_S390_TSCH 22
|
2013-01-05 01:12:48 +08:00
|
|
|
#define KVM_EXIT_EPR 23
|
2014-04-29 13:54:19 +08:00
|
|
|
#define KVM_EXIT_SYSTEM_EVENT 24
|
2015-01-30 23:55:56 +08:00
|
|
|
#define KVM_EXIT_S390_STSI 25
|
2009-06-11 20:43:28 +08:00
|
|
|
|
|
|
|
/* For KVM_EXIT_INTERNAL_ERROR */
|
2012-10-17 13:46:52 +08:00
|
|
|
/* Emulate instruction failed. */
|
|
|
|
#define KVM_INTERNAL_ERROR_EMULATION 1
|
|
|
|
/* Encounter unexpected simultaneous exceptions. */
|
|
|
|
#define KVM_INTERNAL_ERROR_SIMUL_EX 2
|
|
|
|
/* Encounter unexpected vm-exit due to delivery event. */
|
|
|
|
#define KVM_INTERNAL_ERROR_DELIVERY_EV 3
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-02-22 18:58:31 +08:00
|
|
|
/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
struct kvm_run {
|
|
|
|
/* in */
|
2007-01-06 08:36:24 +08:00
|
|
|
__u8 request_interrupt_window;
|
2008-12-11 23:54:54 +08:00
|
|
|
__u8 padding1[7];
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
/* out */
|
|
|
|
__u32 exit_reason;
|
2007-01-06 08:36:24 +08:00
|
|
|
__u8 ready_for_interrupt_injection;
|
|
|
|
__u8 if_flag;
|
2008-12-11 23:54:54 +08:00
|
|
|
__u8 padding2[2];
|
2007-02-12 16:54:39 +08:00
|
|
|
|
|
|
|
/* in (pre_kvm_run), out (post_kvm_run) */
|
2007-01-06 08:36:24 +08:00
|
|
|
__u64 cr8;
|
|
|
|
__u64 apic_base;
|
|
|
|
|
2009-11-19 21:21:16 +08:00
|
|
|
#ifdef __KVM_S390
|
|
|
|
/* the processor status word for s390 */
|
|
|
|
__u64 psw_mask; /* psw upper half */
|
|
|
|
__u64 psw_addr; /* psw lower half */
|
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
union {
|
|
|
|
/* KVM_EXIT_UNKNOWN */
|
|
|
|
struct {
|
2007-03-04 20:17:08 +08:00
|
|
|
__u64 hardware_exit_reason;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
} hw;
|
2007-03-04 20:17:08 +08:00
|
|
|
/* KVM_EXIT_FAIL_ENTRY */
|
|
|
|
struct {
|
|
|
|
__u64 hardware_entry_failure_reason;
|
|
|
|
} fail_entry;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* KVM_EXIT_EXCEPTION */
|
|
|
|
struct {
|
|
|
|
__u32 exception;
|
|
|
|
__u32 error_code;
|
|
|
|
} ex;
|
|
|
|
/* KVM_EXIT_IO */
|
2009-03-28 12:53:05 +08:00
|
|
|
struct {
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define KVM_EXIT_IO_IN 0
|
|
|
|
#define KVM_EXIT_IO_OUT 1
|
|
|
|
__u8 direction;
|
|
|
|
__u8 size; /* bytes */
|
|
|
|
__u16 port;
|
2007-03-20 18:46:50 +08:00
|
|
|
__u32 count;
|
|
|
|
__u64 data_offset; /* relative to kvm_run start */
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
} io;
|
|
|
|
struct {
|
2008-12-15 20:52:10 +08:00
|
|
|
struct kvm_debug_exit_arch arch;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
} debug;
|
|
|
|
/* KVM_EXIT_MMIO */
|
|
|
|
struct {
|
|
|
|
__u64 phys_addr;
|
|
|
|
__u8 data[8];
|
|
|
|
__u32 len;
|
|
|
|
__u8 is_write;
|
|
|
|
} mmio;
|
2007-03-04 19:59:30 +08:00
|
|
|
/* KVM_EXIT_HYPERCALL */
|
|
|
|
struct {
|
2007-07-17 03:24:47 +08:00
|
|
|
__u64 nr;
|
2007-03-04 19:59:30 +08:00
|
|
|
__u64 args[6];
|
|
|
|
__u64 ret;
|
|
|
|
__u32 longmode;
|
|
|
|
__u32 pad;
|
|
|
|
} hypercall;
|
2007-10-22 22:50:39 +08:00
|
|
|
/* KVM_EXIT_TPR_ACCESS */
|
|
|
|
struct {
|
|
|
|
__u64 rip;
|
|
|
|
__u32 is_write;
|
|
|
|
__u32 pad;
|
|
|
|
} tpr_access;
|
2008-03-26 01:47:23 +08:00
|
|
|
/* KVM_EXIT_S390_SIEIC */
|
|
|
|
struct {
|
|
|
|
__u8 icptcode;
|
|
|
|
__u16 ipa;
|
|
|
|
__u32 ipb;
|
|
|
|
} s390_sieic;
|
2008-03-26 01:47:34 +08:00
|
|
|
/* KVM_EXIT_S390_RESET */
|
|
|
|
#define KVM_S390_RESET_POR 1
|
|
|
|
#define KVM_S390_RESET_CLEAR 2
|
|
|
|
#define KVM_S390_RESET_SUBSYSTEM 4
|
|
|
|
#define KVM_S390_RESET_CPU_INIT 8
|
|
|
|
#define KVM_S390_RESET_IPL 16
|
|
|
|
__u64 s390_reset_flags;
|
2012-01-04 17:25:22 +08:00
|
|
|
/* KVM_EXIT_S390_UCONTROL */
|
|
|
|
struct {
|
|
|
|
__u64 trans_exc_code;
|
|
|
|
__u32 pgm_code;
|
|
|
|
} s390_ucontrol;
|
2014-07-29 01:29:13 +08:00
|
|
|
/* KVM_EXIT_DCR (deprecated) */
|
2008-04-17 12:28:07 +08:00
|
|
|
struct {
|
|
|
|
__u32 dcrn;
|
|
|
|
__u32 data;
|
|
|
|
__u8 is_write;
|
|
|
|
} dcr;
|
2009-06-11 20:43:28 +08:00
|
|
|
struct {
|
|
|
|
__u32 suberror;
|
2009-11-04 17:54:59 +08:00
|
|
|
/* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
|
|
|
|
__u32 ndata;
|
|
|
|
__u64 data[16];
|
2009-06-11 20:43:28 +08:00
|
|
|
} internal;
|
2010-03-25 04:48:30 +08:00
|
|
|
/* KVM_EXIT_OSI */
|
|
|
|
struct {
|
|
|
|
__u64 gprs[32];
|
|
|
|
} osi;
|
KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 08:21:34 +08:00
|
|
|
struct {
|
|
|
|
__u64 nr;
|
|
|
|
__u64 ret;
|
|
|
|
__u64 args[9];
|
|
|
|
} papr_hcall;
|
2012-12-20 22:32:12 +08:00
|
|
|
/* KVM_EXIT_S390_TSCH */
|
|
|
|
struct {
|
|
|
|
__u16 subchannel_id;
|
|
|
|
__u16 subchannel_nr;
|
|
|
|
__u32 io_int_parm;
|
|
|
|
__u32 io_int_word;
|
|
|
|
__u32 ipb;
|
|
|
|
__u8 dequeued;
|
|
|
|
} s390_tsch;
|
2013-01-05 01:12:48 +08:00
|
|
|
/* KVM_EXIT_EPR */
|
|
|
|
struct {
|
|
|
|
__u32 epr;
|
|
|
|
} epr;
|
2014-04-29 13:54:19 +08:00
|
|
|
/* KVM_EXIT_SYSTEM_EVENT */
|
|
|
|
struct {
|
|
|
|
#define KVM_SYSTEM_EVENT_SHUTDOWN 1
|
|
|
|
#define KVM_SYSTEM_EVENT_RESET 2
|
|
|
|
__u32 type;
|
|
|
|
__u64 flags;
|
|
|
|
} system_event;
|
2015-01-30 23:55:56 +08:00
|
|
|
/* KVM_EXIT_S390_STSI */
|
|
|
|
struct {
|
|
|
|
__u64 addr;
|
|
|
|
__u8 ar;
|
|
|
|
__u8 reserved;
|
|
|
|
__u8 fc;
|
|
|
|
__u8 sel1;
|
|
|
|
__u16 sel2;
|
|
|
|
} s390_stsi;
|
2007-07-17 16:45:55 +08:00
|
|
|
/* Fix the size of the union. */
|
|
|
|
char padding[256];
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
};
|
2012-01-11 18:20:30 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* shared registers between kvm and userspace.
|
|
|
|
* kvm_valid_regs specifies the register classes set by the host
|
|
|
|
* kvm_dirty_regs specified the register classes dirtied by userspace
|
|
|
|
* struct kvm_sync_regs is architecture specific, as well as the
|
|
|
|
* bits for kvm_valid_regs and kvm_dirty_regs
|
|
|
|
*/
|
|
|
|
__u64 kvm_valid_regs;
|
|
|
|
__u64 kvm_dirty_regs;
|
|
|
|
union {
|
|
|
|
struct kvm_sync_regs regs;
|
2014-06-09 22:57:26 +08:00
|
|
|
char padding[2048];
|
2012-01-11 18:20:30 +08:00
|
|
|
} s;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
};
|
|
|
|
|
2008-05-30 22:05:54 +08:00
|
|
|
/* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
|
|
|
|
|
|
|
|
struct kvm_coalesced_mmio_zone {
|
|
|
|
__u64 addr;
|
|
|
|
__u32 size;
|
|
|
|
__u32 pad;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_coalesced_mmio {
|
|
|
|
__u64 phys_addr;
|
|
|
|
__u32 len;
|
|
|
|
__u32 pad;
|
|
|
|
__u8 data[8];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_coalesced_mmio_ring {
|
|
|
|
__u32 first, last;
|
|
|
|
struct kvm_coalesced_mmio coalesced_mmio[0];
|
|
|
|
};
|
|
|
|
|
|
|
|
#define KVM_COALESCED_MMIO_MAX \
|
|
|
|
((PAGE_SIZE - sizeof(struct kvm_coalesced_mmio_ring)) / \
|
|
|
|
sizeof(struct kvm_coalesced_mmio))
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* for KVM_TRANSLATE */
|
|
|
|
struct kvm_translation {
|
|
|
|
/* in */
|
|
|
|
__u64 linear_address;
|
|
|
|
|
|
|
|
/* out */
|
|
|
|
__u64 physical_address;
|
|
|
|
__u8 valid;
|
|
|
|
__u8 writeable;
|
|
|
|
__u8 usermode;
|
2007-02-12 16:54:41 +08:00
|
|
|
__u8 pad[5];
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
};
|
|
|
|
|
2015-02-06 22:01:21 +08:00
|
|
|
/* for KVM_S390_MEM_OP */
|
|
|
|
struct kvm_s390_mem_op {
|
|
|
|
/* in */
|
|
|
|
__u64 gaddr; /* the guest address */
|
|
|
|
__u64 flags; /* flags */
|
|
|
|
__u32 size; /* amount of bytes */
|
|
|
|
__u32 op; /* type of operation */
|
|
|
|
__u64 buf; /* buffer in userspace */
|
|
|
|
__u8 ar; /* the access register number */
|
|
|
|
__u8 reserved[31]; /* should be set to 0 */
|
|
|
|
};
|
|
|
|
/* types for kvm_s390_mem_op->op */
|
|
|
|
#define KVM_S390_MEMOP_LOGICAL_READ 0
|
|
|
|
#define KVM_S390_MEMOP_LOGICAL_WRITE 1
|
|
|
|
/* flags for kvm_s390_mem_op->flags */
|
|
|
|
#define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0)
|
|
|
|
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1)
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* for KVM_INTERRUPT */
|
|
|
|
struct kvm_interrupt {
|
|
|
|
/* in */
|
|
|
|
__u32 irq;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* for KVM_GET_DIRTY_LOG */
|
|
|
|
struct kvm_dirty_log {
|
|
|
|
__u32 slot;
|
2009-03-28 12:53:05 +08:00
|
|
|
__u32 padding1;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
union {
|
|
|
|
void __user *dirty_bitmap; /* one bit per page */
|
2009-03-28 12:53:05 +08:00
|
|
|
__u64 padding2;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2007-03-06 01:46:05 +08:00
|
|
|
/* for KVM_SET_SIGNAL_MASK */
|
|
|
|
struct kvm_signal_mask {
|
|
|
|
__u32 len;
|
|
|
|
__u8 sigset[0];
|
|
|
|
};
|
|
|
|
|
2007-10-22 22:50:39 +08:00
|
|
|
/* for KVM_TPR_ACCESS_REPORTING */
|
|
|
|
struct kvm_tpr_access_ctl {
|
|
|
|
__u32 enabled;
|
|
|
|
__u32 flags;
|
|
|
|
__u32 reserved[8];
|
|
|
|
};
|
|
|
|
|
2007-10-25 22:52:32 +08:00
|
|
|
/* for KVM_SET_VAPIC_ADDR */
|
|
|
|
struct kvm_vapic_addr {
|
|
|
|
__u64 vapic_addr;
|
|
|
|
};
|
|
|
|
|
2014-05-12 22:05:13 +08:00
|
|
|
/* for KVM_SET_MP_STATE */
|
2008-04-12 00:24:45 +08:00
|
|
|
|
2014-05-12 22:05:13 +08:00
|
|
|
/* not all states are valid on all architectures */
|
2008-04-12 00:24:45 +08:00
|
|
|
#define KVM_MP_STATE_RUNNABLE 0
|
|
|
|
#define KVM_MP_STATE_UNINITIALIZED 1
|
|
|
|
#define KVM_MP_STATE_INIT_RECEIVED 2
|
|
|
|
#define KVM_MP_STATE_HALTED 3
|
|
|
|
#define KVM_MP_STATE_SIPI_RECEIVED 4
|
2014-04-10 23:35:00 +08:00
|
|
|
#define KVM_MP_STATE_STOPPED 5
|
|
|
|
#define KVM_MP_STATE_CHECK_STOP 6
|
|
|
|
#define KVM_MP_STATE_OPERATING 7
|
|
|
|
#define KVM_MP_STATE_LOAD 8
|
2008-04-12 00:24:45 +08:00
|
|
|
|
|
|
|
struct kvm_mp_state {
|
|
|
|
__u32 mp_state;
|
|
|
|
};
|
|
|
|
|
2008-03-26 01:47:20 +08:00
|
|
|
struct kvm_s390_psw {
|
|
|
|
__u64 mask;
|
|
|
|
__u64 addr;
|
|
|
|
};
|
|
|
|
|
KVM: s390: interrupt subsystem, cpu timer, waitpsw
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-26 01:47:26 +08:00
|
|
|
/* valid values for type in kvm_s390_interrupt */
|
|
|
|
#define KVM_S390_SIGP_STOP 0xfffe0000u
|
|
|
|
#define KVM_S390_PROGRAM_INT 0xfffe0001u
|
|
|
|
#define KVM_S390_SIGP_SET_PREFIX 0xfffe0002u
|
|
|
|
#define KVM_S390_RESTART 0xfffe0003u
|
2013-10-07 23:11:48 +08:00
|
|
|
#define KVM_S390_INT_PFAULT_INIT 0xfffe0004u
|
|
|
|
#define KVM_S390_INT_PFAULT_DONE 0xfffe0005u
|
2012-12-20 22:32:09 +08:00
|
|
|
#define KVM_S390_MCHK 0xfffe1000u
|
2014-03-26 23:11:54 +08:00
|
|
|
#define KVM_S390_INT_CLOCK_COMP 0xffff1004u
|
|
|
|
#define KVM_S390_INT_CPU_TIMER 0xffff1005u
|
KVM: s390: interrupt subsystem, cpu timer, waitpsw
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-26 01:47:26 +08:00
|
|
|
#define KVM_S390_INT_VIRTIO 0xffff2603u
|
|
|
|
#define KVM_S390_INT_SERVICE 0xffff2401u
|
|
|
|
#define KVM_S390_INT_EMERGENCY 0xffff1201u
|
2011-10-18 18:27:15 +08:00
|
|
|
#define KVM_S390_INT_EXTERNAL_CALL 0xffff1202u
|
2012-12-20 22:32:08 +08:00
|
|
|
/* Anything below 0xfffe0000u is taken by INT_IO */
|
|
|
|
#define KVM_S390_INT_IO(ai,cssid,ssid,schid) \
|
|
|
|
(((schid)) | \
|
|
|
|
((ssid) << 16) | \
|
|
|
|
((cssid) << 18) | \
|
|
|
|
((ai) << 26))
|
|
|
|
#define KVM_S390_INT_IO_MIN 0x00000000u
|
|
|
|
#define KVM_S390_INT_IO_MAX 0xfffdffffu
|
|
|
|
|
KVM: s390: interrupt subsystem, cpu timer, waitpsw
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-26 01:47:26 +08:00
|
|
|
|
|
|
|
struct kvm_s390_interrupt {
|
|
|
|
__u32 type;
|
|
|
|
__u32 parm;
|
|
|
|
__u64 parm64;
|
|
|
|
};
|
|
|
|
|
2013-10-07 22:13:44 +08:00
|
|
|
struct kvm_s390_io_info {
|
|
|
|
__u16 subchannel_id;
|
|
|
|
__u16 subchannel_nr;
|
|
|
|
__u32 io_int_parm;
|
|
|
|
__u32 io_int_word;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_s390_ext_info {
|
|
|
|
__u32 ext_params;
|
|
|
|
__u32 pad;
|
|
|
|
__u64 ext_params2;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_s390_pgm_info {
|
|
|
|
__u64 trans_exc_code;
|
|
|
|
__u64 mon_code;
|
|
|
|
__u64 per_address;
|
|
|
|
__u32 data_exc_code;
|
|
|
|
__u16 code;
|
|
|
|
__u16 mon_class_nr;
|
|
|
|
__u8 per_code;
|
|
|
|
__u8 per_atmid;
|
|
|
|
__u8 exc_access_id;
|
|
|
|
__u8 per_access_id;
|
|
|
|
__u8 op_access_id;
|
|
|
|
__u8 pad[3];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_s390_prefix_info {
|
|
|
|
__u32 address;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_s390_extcall_info {
|
|
|
|
__u16 code;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_s390_emerg_info {
|
|
|
|
__u16 code;
|
|
|
|
};
|
|
|
|
|
2014-10-15 22:48:16 +08:00
|
|
|
#define KVM_S390_STOP_FLAG_STORE_STATUS 0x01
|
|
|
|
struct kvm_s390_stop_info {
|
|
|
|
__u32 flags;
|
|
|
|
};
|
|
|
|
|
2013-10-07 22:13:44 +08:00
|
|
|
struct kvm_s390_mchk_info {
|
|
|
|
__u64 cr14;
|
|
|
|
__u64 mcic;
|
|
|
|
__u64 failing_storage_address;
|
|
|
|
__u32 ext_damage_code;
|
|
|
|
__u32 pad;
|
|
|
|
__u8 fixed_logout[16];
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_s390_irq {
|
|
|
|
__u64 type;
|
|
|
|
union {
|
|
|
|
struct kvm_s390_io_info io;
|
|
|
|
struct kvm_s390_ext_info ext;
|
|
|
|
struct kvm_s390_pgm_info pgm;
|
|
|
|
struct kvm_s390_emerg_info emerg;
|
|
|
|
struct kvm_s390_extcall_info extcall;
|
|
|
|
struct kvm_s390_prefix_info prefix;
|
2014-10-15 22:48:16 +08:00
|
|
|
struct kvm_s390_stop_info stop;
|
2013-10-07 22:13:44 +08:00
|
|
|
struct kvm_s390_mchk_info mchk;
|
|
|
|
char reserved[64];
|
|
|
|
} u;
|
|
|
|
};
|
|
|
|
|
2014-11-25 00:13:46 +08:00
|
|
|
struct kvm_s390_irq_state {
|
|
|
|
__u64 buf;
|
|
|
|
__u32 flags;
|
|
|
|
__u32 len;
|
|
|
|
__u32 reserved[4];
|
|
|
|
};
|
|
|
|
|
2008-12-15 20:52:10 +08:00
|
|
|
/* for KVM_SET_GUEST_DEBUG */
|
|
|
|
|
|
|
|
#define KVM_GUESTDBG_ENABLE 0x00000001
|
|
|
|
#define KVM_GUESTDBG_SINGLESTEP 0x00000002
|
|
|
|
|
|
|
|
struct kvm_guest_debug {
|
|
|
|
__u32 control;
|
|
|
|
__u32 pad;
|
|
|
|
struct kvm_guest_debug_arch arch;
|
|
|
|
};
|
|
|
|
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
enum {
|
|
|
|
kvm_ioeventfd_flag_nr_datamatch,
|
|
|
|
kvm_ioeventfd_flag_nr_pio,
|
|
|
|
kvm_ioeventfd_flag_nr_deassign,
|
2013-02-28 19:33:20 +08:00
|
|
|
kvm_ioeventfd_flag_nr_virtio_ccw_notify,
|
KVM: VMX: speed up wildcard MMIO EVENTFD
With KVM, MMIO is much slower than PIO, due to the need to
do page walk and emulation. But with EPT, it does not have to be: we
know the address from the VMCS so if the address is unique, we can look
up the eventfd directly, bypassing emulation.
Unfortunately, this only works if userspace does not need to match on
access length and data. The implementation adds a separate FAST_MMIO
bus internally. This serves two purposes:
- minimize overhead for old userspace that does not use eventfd with lengtth = 0
- minimize disruption in other code (since we don't know the length,
devices on the MMIO bus only get a valid address in write, this
way we don't need to touch all devices to teach them to handle
an invalid length)
At the moment, this optimization only has effect for EPT on x86.
It will be possible to speed up MMIO for NPT and MMU using the same
idea in the future.
With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
I was unable to detect any measureable slowdown to non-eventfd MMIO.
Making MMIO faster is important for the upcoming virtio 1.0 which
includes an MMIO signalling capability.
The idea was suggested by Peter Anvin. Lots of thanks to Gleb for
pre-review and suggestions.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-01 02:50:44 +08:00
|
|
|
kvm_ioeventfd_flag_nr_fast_mmio,
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
kvm_ioeventfd_flag_nr_max,
|
|
|
|
};
|
|
|
|
|
|
|
|
#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
|
|
|
|
#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
|
|
|
|
#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
|
2013-02-28 19:33:20 +08:00
|
|
|
#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
|
|
|
|
(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
|
|
|
|
#define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1)
|
|
|
|
|
|
|
|
struct kvm_ioeventfd {
|
|
|
|
__u64 datamatch;
|
|
|
|
__u64 addr; /* legal pio/mmio address */
|
2014-04-01 02:50:38 +08:00
|
|
|
__u32 len; /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
__s32 fd;
|
|
|
|
__u32 flags;
|
|
|
|
__u8 pad[36];
|
|
|
|
};
|
|
|
|
|
2010-03-25 04:48:29 +08:00
|
|
|
/* for KVM_ENABLE_CAP */
|
|
|
|
struct kvm_enable_cap {
|
|
|
|
/* in */
|
|
|
|
__u32 cap;
|
|
|
|
__u32 flags;
|
|
|
|
__u64 args[4];
|
|
|
|
__u8 pad[64];
|
|
|
|
};
|
|
|
|
|
2010-07-29 20:48:08 +08:00
|
|
|
/* for KVM_PPC_GET_PVINFO */
|
|
|
|
struct kvm_ppc_pvinfo {
|
|
|
|
/* out */
|
|
|
|
__u32 flags;
|
|
|
|
__u32 hcall[4];
|
|
|
|
__u8 pad[108];
|
|
|
|
};
|
|
|
|
|
2012-04-27 03:43:42 +08:00
|
|
|
/* for KVM_PPC_GET_SMMU_INFO */
|
|
|
|
#define KVM_PPC_PAGE_SIZES_MAX_SZ 8
|
|
|
|
|
|
|
|
struct kvm_ppc_one_page_size {
|
|
|
|
__u32 page_shift; /* Page shift (or 0) */
|
|
|
|
__u32 pte_enc; /* Encoding in the HPTE (>>12) */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_ppc_one_seg_page_size {
|
|
|
|
__u32 page_shift; /* Base page shift of segment (or 0) */
|
|
|
|
__u32 slb_enc; /* SLB encoding for BookS */
|
|
|
|
struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
|
|
|
|
};
|
|
|
|
|
|
|
|
#define KVM_PPC_PAGE_SIZES_REAL 0x00000001
|
|
|
|
#define KVM_PPC_1T_SEGMENTS 0x00000002
|
|
|
|
|
|
|
|
struct kvm_ppc_smmu_info {
|
|
|
|
__u64 flags;
|
|
|
|
__u32 slb_size;
|
|
|
|
__u32 pad;
|
|
|
|
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
|
|
|
|
};
|
|
|
|
|
2012-07-03 13:48:52 +08:00
|
|
|
#define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define KVMIO 0xAE
|
|
|
|
|
2012-01-04 17:25:20 +08:00
|
|
|
/* machine type bits, to be used as argument to KVM_CREATE_VM */
|
|
|
|
#define KVM_VM_S390_UCONTROL 1
|
|
|
|
|
2013-10-08 00:48:01 +08:00
|
|
|
/* on ppc, 0 indicate default, 1 should force HV and 2 PR */
|
|
|
|
#define KVM_VM_PPC_HV 1
|
|
|
|
#define KVM_VM_PPC_PR 2
|
|
|
|
|
2012-01-04 17:25:23 +08:00
|
|
|
#define KVM_S390_SIE_PAGE_OFFSET 1
|
|
|
|
|
2007-02-22 01:28:04 +08:00
|
|
|
/*
|
|
|
|
* ioctls for /dev/kvm fds:
|
|
|
|
*/
|
2007-03-01 23:20:13 +08:00
|
|
|
#define KVM_GET_API_VERSION _IO(KVMIO, 0x00)
|
|
|
|
#define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */
|
|
|
|
#define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list)
|
2008-03-26 01:47:20 +08:00
|
|
|
|
|
|
|
#define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06)
|
2007-03-01 23:56:20 +08:00
|
|
|
/*
|
|
|
|
* Check if a kvm extension is available. Argument is extension number,
|
|
|
|
* return is 1 (yes) or 0 (no, sorry).
|
|
|
|
*/
|
|
|
|
#define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03)
|
2007-03-07 19:05:38 +08:00
|
|
|
/*
|
|
|
|
* Get size for mmap(vcpu_fd)
|
|
|
|
*/
|
|
|
|
#define KVM_GET_VCPU_MMAP_SIZE _IO(KVMIO, 0x04) /* in bytes */
|
2008-02-12 00:37:23 +08:00
|
|
|
#define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2)
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
|
|
|
|
#define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
|
|
|
|
#define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
|
2013-09-22 22:44:50 +08:00
|
|
|
#define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
|
2009-11-03 00:20:28 +08:00
|
|
|
|
2007-07-06 17:20:49 +08:00
|
|
|
/*
|
|
|
|
* Extension capability list.
|
|
|
|
*/
|
|
|
|
#define KVM_CAP_IRQCHIP 0
|
2007-07-18 17:15:21 +08:00
|
|
|
#define KVM_CAP_HLT 1
|
2007-10-03 00:52:55 +08:00
|
|
|
#define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
|
2007-10-10 01:20:39 +08:00
|
|
|
#define KVM_CAP_USER_MEMORY 3
|
2007-10-25 06:29:55 +08:00
|
|
|
#define KVM_CAP_SET_TSS_ADDR 4
|
2007-10-22 22:50:39 +08:00
|
|
|
#define KVM_CAP_VAPIC 6
|
2008-02-12 00:37:23 +08:00
|
|
|
#define KVM_CAP_EXT_CPUID 7
|
2008-02-16 03:52:47 +08:00
|
|
|
#define KVM_CAP_CLOCKSOURCE 8
|
2011-07-18 22:17:15 +08:00
|
|
|
#define KVM_CAP_NR_VCPUS 9 /* returns recommended max vcpus per vm */
|
2008-02-20 17:59:20 +08:00
|
|
|
#define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */
|
2008-01-28 05:10:22 +08:00
|
|
|
#define KVM_CAP_PIT 11
|
2008-02-23 01:21:36 +08:00
|
|
|
#define KVM_CAP_NOP_IO_DELAY 12
|
2008-02-23 01:21:37 +08:00
|
|
|
#define KVM_CAP_PV_MMU 13
|
2008-04-12 00:24:45 +08:00
|
|
|
#define KVM_CAP_MP_STATE 14
|
2008-05-30 22:05:54 +08:00
|
|
|
#define KVM_CAP_COALESCED_MMIO 15
|
2008-07-29 16:30:57 +08:00
|
|
|
#define KVM_CAP_SYNC_MMU 16 /* Changes to host mmap are reflected in guest */
|
2008-09-14 08:48:28 +08:00
|
|
|
#define KVM_CAP_IOMMU 18
|
2008-12-09 00:25:27 +08:00
|
|
|
/* Bug in KVM_SET_USER_MEMORY_REGION fixed: */
|
|
|
|
#define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21
|
2008-12-11 23:54:54 +08:00
|
|
|
#define KVM_CAP_USER_NMI 22
|
2009-01-19 20:57:52 +08:00
|
|
|
#ifdef __KVM_HAVE_GUEST_DEBUG
|
2008-12-15 20:52:10 +08:00
|
|
|
#define KVM_CAP_SET_GUEST_DEBUG 23
|
2008-12-20 01:13:54 +08:00
|
|
|
#endif
|
2009-01-19 20:57:52 +08:00
|
|
|
#ifdef __KVM_HAVE_PIT
|
2008-12-31 01:55:06 +08:00
|
|
|
#define KVM_CAP_REINJECT_CONTROL 24
|
2008-11-19 19:58:46 +08:00
|
|
|
#endif
|
|
|
|
#define KVM_CAP_IRQ_ROUTING 25
|
2009-02-04 23:28:14 +08:00
|
|
|
#define KVM_CAP_IRQ_INJECT_STATUS 26
|
2009-03-12 21:45:39 +08:00
|
|
|
#define KVM_CAP_ASSIGN_DEV_IRQ 29
|
2009-04-13 17:59:32 +08:00
|
|
|
/* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
|
|
|
|
#define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
|
2009-05-11 16:48:15 +08:00
|
|
|
#ifdef __KVM_HAVE_MCE
|
|
|
|
#define KVM_CAP_MCE 31
|
|
|
|
#endif
|
2009-05-20 22:30:49 +08:00
|
|
|
#define KVM_CAP_IRQFD 32
|
2009-05-15 04:42:53 +08:00
|
|
|
#ifdef __KVM_HAVE_PIT
|
|
|
|
#define KVM_CAP_PIT2 33
|
|
|
|
#endif
|
2009-06-09 20:56:28 +08:00
|
|
|
#define KVM_CAP_SET_BOOT_CPU_ID 34
|
2009-07-07 23:50:38 +08:00
|
|
|
#ifdef __KVM_HAVE_PIT_STATE2
|
|
|
|
#define KVM_CAP_PIT_STATE2 35
|
|
|
|
#endif
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
#define KVM_CAP_IOEVENTFD 36
|
2009-07-21 10:42:48 +08:00
|
|
|
#define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
|
2009-10-16 06:21:43 +08:00
|
|
|
#ifdef __KVM_HAVE_XEN_HVM
|
|
|
|
#define KVM_CAP_XEN_HVM 38
|
|
|
|
#endif
|
2009-10-17 03:28:36 +08:00
|
|
|
#define KVM_CAP_ADJUST_CLOCK 39
|
2009-11-04 17:54:59 +08:00
|
|
|
#define KVM_CAP_INTERNAL_ERROR_DATA 40
|
2009-11-12 08:04:25 +08:00
|
|
|
#ifdef __KVM_HAVE_VCPU_EVENTS
|
|
|
|
#define KVM_CAP_VCPU_EVENTS 41
|
|
|
|
#endif
|
2009-11-19 21:21:16 +08:00
|
|
|
#define KVM_CAP_S390_PSW 42
|
2009-11-30 11:02:02 +08:00
|
|
|
#define KVM_CAP_PPC_SEGSTATE 43
|
2010-01-17 21:51:22 +08:00
|
|
|
#define KVM_CAP_HYPERV 44
|
2010-01-17 21:51:23 +08:00
|
|
|
#define KVM_CAP_HYPERV_VAPIC 45
|
2010-01-17 21:51:24 +08:00
|
|
|
#define KVM_CAP_HYPERV_SPIN 46
|
2010-01-29 14:38:44 +08:00
|
|
|
#define KVM_CAP_PCI_SEGMENT 47
|
2010-02-19 18:00:45 +08:00
|
|
|
#define KVM_CAP_PPC_PAIRED_SINGLES 48
|
2010-02-20 02:38:07 +08:00
|
|
|
#define KVM_CAP_INTR_SHADOW 49
|
2010-02-15 17:45:43 +08:00
|
|
|
#ifdef __KVM_HAVE_DEBUGREGS
|
|
|
|
#define KVM_CAP_DEBUGREGS 50
|
|
|
|
#endif
|
2010-02-24 00:47:57 +08:00
|
|
|
#define KVM_CAP_X86_ROBUST_SINGLESTEP 51
|
2010-03-25 04:48:30 +08:00
|
|
|
#define KVM_CAP_PPC_OSI 52
|
2010-03-25 04:48:18 +08:00
|
|
|
#define KVM_CAP_PPC_UNSET_IRQ 53
|
2010-03-25 04:48:29 +08:00
|
|
|
#define KVM_CAP_ENABLE_CAP 54
|
2010-06-13 17:29:39 +08:00
|
|
|
#ifdef __KVM_HAVE_XSAVE
|
|
|
|
#define KVM_CAP_XSAVE 55
|
|
|
|
#endif
|
|
|
|
#ifdef __KVM_HAVE_XCRS
|
|
|
|
#define KVM_CAP_XCRS 56
|
|
|
|
#endif
|
2010-07-29 20:48:08 +08:00
|
|
|
#define KVM_CAP_PPC_GET_PVINFO 57
|
2010-08-30 19:50:45 +08:00
|
|
|
#define KVM_CAP_PPC_IRQ_LEVEL 58
|
2010-10-14 17:22:50 +08:00
|
|
|
#define KVM_CAP_ASYNC_PF 59
|
2011-03-25 16:44:51 +08:00
|
|
|
#define KVM_CAP_TSC_CONTROL 60
|
|
|
|
#define KVM_CAP_GET_TSC_KHZ 61
|
2011-04-28 06:24:21 +08:00
|
|
|
#define KVM_CAP_PPC_BOOKE_SREGS 62
|
2011-06-29 08:22:41 +08:00
|
|
|
#define KVM_CAP_SPAPR_TCE 63
|
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 08:23:08 +08:00
|
|
|
#define KVM_CAP_PPC_SMT 64
|
KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.
Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.
KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.
This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.
Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.
This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.
Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 08:25:44 +08:00
|
|
|
#define KVM_CAP_PPC_RMA 65
|
2011-07-18 22:17:15 +08:00
|
|
|
#define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */
|
2011-09-15 03:45:23 +08:00
|
|
|
#define KVM_CAP_PPC_HIOR 67
|
2011-08-08 23:29:42 +08:00
|
|
|
#define KVM_CAP_PPC_PAPR 68
|
2011-08-19 04:25:21 +08:00
|
|
|
#define KVM_CAP_SW_TLB 69
|
2011-09-14 16:02:41 +08:00
|
|
|
#define KVM_CAP_ONE_REG 70
|
2011-09-20 23:07:29 +08:00
|
|
|
#define KVM_CAP_S390_GMAP 71
|
2011-12-21 19:28:29 +08:00
|
|
|
#define KVM_CAP_TSC_DEADLINE_TIMER 72
|
2012-01-04 17:25:29 +08:00
|
|
|
#define KVM_CAP_S390_UCONTROL 73
|
2012-01-11 18:20:30 +08:00
|
|
|
#define KVM_CAP_SYNC_REGS 74
|
2012-02-28 21:19:54 +08:00
|
|
|
#define KVM_CAP_PCI_2_3 75
|
2012-03-11 03:37:27 +08:00
|
|
|
#define KVM_CAP_KVMCLOCK_CTRL 76
|
2012-03-30 03:14:12 +08:00
|
|
|
#define KVM_CAP_SIGNAL_MSI 77
|
2012-04-27 03:43:42 +08:00
|
|
|
#define KVM_CAP_PPC_GET_SMMU_INFO 78
|
2012-05-15 20:15:25 +08:00
|
|
|
#define KVM_CAP_S390_COW 79
|
KVM: PPC: Book3S HV: Make the guest hash table size configurable
This adds a new ioctl to enable userspace to control the size of the guest
hashed page table (HPT) and to clear it out when resetting the guest.
The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
a pointer to a u32 containing the desired order of the HPT (log base 2
of the size in bytes), which is updated on successful return to the
actual order of the HPT which was allocated.
There must be no vcpus running at the time of this ioctl. To enforce
this, we now keep a count of the number of vcpus running in
kvm->arch.vcpus_running.
If the ioctl is called when a HPT has already been allocated, we don't
reallocate the HPT but just clear it out. We first clear the
kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
the kvm->lock mutex, it will prevent any vcpus from starting to run until
we're done, and (b) it means that the first vcpu to run after we're done
will re-establish the VRMA if necessary.
If userspace doesn't call this ioctl before running the first vcpu, the
kernel will allocate a default-sized HPT at that point. We do it then
rather than when creating the VM, as the code did previously, so that
userspace has a chance to do the ioctl if it wants.
When allocating the HPT, we can allocate either from the kernel page
allocator, or from the preallocated pool. If userspace is asking for
a different size from the preallocated HPTs, we first try to allocate
using the kernel page allocator. Then we try to allocate from the
preallocated pool, and then if that fails, we try allocating decreasing
sizes from the kernel page allocator, down to the minimum size allowed
(256kB). Note that the kernel page allocator limits allocations to
1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
16MB (on 64-bit powerpc, at least).
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix module compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-04 10:32:53 +08:00
|
|
|
#define KVM_CAP_PPC_ALLOC_HTAB 80
|
2012-08-21 11:02:51 +08:00
|
|
|
#define KVM_CAP_READONLY_MEM 81
|
2012-09-22 01:58:03 +08:00
|
|
|
#define KVM_CAP_IRQFD_RESAMPLE 82
|
2012-08-09 04:38:19 +08:00
|
|
|
#define KVM_CAP_PPC_BOOKE_WATCHDOG 83
|
KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).
This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.
The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-20 06:57:20 +08:00
|
|
|
#define KVM_CAP_PPC_HTAB_FD 84
|
2012-12-20 22:32:12 +08:00
|
|
|
#define KVM_CAP_S390_CSS_SUPPORT 85
|
2013-01-05 01:12:48 +08:00
|
|
|
#define KVM_CAP_PPC_EPR 86
|
2013-01-21 07:28:13 +08:00
|
|
|
#define KVM_CAP_ARM_PSCI 87
|
2013-01-24 02:18:04 +08:00
|
|
|
#define KVM_CAP_ARM_SET_DEVICE_ADDR 88
|
2013-04-12 22:08:42 +08:00
|
|
|
#define KVM_CAP_DEVICE_CTRL 89
|
2013-04-12 22:08:47 +08:00
|
|
|
#define KVM_CAP_IRQ_MPIC 90
|
2013-04-18 04:30:00 +08:00
|
|
|
#define KVM_CAP_PPC_RTAS 91
|
2013-04-27 08:28:37 +08:00
|
|
|
#define KVM_CAP_IRQ_XICS 92
|
2013-02-07 18:46:46 +08:00
|
|
|
#define KVM_CAP_ARM_EL1_32BIT 93
|
2013-08-01 12:44:24 +08:00
|
|
|
#define KVM_CAP_SPAPR_MULTITCE 94
|
2013-09-22 22:44:50 +08:00
|
|
|
#define KVM_CAP_EXT_EMUL_CPUID 95
|
2014-01-16 17:18:37 +08:00
|
|
|
#define KVM_CAP_HYPERV_TIME 96
|
2014-02-28 12:06:17 +08:00
|
|
|
#define KVM_CAP_IOAPIC_POLARITY_IGNORED 97
|
2013-10-24 00:26:34 +08:00
|
|
|
#define KVM_CAP_ENABLE_CAP_VM 98
|
2013-07-15 19:36:01 +08:00
|
|
|
#define KVM_CAP_S390_IRQCHIP 99
|
2014-04-01 02:50:38 +08:00
|
|
|
#define KVM_CAP_IOEVENTFD_NO_LENGTH 100
|
Lazy storage key handling
-------------------------
Linux does not use the ACC and F bits of the storage key. Newer Linux
versions also do not use the storage keys for dirty and reference
tracking. We can optimize the guest handling for those guests for faults
as well as page-in and page-out by simply not caring about the guest
visible storage key. We trap guest storage key instruction to enable
those keys only on demand.
Migration bitmap
Until now s390 never provided a proper dirty bitmap. Let's provide a
proper migration bitmap for s390. We also change the user dirty tracking
to a fault based mechanism. This makes the host completely independent
from the storage keys. Long term this will allow us to back guest memory
with large pages.
per-VM device attributes
------------------------
To avoid the introduction of new ioctls, let's provide the
attribute semanantic also on the VM-"device".
Userspace controlled CMMA
-------------------------
The CMMA assist is changed from "always on" to "on if requested" via
per-VM device attributes. In addition a callback to reset all usage
states is provided.
Proper guest DAT handling for intercepts
----------------------------------------
While instructions handled by SIE take care of all addressing aspects,
KVM/s390 currently does not care about guest address translation of
intercepts. This worked out fine, because
- the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real
for all pages up to memory size
- intercepts happen only for a small amount of cases
- all of these intercepts happen to be in the kernel text for current
distros
Of course we need to be better for other intercepts, kernel modules etc.
We provide the infrastructure and rework all in-kernel intercepts to work
on logical addresses (paging etc) instead of real ones. The code has
been running internally for several months now, so it is time for going
public.
GDB support
-----------
We provide breakpoints, single stepping and watchpoints.
Fixes/Cleanups
--------------
- Improve program check delivery
- Factor out the handling of transactional memory on program checks
- Use the existing define __LC_PGM_TDB
- Several cleanups in the lowcore structure
- Documentation
NOTES
-----
- All patches touching base s390 are either ACKed or written by the s390
maintainers
- One base KVM patch "KVM: add kvm_is_error_gpa() helper"
- One patch introduces the notion of VM device attributes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTVlHZAAoJEBF7vIC1phx8REgP/1P0EUzfBpoS53z1v60n2uLT
lW79LY9Op4/ZacEgHtU9LzmGa88X0arDsIpBZQsTNLF77AGFcMCCV3X2il/lQrRG
KSE+ycKLoFjCcES442DwF4gHoGldD+KL/+5LPWSQZtvb9dDpHDft9aeMRBbpUL0Q
M2kKQDlmJ2XqQu3D5PwSHgVRByHiHOzmTe2ejSSbdppkwBpaiqSBBBk0jVYDW9Jh
eqUnBcrrYW2p+QS37ELM6hOkfDXN/vXoHBQeyca19TuZVCPNA7HeJaPc2mJ/GZk9
wrNWEmY3f/lY0lk0zMwBwsDOS5K7jbtvXzcex6m+NsIqQuOvKsmPBy1BWb/axcK5
uZq/JGFC0fxsFU+7ImtvQrJ/DMHnVuvSKF4WUVle2GdMlDIqkguwX27WwHSiH4/r
Au02KlVIMUZdLAEUrw/W/S4MPLeZYoGfetHGCOmSaP2qGc97BVFedZaqekDlUgMw
3gIoQmSIBcfrgF4k9N4nLjdhAX2S4gkviwF3pTlIkecNfa7RcI3Xk7U9mVPmIhL4
IquVqjdXZH4m0e4gViBMtQ0IPwGt1qFlV6Wv3O9MExhfi7VQ8M8TMYNhEvtGpY75
cuZwZYGM4FqszDAy9hbk0avTLqCxqlTiBKi3tHoQMappQmsJPrIdxIpev3MZPHCp
vZMkbzhM9l3eefNJVw66
=jxBp
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-20140422' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue
Lazy storage key handling
-------------------------
Linux does not use the ACC and F bits of the storage key. Newer Linux
versions also do not use the storage keys for dirty and reference
tracking. We can optimize the guest handling for those guests for faults
as well as page-in and page-out by simply not caring about the guest
visible storage key. We trap guest storage key instruction to enable
those keys only on demand.
Migration bitmap
Until now s390 never provided a proper dirty bitmap. Let's provide a
proper migration bitmap for s390. We also change the user dirty tracking
to a fault based mechanism. This makes the host completely independent
from the storage keys. Long term this will allow us to back guest memory
with large pages.
per-VM device attributes
------------------------
To avoid the introduction of new ioctls, let's provide the
attribute semanantic also on the VM-"device".
Userspace controlled CMMA
-------------------------
The CMMA assist is changed from "always on" to "on if requested" via
per-VM device attributes. In addition a callback to reset all usage
states is provided.
Proper guest DAT handling for intercepts
----------------------------------------
While instructions handled by SIE take care of all addressing aspects,
KVM/s390 currently does not care about guest address translation of
intercepts. This worked out fine, because
- the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real
for all pages up to memory size
- intercepts happen only for a small amount of cases
- all of these intercepts happen to be in the kernel text for current
distros
Of course we need to be better for other intercepts, kernel modules etc.
We provide the infrastructure and rework all in-kernel intercepts to work
on logical addresses (paging etc) instead of real ones. The code has
been running internally for several months now, so it is time for going
public.
GDB support
-----------
We provide breakpoints, single stepping and watchpoints.
Fixes/Cleanups
--------------
- Improve program check delivery
- Factor out the handling of transactional memory on program checks
- Use the existing define __LC_PGM_TDB
- Several cleanups in the lowcore structure
- Documentation
NOTES
-----
- All patches touching base s390 are either ACKed or written by the s390
maintainers
- One base KVM patch "KVM: add kvm_is_error_gpa() helper"
- One patch introduces the notion of VM device attributes
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Conflicts:
include/uapi/linux/kvm.h
2014-04-22 21:51:06 +08:00
|
|
|
#define KVM_CAP_VM_ATTRIBUTES 101
|
2014-04-29 13:54:14 +08:00
|
|
|
#define KVM_CAP_ARM_PSCI_0_2 102
|
2014-05-30 20:51:40 +08:00
|
|
|
#define KVM_CAP_PPC_FIXUP_HCALL 103
|
2014-06-02 09:02:59 +08:00
|
|
|
#define KVM_CAP_PPC_ENABLE_HCALL 104
|
2014-07-15 00:33:08 +08:00
|
|
|
#define KVM_CAP_CHECK_EXTENSION_VM 105
|
2014-10-09 20:10:13 +08:00
|
|
|
#define KVM_CAP_S390_USER_SIGP 106
|
2014-06-09 22:57:26 +08:00
|
|
|
#define KVM_CAP_S390_VECTOR_REGISTERS 107
|
2015-02-06 22:01:21 +08:00
|
|
|
#define KVM_CAP_S390_MEM_OP 108
|
2015-01-30 23:55:56 +08:00
|
|
|
#define KVM_CAP_S390_USER_STSI 109
|
2014-09-23 21:23:01 +08:00
|
|
|
#define KVM_CAP_S390_SKEYS 110
|
2014-12-09 07:07:56 +08:00
|
|
|
#define KVM_CAP_MIPS_FPU 111
|
2014-12-09 07:07:56 +08:00
|
|
|
#define KVM_CAP_MIPS_MSA 112
|
2014-11-12 03:57:06 +08:00
|
|
|
#define KVM_CAP_S390_INJECT_IRQ 113
|
2014-11-25 00:13:46 +08:00
|
|
|
#define KVM_CAP_S390_IRQ_STATE 114
|
2015-03-20 17:39:41 +08:00
|
|
|
#define KVM_CAP_PPC_HWRNG 115
|
2008-11-19 19:58:46 +08:00
|
|
|
|
|
|
|
#ifdef KVM_CAP_IRQ_ROUTING
|
|
|
|
|
|
|
|
struct kvm_irq_routing_irqchip {
|
|
|
|
__u32 irqchip;
|
|
|
|
__u32 pin;
|
|
|
|
};
|
|
|
|
|
2009-02-10 13:57:06 +08:00
|
|
|
struct kvm_irq_routing_msi {
|
|
|
|
__u32 address_lo;
|
|
|
|
__u32 address_hi;
|
|
|
|
__u32 data;
|
|
|
|
__u32 pad;
|
|
|
|
};
|
|
|
|
|
2013-07-15 19:36:01 +08:00
|
|
|
struct kvm_irq_routing_s390_adapter {
|
|
|
|
__u64 ind_addr;
|
|
|
|
__u64 summary_addr;
|
|
|
|
__u64 ind_offset;
|
|
|
|
__u32 summary_offset;
|
|
|
|
__u32 adapter_id;
|
|
|
|
};
|
|
|
|
|
2008-11-19 19:58:46 +08:00
|
|
|
/* gsi routing entry types */
|
|
|
|
#define KVM_IRQ_ROUTING_IRQCHIP 1
|
2009-02-10 13:57:06 +08:00
|
|
|
#define KVM_IRQ_ROUTING_MSI 2
|
2013-07-15 19:36:01 +08:00
|
|
|
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
|
2008-11-19 19:58:46 +08:00
|
|
|
|
|
|
|
struct kvm_irq_routing_entry {
|
|
|
|
__u32 gsi;
|
|
|
|
__u32 type;
|
|
|
|
__u32 flags;
|
|
|
|
__u32 pad;
|
|
|
|
union {
|
|
|
|
struct kvm_irq_routing_irqchip irqchip;
|
2009-02-10 13:57:06 +08:00
|
|
|
struct kvm_irq_routing_msi msi;
|
2013-07-15 19:36:01 +08:00
|
|
|
struct kvm_irq_routing_s390_adapter adapter;
|
2008-11-19 19:58:46 +08:00
|
|
|
__u32 pad[8];
|
|
|
|
} u;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_irq_routing {
|
|
|
|
__u32 nr;
|
|
|
|
__u32 flags;
|
|
|
|
struct kvm_irq_routing_entry entries[0];
|
|
|
|
};
|
|
|
|
|
2008-12-31 01:55:06 +08:00
|
|
|
#endif
|
2007-07-06 17:20:49 +08:00
|
|
|
|
2009-05-11 16:48:15 +08:00
|
|
|
#ifdef KVM_CAP_MCE
|
|
|
|
/* x86 MCE */
|
|
|
|
struct kvm_x86_mce {
|
|
|
|
__u64 status;
|
|
|
|
__u64 addr;
|
|
|
|
__u64 misc;
|
|
|
|
__u64 mcg_status;
|
|
|
|
__u8 bank;
|
|
|
|
__u8 pad1[7];
|
|
|
|
__u64 pad2[3];
|
|
|
|
};
|
|
|
|
#endif
|
|
|
|
|
2009-10-16 06:21:43 +08:00
|
|
|
#ifdef KVM_CAP_XEN_HVM
|
|
|
|
struct kvm_xen_hvm_config {
|
|
|
|
__u32 flags;
|
|
|
|
__u32 msr;
|
|
|
|
__u64 blob_addr_32;
|
|
|
|
__u64 blob_addr_64;
|
|
|
|
__u8 blob_size_32;
|
|
|
|
__u8 blob_size_64;
|
|
|
|
__u8 pad2[30];
|
|
|
|
};
|
|
|
|
#endif
|
|
|
|
|
2009-05-20 22:30:49 +08:00
|
|
|
#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
|
2012-09-22 01:58:03 +08:00
|
|
|
/*
|
|
|
|
* Available with KVM_CAP_IRQFD_RESAMPLE
|
|
|
|
*
|
|
|
|
* KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
|
|
|
|
* the irqfd to operate in resampling mode for level triggered interrupt
|
|
|
|
* emlation. See Documentation/virtual/kvm/api.txt.
|
|
|
|
*/
|
|
|
|
#define KVM_IRQFD_FLAG_RESAMPLE (1 << 1)
|
2009-05-20 22:30:49 +08:00
|
|
|
|
|
|
|
struct kvm_irqfd {
|
|
|
|
__u32 fd;
|
|
|
|
__u32 gsi;
|
|
|
|
__u32 flags;
|
2012-09-22 01:58:03 +08:00
|
|
|
__u32 resamplefd;
|
|
|
|
__u8 pad[16];
|
2009-05-20 22:30:49 +08:00
|
|
|
};
|
|
|
|
|
2009-10-17 03:28:36 +08:00
|
|
|
struct kvm_clock_data {
|
|
|
|
__u64 clock;
|
|
|
|
__u32 flags;
|
|
|
|
__u32 pad[9];
|
|
|
|
};
|
|
|
|
|
2011-08-19 04:25:21 +08:00
|
|
|
#define KVM_MMU_FSL_BOOKE_NOHV 0
|
|
|
|
#define KVM_MMU_FSL_BOOKE_HV 1
|
|
|
|
|
|
|
|
struct kvm_config_tlb {
|
|
|
|
__u64 params;
|
|
|
|
__u64 array;
|
|
|
|
__u32 mmu_type;
|
|
|
|
__u32 array_len;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_dirty_tlb {
|
|
|
|
__u64 bitmap;
|
|
|
|
__u32 num_dirty;
|
|
|
|
};
|
|
|
|
|
2011-09-14 16:02:41 +08:00
|
|
|
/* Available with KVM_CAP_ONE_REG */
|
|
|
|
|
|
|
|
#define KVM_REG_ARCH_MASK 0xff00000000000000ULL
|
|
|
|
#define KVM_REG_GENERIC 0x0000000000000000ULL
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Architecture specific registers are to be defined in arch headers and
|
|
|
|
* ORed with the arch identifier.
|
|
|
|
*/
|
|
|
|
#define KVM_REG_PPC 0x1000000000000000ULL
|
|
|
|
#define KVM_REG_X86 0x2000000000000000ULL
|
|
|
|
#define KVM_REG_IA64 0x3000000000000000ULL
|
|
|
|
#define KVM_REG_ARM 0x4000000000000000ULL
|
|
|
|
#define KVM_REG_S390 0x5000000000000000ULL
|
2012-12-11 00:15:34 +08:00
|
|
|
#define KVM_REG_ARM64 0x6000000000000000ULL
|
2013-06-11 03:33:47 +08:00
|
|
|
#define KVM_REG_MIPS 0x7000000000000000ULL
|
2011-09-14 16:02:41 +08:00
|
|
|
|
|
|
|
#define KVM_REG_SIZE_SHIFT 52
|
|
|
|
#define KVM_REG_SIZE_MASK 0x00f0000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U8 0x0000000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U16 0x0010000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U32 0x0020000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U64 0x0030000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U128 0x0040000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U256 0x0050000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U512 0x0060000000000000ULL
|
|
|
|
#define KVM_REG_SIZE_U1024 0x0070000000000000ULL
|
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
struct kvm_reg_list {
|
|
|
|
__u64 n; /* number of regs */
|
|
|
|
__u64 reg[0];
|
|
|
|
};
|
|
|
|
|
2011-09-14 16:02:41 +08:00
|
|
|
struct kvm_one_reg {
|
|
|
|
__u64 id;
|
|
|
|
__u64 addr;
|
|
|
|
};
|
|
|
|
|
2012-03-30 03:14:12 +08:00
|
|
|
struct kvm_msi {
|
|
|
|
__u32 address_lo;
|
|
|
|
__u32 address_hi;
|
|
|
|
__u32 data;
|
|
|
|
__u32 flags;
|
|
|
|
__u8 pad[16];
|
|
|
|
};
|
|
|
|
|
2013-01-24 02:18:04 +08:00
|
|
|
struct kvm_arm_device_addr {
|
|
|
|
__u64 id;
|
|
|
|
__u64 addr;
|
|
|
|
};
|
|
|
|
|
2013-04-12 22:08:42 +08:00
|
|
|
/*
|
|
|
|
* Device control API, available with KVM_CAP_DEVICE_CTRL
|
|
|
|
*/
|
|
|
|
#define KVM_CREATE_DEVICE_TEST 1
|
|
|
|
|
|
|
|
struct kvm_create_device {
|
|
|
|
__u32 type; /* in: KVM_DEV_TYPE_xxx */
|
|
|
|
__u32 fd; /* out: device handle */
|
|
|
|
__u32 flags; /* in: KVM_CREATE_DEVICE_xxx */
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_device_attr {
|
|
|
|
__u32 flags; /* no flags currently defined */
|
|
|
|
__u32 group; /* device-defined */
|
|
|
|
__u64 attr; /* group-defined */
|
|
|
|
__u64 addr; /* userspace address of attr data */
|
|
|
|
};
|
|
|
|
|
2013-10-31 01:02:17 +08:00
|
|
|
#define KVM_DEV_VFIO_GROUP 1
|
|
|
|
#define KVM_DEV_VFIO_GROUP_ADD 1
|
|
|
|
#define KVM_DEV_VFIO_GROUP_DEL 2
|
2014-09-02 17:27:33 +08:00
|
|
|
|
|
|
|
enum kvm_device_type {
|
|
|
|
KVM_DEV_TYPE_FSL_MPIC_20 = 1,
|
|
|
|
#define KVM_DEV_TYPE_FSL_MPIC_20 KVM_DEV_TYPE_FSL_MPIC_20
|
|
|
|
KVM_DEV_TYPE_FSL_MPIC_42,
|
|
|
|
#define KVM_DEV_TYPE_FSL_MPIC_42 KVM_DEV_TYPE_FSL_MPIC_42
|
|
|
|
KVM_DEV_TYPE_XICS,
|
|
|
|
#define KVM_DEV_TYPE_XICS KVM_DEV_TYPE_XICS
|
|
|
|
KVM_DEV_TYPE_VFIO,
|
|
|
|
#define KVM_DEV_TYPE_VFIO KVM_DEV_TYPE_VFIO
|
|
|
|
KVM_DEV_TYPE_ARM_VGIC_V2,
|
|
|
|
#define KVM_DEV_TYPE_ARM_VGIC_V2 KVM_DEV_TYPE_ARM_VGIC_V2
|
|
|
|
KVM_DEV_TYPE_FLIC,
|
|
|
|
#define KVM_DEV_TYPE_FLIC KVM_DEV_TYPE_FLIC
|
2014-06-07 06:54:51 +08:00
|
|
|
KVM_DEV_TYPE_ARM_VGIC_V3,
|
|
|
|
#define KVM_DEV_TYPE_ARM_VGIC_V3 KVM_DEV_TYPE_ARM_VGIC_V3
|
2014-09-02 17:27:33 +08:00
|
|
|
KVM_DEV_TYPE_MAX,
|
|
|
|
};
|
2013-04-12 22:08:46 +08:00
|
|
|
|
2007-02-22 01:28:04 +08:00
|
|
|
/*
|
|
|
|
* ioctls for VM fds
|
|
|
|
*/
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
|
2007-02-22 00:04:26 +08:00
|
|
|
/*
|
|
|
|
* KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
|
|
|
|
* a vcpu fd.
|
|
|
|
*/
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
|
|
|
|
#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
|
2010-06-21 16:44:20 +08:00
|
|
|
/* KVM_SET_MEMORY_ALIAS is obsolete: */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias)
|
|
|
|
#define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
|
|
|
|
#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
|
|
|
|
#define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
|
2009-06-09 16:33:36 +08:00
|
|
|
struct kvm_userspace_memory_region)
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
|
|
|
|
#define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
|
2012-01-04 17:25:21 +08:00
|
|
|
|
|
|
|
/* enable ucontrol for s390 */
|
|
|
|
struct kvm_s390_ucas_mapping {
|
|
|
|
__u64 user_addr;
|
|
|
|
__u64 vcpu_addr;
|
|
|
|
__u64 length;
|
|
|
|
};
|
|
|
|
#define KVM_S390_UCAS_MAP _IOW(KVMIO, 0x50, struct kvm_s390_ucas_mapping)
|
|
|
|
#define KVM_S390_UCAS_UNMAP _IOW(KVMIO, 0x51, struct kvm_s390_ucas_mapping)
|
2012-01-04 17:25:26 +08:00
|
|
|
#define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long)
|
2012-01-04 17:25:21 +08:00
|
|
|
|
2007-07-06 17:20:49 +08:00
|
|
|
/* Device model IOC */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60)
|
|
|
|
#define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level)
|
|
|
|
#define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip)
|
|
|
|
#define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip)
|
|
|
|
#define KVM_CREATE_PIT _IO(KVMIO, 0x64)
|
|
|
|
#define KVM_GET_PIT _IOWR(KVMIO, 0x65, struct kvm_pit_state)
|
|
|
|
#define KVM_SET_PIT _IOR(KVMIO, 0x66, struct kvm_pit_state)
|
|
|
|
#define KVM_IRQ_LINE_STATUS _IOWR(KVMIO, 0x67, struct kvm_irq_level)
|
2008-05-30 22:05:54 +08:00
|
|
|
#define KVM_REGISTER_COALESCED_MMIO \
|
|
|
|
_IOW(KVMIO, 0x67, struct kvm_coalesced_mmio_zone)
|
|
|
|
#define KVM_UNREGISTER_COALESCED_MMIO \
|
|
|
|
_IOW(KVMIO, 0x68, struct kvm_coalesced_mmio_zone)
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_ASSIGN_PCI_DEVICE _IOR(KVMIO, 0x69, \
|
|
|
|
struct kvm_assigned_pci_dev)
|
|
|
|
#define KVM_SET_GSI_ROUTING _IOW(KVMIO, 0x6a, struct kvm_irq_routing)
|
2009-03-12 21:45:39 +08:00
|
|
|
/* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_ASSIGN_IRQ __KVM_DEPRECATED_VM_R_0x70
|
|
|
|
#define KVM_ASSIGN_DEV_IRQ _IOW(KVMIO, 0x70, struct kvm_assigned_irq)
|
|
|
|
#define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71)
|
|
|
|
#define KVM_DEASSIGN_PCI_DEVICE _IOW(KVMIO, 0x72, \
|
|
|
|
struct kvm_assigned_pci_dev)
|
|
|
|
#define KVM_ASSIGN_SET_MSIX_NR _IOW(KVMIO, 0x73, \
|
|
|
|
struct kvm_assigned_msix_nr)
|
|
|
|
#define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO, 0x74, \
|
|
|
|
struct kvm_assigned_msix_entry)
|
|
|
|
#define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
|
|
|
|
#define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd)
|
|
|
|
#define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config)
|
|
|
|
#define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78)
|
|
|
|
#define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
|
|
|
|
#define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
|
|
|
|
#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
|
|
|
|
#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
|
|
|
|
/* Available with KVM_CAP_PIT_STATE2 */
|
|
|
|
#define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2)
|
|
|
|
#define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
|
2010-07-29 20:48:08 +08:00
|
|
|
/* Available with KVM_CAP_PPC_GET_PVINFO */
|
|
|
|
#define KVM_PPC_GET_PVINFO _IOW(KVMIO, 0xa1, struct kvm_ppc_pvinfo)
|
2011-03-25 16:44:51 +08:00
|
|
|
/* Available with KVM_CAP_TSC_CONTROL */
|
|
|
|
#define KVM_SET_TSC_KHZ _IO(KVMIO, 0xa2)
|
|
|
|
#define KVM_GET_TSC_KHZ _IO(KVMIO, 0xa3)
|
2012-02-28 21:19:54 +08:00
|
|
|
/* Available with KVM_CAP_PCI_2_3 */
|
|
|
|
#define KVM_ASSIGN_SET_INTX_MASK _IOW(KVMIO, 0xa4, \
|
|
|
|
struct kvm_assigned_pci_dev)
|
2012-03-30 03:14:12 +08:00
|
|
|
/* Available with KVM_CAP_SIGNAL_MSI */
|
|
|
|
#define KVM_SIGNAL_MSI _IOW(KVMIO, 0xa5, struct kvm_msi)
|
2012-04-27 03:43:42 +08:00
|
|
|
/* Available with KVM_CAP_PPC_GET_SMMU_INFO */
|
|
|
|
#define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info)
|
KVM: PPC: Book3S HV: Make the guest hash table size configurable
This adds a new ioctl to enable userspace to control the size of the guest
hashed page table (HPT) and to clear it out when resetting the guest.
The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
a pointer to a u32 containing the desired order of the HPT (log base 2
of the size in bytes), which is updated on successful return to the
actual order of the HPT which was allocated.
There must be no vcpus running at the time of this ioctl. To enforce
this, we now keep a count of the number of vcpus running in
kvm->arch.vcpus_running.
If the ioctl is called when a HPT has already been allocated, we don't
reallocate the HPT but just clear it out. We first clear the
kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
the kvm->lock mutex, it will prevent any vcpus from starting to run until
we're done, and (b) it means that the first vcpu to run after we're done
will re-establish the VRMA if necessary.
If userspace doesn't call this ioctl before running the first vcpu, the
kernel will allocate a default-sized HPT at that point. We do it then
rather than when creating the VM, as the code did previously, so that
userspace has a chance to do the ioctl if it wants.
When allocating the HPT, we can allocate either from the kernel page
allocator, or from the preallocated pool. If userspace is asking for
a different size from the preallocated HPTs, we first try to allocate
using the kernel page allocator. Then we try to allocate from the
preallocated pool, and then if that fails, we try allocating decreasing
sizes from the kernel page allocator, down to the minimum size allowed
(256kB). Note that the kernel page allocator limits allocations to
1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
16MB (on 64-bit powerpc, at least).
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix module compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-04 10:32:53 +08:00
|
|
|
/* Available with KVM_CAP_PPC_ALLOC_HTAB */
|
|
|
|
#define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32)
|
2012-09-14 05:44:30 +08:00
|
|
|
#define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce)
|
|
|
|
/* Available with KVM_CAP_RMA */
|
|
|
|
#define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma)
|
KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).
This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.
The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-20 06:57:20 +08:00
|
|
|
/* Available with KVM_CAP_PPC_HTAB_FD */
|
|
|
|
#define KVM_PPC_GET_HTAB_FD _IOW(KVMIO, 0xaa, struct kvm_get_htab_fd)
|
2013-01-24 02:18:04 +08:00
|
|
|
/* Available with KVM_CAP_ARM_SET_DEVICE_ADDR */
|
|
|
|
#define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
|
2013-04-18 04:30:00 +08:00
|
|
|
/* Available with KVM_CAP_PPC_RTAS */
|
|
|
|
#define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
|
2007-02-22 00:04:26 +08:00
|
|
|
|
2013-04-12 22:08:42 +08:00
|
|
|
/* ioctl for vm fd */
|
|
|
|
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
|
|
|
|
|
|
|
|
/* ioctls for fds returned by KVM_CREATE_DEVICE */
|
|
|
|
#define KVM_SET_DEVICE_ATTR _IOW(KVMIO, 0xe1, struct kvm_device_attr)
|
|
|
|
#define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
|
|
|
|
#define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
|
|
|
|
|
2007-02-22 00:04:26 +08:00
|
|
|
/*
|
|
|
|
* ioctls for vcpu fds
|
|
|
|
*/
|
2007-03-01 23:20:13 +08:00
|
|
|
#define KVM_RUN _IO(KVMIO, 0x80)
|
|
|
|
#define KVM_GET_REGS _IOR(KVMIO, 0x81, struct kvm_regs)
|
|
|
|
#define KVM_SET_REGS _IOW(KVMIO, 0x82, struct kvm_regs)
|
|
|
|
#define KVM_GET_SREGS _IOR(KVMIO, 0x83, struct kvm_sregs)
|
|
|
|
#define KVM_SET_SREGS _IOW(KVMIO, 0x84, struct kvm_sregs)
|
|
|
|
#define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation)
|
|
|
|
#define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt)
|
2008-12-15 20:52:10 +08:00
|
|
|
/* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_DEBUG_GUEST __KVM_DEPRECATED_VCPU_W_0x87
|
2007-03-01 23:20:13 +08:00
|
|
|
#define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs)
|
|
|
|
#define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs)
|
|
|
|
#define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid)
|
2007-03-06 01:46:05 +08:00
|
|
|
#define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask)
|
2007-04-01 21:34:31 +08:00
|
|
|
#define KVM_GET_FPU _IOR(KVMIO, 0x8c, struct kvm_fpu)
|
|
|
|
#define KVM_SET_FPU _IOW(KVMIO, 0x8d, struct kvm_fpu)
|
2007-09-06 17:22:56 +08:00
|
|
|
#define KVM_GET_LAPIC _IOR(KVMIO, 0x8e, struct kvm_lapic_state)
|
|
|
|
#define KVM_SET_LAPIC _IOW(KVMIO, 0x8f, struct kvm_lapic_state)
|
2007-11-21 23:10:04 +08:00
|
|
|
#define KVM_SET_CPUID2 _IOW(KVMIO, 0x90, struct kvm_cpuid2)
|
|
|
|
#define KVM_GET_CPUID2 _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
|
2007-10-22 22:50:39 +08:00
|
|
|
/* Available with KVM_CAP_VAPIC */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_TPR_ACCESS_REPORTING _IOWR(KVMIO, 0x92, struct kvm_tpr_access_ctl)
|
2007-10-25 22:52:32 +08:00
|
|
|
/* Available with KVM_CAP_VAPIC */
|
|
|
|
#define KVM_SET_VAPIC_ADDR _IOW(KVMIO, 0x93, struct kvm_vapic_addr)
|
KVM: s390: interrupt subsystem, cpu timer, waitpsw
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).
In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.
This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.
The following interrupts are supported:
SIGP STOP - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
(stopped) remote cpu
INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
service processor
struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.
kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.
[christian: change virtio interrupt to 0x2603]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-26 01:47:26 +08:00
|
|
|
/* valid for virtual machine (for floating interrupt)_and_ vcpu */
|
|
|
|
#define KVM_S390_INTERRUPT _IOW(KVMIO, 0x94, struct kvm_s390_interrupt)
|
2008-03-26 01:47:20 +08:00
|
|
|
/* store status for s390 */
|
|
|
|
#define KVM_S390_STORE_STATUS_NOADDR (-1ul)
|
|
|
|
#define KVM_S390_STORE_STATUS_PREFIXED (-2ul)
|
|
|
|
#define KVM_S390_STORE_STATUS _IOW(KVMIO, 0x95, unsigned long)
|
|
|
|
/* initial ipl psw for s390 */
|
|
|
|
#define KVM_S390_SET_INITIAL_PSW _IOW(KVMIO, 0x96, struct kvm_s390_psw)
|
|
|
|
/* initial reset for s390 */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97)
|
2008-04-12 00:24:45 +08:00
|
|
|
#define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state)
|
|
|
|
#define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state)
|
2014-08-26 20:00:38 +08:00
|
|
|
/* Available with KVM_CAP_USER_NMI */
|
2009-11-03 00:20:28 +08:00
|
|
|
#define KVM_NMI _IO(KVMIO, 0x9a)
|
2008-12-15 20:52:10 +08:00
|
|
|
/* Available with KVM_CAP_SET_GUEST_DEBUG */
|
|
|
|
#define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug)
|
2009-05-11 16:48:15 +08:00
|
|
|
/* MCE for x86 */
|
|
|
|
#define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64)
|
|
|
|
#define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64)
|
|
|
|
#define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce)
|
2009-11-12 08:04:25 +08:00
|
|
|
/* Available with KVM_CAP_VCPU_EVENTS */
|
|
|
|
#define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events)
|
|
|
|
#define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events)
|
2010-02-15 17:45:43 +08:00
|
|
|
/* Available with KVM_CAP_DEBUGREGS */
|
|
|
|
#define KVM_GET_DEBUGREGS _IOR(KVMIO, 0xa1, struct kvm_debugregs)
|
|
|
|
#define KVM_SET_DEBUGREGS _IOW(KVMIO, 0xa2, struct kvm_debugregs)
|
2013-10-24 00:26:34 +08:00
|
|
|
/*
|
|
|
|
* vcpu version available with KVM_ENABLE_CAP
|
|
|
|
* vm version available with KVM_CAP_ENABLE_CAP_VM
|
|
|
|
*/
|
2010-03-25 04:48:29 +08:00
|
|
|
#define KVM_ENABLE_CAP _IOW(KVMIO, 0xa3, struct kvm_enable_cap)
|
2010-06-13 17:29:39 +08:00
|
|
|
/* Available with KVM_CAP_XSAVE */
|
|
|
|
#define KVM_GET_XSAVE _IOR(KVMIO, 0xa4, struct kvm_xsave)
|
|
|
|
#define KVM_SET_XSAVE _IOW(KVMIO, 0xa5, struct kvm_xsave)
|
|
|
|
/* Available with KVM_CAP_XCRS */
|
|
|
|
#define KVM_GET_XCRS _IOR(KVMIO, 0xa6, struct kvm_xcrs)
|
|
|
|
#define KVM_SET_XCRS _IOW(KVMIO, 0xa7, struct kvm_xcrs)
|
2011-08-19 04:25:21 +08:00
|
|
|
/* Available with KVM_CAP_SW_TLB */
|
|
|
|
#define KVM_DIRTY_TLB _IOW(KVMIO, 0xaa, struct kvm_dirty_tlb)
|
2011-09-14 16:02:41 +08:00
|
|
|
/* Available with KVM_CAP_ONE_REG */
|
|
|
|
#define KVM_GET_ONE_REG _IOW(KVMIO, 0xab, struct kvm_one_reg)
|
|
|
|
#define KVM_SET_ONE_REG _IOW(KVMIO, 0xac, struct kvm_one_reg)
|
2012-03-11 03:37:27 +08:00
|
|
|
/* VM is being stopped by host */
|
|
|
|
#define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
|
2013-01-21 07:28:06 +08:00
|
|
|
#define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)
|
2013-09-30 16:50:07 +08:00
|
|
|
#define KVM_ARM_PREFERRED_TARGET _IOR(KVMIO, 0xaf, struct kvm_vcpu_init)
|
2013-01-21 07:28:06 +08:00
|
|
|
#define KVM_GET_REG_LIST _IOWR(KVMIO, 0xb0, struct kvm_reg_list)
|
2015-02-06 22:01:21 +08:00
|
|
|
/* Available with KVM_CAP_S390_MEM_OP */
|
|
|
|
#define KVM_S390_MEM_OP _IOW(KVMIO, 0xb1, struct kvm_s390_mem_op)
|
2014-09-23 21:23:01 +08:00
|
|
|
/* Available with KVM_CAP_S390_SKEYS */
|
|
|
|
#define KVM_S390_GET_SKEYS _IOW(KVMIO, 0xb2, struct kvm_s390_skeys)
|
|
|
|
#define KVM_S390_SET_SKEYS _IOW(KVMIO, 0xb3, struct kvm_s390_skeys)
|
2014-11-12 03:57:06 +08:00
|
|
|
/* Available with KVM_CAP_S390_INJECT_IRQ */
|
|
|
|
#define KVM_S390_IRQ _IOW(KVMIO, 0xb4, struct kvm_s390_irq)
|
2014-11-25 00:13:46 +08:00
|
|
|
/* Available with KVM_CAP_S390_IRQ_STATE */
|
|
|
|
#define KVM_S390_SET_IRQ_STATE _IOW(KVMIO, 0xb5, struct kvm_s390_irq_state)
|
|
|
|
#define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
|
2008-07-02 05:23:49 +08:00
|
|
|
|
2009-03-12 21:45:39 +08:00
|
|
|
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
|
2012-02-28 21:19:54 +08:00
|
|
|
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
|
|
|
|
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
|
2009-03-12 21:45:39 +08:00
|
|
|
|
2008-07-29 00:26:26 +08:00
|
|
|
struct kvm_assigned_pci_dev {
|
|
|
|
__u32 assigned_dev_id;
|
|
|
|
__u32 busnr;
|
|
|
|
__u32 devfn;
|
|
|
|
__u32 flags;
|
2010-01-29 14:38:44 +08:00
|
|
|
__u32 segnr;
|
2008-10-19 22:39:45 +08:00
|
|
|
union {
|
2010-01-29 14:38:44 +08:00
|
|
|
__u32 reserved[11];
|
2008-10-19 22:39:45 +08:00
|
|
|
};
|
2008-07-29 00:26:26 +08:00
|
|
|
};
|
|
|
|
|
2009-03-12 21:45:39 +08:00
|
|
|
#define KVM_DEV_IRQ_HOST_INTX (1 << 0)
|
|
|
|
#define KVM_DEV_IRQ_HOST_MSI (1 << 1)
|
|
|
|
#define KVM_DEV_IRQ_HOST_MSIX (1 << 2)
|
|
|
|
|
|
|
|
#define KVM_DEV_IRQ_GUEST_INTX (1 << 8)
|
|
|
|
#define KVM_DEV_IRQ_GUEST_MSI (1 << 9)
|
|
|
|
#define KVM_DEV_IRQ_GUEST_MSIX (1 << 10)
|
|
|
|
|
|
|
|
#define KVM_DEV_IRQ_HOST_MASK 0x00ff
|
|
|
|
#define KVM_DEV_IRQ_GUEST_MASK 0xff00
|
|
|
|
|
2008-07-29 00:26:26 +08:00
|
|
|
struct kvm_assigned_irq {
|
|
|
|
__u32 assigned_dev_id;
|
2011-06-03 14:51:05 +08:00
|
|
|
__u32 host_irq; /* ignored (legacy field) */
|
2008-07-29 00:26:26 +08:00
|
|
|
__u32 guest_irq;
|
|
|
|
__u32 flags;
|
2008-10-19 22:39:45 +08:00
|
|
|
union {
|
|
|
|
__u32 reserved[12];
|
|
|
|
};
|
2008-07-29 00:26:26 +08:00
|
|
|
};
|
|
|
|
|
2009-02-25 17:22:26 +08:00
|
|
|
struct kvm_assigned_msix_nr {
|
|
|
|
__u32 assigned_dev_id;
|
|
|
|
__u16 entry_nr;
|
|
|
|
__u16 padding;
|
|
|
|
};
|
|
|
|
|
2009-05-21 13:50:13 +08:00
|
|
|
#define KVM_MAX_MSIX_PER_DEV 256
|
2009-02-25 17:22:26 +08:00
|
|
|
struct kvm_assigned_msix_entry {
|
|
|
|
__u32 assigned_dev_id;
|
|
|
|
__u32 gsi;
|
|
|
|
__u16 entry; /* The index of entry in the MSI-X table */
|
|
|
|
__u16 padding[3];
|
|
|
|
};
|
|
|
|
|
2009-11-03 00:20:28 +08:00
|
|
|
#endif /* __LINUX_KVM_H */
|