OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	5dea1c88ed	lguest: use a special 1:1 linear pagetable mode until first switch. The Host used to create some page tables for the Guest to use at the top of Guest memory; it would then tell the Guest where this was. In particular, it created linear mappings for 0 and 0xC0000000 addresses because lguest used to switch to its real page tables quite late in boot. However, since `d50d8fe19` Linux initialized boot page tables in head_32.S even before the "are we lguest?" boot jump. So, now we can simplify things: the Host pagetable code assumes 1:1 linear mapping until it first calls the LHCALL_NEW_PGTABLE hypercall, which we now do before we reach C code. This also means that the Host doesn't need to know anything about the Guest's PAGE_OFFSET. (Non-Linux guests might not even have such a thing). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-07-22 14:39:48 +09:30
Rob Landley	6151658751	Correct occurrences of - Documentation/kvm/ to Documentation/virtual/kvm - Documentation/uml/ to Documentation/virtual/uml - Documentation/lguest/ to Documentation/virtual/lguest throughout the kernel source tree. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>	2011-05-06 09:27:55 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Rusty Russell	ced05dd741	lguest: compile fixes arch/x86/lguest/boot.c: In function ‘lguest_init_IRQ’: arch/x86/lguest/boot.c:824: error: macro "__this_cpu_write" requires 2 arguments, but only 1 given arch/x86/lguest/boot.c:824: error: ‘__this_cpu_write’ undeclared (first use in this function) arch/x86/lguest/boot.c:824: error: (Each undeclared identifier is reported only once arch/x86/lguest/boot.c:824: error: for each function it appears in.) drivers/lguest/x86/core.c: In function ‘copy_in_guest_info’: drivers/lguest/x86/core.c:94: error: lvalue required as left operand of assignment Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-01-20 21:37:29 +10:30
Christoph Lameter	c9f2954964	lguest: Use this_cpu_ops Use this_cpu_ops in a couple of places in lguest. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-01-20 21:37:29 +10:30
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Rusty Russell	091ebf07a2	lguest: stop using KVM hypercall mechanism This is a partial revert of `4cd8b5e2a1` "lguest: use KVM hypercalls"; we revert to using (just as questionable but more reliable) int $15 for hypercalls. I didn't revert the register mapping, so we still use the same calling convention as kvm. KVM in more recent incarnations stopped injecting a fault when a guest tried to use the VMCALL instruction from ring 1, so lguest under kvm fails to make hypercalls. It was nice to share code with our KVM cousins, but this was overreach. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Matias Zabaljauregui <zabaljauregui@gmail.com> Cc: Avi Kivity <avi@redhat.com>	2010-04-14 21:43:56 +09:30
Rusty Russell	5094aeafbb	lguest: workaround cmpxchg8b_emu by ignoring cli in the guest. It's only used by cmpxchg8b_emu (see `db677ffa5f` for the gory details), and fixing that to be paravirt aware would be more work than simply ignoring it (and AFAICT only help lguest). This makes lguest work on machines which have cmpxchg8b, for kernels compiled for older processors. (We can't emulate it properly: the popf which expects to restore interrupts does not trap). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: virtualization@lists.osdl.org	2010-04-14 21:43:54 +09:30
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Rusty Russell	3e27249c84	lguest: fix bug in setting guest GDT entry We kill the guest, but then we blatt random stuff. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-04 12:33:33 -08:00
Linus Torvalds	d0316554d3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c	2009-12-14 09:58:24 -08:00
Tejun Heo	390dfd95c5	percpu: make misc percpu symbols unique This patch updates misc percpu related symbols such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * drivers/crypto/padlock-aes.c: s/last_cword/paes_last_cword/ * drivers/lguest/x86/core.c: s/last_cpu/lg_last_cpu/ * drivers/s390/net/netiucv.c: rename the variable used in a macro to avoid clashing with percpu symbol * arch/mn10300/kernel/kprobes.c: replace current_ prefix with cur_ for static variables. Please note that percpu symbol current_kprobe can't be changed as it's used by generic code. Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com	2009-10-29 22:34:14 +09:00
Alexey Dobriyan	d43c36dc6b	headers: remove sched.h from interrupt.h After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-10-11 11:20:58 -07:00
Alexey Dobriyan	828c09509b	const: constify remaining file_operations [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-01 16:11:11 -07:00
Linus Torvalds	1f0918d03f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: lguest: don't force VIRTIO_F_NOTIFY_ON_EMPTY lguest: cleanup for map_switcher() lguest: use PGDIR_SHIFT for PAE code to allow different PAGE_OFFSET lguest: use set_pte/set_pmd uniformly for real page table entries lguest: move panic notifier registration to its expected place. virtio_blk: add support for cache flush virtio: add virtio IDs file virtio: get rid of redundant VIRTIO_ID_9P definition virtio: make add_buf return capacity remaining virtio_pci: minor MSI-X cleanups	2009-09-23 09:23:45 -07:00
Xiao Guangrong	6c189d8312	lguest: cleanup for map_switcher() We can use alloc_page() instead of get_zeroed_page() and virt_to_page() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-09-23 22:26:47 +09:30
Rusty Russell	fb100d78c0	lguest: use PGDIR_SHIFT for PAE code to allow different PAGE_OFFSET We still assume the Guest and Host have the same PAGE_OFFSET settings, but now we don't assume 0xC0000000. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Matias Zabaljauregui <zabaljauregui@gmail.com>	2009-09-23 22:26:46 +09:30
Rusty Russell	4c1ea3dd71	lguest: use set_pte/set_pmd uniformly for real page table entries If we're building a pte, we can use simple assigment; only use set_pte etc. when we're actually going to use that destination as a PTE. I don't know that we'll ever run under Xen, but it's neater. And use set_pte/set_pmd rather than assuming native_ versions, even though that's probably true for most people. (Includes compile fix by Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Matias Zabaljauregui <zabaljauregui@gmail.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>	2009-09-23 22:26:46 +09:30
Anand Gadiyar	fd589a8f0a	trivial: fix typo "to to" in multiple files Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-09-21 15:14:55 +02:00
Rusty Russell	1842f23c05	lguest and virtio: cleanup struct definitions to Linux style. I've been doing this for years, and akpm picked me up on it about 12 months ago. lguest partly serves as example code, so let's do it Right. Also, remove two unused fields in struct vblk_info in the example launcher. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>	2009-07-30 16:03:46 +09:30
Rusty Russell	a91d74a3c4	lguest: update commentry Every so often, after code shuffles, I need to go through and unbitrot the Lguest Journey (see drivers/lguest/README). Since we now use RCU in a simple form in one place I took the opportunity to expand that explanation. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>	2009-07-30 16:03:46 +09:30
Rusty Russell	2e04ef7691	lguest: fix comment style I don't really notice it (except to begrudge the extra vertical space), but Ingo does. And he pointed out that one excuse of lguest is as a teaching tool, it should set a good example. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>	2009-07-30 16:03:45 +09:30
Dan Carpenter	f294526279	lguest: dereferencing freed mem in add_eventfd() "new" was freed and then dereferenced. Also the return value wasn't being used so I modified the caller as well. Compile tested only. Found by smatch (http://repo.or.cz/w/smatch.git). regards, dan carpenter Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-07-30 16:03:43 +09:30
Davide Libenzi	27de22d03d	lguest: remove unnecessary forward struct declaration While fixing lg.h to drop the fwd declaration, I noticed there's another one ;) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-07-17 21:47:44 +09:30
Davide Libenzi	133890103b	eventfd: revised interface and cleanups Change the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. This patch is required by KVM's IRQfd code, which is still under development. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Gregory Haskins <ghaskins@novell.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:55:58 -07:00
Linus Torvalds	7f3591cfac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits) lguest: add support for indirect ring entries lguest: suppress notifications in example Launcher lguest: try to batch interrupts on network receive lguest: avoid sending interrupts to Guest when no activity occurs. lguest: implement deferred interrupts in example Launcher lguest: remove obsolete LHREQ_BREAK call lguest: have example Launcher service all devices in separate threads lguest: use eventfds for device notification eventfd: export eventfd_signal and eventfd_fget for lguest lguest: allow any process to send interrupts lguest: PAE fixes lguest: PAE support lguest: Add support for kvm_hypercall4() lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated lguest: map switcher with executable page table entries lguest: fix writev returning short on console output lguest: clean up length-used value in example launcher lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. lguest: beyond ARRAY_SIZE of cpu->arch.gdt ...	2009-06-12 09:32:26 -07:00
Rusty Russell	5dac051bc6	lguest: remove obsolete LHREQ_BREAK call We no longer need an efficient mechanism to force the Guest back into host userspace, as each device is serviced without bothering the main Guest process (aka. the Launcher). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:11 +09:30
Rusty Russell	df60aeef4f	lguest: use eventfds for device notification Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Davide Libenzi <davidel@xmailserver.org>	2009-06-12 22:27:10 +09:30
Rusty Russell	9f155a9b3d	lguest: allow any process to send interrupts We currently only allow the Launcher process to send interrupts, but it as we already send interrupts from the hrtimer, it's a simple matter of extracting that code into a common set_interrupt routine. As we switch to a thread per virtqueue, this avoids a bottleneck through the main Launcher process. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:09 +09:30
Rusty Russell	92b4d8df84	lguest: PAE fixes 1) j wasn't initialized in setup_pagetables, so they weren't set up for me causing immediate guest crashes. 2) gpte_addr should not re-read the pmd from the Guest. Especially not BUG_ON() based on the value. If we ever supported SMP guests, they could trigger that. And the Launcher could also trigger it (tho currently root-only). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:08 +09:30
Matias Zabaljauregui	acdd0b6292	lguest: PAE support This version requires that host and guest have the same PAE status. NX cap is not offered to the guest, yet. Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:08 +09:30
Matias Zabaljauregui	ebe0ba84f5	lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name (That's really what it is, and the confusion gets worse with PAE support) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>	2009-06-12 22:27:07 +09:30
Matias Zabaljauregui	90603d15fa	lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:06 +09:30
Matias Zabaljauregui	ed1dc77810	lguest: map switcher with executable page table entries Map switcher with executable page table entries. (This bug didn't matter before PAE and hence NX support -- RR) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:06 +09:30
Matias Zabaljauregui	f086122bb6	lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. If GDT_ENTRIES were every > 256, this could become a problem. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:04 +09:30
Roel Kluin	81b79b01d0	lguest: beyond ARRAY_SIZE of cpu->arch.gdt Do not go beyond ARRAY_SIZE of cpu->arch.gdt Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:04 +09:30
Rusty Russell	a32a8813d0	lguest: improve interrupt handling, speed up stream networking lguest never checked for pending interrupts when enabling interrupts, and things still worked. However, it makes a significant difference to TCP performance, so it's time we fixed it by introducing a pending_irq flag and checking it on irq_restore and irq_enable. These two routines are now too big to patch into the 8/10 bytes patch space, so we drop that code. Note: The high latency on interrupt delivery had a very curious effect: once everything else was optimized, networking without GSO was faster than networking with GSO, since more interrupts were sent and hence a greater chance of one getting through to the Guest! Note2: (Almost) Closing the same loophole for iret doesn't have any measurable effect, so I'm leaving that patch for the moment. Before: 1GB tcpblast Guest->Host: 30.7 seconds 1GB tcpblast Guest->Host (no GSO): 76.0 seconds After: 1GB tcpblast Guest->Host: 6.8 seconds 1GB tcpblast Guest->Host (no GSO): 27.8 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:03 +09:30
Rusty Russell	abd41f037e	lguest: fix race in halt code When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting that a timer or the Waker will wake_up_process() us. But we do it in a stupid way, leaving a classic missing wakeup race. So split maybe_do_interrupt() into interrupt_pending() and try_deliver_interrupt(), and check maybe_do_interrupt() and the "break_out" flag before calling schedule. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:02 +09:30
Rusty Russell	a6c372de6e	lguest: fix lguest wake on guest clock tick, or fd activity The Launcher could be inside the Guest on another CPU; wake_up_process will do nothing because it is "running". kick_process will knock it back into our kernel in this case, otherwise we'll miss it until the next guest exit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:01 +09:30
Michael S. Tsirkin	d2a7ddda9f	virtio: find_vqs/del_vqs virtio operations This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations, and updates all drivers. This is needed for MSI support, because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)	2009-06-12 22:16:36 +09:30
Rusty Russell	9499f5e7ed	virtio: add names to virtqueue struct, mapping from devices to queues. Add a linked list of all virtqueues for a virtio device: this helps for debugging and is also needed for upcoming interface change. Also, add a "name" field for clearer debug messages. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:16:36 +09:30
Rusty Russell	564346224d	lguest: fix on Intel when KVM loaded (unhandled trap 13) When KVM is loaded, and hence VT set up, the vmcall instruction in an lguest guest causes a #GP, not #UD. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-26 12:13:11 -07:00
Rusty Russell	a489f0b555	lguest: fix guest crash on non-linear addresses in gdt pvops Fixes guest crash 'lguest: bad read address 0x4800000 len 256' The new per-cpu allocator ends up handing a non-linear address to write_gdt_entry. We do __pa() on it, and hand it to the host, which kills us. I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT code, but had no pressing reason until now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: lguest@ozlabs.org	2009-04-19 23:14:01 +09:30
Matias Zabaljauregui	88df781afb	lguest: fix crash on vmlinux images Typical message: 'lguest: unhandled trap 6 at 0x418726 (0x0)' vmlinux guests were broken by `4cd8b5e2a1` 'lguest: use KVM hypercalls', which rewrites guest text from kvm hypercalls to trap 31. The Launcher mmaps the kernel image. The Guest executes and immediately faults in the first text page (read-only). Then it hits a hypercall, and we rewrite that hypercall, causing a copy-on-write. But the Guest pagetables still refer to the old page: we fault again, but as Host we see the hypercall already rewritten, and pass the fault back to the Guest. The Guest hasn't set up an IDT yet, so we kill it. This doesn't happen with bzImages: they unpack themselves and so the text pages are already read-write. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Tested-by: Patrick McHardy <kaber@trash.net>	2009-04-19 23:14:00 +09:30
Matias Zabaljauregui	df1693abc4	lguest: use bool instead of int Impact: clean up Rusty told me, some time ago, that he had become a fan of "bool". So, here are some replacements. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-30 21:55:25 +10:30
Matias Zabaljauregui	4cd8b5e2a1	lguest: use KVM hypercalls Impact: cleanup This patch allow us to use KVM hypercalls Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-30 21:55:24 +10:30
Rusty Russell	6afbdd059c	lguest: fix spurious BUG_ON() on invalid guest stack. Impact: fix crash on misbehaving guest gpte_addr() contains a BUG_ON(), insisting that the present flag is set. We need to return before we call it if that isn't the case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org	2009-03-30 21:55:23 +10:30
Ingo Molnar	6e15cf0486	Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-27 17:28:43 +01:00
Rusty Russell	6db6a5f3ae	lguest: fix for CONFIG_SPARSE_IRQ=y Impact: remove lots of lguest boot WARN_ON() when CONFIG_SPARSE_IRQ=y We now need to call irq_to_desc_alloc_cpu() before set_irq_chip_and_handler_name(), but we can't do that from init_IRQ (no kmalloc available). So do it as we use interrupts instead. Also means we only alloc for irqs we use, which was the intent of CONFIG_SPARSE_IRQ anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>	2009-03-09 10:06:29 +10:30
Ingo Molnar	965c7ecaf2	x86: remove the Voyager 32-bit subarch Impact: remove unused/broken code The Voyager subarch last built successfully on the v2.6.26 kernel and has been stale since then and does not build on the v2.6.27, v2.6.28 and v2.6.29-rc5 kernels. No actual users beyond the maintainer reported this breakage. Patches were sent and most of the fixes were accepted but the discussion around how to do a few remaining issues cleanly fizzled out with no resolution and the code remained broken. In the v2.6.30 x86 tree development cycle 32-bit subarch support has been reworked and removed - and the Voyager code, beyond the build problems already known, needs serious and significant changes and probably a rewrite to support it. CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has been notified but no patches have been sent so far to fix it. While all other subarchs have been converted to the new scheme, voyager is still broken. We'd prefer to receive patches which clean up the current situation in a constructive way, but even in case of removal there is no obstacle to add that support back after the issues have been sorted out in a mutually acceptable fashion. So remove this inactive code for now. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-23 00:54:01 +01:00
Mark Wallis	05dfdbbd67	lguest: Fix a memory leak with the lg object during launcher close Fix a memory leak identified by Rusty Russell during LCA09 by kfree'ing the lg object instead of just clearing it when the launcher closes. Signed-off-by: Mark Wallis <mwallis@serialmonkey.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-30 11:34:11 +10:30
Atsushi SAKAI	72410af921	lguest: typos fix 3 points lguest_asm.S => i386_head.S LHCALL_BREAK => LHREQ_BREAK perferred => preferred Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-01-30 11:34:10 +10:30
Mark McLoughlin	ff8561c4ad	lguest: do not statically allocate root device We shouldn't be statically allocating the root device object, so dynamically allocate it using root_device_register() instead. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-01-06 10:44:34 -08:00
Linus Torvalds	b840d79631	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually	2009-01-02 11:44:09 -08:00
Mark McLoughlin	bda53cd510	lguest: struct device - replace bus_id with dev_name() bus_id is gradually being removed, so use dev_name() instead. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-12-30 09:26:12 +10:30
Matias Zabaljauregui	58a2456644	lguest: move the initial guest page table creation code to the host This patch moves the initial guest page table creation code to the host, so the launcher keeps working with PAE enabled configs. Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-12-30 09:26:11 +10:30
Rusty Russell	87c7d57c17	virtio: hand virtio ring alignment as argument to vring_new_virtqueue This allows each virtio user to hand in the alignment appropriate to their virtio_ring structures. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>	2008-12-30 09:26:03 +10:30
Rusty Russell	2966af73e7	virtio: use LGUEST_VRING_ALIGN instead of relying on pagesize This doesn't really matter, since lguest is i386 only at the moment, but we could actually choose a different value. (lguest doesn't have a guarenteed ABI). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-12-30 09:26:02 +10:30
Yinghai Lu	b77b881f21	x86: fix lguest used_vectors breakage, -v2 Impact: fix lguest, clean up 32-bit lguest used used_vectors to record vectors, but that model of allocating vectors changed and got broken, after we changed vector allocation to a per_cpu array. Try enable that for 64bit, and the array is used for all vectors that are not managed by vector_irq per_cpu array. Also kill system_vectors[], that is now a duplication of the used_vectors bitmap. [ merged in cpus4096 due to io_apic.c cpumask changes. ] [ -v2, fix build failure ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-23 22:37:28 +01:00
Rusty Russell	1dc3e3bcbf	lguest: update commentry Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-08-26 00:19:28 +10:00
Rusty Russell	71a3f4edc1	lguest: use get_user_pages_fast() instead of get_user_pages() Using a simple page table thrashing program I measure a slight improvement. The program creates five processes. Each touches 1000 pages then schedules the next process. We repeat this 1000 times. As lguest only caches 4 cr3 values, this rebuilds a lot of shadow page tables requiring virt->phys mappings. Before: 5.93 seconds After: 5.40 seconds (Counts of slow vs fastpath in this usage are 6092 and 2852462 respectively.) And more importantly for lguest, the code is simpler. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-08-12 17:52:53 +10:00
Andrew Morton	cf485e566b	lguest: use cpu capability accessors To support my little make-x86-bitops-use-proper-typechecking projectlet. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:34 +10:00
Johannes Weiner	0a707210aa	lguest: fix switcher_page leak on unload map_switcher allocates the array, unmap_switcher has to free it accordingly. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:32 +10:00
Rusty Russell	0c12091d82	lguest: Guest int3 fix Ron Minnich noticed that guest userspace gets a GPF when it tries to int3: we need to copy the privilege level from the guest-supplied IDT to the real IDT. int3 is the only common case where guest userspace expects to invoke an interrupt, so that's the symptom of failing to do this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:31 +10:00
Rusty Russell	e34f872567	virtio: Add transport feature handling stub for virtio_ring. To prepare for virtio_ring transport feature bits, hook in a call in all the users to manipulate them. This currently just clears all the bits, since it doesn't understand any features. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:14 +10:00
Rusty Russell	c624896e48	virtio: Rename set_features to finalize_features Rather than explicitly handing the features to the lower-level, we just hand the virtio_device and have it set the features. This make it clear that it has the chance to manipulate the features of the device at this point (and that all feature negotiation is already done). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:12 +10:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Ingo Molnar	15e551d25e	x86, VisWS: turn into generic arch, eliminate Kconfig specials remove leftover traces of various VISWS related Kconfig specials. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-10 18:55:47 +02:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Ingo Molnar	d02859ecb3	Merge commit 'v2.6.26-rc8' into x86/xen Conflicts: arch/x86/xen/enlighten.c arch/x86/xen/mmu.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-25 12:16:51 +02:00
Suresh Siddha	54481cf88b	x86: fix NULL pointer deref in __switch_to I am able to reproduce the oops reported by Simon in __switch_to() with lguest. My debug showed that there is at least one lguest specific issue (which should be present in 2.6.25 and before aswell) and it got exposed with a kernel oops with the recent fpu dynamic allocation patches. In addition to the previous possible scenario (with fpu_counter), in the presence of lguest, it is possible that the cpu's TS bit it still set and the lguest launcher task's thread_info has TS_USEDFPU still set. This is because of the way the lguest launcher handling the guest's TS bit. (look at lguest_set_ts() in lguest_arch_run_guest()). This can result in a DNA fault while doing unlazy_fpu() in __switch_to(). This will end up causing a DNA fault in the context of new process thats getting context switched in (as opossed to handling DNA fault in the context of lguest launcher/helper process). This is wrong in both pre and post 2.6.25 kernels. In the recent 2.6.26-rc series, this is showing up as NULL pointer dereferences or sleeping function called from atomic context(__switch_to()), as we free and dynamically allocate the FPU context for the newly created threads. Older kernels might show some FPU corruption for processes running inside of lguest. With the appended patch, my test system is running for more than 50 mins now. So atleast some of your oops (hopefully all!) should get fixed. Please give it a try. I will spend more time with this fix tomorrow. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-20 13:26:18 +02:00
Ingo Molnar	688d22e23a	Merge branch 'linus' into x86/xen	2008-06-16 11:21:27 +02:00
Rusty Russell	b769f57908	virtio: set device index in common code. Anthony Liguori points out that three different transports use the virtio code, but each one keeps its own counter to set the virtio_device's index field. In theory (though not in current practice) this means that names could be duplicated, and that risk grows as more transports are created. So we move the selection of the unique virtio_device.index into the common code in virtio.c, which has the side-benefit of removing duplicate code. The only complexity is that lguest and S/390 use the index to uniquely identify the device in case of catastrophic failure before register_virtio_device() is called: now we use the offset within the descriptor page as a unique identifier for the printks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Anthony Liguori <anthony@codemonkey.ws>	2008-05-30 15:09:42 +10:00
Rusty Russell	e27810f113	lguest: use ioremap_cache, not ioremap Thanks to Jon Corbet & LWN. Only took me a day to join the dots. Host->Guest netcat before (with unnecessily large receive buffers): 1073741824 bytes (1.1 GB) copied, 24.7528 seconds, 43.4 MB/s After: 1073741824 bytes (1.1 GB) copied, 17.6369 seconds, 60.9 MB/s Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-30 15:09:41 +10:00
Jeremy Fitzhardinge	a15af1c9ea	x86/paravirt: add pte_flags to just get pte flags Add pte_flags() to extract the flags from a pte. This is a special case of pte_val() which is only guaranteed to return the pte's flags correctly; the page number may be corrupted or missing. The intent is to allow paravirt implementations to return pte flags without having to do any translation of the page number (most notably, Xen). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-27 10:11:36 +02:00
Rusty Russell	a007a751d9	lguest: make Launcher see device status updates This brings us closer to Real Life, where we'd examine the device features once it's set the DRIVER_OK status bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:54 +10:00
Rusty Russell	9f3f746741	lguest: remove bogus NULL cpu check If lg isn't NULL, and cpu_id is sane, &lg->cpus[cpu_id] can't be NULL. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:52 +10:00
Rusty Russell	24adf12722	lguest: avoid using NR_CPUS as a bounds check. NR_CPUS (being a host number) is an arbitrary limit for the Guest. Using the array size directly (which currently happes to be NR_CPUS) is more futureproof. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:51 +10:00
Rusty Russell	c45a6816c1	virtio: explicit advertisement of driver features A recent proposed feature addition to the virtio block driver revealed some flaws in the API: in particular, we assume that feature negotiation is complete once a driver's probe function returns. There is nothing in the API to require this, however, and even I didn't notice when it was violated. So instead, we require the driver to specify what features it supports in a table, we can then move the feature negotiation into the virtio core. The intersection of device and driver features are presented in a new 'features' bitmap in the struct virtio_device. Note that this highlights the difference between Linux unsigned-long bitmaps where each unsigned long is in native endian, and a straight-forward little-endian array of bytes. Drivers can still remove feature bits in their probe routine if they really have to. API changes: - dev->config->feature() no longer gets and acks a feature. - drivers should advertise their features in the 'feature_table' field - use virtio_has_feature() for extra sanity when checking feature bits Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-05-02 21:50:50 +10:00
Matthew Wilcox	d3135846f6	drivers: Remove unnecessary inclusions of asm/semaphore.h None of these files use any of the functionality promised by asm/semaphore.h. It's possible that they rely on it dragging in some unrelated header file, but I can't build all these files, so we'll have fix any build failures as they come up. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:16:32 -04:00
Al Viro	74dbf719ed	misc __user misannotations (pointless casts to long) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:20:23 -07:00
Rusty Russell	a6bd8e1303	lguest: comment documentation update. Took some cycles to re-read the Lguest Journey end-to-end, fix some rot and tighten some phrases. Only comments change. No new jokes, but a couple of recycled old jokes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-28 11:05:54 +11:00
Tim Ansell	b488f22d70	lguest: Add puppies which where previously missing. lguest doesn't have features, it has puppies! Signed-off-by: Timothy R Ansell <mithro@mithis.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-28 11:05:52 +11:00
Rusty Russell	4357bd9453	lguest: Revert `1ce70c4fac`, fix real problem. Ahmed managed to crash the Host in release_pgd(), which cannot be a Guest bug, and indeed it wasn't. The bug was that handing a 0 as the address of the toplevel page table being manipulated can cause the lookup code in find_pgdir() to return an uninitialized cache entry (we shadow up to 4 top level page tables for each Guest). Commit `37cc8d7f96` introduced this behaviour in the Guest, uncovering the bug. The patch which he submitted (which removed the /4 from the index calculation) simply ensured that these high-indexed entries hit the early exit path of guest_set_pmd(). But you get lots of segfaults in guest userspace as the PMDs aren't being updated. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-11 09:35:58 +11:00
Rusty Russell	f14ae652ba	lguest: fix __get_vm_area usage. Robert Bragg's `5dc3318528` tightened (ie. fixed) the checking in __get_vm_area, and it broke lguest. lguest should pass the exact "end" it wants, not some random constant (it was possible previously that it would actually get an address different from SWITCHER_ADDR). Also, Fabio Checconi pointed out that we should make sure we're not hitting the fixmap area. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Robert Bragg <robert@sixbynine.org>	2008-03-11 09:35:56 +11:00
Eugene Teo	f73d1e6ca6	lguest: make sure cpu is initialized before accessing it If req is LHREQ_INITIALIZE, and the guest has been initialized before (unlikely), it will attempt to access cpu->tsk even though cpu is not yet initialized. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-11 09:35:56 +11:00
Ahmed S. Darwish	31f4b46ec6	lguest: accept guest _PAGE_PWT page table entries Beginning from commit `4138cc3418`, ioremap_nocache() sets the _PAGE_PWT flag. Lguest doesn't accept a guest pte with a _PWT flag and reports a "bad page table entry" in that case. Accept guest _PAGE_PWT page table entries. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-02-09 23:24:09 +01:00
Alexey Dobriyan	25478445c4	Fix container_of() usage Using "attr" twice is not OK, because it effectively prohibits such container_of() on variables not named "attr". Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Rusty Russell	6e5aa7efb2	virtio: reset function A reset function solves three problems: 1) It allows us to renegotiate features, eg. if we want to upgrade a guest driver without rebooting the guest. 2) It gives us a clean way of shutting down virtqueues: after a reset, we know that the buffers won't be used by the host, and 3) It helps the guest recover from messed-up drivers. So we remove the ->shutdown hook, and the only way we now remove feature bits is via reset. We leave it to the driver to do the reset before it deletes queues: the balloon driver, for example, needs to chat to the host in its remove function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:50:03 +11:00
Rusty Russell	18445c4d50	virtio: explicit enable_cb/disable_cb rather than callback return. It seems that virtio_net wants to disable callbacks (interrupts) before calling netif_rx_schedule(), so we can't use the return value to do so. Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback now returns void, rather than a boolean. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:49:58 +11:00
Rusty Russell	a586d4f601	virtio: simplify config mechanism. Previously we used a type/len pair within the config space, but this seems overkill. We now simply define a structure which represents the layout in the config space: the config space can now only be extended at the end. The main driver-visible changes: 1) We indicate what fields are present with an explicit feature bit. 2) Virtqueues are explicitly numbered, and not in the config space. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-02-04 23:49:57 +11:00
Rusty Russell	e95035c61a	lguest: fix mis-merge against hpa's TSS renaming drivers/lguest/x86/core.c: In function ‘copy_in_guest_info’: drivers/lguest/x86/core.c:97: error: ‘struct x86_hw_tss’ has no member named ‘esp1’ Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-31 19:59:44 +11:00
Linus Torvalds	d145c7253c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits) lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL lguest: Use explicit includes rateher than indirect lguest: get rid of lg variable assignments lguest: change gpte_addr header lguest: move changed bitmap to lg_cpu lguest: move last_pages to lg_cpu lguest: change last_guest to last_cpu lguest: change spte_addr header lguest: per-vcpu lguest pgdir management lguest: make pending notifications per-vcpu lguest: makes special fields be per-vcpu lguest: per-vcpu lguest task management lguest: replace lguest_arch with lg_cpu_arch. lguest: make registers per-vcpu lguest: make emulate_insn receive a vcpu struct. lguest: map_switcher_in_guest() per-vcpu lguest: per-vcpu interrupt processing. lguest: per-vcpu lguest timers lguest: make hypercalls use the vcpu struct lguest: make write() operation smp aware ... Manual conflict resolved (maybe even correctly, who knows) in drivers/lguest/x86/core.c	2008-01-31 09:35:32 +11:00
H. Peter Anvin	faca62273b	x86: use generic register name in the thread and tss structures This changes size-specific register names (eip/rip, esp/rsp, etc.) to generic names in the thread and tss structures. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:02 +01:00
Glauber de Oliveira Costa	84f12e39c8	lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL x86_64 don't expose the intermediate representation with one underline, _PAGE_KERNEL, just the double-underlined one. Use it, to get a common ground between 32 and 64-bit Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:19 +11:00
Glauber de Oliveira Costa	ca94f2bdd1	lguest: Use explicit includes rateher than indirect explicitly use ktime.h include explicitly use hrtimer.h include explicitly use sched.h include This patch adds headers explicitly to lguest sources file, to avoid depending on them being included somewhere else. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:19 +11:00
Glauber de Oliveira Costa	382ac6b3fb	lguest: get rid of lg variable assignments We can save some lines of code by getting rid of *lg = cpu... lines of code spread everywhere by now. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:18 +11:00
Glauber de Oliveira Costa	934faab464	lguest: change gpte_addr header gpte_addr() does not depend on any guest information. So we wipe out the lg parameter from it completely. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:18 +11:00
Glauber de Oliveira Costa	ae3749dcd8	lguest: move changed bitmap to lg_cpu events represented in the 'changed' bitmap are per-cpu, not per-guest. move it to the lg_cpu structure Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:17 +11:00
Glauber de Oliveira Costa	f34f8c5fea	lguest: move last_pages to lg_cpu in our new model, pages are assigned to a virtual cpu, not to a guest. We move it to the lg_cpu structure. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:16 +11:00

1 2 3 4 5

234 Commits