OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Rusty Russell	f87e0434a3	lguest, x86/entry/32: Fix handling of guest syscalls using interrupt gates In `a798f09111` ("x86/entry/32: Change INT80 to be an interrupt gate") Andy broke lguest. This is because lguest had special code to allow the 0x80 trap gate go straight into the guest itself; interrupts gates (without more work, as mentioned in the file's comments) bounce via the hypervisor. His change made them go via the hypervisor, but as it's in the range of normal hardware interrupts, they were not directed through to the guest at all. Turns out the guest userspace isn't very effective if syscalls are all noops. I haven't ripped out all the now-useless trap-direct-to-guest-kernel code yet, since it will still be needed if someone decides to update this optimization. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Weisbecker <fweisbec@gmail.com> Cc: x86\@kernel.org Link: http://lkml.kernel.org/r/87fuv685kl.fsf@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-01 08:58:13 +02:00
Ingo Molnar	f435e68fe7	x86/asm/entry: Fix remaining use of SYSCALL_VECTOR Commit: `51bb92843e` ("x86/asm/entry: Remove SYSCALL_VECTOR") Converted most uses of SYSCALL_VECTOR to IA32_SYSCALL_VECTOR, but forgot about lguest. Cc: Brian Gerst <brgerst@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431185813-15413-4-git-send-email-brgerst@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-11 07:17:04 +02:00
Rusty Russell	3eebd233fc	lguest: handle traps on the "interrupt suppressed" iret instruction. Lguest's "iret" is non-atomic, as it needs to restore the interrupt state before the real iret (the guest can't actually suppress interrupts). For this reason, the host discards an interrupt if it occurs in this (1-instruction) window. We can do better, by emulating the iret execution, then immediately setting up the interrupt handler. In fact, we don't need to do much, as emulating the iret and setting up th stack for the interrupt handler basically cancel each other out. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-01 14:37:15 +10:30
Rusty Russell	2f921b5bb0	lguest: suppress interrupts for single insn, not range. The last patch reduced our interrupt-suppression region to one address, so simplify the code somewhat. Also, remove the obsolete undefined instruction ranges and the comment which refers to lguest_guest.S instead of head_32.S. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-03-24 11:52:08 +10:30
Rusty Russell	98fb4e5e6b	lguest: fix guest kernel stack overflow when TF bit set. The symptoms are that running gdb on a binary causes the guest to overflow the kernels stack (after some period of time), resulting in it finally being killed with a "Bad address" message. Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-09-06 08:09:27 +09:30
Rusty Russell	9f54288def	lguest: update comments Also removes a long-unused #define and an extraneous semicolon. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-07-22 14:39:50 +09:30
Rusty Russell	6d7a5d1ea3	lguest: don't rewrite vmcall instructions Now we no longer use vmcall, we don't need to rewrite it in the Guest. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-07-22 14:39:49 +09:30
Alexey Dobriyan	d43c36dc6b	headers: remove sched.h from interrupt.h After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-10-11 11:20:58 -07:00
Rusty Russell	2e04ef7691	lguest: fix comment style I don't really notice it (except to begrudge the extra vertical space), but Ingo does. And he pointed out that one excuse of lguest is as a teaching tool, it should set a good example. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>	2009-07-30 16:03:45 +09:30
Rusty Russell	9f155a9b3d	lguest: allow any process to send interrupts We currently only allow the Launcher process to send interrupts, but it as we already send interrupts from the hrtimer, it's a simple matter of extracting that code into a common set_interrupt routine. As we switch to a thread per virtqueue, this avoids a bottleneck through the main Launcher process. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:09 +09:30
Rusty Russell	a32a8813d0	lguest: improve interrupt handling, speed up stream networking lguest never checked for pending interrupts when enabling interrupts, and things still worked. However, it makes a significant difference to TCP performance, so it's time we fixed it by introducing a pending_irq flag and checking it on irq_restore and irq_enable. These two routines are now too big to patch into the 8/10 bytes patch space, so we drop that code. Note: The high latency on interrupt delivery had a very curious effect: once everything else was optimized, networking without GSO was faster than networking with GSO, since more interrupts were sent and hence a greater chance of one getting through to the Guest! Note2: (Almost) Closing the same loophole for iret doesn't have any measurable effect, so I'm leaving that patch for the moment. Before: 1GB tcpblast Guest->Host: 30.7 seconds 1GB tcpblast Guest->Host (no GSO): 76.0 seconds After: 1GB tcpblast Guest->Host: 6.8 seconds 1GB tcpblast Guest->Host (no GSO): 27.8 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:03 +09:30
Rusty Russell	abd41f037e	lguest: fix race in halt code When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting that a timer or the Waker will wake_up_process() us. But we do it in a stupid way, leaving a classic missing wakeup race. So split maybe_do_interrupt() into interrupt_pending() and try_deliver_interrupt(), and check maybe_do_interrupt() and the "break_out" flag before calling schedule. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:02 +09:30
Rusty Russell	a6c372de6e	lguest: fix lguest wake on guest clock tick, or fd activity The Launcher could be inside the Guest on another CPU; wake_up_process will do nothing because it is "running". kick_process will knock it back into our kernel in this case, otherwise we'll miss it until the next guest exit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:01 +09:30
Matias Zabaljauregui	df1693abc4	lguest: use bool instead of int Impact: clean up Rusty told me, some time ago, that he had become a fan of "bool". So, here are some replacements. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-30 21:55:25 +10:30
Matias Zabaljauregui	4cd8b5e2a1	lguest: use KVM hypercalls Impact: cleanup This patch allow us to use KVM hypercalls Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-30 21:55:24 +10:30
Yinghai Lu	b77b881f21	x86: fix lguest used_vectors breakage, -v2 Impact: fix lguest, clean up 32-bit lguest used used_vectors to record vectors, but that model of allocating vectors changed and got broken, after we changed vector allocation to a per_cpu array. Try enable that for 64bit, and the array is used for all vectors that are not managed by vector_irq per_cpu array. Also kill system_vectors[], that is now a duplication of the used_vectors bitmap. [ merged in cpus4096 due to io_apic.c cpumask changes. ] [ -v2, fix build failure ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-23 22:37:28 +01:00
Rusty Russell	0c12091d82	lguest: Guest int3 fix Ron Minnich noticed that guest userspace gets a GPF when it tries to int3: we need to copy the privilege level from the guest-supplied IDT to the real IDT. int3 is the only common case where guest userspace expects to invoke an interrupt, so that's the symptom of failing to do this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-29 09:58:31 +10:00
Rusty Russell	a6bd8e1303	lguest: comment documentation update. Took some cycles to re-read the Lguest Journey end-to-end, fix some rot and tighten some phrases. Only comments change. No new jokes, but a couple of recycled old jokes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-03-28 11:05:54 +11:00
Glauber de Oliveira Costa	382ac6b3fb	lguest: get rid of lg variable assignments We can save some lines of code by getting rid of *lg = cpu... lines of code spread everywhere by now. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:18 +11:00
Glauber de Oliveira Costa	ae3749dcd8	lguest: move changed bitmap to lg_cpu events represented in the 'changed' bitmap are per-cpu, not per-guest. move it to the lg_cpu structure Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:17 +11:00
Glauber de Oliveira Costa	1713608f28	lguest: per-vcpu lguest pgdir management this patch makes the pgdir management per-vcpu. The pgdirs pool is still guest-wide (although it'll probably need to grow when we are really executing more vcpus), but the pgdidx index is gone, since it makes no sense anymore. Instead, we use a per-vcpu index. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:14 +11:00
Glauber de Oliveira Costa	4665ac8e28	lguest: makes special fields be per-vcpu lguest struct have room for some fields, namely, cr2, ts, esp1 and ss1, that are not really guest-wide, but rather, vcpu-wide. This patch puts it in the vcpu struct Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:13 +11:00
Glauber de Oliveira Costa	66686c2ab0	lguest: per-vcpu lguest task management lguest uses tasks to control its running behaviour (like sending breaks, controlling halted state, etc). In a per-vcpu environment, each vcpu will have its own underlying task. So this patch makes the infrastructure for that possible Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:12 +11:00
Glauber de Oliveira Costa	fc708b3e40	lguest: replace lguest_arch with lg_cpu_arch. The fields found in lguest_arch are not really per-guest, but per-cpu (gdt, idt, etc). So this patch turns lguest_arch into lg_cpu_arch. It makes sense to have a per-guest per-arch struct, but this can be addressed later, when the need arrives. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:11 +11:00
Glauber de Oliveira Costa	a53a35a8b4	lguest: make registers per-vcpu This is the most obvious per-vcpu field: registers. So this patch moves it from struct lguest to struct vcpu, and patch the places in which they are used, accordingly Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:11 +11:00
Glauber de Oliveira Costa	177e449dc5	lguest: per-vcpu interrupt processing. This patch adapts interrupt processing for using the vcpu struct. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:09 +11:00
Glauber de Oliveira Costa	ad8d8f3bc6	lguest: per-vcpu lguest timers Here, I introduce per-vcpu timers. With this, we can have local expiries, needed for accounting time in smp guests Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-01-30 22:50:08 +11:00
Rusty Russell	e1e72965ec	lguest: documentation update Went through the documentation doing typo and content fixes. This patch contains only comment and whitespace changes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-25 15:02:50 +10:00
Rusty Russell	2d37f94a28	generalize lgread_u32/lgwrite_u32. Jes complains that page table code still uses lgread_u32 even though it now uses general kernel pte types. The best thing to do is to generalize lgread_u32 and lgwrite_u32. This means we lose the efficiency of getuser(). We could potentially regain it if we used __copy_from_user instead of copy_from_user, but I'm not certain that our range check is equivalent to access_ok() on all platforms. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jes Sorensen <jes@sgi.com>	2007-10-23 15:49:56 +10:00
Rusty Russell	47436aa4ad	Boot with virtual == physical to get closer to native Linux. 1) This allows us to get alot closer to booting bzImages. 2) It means we don't have to know page_offset. 3) The Guest needs to modify the boot pagetables to create the PAGE_OFFSET mapping before jumping to C code. 4) guest_pa() walks the page tables rather than using page_offset. 5) We don't use page_offset to figure out whether to emulate: it was always kinda quesationable, and won't work for instructions done before remapping (bzImage unpacking in particular). 6) We still want the kernel address for tlb flushing: have the initial hypercall give us that, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:54 +10:00
Rusty Russell	c18acd73ff	Allow guest to specify syscall vector to use. (Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch). This patch allows Guests to specify what system call vector they want, and we try to reserve it. We only allow one non-Linux system call vector, to try to avoid DoS on the Host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:53 +10:00
Jes Sorensen	625efab1cd	Move i386 part of core.c to x86/core.c. Separate i386 architecture specific from core.c and move it to x86/core.c and add x86/lguest.h header file to match. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:51 +10:00
Rusty Russell	56adbe9ddc	Make shadow IDT a complete IDT with 256 entries. This simplifies the code a little, in preparation for allowing alternate system call vectors in guests (Plan 9 uses 0x40). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2007-10-23 15:49:51 +10:00
Rusty Russell	8057d763ed	Fix lguest page-pinning logic ("lguest: bad stack page 0xc057a000") If the stack pointer is 0xc057a000, then the first stack page is at 0xc0579000 (the stack pointer is decremented before use). Not calculating this correctly caused guests with CONFIG_DEBUG_PAGEALLOC=y to be killed with a "bad stack page" message: the initial kernel stack was just proceeding the .smp_locks section which CONFIG_DEBUG_PAGEALLOC marks read-only when freeing. Thanks to Frederik Deweerdt for the bug report! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-30 09:58:22 -07:00
Rusty Russell	0d027c01cd	lguest: Fix Malicious Guest GDT Host Crash If a Guest makes hypercall which sets a GDT entry to not present, we currently set any segment registers using that GDT entry to 0. Unfortunately, this is not sufficient: there are other ways of altering GDT entries which will cause a fault. The correct solution to do what Linux does: let them set any GDT value they want and handle the #GP when popping causes a fault. This has the added benefit of making our Switcher slightly more robust in the case of any other bugs which cause it to fault. We kill the Guest if it causes a fault in the Switcher: it's the Guest's responsibility to make sure it's not using segments when it changes them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-09 08:14:56 -07:00
Rusty Russell	6c8dca5d53	Provide timespec to guests rather than jiffies clock. A non-periodic clock_event_device and the "jiffies" clock don't mix well: tick_handle_periodic() can go into an infinite loop. Currently lguest guests use the jiffies clock when the TSC is unusable. Instead, make the Host write the current time into the lguest page on every interrupt. This doesn't cost much but is more precise and at least as accurate as the jiffies clock. It also gets rid of the GET_WALLCLOCK hypercall. Also, delay setting sched_clock until our clock is set up, otherwise the early printk timestamps can go backwards (not harmful, just ugly). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-28 19:54:33 -07:00
Rusty Russell	f56a384e98	lguest: documentation VII: FIXMEs Documentation: The FIXMEs Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	bff672e630	lguest: documentation V: Host Documentation: The Host Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Rusty Russell	f938d2c892	lguest: documentation I: Preparation The netfilter code had very good documentation: the Netfilter Hacking HOWTO. Noone ever read it. So this time I'm trying something different, using a bit of Knuthiness. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:16 -07:00
Rusty Russell	e5faff45b3	lguest: fix sense if IF flag on interrupt injection The sense of the IF bit is backwards in the host interrupt handling. This means we always save "IF=1" on the stack when injecting an interrupt. It turns out this is almost always correct (unless the guest is taking a page fault in an interrupt due to an unpopulated vmalloc mapping), so went unnoticed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 09:05:16 -07:00
Rusty Russell	d7e28ffe6c	lguest: the host code This is the code for the "lg.ko" module, which allows lguest guests to be launched. [akpm@linux-foundation.org: update for futex-new-private-futexes] [akpm@linux-foundation.org: build fix] [jmorris@namei.org: lguest: use hrtimers] [akpm@linux-foundation.org: x86_64 build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:52 -07:00

41 Commits