Commit Graph

547315 Commits

Author SHA1 Message Date
Andrey Smetanin 351dc6477c kvm/eventfd: avoid loop inside irqfd_update()
The loop(for) inside irqfd_update() is unnecessary
because any other value for irq_entry.type will just trigger
schedule_work(&irqfd->inject) in irqfd_wakeup.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:28 +02:00
Marcelo Tosatti 7cae2bedcb KVM: x86: move steal time initialization to vcpu entry time
As reported at https://bugs.launchpad.net/qemu/+bug/1494350,
it is possible to have vcpu->arch.st.last_steal initialized
from a thread other than vcpu thread, say the iothread, via
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time,
before copying steal time data to guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:16 +02:00
Takuya Yoshikawa 5225fdf8c8 KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()
Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be
avoided since getting a slot by binary search may not be negligible,
especially for virtual machines with many memory slots.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:02 +02:00
Takuya Yoshikawa d8aacf5df8 KVM: x86: MMU: Remove mapping_level_dirty_bitmap()
Now that it has only one caller, and its name is not so helpful for
readers, remove it.  The new memslot_valid_for_gpte() function
makes it possible to share the common code between
gfn_to_memslot_dirty_bitmap() and mapping_level().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:01 +02:00
Takuya Yoshikawa fd13690218 KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()
This is necessary to eliminate an extra memory slot search later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:00 +02:00
Takuya Yoshikawa 5ed5c5c8fd KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)()
As a bonus, an extra memory slot search can be eliminated when
is_self_change_mapping is true.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:34:00 +02:00
Takuya Yoshikawa cd1872f028 KVM: x86: MMU: Make force_pt_level bool
This will be passed to a function later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:33:59 +02:00
Joerg Roedel 6092d3d3e6 kvm: svm: Only propagate next_rip when guest supports it
Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Cc: Bandan Das <bsd@redhat.com>
Cc: Dirk Mueller <dmueller@suse.com>
Tested-By: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:32:17 +02:00
Paolo Bonzini 951f9fd74f KVM: x86: manually unroll bad_mt_xwr loop
The loop is computing one of two constants, it can be simpler to write
everything inline.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:32:16 +02:00
Wanpeng Li 089d7b6ec5 KVM: nVMX: expose VPID capability to L1
Expose VPID capability to L1. For nested guests, we don't do anything
specific for single context invalidation. Hence, only advertise support
for global context invalidation. The major benefit of nested VPID comes
from having separate vpids when switching between L1 and L2, and also
when L2's vCPUs not sched in/out on L1.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:55 +02:00
Wanpeng Li 5c614b3583 KVM: nVMX: nested VPID emulation
VPID is used to tag address space and avoid a TLB flush. Currently L0 use
the same VPID to run L1 and all its guests. KVM flushes VPID when switching
between L1 and L2.

This patch advertises VPID to the L1 hypervisor, then address space of L1
and L2 can be separately treated and avoid TLB flush when swithing between
L1 and L2. For each nested vmentry, if vpid12 is changed, reuse shadow vpid
w/ an invvpid.

Performance:

run lmbench on L2 w/ 3.5 kernel.

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
kernel    Linux 3.5.0-1 1.2200 1.3700 1.4500 4.7800 2.3300 5.60000 2.88000  nested VPID
kernel    Linux 3.5.0-1 1.2600 1.4300 1.5600   12.7   12.9 3.49000 7.46000  vanilla

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:35 +02:00
Wanpeng Li 99b83ac893 KVM: nVMX: emulate the INVVPID instruction
Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:24 +02:00
Wanpeng Li dd5f5341a3 KVM: VMX: introduce __vmx_flush_tlb to handle specific vpid
Introduce __vmx_flush_tlb() to handle specific vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:09 +02:00
Wanpeng Li 991e7a0eed KVM: VMX: adjust interface to allocate/free_vpid
Adjust allocate/free_vid so that they can be reused for the nested vpid.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:09 +02:00
Kosuke Tatsukawa 6003a42010 kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

        async_pf_execute                    kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                      return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                        !vcpu->arch.apf.halted)
                                        || !list_empty_careful(&vcpu->async_pf.done)
                                     ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář 13db77347d KVM: x86: don't notify userspace IOAPIC on edge EOI
On real hardware, edge-triggered interrupts don't set a bit in TMR,
which means that IOAPIC isn't notified on EOI.  Do the same here.

Staying in guest/kernel mode after edge EOI is what we want for most
devices.  If some bugs could be nicely worked around with edge EOI
notifications, we should invest in a better interface.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář db2bdcbbbd KVM: x86: fix edge EOI and IOAPIC reconfig race
KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration;  thus EOI from the pending interrupt is not
recognized.

(Reconfiguration is not a problem for level interrupts, because IOAPIC
 sends interrupt with the new configuration.)

For an edge interrupt with ACK notifiers, like i8254 timer; things can
happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
    on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
    not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: b053b2aef2 ("KVM: x86: Add EOI exit bitmap inference")
[Before b053b2aef2, this bug happened only with APICv.]
Fixes: c7c9c56ca2 ("x86, apicv: add virtual interrupt delivery support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Radim Krčmář c77f3fab44 kvm: x86: set KVM_REQ_EVENT when updating IRR
After moving PIR to IRR, the interrupt needs to be delivered manually.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:41:08 +02:00
Paolo Bonzini bff98d3b01 Merge branch 'kvm-master' into HEAD
Merge more important SMM fixes.
2015-10-14 16:40:46 +02:00
Paolo Bonzini b10d92a54d KVM: x86: fix RSM into 64-bit protected mode
In order to get into 64-bit protected mode, you need to enable
paging while EFER.LMA=1.  For this to work, CS.L must be 0.
Currently, we load the segments before CR0 and CR4, which means
that if RSM returns into 64-bit protected mode CS.L is already 1
and everything breaks.

Luckily, CS.L=0 is always the case when executing RSM, because it
is forbidden to execute RSM from 64-bit protected mode.  Hence it
is enough to load CR0 and CR4 first, and only then the segments.

Fixes: 660a5d517a
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:52 +02:00
Paolo Bonzini 25188b9986 KVM: x86: fix previous commit for 32-bit
Unfortunately I only noticed this after pushing.

Fixes: f0d648bdf0
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-14 16:39:25 +02:00
Paolo Bonzini 58f800d5ac Merge branch 'kvm-master' into HEAD
This merge brings in a couple important SMM fixes, which makes it
easier to test latest KVM with unrestricted_guest=0 and to test
the in-progress work on SMM support in the firmware.

Conflicts:
	arch/x86/kvm/x86.c
2015-10-13 21:32:50 +02:00
Paolo Bonzini 7391773933 KVM: x86: fix SMI to halted VCPU
An SMI to a halted VCPU must wake it up, hence a VCPU with a pending
SMI must be considered runnable.

Fixes: 64d6067057
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:29:41 +02:00
Paolo Bonzini 5d9bc648b9 KVM: x86: clean up kvm_arch_vcpu_runnable
Split the huge conditional in two functions.

Fixes: 64d6067057
Cc: stable@vger.kernel.org
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:59 +02:00
Paolo Bonzini f0d648bdf0 KVM: x86: map/unmap private slots in __x86_set_memory_region
Otherwise, two copies (one of them never populated and thus bogus)
are allocated for the regular and SMM address spaces.  This breaks
SMM with EPT but without unrestricted guest support, because the
SMM copy of the identity page map is all zeros.

By moving the allocation to the caller we also remove the last
vestiges of kernel-allocated memory regions (not accessible anymore
in userspace since commit b74a07beed, "KVM: Remove kernel-allocated
memory regions", 2010-06-21); that is a nice bonus.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:58 +02:00
Paolo Bonzini 1d8007bdee KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
The next patch will make x86_set_memory_region fill the
userspace_addr.  Since the struct is not used untouched
anymore, it makes sense to build it in x86_set_memory_region
directly; it also simplifies the callers.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:46 +02:00
Paolo Bonzini 1330a0170a KVM: s390: Fixes for 4.4
A bunch of fixes and optimizations for interrupt and time
 handling. No fix is important enough to qualify for 4.3 or
 stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJWHQ0hAAoJEBF7vIC1phx8pmMP/jx+wYMA5U4Gi5OT4HzyFUoh
 nNoQUuwLSykg82vTZgajJFP5Wo4cIDXs2pLUOsuJFpufUY1S2fRgdqjNLcnIKarz
 BY/a6t9nXYQqEDxeHXIkS1sqTXcpEv6yHfitpZCyAz2D+oDmOXQLyJ7tEpp3JGUh
 WwTA1b4cTOpdASWCB2ldcgKDiKMA70dm7e+3ejGlib6v/aoEWDI0/n/0/UZbH8gm
 Q9hKcdxhwqTqVMlMSHcCkcKqKMJpY/8eNNWtlTgVc7gd0kFaLc+T5JToJKUTmE5G
 lCCkBO3TjOGKnoccIRYc7DW+vHVR5er5IaNIRpxnCf/g3FF9R1jbfm9DixYT29IG
 H3GJSZwQMo0glWNfzuBlgmBAgMTGMka9+0zXvXUw+TIOFmjgjIx0w5H0rYmUdE6j
 tZYLYGa96DqdDur1lLN6RJGaO2O08bI2J6TJXJ5h1x8qY6V2YKLRGOabXxLEUut2
 LvanLczT4ou27fgW2kOpcLgCYKT1l2nlH22WzilpITKpBQFSq1flFMQfB32jqQJI
 v41aNBwIEqE/9dR1Zrwad6m//t9u8PAv3fna8cdchYolq/ZZF30R8BJW9lYxeify
 htPKhITzL30JdN3bw5ItVFA/p4YIqVswwq6u+pc9vpWeI4xG71Vq2DybhmuOQ6yd
 kuojcihXhEzkk2vit3Cc
 =LJJQ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20151013' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 4.4

A bunch of fixes and optimizations for interrupt and time
handling. No fix is important enough to qualify for 4.3 or
stable.
2015-10-13 16:44:51 +02:00
David Hildenbrand 60417fcc2b KVM: s390: factor out reading of the guest TOD clock
Let's factor this out and always use get_tod_clock_fast() when
reading the guest TOD.

STORE CLOCK FAST does not do serialization and, therefore, might
result in some fuzziness between different processors in a way
that subsequent calls on different CPUs might have time stamps that
are earlier. This semantics is fine though for all KVM use cases.
To make it obvious that the new function has STORE CLOCK FAST
semantics we name it kvm_s390_get_tod_clock_fast.

With this patch, we only have a handful of places were we
have to care about STP sync (using preempt_disable() logic).

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:35 +02:00
David Hildenbrand 25ed167596 KVM: s390: factor out and fix setting of guest TOD clock
Let's move that whole logic into one function. We now always use unsigned
values when calculating the epoch (to avoid over/underflow defined).
Also, we always have to get all VCPUs out of SIE before doing the update
to avoid running differing VCPUs with different TODs.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:35 +02:00
David Hildenbrand 5a3d883a59 KVM: s390: switch to get_tod_clock() and fix STP sync races
Nobody except early.c makes use of store_tod_clock() to handle the
cc. So if we would get a cc != 0, we would be in more trouble.

Let's replace all users with get_tod_clock(). Returning a cc
on an ioctl sounded strange either way.

We can now also easily move the get_tod_clock() call into the
preempt_disable() section. This is in fact necessary to make the
STP sync work as expected. Otherwise the host TOD could change
and we would end up with a wrong epoch calculation.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 238293b14d KVM: s390: correctly handle injection of pgm irqs and per events
PER events can always co-exist with other program interrupts.

For now, we always overwrite all program interrupt parameters when
injecting any type of program interrupt.

Let's handle that correctly by only overwriting the relevant portion of
the program interrupt parameters. Therefore we can now inject PER events
and ordinary program interrupts concurrently, resulting in no loss of
program interrupts. This will especially by helpful when manually detecting
PER events later - as both types might be triggered during one SIE exit.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 66933b78e3 KVM: s390: simplify in-kernel program irq injection
The main reason to keep program injection in kernel separated until now
was that we were able to do some checking, if really only the owning
thread injects program interrupts (via waitqueue_active(li->wq)).

This BUG_ON was never triggered and the chances of really hitting it, if
another thread injected a program irq to another vcpu, were very small.

Let's drop this check and turn kvm_s390_inject_program_int() and
kvm_s390_inject_prog_irq() into simple inline functions that makes use of
kvm_s390_inject_vcpu().

__must_check can be dropped as they are implicitely given by
kvm_s390_inject_vcpu(), to avoid ugly long function prototypes.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:34 +02:00
David Hildenbrand 4d32ad6bec KVM: s390: drop out early in kvm_s390_has_irq()
Let's get rid of the local variable and exit directly if we found
any pending interrupt. This is not only faster, but also better
readable.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand 118b862b15 KVM: s390: kvm_arch_vcpu_runnable already cares about timer interrupts
We can remove that double check.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand 5f94c58ed0 KVM: s390: set interception requests for all floating irqs
No need to separate pending and floating irqs when setting interception
requests. Let's do it for all equally.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:33 +02:00
David Hildenbrand fee0e0fdb2 KVM: s390: disabled wait cares about machine checks, not PER
We don't care about program event recording irqs (synchronous
program irqs) but asynchronous irqs when checking for disabled
wait. Machine checks were missing.

Let's directly switch to the functions we have for that purpose
instead of testing once again for magic bits.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:32 +02:00
Christian Borntraeger f59922b47e KVM: s390: remove unused variable in __inject_vm
the float int structure is no longer used in __inject_vm.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-13 15:50:27 +02:00
Feng Wu b7d2063177 iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:54 +02:00
Feng Wu bf9f6ac8d7 KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:53 +02:00
Feng Wu 28b835d60f KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Include asm/cpu.h to fix !CONFIG_SMP compilation. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:53 +02:00
Feng Wu 8727688006 KVM: x86: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:52 +02:00
Feng Wu efc644048e KVM: x86: Update IRTE for posted-interrupts
This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Squashed in automatically generated patch from the build robot
 "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:51 +02:00
Feng Wu 6d7425f109 vfio: Register/unregister irq_bypass_producer
This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:50 +02:00
Feng Wu d84f1e0755 KVM: make kvm_set_msi_irq() public
Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:50 +02:00
Feng Wu 8feb4a04dc KVM: Define a new interface kvm_intr_is_single_vcpu()
This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:49 +02:00
Feng Wu ebbfc76536 KVM: Add some helper functions for Posted-Interrupts
This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
[Make the new functions inline. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:48 +02:00
Feng Wu 6ef1522f7e KVM: Extend struct pi_desc for VT-d Posted-Interrupts
Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:48 +02:00
Feng Wu f70c20aaf1 KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:47 +02:00
Eric Auger 9016cfb577 KVM: eventfd: add irq bypass consumer management
This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:46 +02:00
Eric Auger 1a02b27035 KVM: introduce kvm_arch functions for IRQ bypass
This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
[Add weak implementations of the callbacks. - Feng]
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:45 +02:00