Commit Graph

662440 Commits

Author SHA1 Message Date
Alexey Kardashevskiy da6f59e192 KVM: PPC: Use preregistered memory API to access TCE list
VFIO on sPAPR already implements guest memory pre-registration
when the entire guest RAM gets pinned. This can be used to translate
the physical address of a guest page containing the TCE list
from H_PUT_TCE_INDIRECT.

This makes use of the pre-registrered memory API to access TCE list
pages in order to avoid unnecessary locking on the KVM memory
reverse map as we know that all of guest memory is pinned and
we have a flat array mapping GPA to HPA which makes it simpler and
quicker to index into that array (even with looking up the
kernel page tables in vmalloc_to_phys) than it is to find the memslot,
lock the rmap entry, look up the user page tables, and unlock the rmap
entry. Note that the rmap pointer is initialized to NULL
where declared (not in this patch).

If a requested chunk of memory has not been preregistered, this will
fall back to non-preregistered case and lock rmap.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:16 +10:00
Alexey Kardashevskiy 503bfcbe18 KVM: PPC: Pass kvm* to kvmppc_find_table()
The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm*
there. This will be used in the following patches where we will be
attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than
to VCPU).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:12 +10:00
Alexey Kardashevskiy e91aa8e6ec KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently
It does not make much sense to have KVM in book3s-64 and
not to have IOMMU bits for PCI pass through support as it costs little
and allows VFIO to function on book3s KVM.

Having IOMMU_API always enabled makes it unnecessary to have a lot of
"#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those
ifdef's we could have only user space emulated devices accelerated
(but not VFIO) which do not seem to be very useful.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:07 +10:00
Alexey Kardashevskiy 4898d3f49b KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
This adds a capability number for in-kernel support for VFIO on
SPAPR platform.

The capability will tell the user space whether in-kernel handlers of
H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
must not attempt allocating a TCE table in the host kernel via
the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
will not be passed to the user space which is desired action in
the situation like that.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:01 +10:00
Paul Mackerras 644d2d6fef Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the commits in the topic/ppc-kvm branch of the powerpc
tree to get the changes to arch/powerpc which subsequent patches will
rely on.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:33 +10:00
Alexey Kardashevskiy 3762d45aa7 KVM: PPC: Align the table size to system page size
At the moment the userspace can request a table smaller than a page size
and this value will be stored as kvmppc_spapr_tce_table::size.
However the actual allocated size will still be aligned to the system
page size as alloc_page() is used there.

This aligns the table size up to the system page size. It should not
change the existing behaviour but when in-kernel TCE acceleration patchset
reaches the upstream kernel, this will allow small TCE tables be
accelerated as well: PCI IODA iommu_table allocator already aligns
the size and, without this patch, an IOMMU group won't attach to LIOBN
due to the mismatching table size.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:20 +10:00
Alexey Kardashevskiy 96df226769 KVM: PPC: Book3S PR: Preserve storage control bits
PR KVM page fault handler performs eaddr to pte translation for a guest,
however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits
(storage control) in the kvmppc_pte struct. If PR KVM is running as
a second level guest under HV KVM, and PR KVM tries inserting HPT entry,
this fails in HV KVM if it already has this mapping.

This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and
kvmppc_mmu_map_page().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:14 +10:00
Alexey Kardashevskiy bd9166ffe6 KVM: PPC: Book3S PR: Exit KVM on failed mapping
At the moment kvmppc_mmu_map_page() returns -1 if
mmu_hash_ops.hpte_insert() fails for any reason so the page fault handler
resumes the guest and it faults on the same address again.

This adds distinction to kvmppc_mmu_map_page() to return -EIO if
mmu_hash_ops.hpte_insert() failed for a reason other than full pteg.
At the moment only pSeries_lpar_hpte_insert() returns -2 if
plpar_pte_enter() failed with a code other than H_PTEG_FULL.
Other mmu_hash_ops.hpte_insert() instances can only fail with
-1 "full pteg".

With this change, if PR KVM fails to update HPT, it can signal
the userspace about this instead of returning to guest and having
the very same page fault over and over again.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:04 +10:00
Alexey Kardashevskiy 9eecec126e KVM: PPC: Book3S PR: Get rid of unused local variable
@is_mmio has never been used since introduction in
commit 2f4cf5e42d ("Add book3s.c") from 2009.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:00 +10:00
Markus Elfring 37655490db KVM: PPC: e500: Use kcalloc() in e500_mmu_host_init()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:55 +10:00
Markus Elfring a1c52e1c7c KVM: PPC: Book3S HV: Use common error handling code in kvmppc_clr_passthru_irq()
Add a jump target so that a bit of exception handling can be better reused
at the end of this function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:50 +10:00
Paul Mackerras 9b5ab00513 KVM: PPC: Add MMIO emulation for remaining floating-point instructions
For completeness, this adds emulation of the lfiwax and lfiwzx
instructions.  With this, all floating-point load and store instructions
as of Power ISA V2.07 are emulated.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:44 +10:00
Paul Mackerras ceba57df43 KVM: PPC: Emulation for more integer loads and stores
This adds emulation for the following integer loads and stores,
thus enabling them to be used in a guest for accessing emulated
MMIO locations.

- lhaux
- lwaux
- lwzux
- ldu
- lwa
- stdux
- stwux
- stdu
- ldbrx
- stdbrx

Previously, most of these would cause an emulation failure exit to
userspace, though ldu and lwa got treated incorrectly as ld, and
stdu got treated incorrectly as std.

This also tidies up some of the formatting and updates the comment
listing instructions that still need to be implemented.

With this, all integer loads and stores that are defined in the Power
ISA v2.07 are emulated, except for those that are permitted to trap
when used on cache-inhibited or write-through mappings (and which do
in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx,
lq/stq, and l[bhwdq]arx/st[bhwdq]cx.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:38 +10:00
Alexey Kardashevskiy 91242fd1a3 KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed)
This adds missing stdx emulation for emulated MMIO accesses by KVM
guests.  This allows the Mellanox mlx5_core driver from recent kernels
to work when MMIO emulation is enforced by userspace.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:33 +10:00
Bin Lu 6f63e81bda KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.

The instructions that this adds emulation for are:

- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x

[paulus@ozlabs.org - some cleanups, fixes and rework, make it
 compile for Book E, fix build when PR KVM is built in]

Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:36:41 +10:00
Paul Mackerras 307d927967 KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interrupts
This provides functions that can be used for generating interrupts
indicating that a given functional unit (floating point, vector, or
VSX) is unavailable.  These functions will be used in instruction
emulation code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 10:39:50 +10:00
Radim Krčmář f7b1a77d3b KVM: s390: features for 4.12
1. guarded storage support for guests
    This contains an s390 base Linux feature branch that is necessary
    to implement the KVM part
 2. Provide an interface to implement adapter interruption suppression
    which is necessary for proper zPCI support
 3. Use more defines instead of numbers
 4. Provide logging for lazy enablement of runtime instrumentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJY50fGAAoJEBF7vIC1phx8TK4P/0y7J6go7NFniVXf0K5/teAZ
 kH6BRwoWQDCGAwkpz8fqhKTNxdEq1bjLS5aNf2R3oGt61yoKim4GLSYeRymhKWo0
 WZbwq8VpCWxjysx4wyg1nm4J05EDqaPvfxtD9ONCxZNXvPeZR5pIY1uo3twuNROS
 z+bOtP1YK7FyzbgGsHnc47YKrOMa3LwTQfAAhf3eVHs5f9cksIqlQYlo/H7fJY96
 Z3qpZmrwWP5avZAenxfUKS2dEiT2lzFvUP1waYwm4sZ7fImdgBRlnFzFK/O7qiIV
 c9KHW+qao37NU1AsqJibXZvFQBJixKxWa8nCKsagpkZAMDoPNoutHsxXpNJNrp4S
 Zq+OWbiYRKXnndGI94RtMUE+bmRvcj37a648+nwzRPA9N889GQR829qKLANsqCwY
 7bM3gE9d5BkAASp0uZ4bgrfyU4tlGTH8WLNSLM+upeOGjWEk3HdOEFuJQNspgaWQ
 oaAl4TROQhGvZXYxzTHP26jEG0IAIkUd2CUULAe8lZ2vk+kc4xPAfhjQ41DFembz
 fDMXXKgjtHvT6Z5USbGJdkOBkhzGyOuETZJyrj7D60OmlHXeoBPgWIJ21cS8ZU4F
 M0cM+1DcXbOOKjfIbZRbLh9OdAhBya8VzHnyw7T2vl9hXIAhEH9MwTfVsnBujdca
 Q4BBOIh3HV9VfT0exaaF
 =Jtjb
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux

From: Christian Borntraeger <borntraeger@de.ibm.com>

KVM: s390: features for 4.12

1. guarded storage support for guests
   This contains an s390 base Linux feature branch that is necessary
   to implement the KVM part
2. Provide an interface to implement adapter interruption suppression
   which is necessary for proper zPCI support
3. Use more defines instead of numbers
4. Provide logging for lazy enablement of runtime instrumentation
2017-04-11 20:54:40 +02:00
Jim Mattson 28d0635388 kvm: nVMX: Disallow userspace-injected exceptions in guest mode
The userspace exception injection API and code path are entirely
unprepared for exceptions that might cause a VM-exit from L2 to L1, so
the best course of action may be to simply disallow this for now.

1. The API provides no mechanism for userspace to specify the new DR6
bits for a #DB exception or the new CR2 value for a #PF
exception. Presumably, userspace is expected to modify these registers
directly with KVM_SET_SREGS before the next KVM_RUN ioctl. However, in
the event that L1 intercepts the exception, these registers should not
be changed. Instead, the new values should be provided in the
exit_qualification field of vmcs12 (Intel SDM vol 3, section 27.1).

2. In the case of a userspace-injected #DB, inject_pending_event()
clears DR7.GD before calling vmx_queue_exception(). However, in the
event that L1 intercepts the exception, this is too early, because
DR7.GD should not be modified by a #DB that causes a VM-exit directly
(Intel SDM vol 3, section 27.1).

3. If the injected exception is a #PF, nested_vmx_check_exception()
doesn't properly check whether or not L1 is interested in the
associated error code (using the #PF error code mask and match fields
from vmcs12). It may either return 0 when it should call
nested_vmx_vmexit() or vice versa.

4. nested_vmx_check_exception() assumes that it is dealing with a
hardware-generated exception intercept from L2, with some of the
relevant details (the VM-exit interruption-information and the exit
qualification) live in vmcs02. For userspace-injected exceptions, this
is not the case.

5. prepare_vmcs12() assumes that when its exit_intr_info argument
specifies valid information with a valid error code that it can VMREAD
the VM-exit interruption error code from vmcs02. For
userspace-injected exceptions, this is not the case.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:01 +02:00
David Hildenbrand 28bf288879 KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
If we already entered/are about to enter SMM, don't allow switching to
INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events()
will report a warning.

Same applies if we are already in MP state INIT_RECEIVED and SMM is
requested to be turned on. Refuse to set the VCPU events in this case.

Fixes: cd7764fe9f ("KVM: x86: latch INITs while in system management mode")
Cc: stable@vger.kernel.org # 4.2+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:01 +02:00
Paolo Bonzini 4b4357e025 kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public
Its value has never changed; we might as well make it part of the ABI instead
of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).

Because PPC does not always make MMIO available, the code has to be made
dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:01 +02:00
Paolo Bonzini 3042255899 kvm: make KVM_CAP_COALESCED_MMIO architecture agnostic
Remove code from architecture files that can be moved to virt/kvm, since there
is already common code for coalesced MMIO.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[Removed a pointless 'break' after 'return'.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini a5f4645704 KVM: nVMX: support RDRAND and RDSEED exiting
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini 1f51999270 KVM: VMX: add missing exit reasons
In order to simplify adding exit reasons in the future,
the array of exit reason names is now also sorted by
exit reason code.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini ae1e2d1082 kvm: nVMX: support EPT accessed/dirty bits
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
thing to change is that, when EPT accessed and dirty bits are not in use,
VMX treats accesses to guest paging structures as data reads.  When they
are in use (bit 6 of EPTP is set), they are treated as writes and the
corresponding EPT dirty bit is set.  The MMU didn't know this detail,
so this patch adds it.

We also have to fix up the exit qualification.  It may be wrong because
KVM sets bit 6 but the guest might not.

L1 emulates EPT A/D bits using write permissions, so in principle it may
be possible for EPT A/D bits to be used by L1 even though not available
in hardware.  The problem is that guest page-table walks will be treated
as reads rather than writes, so they would not cause an EPT violation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Fixed typo in walk_addr_generic() comment and changed bit clear +
 conditional-set pattern in handle_ept_violation() to conditional-clear]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini 86407bcb5c kvm: x86: MMU support for EPT accessed/dirty bits
This prepares the MMU paging code for EPT accessed and dirty bits,
which can be enabled optionally at runtime.  Code that updates the
accessed and dirty bits will need a pointer to the struct kvm_mmu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini 0047723130 KVM: VMX: remove bogus check for invalid EPT violation
handle_ept_violation is checking for "guest-linear-address invalid" +
"not a paging-structure walk".  However, _all_ EPT violations without
a valid guest linear address are paging structure walks, because those
EPT violations happen when loading the guest PDPTEs.

Therefore, the check can never be true, and even if it were, KVM doesn't
care about the guest linear address; it only uses the guest *physical*
address VMCS field.  So, remove the check altogether.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini 7db742654d KVM: nVMX: we support 1GB EPT pages
Large pages at the PDPE level can be emulated by the MMU, so the bit
can be set unconditionally in the EPT capabilities MSR.  The same is
true of 2MB EPT pages, though all Intel processors with EPT in practice
support those.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini ad6260da1e KVM: x86: drop legacy device assignment
Legacy device assignment has been deprecated since 4.2 (released
1.5 years ago).  VFIO is better and everyone should have switched to it.
If they haven't, this should convince them. :)

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07 16:49:00 +02:00
Paolo Bonzini 2c82878b0c KVM: VMX: require virtual NMI support
Virtual NMIs are only missing in Prescott and Yonah chips.  Both are obsolete
for virtualization usage---Yonah is 32-bit only even---so drop vNMI emulation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07 16:49:00 +02:00
Borislav Petkov 74f169090b kvm/svm: Setup MCG_CAP on AMD properly
MCG_CAP[63:9] bits are reserved on AMD. However, on an AMD guest, this
MSR returns 0x100010a. More specifically, bit 24 is set, which is simply
wrong. That bit is MCG_SER_P and is present only on Intel. Thus, clean
up the reserved bits in order not to confuse guests.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-07 16:49:00 +02:00
David Hildenbrand 1279a6b124 KVM: nVMX: single function for switching between vmcs
Let's combine it in a single function vmx_switch_vmcs().

Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Jim Mattson f0b98c02c1 kvm: vmx: Don't use INVVPID when EPT is enabled
According to the Intel SDM, volume 3, section 28.3.2: Creating and
Using Cached Translation Information, "No linear mappings are used
while EPT is in use." INVEPT will invalidate both the guest-physical
mappings and the combined mappings in the TLBs and paging-structure
caches, so an INVVPID is superfluous.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:00 +02:00
Yi Min Zhao 47a4693e1d KVM: s390: introduce AIS capability
Introduce a cap to enable AIS facility bit, and add documentation
for this capability.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-04-07 09:11:11 +02:00
Radim Krčmář 715958f921 KVM: MIPS: VZ support, Octeon III, and TLBR
Add basic support for the MIPS Virtualization Module (generally known as
 MIPS VZ) in KVM. We primarily support the ImgTec P5600, P6600, I6400,
 and Cavium Octeon III cores so far. Support is included for the
 following VZ / guest hardware features:
 - MIPS32 and MIPS64, r5 (VZ requires r5 or later) and r6
 - TLBs with GuestID (IMG cores) or Root ASID Dealias (Octeon III)
 - Shared physical root/guest TLB (IMG cores)
 - FPU / MSA
 - Cop0 timer (up to 1GHz for now due to soft timer limit)
 - Segmentation control (EVA)
 - Hardware page table walker (HTW) both for root and guest TLB
 
 Also included is a proper implementation of the TLBR instruction for the
 trap & emulate MIPS KVM implementation.
 
 Preliminary MIPS architecture changes are applied directly with Ralf's
 ack.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJY5XPkAAoJEGwLaZPeOHZ6aLcQAI43z58kkUopHJVfXtUbS+p0
 Bno+oi6XKwEL0AD361A6jflbfxaSHQocilhCBvGKf7c7Rm/oRWrAxrXDnNEDi59s
 U7tH8KATdzgySu8mZOJNp8a0VcWS08yAbwOeZcqASPowBARPhlga3DCQdC6mWePi
 rlfHzRi2hBNKOc1q3KmGKDfiwi4x3dcLQYd9O8RmdpAjW5bfem0mJ76w9LRkPZHz
 YiCxnHYa0n4sNscT7HREe+P9/MzD2MQY04m+jhSMo/IHYPec9ap8kFN+de/4P1cT
 J2yTscywsQlC56E/pcRT5X0TYAZz/rsDhmRnIKRYuJBrGIXV8BKdYyqmBrxC7o6/
 K4HvXJtMzkyG/xGj5l4TqTgTlPH0k4iu/bBWvyRjd40v3ZpSq5GqNG+6VX1QfYDW
 ZNa0fviC9uHqbfHijHs9IV1Kdb4bII/xd2eotCUy8jKbikd6FJWUT/XqQB4NGQpW
 PZtgPXVs958vWLG1qrdh2dSMpGR21uPwp9NsqGim/3raQOlDeTUK+x384urqLcU/
 pQT2WROmXw8H9qPPKpkCs9xdhp0ja2TotTJcqH+mNk+r3QzWa4N95rpd9MZKtbyc
 YaQqC5FWru79ZfO53n2PsZidWyHHUS1rxYuYkopeGC7pgmoUdKdHwkzkvFdWLXHE
 Ol8lksYDC5aHiWD6V8Sh
 =smiL
 -----END PGP SIGNATURE-----

Merge tag 'kvm_mips_4.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips

From: James Hogan <james.hogan@imgtec.com>

KVM: MIPS: VZ support, Octeon III, and TLBR

Add basic support for the MIPS Virtualization Module (generally known as
MIPS VZ) in KVM. We primarily support the ImgTec P5600, P6600, I6400,
and Cavium Octeon III cores so far. Support is included for the
following VZ / guest hardware features:
- MIPS32 and MIPS64, r5 (VZ requires r5 or later) and r6
- TLBs with GuestID (IMG cores) or Root ASID Dealias (Octeon III)
- Shared physical root/guest TLB (IMG cores)
- FPU / MSA
- Cop0 timer (up to 1GHz for now due to soft timer limit)
- Segmentation control (EVA)
- Hardware page table walker (HTW) both for root and guest TLB

Also included is a proper implementation of the TLBR instruction for the
trap & emulate MIPS KVM implementation.

Preliminary MIPS architecture changes are applied directly with Ralf's
ack.
2017-04-06 14:47:03 +02:00
Yi Min Zhao a892095013 KVM: s390: introduce adapter interrupt inject function
Inject adapter interrupts on a specified adapter which allows to
retrieve the adapter flags, e.g. if the adapter is subject to AIS
facility or not. And add documentation for this interface.

For adapters subject to AIS, handle the airq injection suppression
for a given ISC according to the interruption mode:
- before injection, if NO-Interruptions Mode, just return 0 and
  suppress, otherwise, allow the injection.
- after injection, if SINGLE-Interruption Mode, change it to
  NO-Interruptions Mode to suppress the following interrupts.

Besides, add tracepoint for suppressed airq and AIS mode transitions.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-04-06 13:15:37 +02:00
Fei Li 5197839354 KVM: s390: introduce ais mode modify function
Provide an interface for userspace to modify AIS
(adapter-interruption-suppression) mode state, and add documentation
for the interface. Allowed target modes are ALL-Interruptions mode
and SINGLE-Interruption mode.

We introduce the 'simm' and 'nimm' fields in kvm_s390_float_interrupt
to store interruption modes for each ISC. Each bit in 'simm' and
'nimm' targets to one ISC, and collaboratively indicate three modes:
ALL-Interruptions, SINGLE-Interruption and NO-Interruptions. This
interface can initiate most transitions between the states; transition
from SINGLE-Interruption to NO-Interruptions via adapter interrupt
injection will be introduced in a following patch. The meaningful
combinations are as follows:

    interruption mode | simm bit | nimm bit
    ------------------|----------|----------
             ALL      |    0     |     0
           SINGLE     |    1     |     0
             NO       |    1     |     1

Besides, add tracepoint to track AIS mode transitions.

Co-Authored-By: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-04-06 13:15:36 +02:00
Fei Li 08fab50da6 KVM: s390: interface for suppressible I/O adapters
In order to properly implement adapter-interruption suppression, we
need a way for userspace to specify which adapters are subject to
suppression. Let's convert the existing (and unused) 'pad' field into
a 'flags' field and define a flag value for suppressible adapters.

Besides, add documentation for the interface.

Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-04-06 13:15:36 +02:00
Alexey Kardashevskiy e5afdf9dd5 powerpc/vfio_spapr_tce: Add reference counting to iommu_table
So far iommu_table obejcts were only used in virtual mode and had
a single owner. We are going to change this by implementing in-kernel
acceleration of DMA mapping requests. The proposed acceleration
will handle requests in real mode and KVM will keep references to tables.

This adds a kref to iommu_table and defines new helpers to update it.
This replaces iommu_free_table() with iommu_tce_table_put() and makes
iommu_free_table() static. iommu_tce_table_get() is not used in this patch
but it will be in the following patch.

Since this touches prototypes, this also removes @node_name parameter as
it has never been really useful on powernv and carrying it for
the pseries platform code to iommu_free_table() seems to be quite
useless as well.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30 21:42:11 +11:00
Alexey Kardashevskiy 11edf116e3 powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal
At the moment iommu_table can be disposed by either calling
iommu_table_free() directly or it_ops::free(); the only implementation
of free() is in IODA2 - pnv_ioda2_table_free() - and it calls
iommu_table_free() anyway.

As we are going to have reference counting on tables, we need an unified
way of disposing tables.

This moves it_ops::free() call into iommu_free_table() and makes use
of the latter. The free() callback now handles only platform-specific
data.

As from now on the iommu_free_table() calls it_ops->free(), we need
to have it_ops initialized before calling iommu_free_table() so this
moves this initialization in pnv_pci_ioda2_create_table().

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30 21:42:11 +11:00
Alexey Kardashevskiy a540aa56ba powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
In real mode, TCE tables are invalidated using special
cache-inhibited store instructions which are not available in
virtual mode

This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.

The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.

This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30 21:42:01 +11:00
Stefan Raspl e55fe3cccc tools/kvm_stat: add '%Total' column
Add column '%Total' next to 'Total' for easier comparison of numbers between
hosts.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:33 +02:00
Stefan Raspl 9f114a03c6 tools/kvm_stat: add interactive command 'r'
Provide an interactive command to reset the tracepoint statistics.
Requires some extra work for debugfs, as the counters cannot be reset.

On the up side, this offers us the opportunity to have debugfs values
reset on startup and whenever a filter is modified, becoming consistent
with the tracepoint provider. As a bonus, 'kvmstat -dt' will now provide
useful output, instead of mixing values in totally different orders of
magnitude.
Furthermore, we avoid unnecessary resets when any of the filters is
"changed" interactively to the previous value.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:32 +02:00
Stefan Raspl 4443084fa0 tools/kvm_stat: add interactive command 'c'
Provide a real simple way to erase any active filter.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:32 +02:00
Stefan Raspl f9ff108735 tools/kvm_stat: add option '--guest'
Add a new option '-g'/'--guest' to select a particular process by providing
the QEMU guest name.
Notes:
- The logic to figure out the pid corresponding to the guest name might look
  scary, but works pretty reliably in practice; in the unlikely event that it
  returns add'l flukes, it will bail out and hint at using '-p' instead, no
  harm done.
- Mixing '-g' and '-p' is possible, and the final instance specified on the
  command line is the significant one. This is consistent with current
  behavior for '-p' which, if specified multiple times, also regards the final
  instance as the significant one.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:31 +02:00
Stefan Raspl 645c1728a9 tools/kvm_stat: remove regex filter on empty input
Behavior on empty/0 input for regex and pid filtering was inconsistent, as
the former would keep the current filter, while the latter would (naturally)
remove any pid filtering.
Make things consistent by falling back to the default filter on empty input
for the regex filter dialogue.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:30 +02:00
Stefan Raspl 72187dfa8e tools/kvm_stat: display regex when set to non-default
If a user defines a regex filter through the interactive command, display
the active regex in the header's second line.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:30 +02:00
Stefan Raspl 0152c20f04 tools/kvm_stat: print error messages on faulty pid filter input
Print helpful messages in case users enter invalid input or invalid pids in
the interactive pid filter dialogue.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:29 +02:00
Stefan Raspl be03ea3b77 tools/kvm_stat: remove pid filter on empty input
Improve consistency in the interactive dialogue for pid filtering by
removing any filters on empty input (in addition to entering 0).

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:29 +02:00
Stefan Raspl a24e85f6a6 tools/kvm_stat: display guest name when using pid filter
When running kvm_stat with option '-p' to filter per process, display
the QEMU guest name next to the pid, if available.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Reviewed-By: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:28 +02:00
Stefan Raspl 1eaa2f9022 tools/kvm_stat: document list of interactive commands
Apart from the source code, there does not seem to be a place that documents
the interactive capabilities of kvm_stat yet.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-29 12:01:28 +02:00