OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Paolo Bonzini	1c2af4968e	Merge tag 'kvm-for-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next	2014-03-04 15:58:00 +01:00
Andrew Jones	332967a3ea	x86: kvm: introduce periodic global clock updates commit `0061d53daf` introduced a mechanism to execute a global clock update for a vm. We can apply this periodically in order to propagate host NTP corrections. Also, if all vcpus of a vm are pinned, then without an additional trigger, no guest NTP corrections can propagate either, as the current trigger is only vcpu cpu migration. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-04 11:50:54 +01:00
Andrew Jones	7e44e4495a	x86: kvm: rate-limit global clock updates When we update a vcpu's local clock it may pick up an NTP correction. We can't wait an indeterminate amount of time for other vcpus to pick up that correction, so commit `0061d53daf` introduced a global clock update. However, we can't request a global clock update on every vcpu load either (which is what happens if the tsc is marked as unstable). The solution is to rate-limit the global clock updates. Marcelo calculated that we should delay the global clock updates no more than 0.1s as follows: Assume an NTP correction c is applied to one vcpu, but not the other, then in n seconds the delta of the vcpu system_timestamps will be c * n. If we assume a correction of 500ppm (worst-case), then the two vcpus will diverge 50us in 0.1s, which is a considerable amount. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-03-04 11:50:47 +01:00
Andrew Honig	a08d3b3b99	kvm: x86: fix emulator buffer overflow (CVE-2014-0049) The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: `f78146b0f9` Signed-off-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 19:35:22 +01:00
Takuya Yoshikawa	684851a157	KVM: x86: Break kvm_for_each_vcpu loop after finding the VP_INDEX No need to scan the entire VCPU array. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 19:25:39 +01:00
Marcelo Tosatti	d3714010c3	KVM: x86: emulator_cmpxchg_emulated should mark_page_dirty emulator_cmpxchg_emulated writes to guest memory, therefore it should update the dirty bitmap accordingly. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-26 10:11:08 +01:00
Liu, Jinsong	390bd528ae	KVM: x86: Enable Intel MPX for guest From 44c2abca2c2eadc6f2f752b66de4acc8131880c4 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:12:31 +0800 Subject: [PATCH v5 3/3] KVM: x86: Enable Intel MPX for guest This patch enable Intel MPX feature to guest. Signed-off-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-25 20:17:12 +01:00
Liu, Jinsong	0dd376e709	KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:55 +0800 Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic to kvm_get/set_msr(). Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-25 20:17:09 +01:00
Liu, Jinsong	56c103ec04	KVM: x86: Fix xsave cpuid exposing bug From 00c920c96127d20d4c3bb790082700ae375c39a0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Fri, 21 Feb 2014 23:47:18 +0800 Subject: [PATCH] KVM: x86: Fix xsave cpuid exposing bug EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable. Bit 63 of XCR0 is reserved for future expansion. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-22 15:53:34 +01:00
Marcelo Tosatti	4f34d683e5	KVM: x86: remove unused last_kernel_ns variable Remove unused last_kernel_ns variable. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-04 04:20:42 +01:00
Paolo Bonzini	5f66b62095	kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef Self explanatory. Reported-by: Radim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-29 18:10:45 +01:00
Jan Kiszka	58cb628dbe	KVM: x86: Validate guest writes to MSR_IA32_APICBASE Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-27 14:39:44 +01:00
Vadim Rozenfeld	b3af1e889e	KVM: x86: mark hyper-v vapic assist page as dirty Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-24 16:51:15 +01:00
Vadim Rozenfeld	b94b64c9a7	KVM: x86: mark hyper-v hypercall page as dirty Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-23 19:00:04 +01:00
Linus Torvalds	7ebd3faa9b	First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJS3TVKAAoJEBvWZb6bTYbyIFgP/2cmt4ifCuFMaZv4+G1S8jZU uC9ZB/+7vzht/p6zAy+4BxurKbHmSBFkC1OKcxYuy7yB4CQkHabzj4V2vRtqFdwH 5lExP9qh3kqaVLuhnvxLTmkktR3EW4PFy6OI53l5kRNktOXSuZ0aN6K3V7tCg/X0 iL7ASo4bJKlxeWcDpmuVrNgAajmZVfXrjKY7robgBQno+yIsgKhRZRBQHjozA6B8 FpCo/k48RZd/EzIbV/PDDRI4hmmry/lgrO9SKjzq56wSqff2bd/k/KYze4dbAPfd Ps60enPTuHmeEjjb4MMMU4EKHVdTQFUMx/xZCmT4xzoh8s4of6RHphXbfE0SUznQ dTveyEQAR7E3JNS0k1+3WEX5fWlFesp0hO2NeE0wzUq4TAr9ztgVO9NQ6Si15e7Z 2HysO0T5Ojtt0lY08/PvS6i48eCAuuBomrejJS8hLW4SUZ5adn+yW4Qo7Fp9JeBR l9a3LsVT8BZMtUWrUuFcVhlM4MbzElUPjDbgWhR8UYU/kpfVZOQu8qWgGKR4UWXy X7/t9l/tjR99CmfMJBAOzJid+ScSpAfg77BdaKiQrVfVIJmsjEjlO8vUMyj5b1HF hPX5wNyJjHAOfridLeHSs4Rdm4a8sk8Az5d4h76pLVz8M4jyTi2v0rO3N4/dU/pu x7N8KR5hAj+mLBoM9/Al =8sYU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...	2014-01-22 21:40:43 -08:00
Jan Kiszka	73aaf249ee	KVM: SVM: Fix reading of DR6 In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of `020df0794f`. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:10 +01:00
Jan Kiszka	9926c9fdbd	KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS Whenever we change arch.dr7, we also have to call kvm_update_dr7. In case guest debugging is off, this will synchronize the new state into hardware. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:09 +01:00
Vadim Rozenfeld	e984097b55	add support for Hyper-V reference time counter Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-17 10:22:08 +01:00
Paolo Bonzini	aab6d7ce37	KVM: remove useless write to vcpu->hv_clock.tsc_timestamp After the previous patch from Marcelo, the comment before this write became obsolete. In fact, the write is unnecessary. The calls to kvm_write_tsc ultimately result in a master clock update as soon as all TSCs agree and the master clock is re-enabled. This master clock update will rewrite tsc_timestamp. So, together with the comment, delete the dead write too. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 18:08:25 +01:00
Marcelo Tosatti	f25e656d31	KVM: x86: fix tsc catchup issue with tsc scaling To fix a problem related to different resolution of TSC and system clock, the offset in TSC units is approximated by delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc (Guest TSC value at (Guest TSC value at last VM-exit) the last kvm_guest_time_update call) Delta is then later scaled using mult,shift pair found in hv_clock structure (which is correct against tsc_timestamp in that structure). However, if a frequency change is performed between these two points, this delta is measured using different TSC frequencies, but scaled using mult,shift pair for one frequency only. The end result is an incorrect delta. The bug which this code works around is not the only cause for clock backwards events. The global accumulator is still necessary, so remove the max_kernel_ns fix and rely on the global accumulator for no clock backwards events. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 13:44:46 +01:00
Marcelo Tosatti	9ed96e87c5	KVM: x86: limit PIT timer frequency Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Cc: stable@kernel.org Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-01-15 12:43:54 +01:00
Takuya Yoshikawa	9357d93952	KVM: x86: Add comment on vcpu_enter_guest()'s return value Giving proper names to the 0 and 1 was once suggested. But since 0 is returned to the userspace, giving it another name can introduce extra confusion. This patch just explains the meanings instead. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-13 14:23:54 +01:00
Takuya Yoshikawa	c08ac06ab3	KVM: Use cond_resched() directly and remove useless kvm_resched() Since the commit `15ad7146` ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-13 14:23:45 +01:00
Andy Honig	fda4e2e855	KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: `b93463aa59` ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-12-12 22:39:46 +01:00
Michael S. Tsirkin	01b71917b5	kvm: optimize out smp_mb after srcu_read_unlock I noticed that srcu_read_lock/unlock both have a memory barrier, so just by moving srcu_read_unlock earlier we can get rid of one call to smp_mb() using smp_mb__after_srcu_read_unlock instead. Unsurprisingly, the gain is small but measureable using the unit test microbenchmark: before vmcall in the ballpark of 1410 cycles after vmcall in the ballpark of 1360 cycles Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-11-06 09:32:31 +02:00
Gleb Natapov	95f328d3ad	Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queue Conflicts: arch/powerpc/include/asm/processor.h	2013-11-04 10:20:57 +02:00
Paolo Bonzini	c67a04cb9a	KVM: x86: fix KVM_SET_XCRS loop The loop was always using 0 as the index. This means that any rubbish after the first element of the array went undetected. It seems reasonable to assume that no KVM userspace did that. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-31 11:31:19 +01:00
Paolo Bonzini	46c34cb059	KVM: x86: fix KVM_SET_XCRS for CPUs that do not support XSAVE The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS could return. XCR0's bit 0 is always 1 in real processors with XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the emulated processor does not have XSAVE. So, KVM_SET_XCRS must ignore that bit when checking for attempts to enable unsupported save states. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-31 11:30:46 +01:00
Alex Williamson	e0f0bbc527	kvm: Create non-coherent DMA registeration We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:23 +01:00
Alex Williamson	d96eb2c6f4	kvm/x86: Convert iommu_flags to iommu_noncoherent Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 19:02:13 +01:00
Borislav Petkov	b51e974fcd	kvm, emulator: Rename VendorSpecific flag Call it EmulateOnUD which is exactly what we're trying to do with vendor-specific instructions. Rename ->only_vendor_specific_insn to something shorter, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 18:54:40 +01:00
Borislav Petkov	1ce19dc16c	kvm, emulator: Use opcode length Add a field to the current emulation context which contains the instruction opcode length. This will streamline handling of opcodes of different length. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 18:54:39 +01:00
Borislav Petkov	9c15bb1d0a	kvm: Add KVM_GET_EMULATED_CPUID Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality. Make sure ->padding is being passed on clean from userspace so that we can use it for something in the future, after the ioctl gets cast in stone. s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-30 18:54:39 +01:00
Aneesh Kumar K.V	5587027ce9	kvm: Add struct kvm arg to memslot APIs We will use that in the later patch to find the kvm ops handler Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-17 15:49:23 +02:00
chai wen	f2e106692d	KVM: Drop FOLL_GET in GUP when doing async page fault Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-15 13:43:37 +03:00
Paolo Bonzini	8a3c1a3347	KVM: mmu: change useless int return types to void kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 15:44:02 +03:00
Paolo Bonzini	d8d173dab2	KVM: mmu: remove uninteresting MMU "new_cr3" callbacks The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit `e676505` (KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation, 2012-07-08). The commit message mentioned that "mmu_free_roots() is somewhat of an overkill, but fixing that is more complicated and will be done after this minimal fix". One year has passed, and no one really felt the need to do a different fix. Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the callback. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 15:43:59 +03:00
Paolo Bonzini	4344ee981e	KVM: x86: only copy XSAVE state for the supported features This makes the interface more deterministic for userspace, which can expect (after configuring only the features it supports) to get exactly the same state from the kernel, independent of the host CPU and kernel version. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 12:29:09 +03:00
Paolo Bonzini	d7876f1be4	KVM: x86: prevent setting unsupported XSAVE states A guest can still attempt to save and restore XSAVE states even if they have been masked in CPUID leaf 0Dh. This usually is not visible to the guest, but is still wrong: "Any attempt to set a reserved bit (as determined by the contents of EAX and EDX after executing CPUID with EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP exception". The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE. This catches migration from newer to older kernel/processor before the guest starts running. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-10-03 12:29:07 +03:00
Paolo Bonzini	2f303b74a6	KVM: Convert kvm_lock back to non-raw spinlock In commit `e935b8372c` ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 [<ffffffff811185bf>] kswapd+0x18f/0x490 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 [<ffffffff81060d2b>] kthread+0xdb/0xe0 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 [<ffffffff81060c50>] ? __init_kthread_worker+0x After the previous patch, kvm_lock need not be a raw spinlock anymore, so change it back. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-30 09:21:51 +02:00
Marcelo Tosatti	2e762ff79f	KVM: x86: update masterclock when kvmclock_offset is calculated (v2) The offset to add to the hosts monotonic time, kvmclock_offset, is calculated against the monotonic time at KVM_SET_CLOCK ioctl time. Request a master clock update at this time, to reduce a potentially unbounded difference between the values of the masterclock and the clock value used to calculate kvmclock_offset. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-28 17:36:11 +03:00
Paolo Bonzini	0912c9771e	KVM: x86: add comments where MMIO does not return to the emulator Support for single-step in the emulator (new in 3.12) does not work for MMIO or PIO writes, because they are completed without returning to the emulator. This is not worse than what we had in 3.11; still, add comments so that the issue is not forgotten. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-28 17:15:54 +03:00
Raghavendra K T	24d2166beb	kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Note that we are using APIC_DM_REMRD which has reserved usage. In future if APIC_DM_REMRD usage is standardized, then we should find some other way or go back to old method. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-26 12:47:09 +03:00
Srivatsa Vaddagiri	6aef266c6e	kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com> [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable Added flags for future use (suggested by Gleb)] [ Raghu: fold pv_unhalt flag as suggested by Eric Northup] Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Gleb Natapov <gleb@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-08-26 12:47:09 +03:00
Nadav Har'El	b7e914501c	nEPT: Fix wrong test in kvm_set_cr3 kvm_set_cr3() attempts to check if the new cr3 is a valid guest physical address. The problem is that with nested EPT, cr3 is an L2 physical address, not an L1 physical address as this test expects. As the comment above this test explains, it isn't necessary, and doesn't correspond to anything a real processor would do. So this patch removes it. Note that this wrong test could have also theoretically caused problems in nested NPT, not just in nested EPT. However, in practice, the problem was avoided: nested_svm_vmexit()/vmrun() do not call kvm_set_cr3 in the nested NPT case, and instead set the vmcb (and arch.cr3) directly, thus circumventing the problem. Additional potential calls to the buggy function are avoided in that we don't trap cr3 modifications when nested NPT is enabled. However, because in nested VMX we did want to use kvm_set_cr3() (as requested in Avi Kivity's review of the original nested VMX patches), we can't avoid this problem and need to fix it. Reviewed-by: Orit Wasserman <owasserm@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-08-07 15:57:35 +02:00
Paolo Bonzini	663f4c61b8	KVM: x86: handle singlestep during emulation This lets debugging work better during emulation of invalid guest state. This time the check is done after emulation, but before writeback of the flags; we need to check the flags before execution of the instruction, we cannot check singlestep_rip because the CS base may have already been modified. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Conflicts: arch/x86/kvm/x86.c	2013-07-29 09:01:14 +02:00
Paolo Bonzini	4a1e10d5b5	KVM: x86: handle hardware breakpoints during emulation This lets debugging work better during emulation of invalid guest state. The check is done before emulating the instruction, and (in the case of guest debugging) reuses EMULATE_DO_MMIO to exit with KVM_EXIT_DEBUG. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-29 09:01:14 +02:00
Paolo Bonzini	ac0a48c39a	KVM: x86: rename EMULATE_DO_MMIO The next patch will reuse it for other userspace exits than MMIO, namely debug events. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-29 09:01:14 +02:00
Nadav Har'El	b3897a49e2	KVM: nVMX: Fix read/write to MSR_IA32_FEATURE_CONTROL Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment. This patch simulate this MSR in nested_vmx and the default value is 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock bit, write to it will cause #GP(0). Another QEMU patch is also needed to handle emulation of reset and migration. Reset to vCPU should clear this MSR and migration should reserve value of it. This patch is based on Nadav's previous commit. http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478 Signed-off-by: Nadav Har'El <nyh@math.technion.ac.il> Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-07-18 12:29:29 +02:00
Mathias Krause	6b61edf765	KVM: x86: Drop useless cast Void pointers don't need no casting, drop it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-07-18 12:29:29 +02:00

1 2 3 4 5 ...

885 Commits