OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alexey Kardashevskiy	121f80ba68	KVM: PPC: VFIO: Add in-kernel acceleration for VFIO This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:26 +10:00
Marc Zyngier	cffcd9df10	KVM: arm/arm64: vgic-v3: Fix off-by-one LR access When iterating over the used LRs, be careful not to try to access an unused LR, or even an unimplemented one if you're unlucky... Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-19 17:28:38 +02:00
Marc Zyngier	ff567614d5	KVM: arm/arm64: vgic-v3: De-optimize VMCR save/restore when emulating a GICv2 When emulating a GICv2-on-GICv3, special care must be taken to only save/restore VMCR_EL2 when ICC_SRE_EL1.SRE is cleared. Otherwise, all Group-0 interrupts end-up being delivered as FIQ, which is probably not what the guest expects, as demonstrated here with an unhappy EFI: FIQ Exception at 0x000000013BD21CC4 This means that we cannot perform the load/put trick when dealing with VMCR_EL2 (because the host has SRE set), and we have to deal with it in the world-switch. Fortunately, this is not the most common case (modern guests should be able to deal with GICv3 directly), and the performance is not worse than what it was before the VMCR optimization. Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-19 17:28:38 +02:00
David Hildenbrand	8c6b7828c2	KVM: x86: cleanup return handling in setup_routing_entry() Let's drop the goto and return the error value directly. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:14 +02:00
David Hildenbrand	993225adf4	KVM: x86: rename kvm_vcpu_request_scan_ioapic() Let's rename it into a proper arch specific callback. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:14 +02:00
David Hildenbrand	1df6ddede1	KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP Avoid races between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP by taking the kvm->lock when setting up routes. If KVM_CREATE_IRQCHIP fails, KVM_SET_GSI_ROUTING could have already set up routes pointing at pic/ioapic, being silently removed already. Also, as a side effect, this patch makes sure that KVM_SET_GSI_ROUTING and KVM_CAP_SPLIT_IRQCHIP cannot run in parallel. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-12 20:17:13 +02:00
Christoffer Dall	3dbbdf7863	KVM: arm/arm64: Report PMU overflow interrupts to userspace irqchip When not using an in-kernel VGIC, but instead emulating an interrupt controller in userspace, we should report the PMU overflow status to that userspace interrupt controller using the KVM_CAP_ARM_USER_IRQ feature. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:39 -07:00
Alexander Graf	d9e1397783	KVM: arm/arm64: Support arch timers with a userspace gic If you're running with a userspace gic or other interrupt controller (that is no vgic in the kernel), then you have so far not been able to use the architected timers, because the output of the architected timers, which are driven inside the kernel, was a kernel-only construct between the arch timer code and the vgic. This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a side channel on the kvm_run structure, run->s.regs.device_irq_level, to always notify userspace of the timer output levels when using a userspace irqchip. This works by ensuring that before we enter the guest, if the timer output level has changed compared to what we last told userspace, we don't enter the guest, but instead return to userspace to notify it of the new level. If we are exiting, because of an MMIO for example, and the level changed at the same time, the value is also updated and userspace can sample the line as it needs. This is nicely achieved simply always updating the timer_irq_level field after the main run loop. Note that the kvm_timer_update_irq trace event is changed to show the host IRQ number for the timer instead of the guest IRQ number, because the kernel no longer know which IRQ userspace wires up the timer signal to. Also note that this patch implements all required functionality but does not yet advertise the capability. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:38 -07:00
Christoffer Dall	b22e7df2d8	KVM: arm/arm64: Cleanup the arch timer code's irqchip checking Currently we check if we have an in-kernel irqchip and if the vgic was properly implemented several places in the arch timer code. But, we already predicate our enablement of the arm timers on having a valid and initialized gic, so we can simply check if the timers are enabled or not. This also gets rid of the ugly "error that's not an error but used to signal that the timer shouldn't poke the gic" construct we have. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:37 -07:00
Christoffer Dall	8ac76ef4b5	KVM: arm/arm64: vgic: Improve sync_hwstate performance There is no need to call any functions to fold LRs when we don't use any LRs and we don't need to mess with overflow flags, take spinlocks, or prune the AP list if the AP list is empty. Note: list_empty is a single atomic read (uses READ_ONCE) and can therefore check if a list is empty or not without the need to take the spinlock protecting the list. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:49:12 -07:00
Christoffer Dall	0b09b6e519	KVM: arm/arm64: vgic: Don't check vgic_initialized in sync/flush Now when we do an early init of the static parts of the VGIC data structures, we can do things like checking if the AP lists are empty directly without having to explicitly check if the vgic is initialized and reduce a bit of work in our critical path. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:49:11 -07:00
Christoffer Dall	966e014919	KVM: arm/arm64: vgic: Implement early VGIC init functionality Implement early initialization for both the distributor and the CPU interfaces. The basic idea is that even though the VGIC is not functional or not requested from user space, the critical path of the run loop can still call VGIC functions that just won't do anything, without them having to check additional initialization flags to ensure they don't look at uninitialized data structures. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:49:11 -07:00
Christoffer Dall	096f31c436	KVM: arm/arm64: vgic: Get rid of MISR and EISR fields We don't use these fields anymore so let's nuke them completely. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:10 -07:00
Christoffer Dall	b6095b084d	KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state Now when we don't look at the MISR and EISR values anymore, we can get rid of the logic to save them in the GIC save/restore code. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:09 -07:00
Christoffer Dall	af0614991a	KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation Since we always read back the LRs that we wrote to the guest and the MISR and EISR registers simply provide a summary of the configuration of the bits in the LRs, there is really no need to read back those status registers and process them. We might as well just signal the notifyfd when folding the LR state and save some cycles in the process. We now clear the underflow bit in the fold_lr_state functions as we only need to clear this bit if we had used all the LRs, so this is as good a place as any to do that work. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:49:07 -07:00
Christoffer Dall	90cac1f52a	KVM: arm/arm64: vgic: Only set underflow when actually out of LRs We currently assume that all the interrupts in our AP list will be queued to LRs, but that's not necessarily the case, because some of them could have been migrated away to different VCPUs and only the VCPU thread itself can remove interrupts from its AP list. Therefore, slightly change the logic to only setting the underflow interrupt when we actually run out of LRs. As it turns out, this allows us to further simplify the handling in vgic_sync_hwstate in later patches. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:45:32 -07:00
Christoffer Dall	00dafa0fcf	KVM: arm/arm64: vgic: Get rid of live_lrs There is no need to calculate and maintain live_lrs when we always populate the lowest numbered LRs first on every entry and clear all LRs on every exit. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-04-09 07:45:31 -07:00
Shih-Wei Li	f6769581e9	KVM: arm/arm64: vgic: Avoid flushing vgic state when there's no pending IRQ We do not need to flush vgic states in each world switch unless there is pending IRQ queued to the vgic's ap list. We can thus reduce the overhead by not grabbing the spinlock and not making the extra function call to vgic_flush_lr_state. Note: list_empty is a single atomic read (uses READ_ONCE) and can therefore check if a list is empty or not without the need to take the spinlock protecting the list. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:45:31 -07:00
Christoffer Dall	328e566479	KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put We don't have to save/restore the VMCR on every entry to/from the guest, since on GICv2 we can access the control interface from EL1 and on VHE systems with GICv3 we can access the control interface from KVM running in EL2. GICv3 systems without VHE becomes the rare case, which has to save/restore the register on each round trip. Note that userspace accesses may see out-of-date values if the VCPU is running while accessing the VGIC state via the KVM device API, but this is already the case and it is up to userspace to quiesce the CPUs before reading the CPU registers from the GIC for an up-to-date view. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-09 07:45:22 -07:00
Paolo Bonzini	4b4357e025	kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-07 16:49:01 +02:00
Paolo Bonzini	3042255899	kvm: make KVM_CAP_COALESCED_MMIO architecture agnostic Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-07 16:49:00 +02:00
Paolo Bonzini	ad6260da1e	KVM: x86: drop legacy device assignment Legacy device assignment has been deprecated since 4.2 (released 1.5 years ago). VFIO is better and everyone should have switched to it. If they haven't, this should convince them. :) Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-07 16:49:00 +02:00
Radim Krčmář	6fd6410311	KVM/ARM Fixes for v4.11-rc6 Fixes include: - Fix a problem with GICv3 userspace save/restore - Clarify GICv2 userspace save/restore ABI - Be more careful in clearing GIC LRs - Add missing synchronization primitive to our MMU handling code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJY5MItAAoJEEtpOizt6ddy4mUH/1Z2rt2mUAYFQpWD/vy9WMxf zJKMtcLlZZGjeU78zFfWuOxEo1bbDO+tOTV1docNnY8xjyszCZ5XKOqMeo2a7Vfh 1QYHxJTOmgxcRmMsOnJpqUXhhYm9hDxrbU88U/wvoNllLjWBea01ZXiJbWFPBssT jrdtcCVstDGp3x3D91RgYNNzj9jNw80RBekACZZwYokDRpBZyUb8DYKfUgABFEKT UPiHrxb8UOVqvbCuXMBNzhUZcuMoAh3oY02R9sV7u1QOXAJYfRV4fOV12fIcYbHf tnyU8cCxEkSI1pHrpVG6SStcMt8yznQ+UPo0okQNBJXim2yI8+QKHtQlvx7Tjo8= =tPDd -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm From: Christoffer Dall <cdall@linaro.org> KVM/ARM Fixes for v4.11-rc6 Fixes include: - Fix a problem with GICv3 userspace save/restore - Clarify GICv2 userspace save/restore ABI - Be more careful in clearing GIC LRs - Add missing synchronization primitive to our MMU handling code	2017-04-05 16:27:47 +02:00
Christoffer Dall	6d56111c92	KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI As an oversight, for GICv2, we accidentally export the GICC_PMR register in the format of the GICH_VMCR.VMPriMask field in the lower 5 bits of a word, meaning that userspace must always use the lower 5 bits to communicate with the KVM device and must shift the value left by 3 places to obtain the actual priority mask level. Since GICv3 supports the full 8 bits of priority masking in the ICH_VMCR, we have to fix the value we export when emulating a GICv2 on top of a hardware GICv3 and exporting the emulated GICv2 state to userspace. Take the chance to clarify this aspect of the ABI. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>	2017-04-04 14:33:59 +02:00
Christoffer Dall	5b0d2cc280	KVM: arm64: Ensure LRs are clear when they should be We currently have some code to clear the list registers on GICv3, but we never call this code, because the caller got nuked when removing the old vgic. We also used to have a similar GICv2 part, but that got lost in the process too. Let's reintroduce the logic for GICv2 and call the logic when we initialize the use of hypervisors on the CPU, for example when first loading KVM or when exiting a low power state. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-04-04 14:33:58 +02:00
Herongguang (Stephen)	0292e169b2	KVM: pci-assign: do not map smm memory slot pages in vt-d page tables or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when destroy VM. This is consistent with current vfio implementation. Signed-off-by: herongguang <herongguang.he@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-28 10:08:54 +02:00
David Hildenbrand	90db10434b	KVM: kvm_io_bus_unregister_dev() should never fail No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: `e93f8a0f82` ("KVM: convert io_bus to SRCU") Cc: stable@vger.kernel.org # 3.4+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-23 19:02:25 +01:00
Ard Biesheuvel	63d7c6afc5	arm: kvm: move kvm_vgic_global_state out of .text section The kvm_vgic_global_state struct contains a static key which is written to by jump_label_init() at boot time. So in preparation of making .text regions truly (well, almost truly) read-only, mark kvm_vgic_global_state __ro_after_init so it moves to the .rodata section instead. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2017-03-23 13:53:46 +00:00
Peter Xu	df630b8c1e	KVM: x86: clear bus pointer when destroyed When releasing the bus, let's clear the bus pointers to mark it out. If any further device unregister happens on this bus, we know that we're done if we found the bus being released already. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-03-21 17:11:18 +01:00
Andre Przywara	a5e1e6ca94	KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled The ITS spec says that ITS commands are only processed when the ITS is enabled (section 8.19.4, Enabled, bit[0]). Our emulation was not taking this into account. Fix this by checking the enabled state before handling CWRITER writes. On the other hand that means that CWRITER could advance while the ITS is disabled, and enabling it would need those commands to be processed. Fix this case as well by refactoring actual command processing and calling this from both the GITS_CWRITER and GITS_CTLR handlers. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-03-07 15:44:08 +00:00
Jintack Lim	370a0ec181	KVM: arm/arm64: Let vcpu thread modify its own active state Currently, if a vcpu thread tries to change the active state of an interrupt which is already on the same vcpu's AP list, it will loop forever. Since the VGIC mmio handler is called after a vcpu has already synced back the LR state to the struct vgic_irq, we can just let it proceed safely. Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-03-07 14:48:16 +00:00
Marc Zyngier	4dfc050571	KVM: arm/arm64: vgic-v3: Don't pretend to support IRQ/FIQ bypass Our GICv3 emulation always presents ICC_SRE_EL1 with DIB/DFB set to zero, which implies that there is a way to bypass the GIC and inject raw IRQ/FIQ by driving the CPU pins. Of course, we don't allow that when the GIC is configured, but we fail to indicate that to the guest. The obvious fix is to set these bits (and never let them being changed again). Reported-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-03-06 10:30:57 +00:00
Linus Torvalds	2d62e0768d	Second batch of KVM changes for 4.11 merge window PPC: * correct assumption about ASDR on POWER9 * fix MMIO emulation on POWER9 x86: * add a simple test for ioperm * cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) * fix nVMX interrupt delivery * fix some performance counters in the guest And two cleanup patches. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJYuu5qAAoJEED/6hsPKofoRAUH/jkx/KFDcw3FggixysWVgRai iLSbbAZemnSLFSOkOU/t7Bz0fXCUgB0tAcMJd9ow01Dg1zObiTpuUIo6qEPaYHdX gqtUzlHuyECZEcgK0RXS9kDYLrvw7EFocxnDWQfV91qCZSS6nBSSLF3ST1rNV69W mUvcZG+MciDcZUe1lTexoswVTh1m7avvozEnQ5OHnZR9yicoXiadBQjzL6yqWoqf Ml/29zRk5+MvloTudxjkAKm3mh7psW88jNMh37TXbAA7i+Xwl9cU6GLR9mFWstoP 7Ot7ecq9mNAUO3lTIQh7lqvB60LMFznS4IlYK7MbplC3kvJLkfzhTWaN1aGvh90= =cqHo -----END PGP SIGNATURE----- Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more KVM updates from Radim Krčmář: "Second batch of KVM changes for the 4.11 merge window: PPC: - correct assumption about ASDR on POWER9 - fix MMIO emulation on POWER9 x86: - add a simple test for ioperm - cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) - fix nVMX interrupt delivery - fix some performance counters in the guest ... and two cleanup patches" * tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Fix pending events injection x86/kvm/vmx: remove unused variable in segment_base() selftests/x86: Add a basic selftest for ioperm x86/asm: Tidy up TSS limit code kvm: convert kvm.users_count from atomic_t to refcount_t KVM: x86: never specify a sample period for virtualized in_tx_cp counters KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9 KVM: PPC: Book3S HV: Fix software walk of guest process page tables	2017-03-04 11:36:19 -08:00
Ingo Molnar	03441a3482	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h> We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/stat.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:34 +01:00
Ingo Molnar	174cd4b1e5	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:32 +01:00
Ingo Molnar	6e84f31522	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/mm.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. The APIs that are going to be moved first are: mm_alloc() __mmdrop() mmdrop() mmdrop_async_fn() mmdrop_async() mmget_not_zero() mmput() mmput_async() get_task_mm() mm_access() mm_release() Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:28 +01:00
Elena Reshetova	e3736c3eb3	kvm: convert kvm.users_count from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-03-01 17:03:21 +01:00
Vegard Nossum	3fce371bfa	mm: add new mmget() helper Apart from adding the helper function itself, the rest of the kernel is converted mechanically using: git grep -l 'atomic_inc.mm_users' \| xargs sed -i 's/atomic_inc(&$.$->mm_users);/mmget$\1$;/' git grep -l 'atomic_inc.mm_users' \| xargs sed -i 's/atomic_inc(&$.$\.mm_users);/mmget$\&\1$;/' This is needed for a later patch that hooks into the helper, but might be a worthwhile cleanup on its own. (Michal Hocko provided most of the kerneldoc comment.) Link: http://lkml.kernel.org/r/20161218123229.22952-2-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:48 -08:00
Vegard Nossum	f1f1007644	mm: add new mmgrab() helper Apart from adding the helper function itself, the rest of the kernel is converted mechanically using: git grep -l 'atomic_inc.mm_count' \| xargs sed -i 's/atomic_inc(&$.$->mm_count);/mmgrab$\1$;/' git grep -l 'atomic_inc.mm_count' \| xargs sed -i 's/atomic_inc(&$.$\.mm_count);/mmgrab$\&\1$;/' This is needed for a later patch that hooks into the helper, but might be a worthwhile cleanup on its own. (Michal Hocko provided most of the kerneldoc comment.) Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-27 18:43:48 -08:00
Dave Jiang	11bac80004	mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:54 -08:00
Markus Elfring	843574a3ed	KVM: Return an error code only as a constant in kvm_get_dirty_log() * Return an error code without storing it in an intermediate variable. * Delete the local variable "r" and the jump label "out" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-17 12:30:14 +01:00
Markus Elfring	58d6db3491	KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() * Return an error code without storing it in an intermediate variable. * Delete the local variable "r" and the jump label "out" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-17 12:30:06 +01:00
Markus Elfring	f6a3b168e5	KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() * Return directly after a call of the function "copy_from_user" failed in a case block. This issue was detected by using the Coccinelle software. * Delete the jump label "out" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-17 12:29:42 +01:00
Cao, Lei	bbd6411513	KVM: Support vCPU-based gfn->hva cache Provide versions of struct gfn_to_hva_cache functions that take vcpu as a parameter instead of struct kvm. The existing functions are not needed anymore, so delete them. This allows dirty pages to be logged in the vcpu dirty ring, instead of the global dirty ring, for ring-based dirty memory tracking. Signed-off-by: Lei Cao <lei.cao@stratus.com> Message-Id: <CY1PR08MB19929BD2AC47A291FD680E83F04F0@CY1PR08MB1992.namprd08.prod.outlook.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-16 18:42:46 +01:00
Paolo Bonzini	4bd518f159	KVM: use separate generations for each address space This will make it easier to support multiple address spaces in kvm_gfn_to_hva_cache_init. Instead of having to check the address space id, we can keep on checking just the generation number. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-16 18:42:41 +01:00
Paolo Bonzini	5a2d4365d2	KVM: only retrieve memslots once when initializing cache This will make it a bit simpler to handle multiple address spaces in gfn_to_hva_cache. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-16 18:42:33 +01:00
Jintack Lim	7b6b46311a	KVM: arm/arm64: Emulate the EL1 phys timer registers Emulate read and write operations to CNTP_TVAL, CNTP_CVAL and CNTP_CTL. Now VMs are able to use the EL1 physical timer. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:37 +00:00
Jintack Lim	f242adaf0c	KVM: arm/arm64: Set up a background timer for the physical timer emulation Set a background timer for the EL1 physical timer emulation while VMs are running, so that VMs get the physical timer interrupts in a timely manner. Schedule the background timer on entry to the VM and cancel it on exit. This would not have any performance impact to the guest OSes that currently use the virtual timer since the physical timer is always not enabled. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:36 +00:00
Jintack Lim	fb280e9757	KVM: arm/arm64: Set a background timer to the earliest timer expiration When scheduling a background timer, consider both of the virtual and physical timer and pick the earliest expiration time. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:35 +00:00
Jintack Lim	58e0c9732a	KVM: arm/arm64: Update the physical timer interrupt level Now that we maintain the EL1 physical timer register states of VMs, update the physical timer interrupt level along with the virtual one. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:35 +00:00
Jintack Lim	a91d18551e	KVM: arm/arm64: Initialize the emulated EL1 physical timer Initialize the emulated EL1 physical timer with the default irq number. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:34 +00:00
Jintack Lim	9171fa2e09	KVM: arm/arm64: Decouple kvm timer functions from virtual timer Now that we have a separate structure for timer context, make functions generic so that they can work with any timer context, not just the virtual timer context. This does not change the virtual timer functionality. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:33 +00:00
Jintack Lim	90de943a43	KVM: arm/arm64: Move cntvoff to each timer context Make cntvoff per each timer context. This is helpful to abstract kvm timer functions to work with timer context without considering timer types (e.g. physical timer or virtual timer). This also would pave the way for ever doing adjustments of the cntvoff on a per-CPU basis if that should ever make sense. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:33 +00:00
Jintack Lim	fbb4aeec5f	KVM: arm/arm64: Abstract virtual timer context into separate structure Abstract virtual timer context into a separate structure and change all callers referring to timer registers, irq state and so on. No change in functionality. This is about to become very handy when adding the EL1 physical timer. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:32 +00:00
Shanker Donthineni	0bdbf3b071	KVM: arm/arm64: vgic: Stop injecting the MSI occurrence twice The IRQFD framework calls the architecture dependent function twice if the corresponding GSI type is edge triggered. For ARM, the function kvm_set_msi() is getting called twice whenever the IRQFD receives the event signal. The rest of the code path is trying to inject the MSI without any validation checks. No need to call the function vgic_its_inject_msi() second time to avoid an unnecessary overhead in IRQ queue logic. It also avoids the possibility of VM seeing the MSI twice. Simple fix, return -1 if the argument 'level' value is zero. Cc: stable@vger.kernel.org Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-02-08 15:13:14 +00:00
Christoffer Dall	11710dec8a	KVM: arm/arm64: Remove kvm_vgic_inject_mapped_irq The only benefit of having kvm_vgic_inject_mapped_irq separate from kvm_vgic_inject_irq is that we pass a boolean that we use for error checking on the injection path. While this could potentially help in some aspect of robustness, it's also a little bit of a defensive move, and arguably callers into the vgic should have make sure they have marked their virtual IRQs as mapped if required. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-02-01 11:56:35 +01:00
Vijaya Kumar K	e96a006cb0	KVM: arm/arm64: vgic: Implement KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO ioctl Userspace requires to store and restore of line_level for level triggered interrupts using ioctl KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-30 13:47:29 +00:00
Vijaya Kumar K	d017d7b0bd	KVM: arm/arm64: vgic: Implement VGICv3 CPU interface access VGICv3 CPU interface registers are accessed using KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed as 64-bit. The cpu MPIDR value is passed along with register id. It is used to identify the cpu for registers access. The VM that supports SEIs expect it on destination machine to handle guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility. Similarly, VM that supports Affinity Level 3 that is required for AArch64 mode, is required to be supported on destination machine. Hence checked for ICC_CTLR_EL1.A3V compatibility. The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC CPU registers for AArch64. For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but APIs are not implemented. Updated arch/arm/include/uapi/asm/kvm.h with new definitions required to compile for AArch32. The version of VGIC v3 specification is defined here Documentation/virtual/kvm/devices/arm-vgic-v3.txt Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-30 13:47:25 +00:00
Vijaya Kumar K	5fb247d79c	KVM: arm/arm64: vgic: Introduce VENG0 and VENG1 fields to vmcr struct ICC_VMCR_EL2 supports virtual access to ICC_IGRPEN1_EL1.Enable and ICC_IGRPEN0_EL1.Enable fields. Add grpen0 and grpen1 member variables to struct vmcr to support read and write of these fields. Also refactor vgic_set_vmcr and vgic_get_vmcr() code. Drop ICH_VMCR_CTLR_SHIFT and ICH_VMCR_CTLR_MASK macros and instead use ICH_VMCR_EOI* and ICH_VMCR_CBPR* macros. Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-30 13:47:21 +00:00
Vijaya Kumar K	94574c9488	KVM: arm/arm64: vgic: Add distributor and redistributor access VGICv3 Distributor and Redistributor registers are accessed using KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_REDIST_REGS with KVM_SET_DEVICE_ATTR and KVM_GET_DEVICE_ATTR ioctls. These registers are accessed as 32-bit and cpu mpidr value passed along with register offset is used to identify the cpu for redistributor registers access. The version of VGIC v3 specification is defined here Documentation/virtual/kvm/devices/arm-vgic-v3.txt Also update arch/arm/include/uapi/asm/kvm.h to compile for AArch32 mode. Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-30 13:47:07 +00:00
Vijaya Kumar K	2df903a89a	KVM: arm/arm64: vgic: Implement support for userspace access Read and write of some registers like ISPENDR and ICPENDR from userspace requires special handling when compared to guest access for these registers. Refer to Documentation/virtual/kvm/devices/arm-vgic-v3.txt for handling of ISPENDR, ICPENDR registers handling. Add infrastructure to support guest and userspace read and write for the required registers Also moved vgic_uaccess from vgic-mmio-v2.c to vgic-mmio.c Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-30 13:47:02 +00:00
Christoffer Dall	10f92c4c53	KVM: arm/arm64: vgic: Add debugfs vgic-state file Add a file to debugfs to read the in-kernel state of the vgic. We don't do any locking of the entire VGIC state while traversing all the IRQs, so if the VM is running the user/developer may not see a quiesced state, but should take care to pause the VM using facilities in user space for that purpose. We also don't support LPIs yet, but they can be added easily if needed. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-01-25 13:50:03 +01:00
Christoffer Dall	8694e4da66	KVM: arm/arm64: Remove struct vgic_irq pending field One of the goals behind the VGIC redesign was to get rid of cached or intermediate state in the data structures, but we decided to allow ourselves to precompute the pending value of an IRQ based on the line level and pending latch state. However, this has now become difficult to base proper GICv3 save/restore on, because there is a potential to modify the pending state without knowing if an interrupt is edge or level configured. See the following post and related message for more background: https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023195.html This commit gets rid of the precomputed pending field in favor of a function that calculates the value when needed, irq_is_pending(). The soft_pending field is renamed to pending_latch to represent that this latch is the equivalent hardware latch which gets manipulated by the input signal for edge-triggered interrupts and when writing to the SPENDR/CPENDR registers. After this commit save/restore code should be able to simply restore the pending_latch state, line_level state, and config state in any order and get the desired result. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2017-01-25 13:26:13 +01:00
Radim Krčmář	1b1973ef9a	KVM/ARM updates for 4.10-rc4 - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYeLkqAAoJECPQ0LrRPXpDpasP/iynAP2/28LNQ7L8r3be0m7N PCZv3gpsTtu1cmTlpiPQVPqzBUVPXQMJJAnDIhA7e1YbYfYJUIhEirVUQ56oT+lY nmnoquzNFGUNkYRHUj8RTeQ8ISuP/iPlWOVlSh4n2rbrHFAxX1+W1xXdOedbdFJp oFDRR6DCB4qdIxt7dtTcfObhwVlro8WDOfClNqiUDUdhxfDBD7mEPYNO2J2D0/ca 6YQAkxPzP/MJgKl0mVRjjxhkoNSt6lP0uyU65X67g7i2ZlFRyU3PvqwPm3JOY++V Uuff2ud81D5yYwf0+6sA+i+707jezgsoiuFnwUFQVjW/9rpIK7W1gzqht2VWZNgJ NjhFPtNhOFkam/sxPFiSaTNyhUNgrg6C/qPDBF0pPlyGd0mtrvJdVG4y/R9YGSB3 JoiwaFdoi9buEakpRbhu4y5bhcDZZPAo+l+2OanpeE6y3zypqTsJRop8H/qhnv8R FFzqycBjeEcVo9ZKIpqQVEnp0njBcHKmqblMswgSgGKe3816iGC+kj4oMm7J57Yq vy1OKQMuur+rSRyH7LpVN1HYiq9yBHdbHsXxUjEWSD5dtPAGdOhAP0kAzXlPbEot WkVQO2uEV3EKs0UQjWkLKNVJKh/pPaWH3z8ENRlf+pASzvO+UYRmY3sjfDcg3AsI 7Lxc+CVJJ4ExJbFe/OTk =7cIH -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for 4.10-rc4 - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails	2017-01-17 15:04:59 +01:00
Marc Zyngier	1193e6aeec	KVM: arm/arm64: vgic: Fix deadlock on error handling Dmitry Vyukov reported that the syzkaller fuzzer triggered a deadlock in the vgic setup code when an error was detected, as the cleanup code tries to take a lock that is already held by the setup code. The fix is to avoid retaking the lock when cleaning up, by telling the cleanup function that we already hold it. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-13 11:19:35 +00:00
Jintack Lim	488f94d721	KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems Current KVM world switch code is unintentionally setting wrong bits to CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical timer. Bit positions of CNTHCTL_EL2 are changing depending on HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is not set, but they are 11th and 10th bits respectively when E2H is set. In fact, on VHE we only need to set those bits once, not for every world switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE == 1, which makes those bits have no effect for the host kernel execution. So we just set those bits once for guests, and that's it. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-13 11:19:25 +00:00
Christoffer Dall	63e41226af	KVM: arm/arm64: Fix occasional warning from the timer work function When a VCPU blocks (WFI) and has programmed the vtimer, we program a soft timer to expire in the future to wake up the vcpu thread when appropriate. Because such as wake up involves a vcpu kick, and the timer expire function can get called from interrupt context, and the kick may sleep, we have to schedule the kick in the work function. The work function currently has a warning that gets raised if it turns out that the timer shouldn't fire when it's run, which was added because the idea was that in that case the work should never have been cancelled. However, it turns out that this whole thing is racy and we can get spurious warnings. The problem is that we clear the armed flag in the work function, which may run in parallel with the kvm_timer_unschedule->timer_disarm() call. This results in a possible situation where the timer_disarm() call does not call cancel_work_sync(), which effectively synchronizes the completion of the work function with running the VCPU. As a result, the VCPU thread proceeds before the work function completees, causing changes to the timer state such that kvm_timer_should_fire(vcpu) returns false in the work function. All we do in the work function is to kick the VCPU, and an occasional rare extra kick never harmed anyone. Since the race above is extremely rare, we don't bother checking if the race happens but simply remove the check and the clearing of the armed flag from the work function. Reported-by: Matthias Brugger <mbrugger@suse.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2017-01-13 11:15:30 +00:00
Wanpeng Li	4f3dbdf47e	KVM: eventfd: fix NULL deref irqbypass consumer Reported syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] PGD 0 Oops: 0002 [#1] SMP CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1 Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm] task: ffff9bbe0dfbb900 task.stack: ffffb61802014000 RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] Call Trace: irqfd_shutdown+0x66/0xa0 [kvm] process_one_work+0x16b/0x480 worker_thread+0x4b/0x500 kthread+0x101/0x140 ? process_one_work+0x480/0x480 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x25/0x30 RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20 CR2: 0000000000000008 The syzkaller folks reported a NULL pointer dereference that due to unregister an consumer which fails registration before. The syzkaller creates two VMs w/ an equal eventfd occasionally. So the second VM fails to register an irqbypass consumer. It will make irqfd as inactive and queue an workqueue work to shutdown irqfd and unregister the irqbypass consumer when eventfd is closed. However, the second consumer has been initialized though it fails registration. So the token(same as the first VM's) is taken to unregister the consumer through the workqueue, the consumer of the first VM is found and unregistered, then NULL deref incurred in the path of deleting consumer from the consumers list. This patch fixes it by making irq_bypass_register/unregister_consumer() looks for the consumer entry based on consumer pointer itself instead of token matching. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-12 14:42:34 +01:00
Linus Torvalds	3ddc76dfc7	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t	2016-12-25 14:30:04 -08:00
Linus Torvalds	b272f732f8	Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine	2016-12-25 14:05:56 -08:00
Thomas Gleixner	a5a1d1c291	clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>	2016-12-25 11:04:12 +01:00
Thomas Gleixner	73c1b41e63	cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-25 10:47:44 +01:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Lorenzo Stoakes	8b7457ef9a	mm: unexport __get_user_pages_unlocked() Unexport the low-level __get_user_pages_unlocked() function and replaces invocations with calls to more appropriate higher-level functions. In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with get_user_pages_unlocked() since we can now pass gup_flags. In async_pf_execute() and process_vm_rw_single_vec() we need to pass different tsk, mm arguments so get_user_pages_remote() is the sane replacement in these cases (having added manual acquisition and release of mmap_sem.) Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH flag. However, this flag was originally silently dropped by commit `1e9877902d` ("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been unintentional and reintroducing it is therefore not an issue. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:09 -08:00
Linus Torvalds	93173b5bf2	Small release, the most interesting stuff is x86 nested virt improvements. x86: userspace can now hide nested VMX features from guests; nested VMX can now run Hyper-V in a guest; support for AVX512_4VNNIW and AVX512_FMAPS in KVM; infrastructure support for virtual Intel GPUs. PPC: support for KVM guests on POWER9; improved support for interrupt polling; optimizations and cleanups. s390: two small optimizations, more stuff is in flight and will be in 4.11. ARM: support for the GICv3 ITS on 32bit platforms. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJYTkP0FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D lZIH/iT1n9OQXcuTpYYnQhuCenzI3GZZOIMTbCvK2i5bo0FIJKxVn0EiAAqZSXvO nO185FqjOgLuJ1AD1kJuxzye5suuQp4HIPWWgNHcexLuy43WXWKZe0IQlJ4zM2Xf u31HakpFmVDD+Cd1qN3yDXtDrRQ79/xQn2kw7CWb8olp+pVqwbceN3IVie9QYU+3 gCz0qU6As0aQIwq2PyalOe03sO10PZlm4XhsoXgWPG7P18BMRhNLTDqhLhu7A/ry qElVMANT7LSNLzlwNdpzdK8rVuKxETwjlc1UP8vSuhrwad4zM2JJ1Exk26nC2NaG D0j4tRSyGFIdx6lukZm7HmiSHZ0= =mkoB -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...	2016-12-13 15:47:02 -08:00
Linus Torvalds	edc5f445a6	VFIO updates for v4.10-rc1 - VFIO updates for v4.10 primarily include a new Mediated Device interface, which essentially allows software defined devices to be exposed to users through VFIO. The host vendor driver providing this virtual device polices, or mediates user access to the device. These devices often incorporate portions of real devices, for instance the primary initial users of this interface expose vGPUs which allow the user to map mediated devices, or mdevs, to a portion of a physical GPU. QEMU composes these mdevs into PCI representations using the existing VFIO user API. This enables both Intel KVM-GT support, which is also expected to arrive into Linux mainline during the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O devices (aka CCW devices) for s390 virtualization support. (Kirti Wankhede, Neo Jia) - Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin) - Fixes to VFIO capability chain handling (Eric Auger) - Error handling fixes for fallout from mdev (Christophe JAILLET) - Notifiers to expose struct kvm to mdev vendor drivers (Jike Song) - type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJYSyCtAAoJECObm247sIsi+rIP/3Q/GE3zaDdz1iKQK/c/qhs6 0Pl45opAqw4wCJDCZIhRmoHmsCaT4KkeJKU1fiYc0mKJhW11HfA4DTFwzBqrHBj7 7wPjHTaWwlFRHCYVCWYEp5g9UASyD8ubWGyZKzqIXELFoAvwuBL3SULNj4neJKKR rPcHTVxJ7laYIjHFzuNUi/MWEdjxPT9oJn8Bm9mhISwPglIMU9nkIR20ChaSeFJb MiFqFW7BcvkVyqupjpksM9DodpNZu+3uSMVtgASNVNbilf0FXJr0d8RCbeSxTIfm rEsZ5+0PrklhCtmRRl5EB+tNawgaism8wAF74KIO//76vE02Usrxb0b5mTIZ8TiN 6/Z+WID5D+ZRt8hp9hJIJmGE/sM/odH4r174dPaiEkMvOB9ksDIPkzgbtDbVY40c DACb7/n3ZZA0an2Eq2HEx/BqTOvt9sgu367KVvhuoIArQcb5SM94GT03Dv+pKnax Cxmro2oaWmAV3IS0vNzbCIddsFqlPjkFIYxjtzBy+bVLg2RN3STyaSL6cwJsydSU KLcCPiYtovczKFj7RJlgVlqh5/8uZ7SEffTkIggehdnVPAfDlK9p9BYqLCgAoWpN vwWidM3qOIjooRXQgxUwJgJsl4MLRMoA/gFP4iHbqOgIAGtUDRHuQ4muvkf+LLxg wpgfXsBQNRuVcZHBUEVe =gc6j -----END PGP SIGNATURE----- Merge tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - VFIO updates for v4.10 primarily include a new Mediated Device interface, which essentially allows software defined devices to be exposed to users through VFIO. The host vendor driver providing this virtual device polices, or mediates user access to the device. These devices often incorporate portions of real devices, for instance the primary initial users of this interface expose vGPUs which allow the user to map mediated devices, or mdevs, to a portion of a physical GPU. QEMU composes these mdevs into PCI representations using the existing VFIO user API. This enables both Intel KVM-GT support, which is also expected to arrive into Linux mainline during the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O devices (aka CCW devices) for s390 virtualization support. (Kirti Wankhede, Neo Jia) - Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin) - Fixes to VFIO capability chain handling (Eric Auger) - Error handling fixes for fallout from mdev (Christophe JAILLET) - Notifiers to expose struct kvm to mdev vendor drivers (Jike Song) - type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia) * tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits) vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP. vfio iommu type1: WARN_ON if notifier block is not unregistered kvm: set/clear kvm to/from vfio_group when group add/delete vfio: support notifier chain in vfio_group vfio: vfio_register_notifier: classify iommu notifier vfio: Fix handling of error returned by 'vfio_group_get_from_dev()' vfio: fix vfio_info_cap_add/shift vfio/pci: Drop unnecessary pcibios_err_to_errno() MAINTAINERS: Add entry VFIO based Mediated device drivers docs: Sample driver to demonstrate how to use Mediated device framework. docs: Sysfs ABI for mediated device framework docs: Add Documentation for Mediated devices vfio: Define device_api strings vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare() vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare() vfio: Introduce vfio_set_irqs_validate_and_prepare() vfio_pci: Update vfio_pci to use vfio_info_add_capability() vfio: Introduce common function to add capabilities vfio iommu: Add blocking notifier to notify DMA_UNMAP ...	2016-12-13 09:23:56 -08:00
Paolo Bonzini	f673b5b2a6	KVM/ARM updates for 4.10: - Support for the GICv3 ITS on 32bit platforms - A handful of timer and GIC emulation fixes - A PMU architecture fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYStI0AAoJECPQ0LrRPXpD6kcP/0J+fynLo/uhe3VAP7pZ0fH5 dFmvcgZaHQ6wpWgkHYbyuAkZ2tiQfthylErjt9Xay2qf3f0BZScsNKSkTOmVTOJH NO+4yo7YDIbRbQO3h+QX2YB3uBqdZvn6eRLCDWNLwSa/GkNmLGvhcorQer0GduCl qnsRRrNIewzSYI+U3821jVUjLgXuBuGoFt0yT/197ZBRIrowNJ4vqAvaqVaLQ4jt aOd+aCPKCaatkeewEo6Es4lX86JOytpxtVfNpRe6/gSr1mK2fHAfycQ5Txkl7oTX T/vsYUusYDSJbiz7PUMFBfNYvVijBY8QCtm6yJZHQNg6q25r3pjn//3BiuSDf4Dz o0DDMoFPjEi23myfGI91oeL9Svbtk06ERGyN7MY2vMNtORrwhmgNiSfIsqI9V0d8 Slru3REMZg+ZbY6rgyJZa9/09vlwKfqZpkwJlfQkJO9tsXn4WwwdyvwIXmaH9p5X mqnjgbIMRipBs5Teedb++pC5XQcbC8ed2KMEBXlgORDm6fC0Pz/q623tVRYhIm4B 4YKHI1A8I8XaYd0VJkZOns2Uq7/Uwc2j5wGWRIa0IwB6LXlzNw4kbD+omj0Mmo0V Fxio610jyTfrPidx/XzO0zsEzVW794Si8S4F1nFShdkk1NuzClVnQzce5TA8K3Zu cCUKISR4oi5IWVcimDQt =zxXl -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.10: - Support for the GICv3 ITS on 32bit platforms - A handful of timer and GIC emulation fixes - A PMU architecture fix	2016-12-12 07:29:39 +01:00
Ingo Molnar	6f38751510	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-11 13:07:13 +01:00
Christoffer Dall	8e1a0476f8	KVM: arm/arm64: timer: Check for properly initialized timer on init When the arch timer code fails to initialize (for example because the memory mapped timer doesn't work, which is currently seen with the AEM model), then KVM just continues happily with a final result that KVM eventually does a NULL pointer dereference of the uninitialized cycle counter. Check directly for this in the init path and give the user a reasonable error in this case. Cc: Shih-Wei Li <shihwei@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-12-09 15:47:00 +00:00
Andre Przywara	266068eabb	KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs The GICv2 spec says in section 4.3.12 that a "CPU targets field bit that corresponds to an unimplemented CPU interface is RAZ/WI." Currently we allow the guest to write any value in there and it can read that back. Mask the written value with the proper CPU mask to be spec compliant. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-12-09 15:46:59 +00:00
Jike Song	2fc1bec158	kvm: set/clear kvm to/from vfio_group when group add/delete Sometimes users need to be aware when a vfio_group attaches to a KVM or detaches from it. KVM already calls get/put method from vfio to manipulate the vfio_group reference, it can notify vfio_group in a similar way. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-12-01 10:42:17 -07:00
Dan Carpenter	a0f1d21c1c	KVM: use after free in kvm_ioctl_create_device() We should move the ops->destroy(dev) after the list_del(&dev->vm_node) so that we don't use "dev" after freeing it. Fixes: `a28ebea2ad` ("KVM: Protect device ops->create and list_add with kvm->lock") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-12-01 16:10:50 +01:00
Radim Krčmář	0f4828a1da	KVM/ARM updates for v4.9-rc7 - Do not call kvm_notify_acked for PPIs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYNzGYAAoJECPQ0LrRPXpDwC8P/3SlsYK9ickZfxoX05tfwbmy H5IVmMvnhqQwi2ALe1PycKU9a9c5MISEvFyzGtr/SVkwZdiGRztGCQsYgxAyL0Tr mJDttavNU8B9YKC/d+pNNl18uue1Ny297aPDwL6eo3i9s7MX7EZRdRG3U0MiGlbB MFVCOLCAd8eUGI68eE5CsRC5+3OFqbkh2JlgtZJPV1BDu/K1ojViijUnpv/CJX52 8g8qKU9xTgHnd1pTAaE22u5+odgOvOa62rGqVAF8T9eOMpVHxUDeAvzaFLXQAgty tVwYlEtoglLKXFa/B0dqBX639J8hLKBC3gBM/1sEbUU4Ii026iPuCbWLjDGju7Ra ggaeFp9X8IK9wcwyT88yUAFLwk/neApm5YemzdD7VWSb/5Np3mJpuIH7McwoJp3p cvXrTV4P+XBSYgYSdBsGKSQo38dynW8m8Gqq3D5DEAJc33P/kvwBMFRuzj/F3GwZ 5w1uTDJx+tTdGhpEvxY+Mwb17XDid9WPKyYdgI5Xy662g904m7WmQvP08VezxVcw woMlqqSpJvsNxOphj3xRb00W61MTu7zcfYQlwiDwtEqXgIPlpk3tBZO651eMMaSF bQmP2qPDKw5UQHtRfcDq4SmcyvaDn6j9BMYCR/XvXmtlFi7+zyglhkIn+wkJF0Dz J/hmZNTPVN6rtRv9wY/2 =1IXI -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc7 - Do not call kvm_notify_acked for PPIs	2016-12-01 14:56:34 +01:00
Suraj Jitindar Singh	ec76d819d2	KVM: Export kvm module parameter variables The kvm module has the parameters halt_poll_ns, halt_poll_ns_grow, and halt_poll_ns_shrink. Halt polling was recently added to the powerpc kvm-hv module and these parameters were essentially duplicated for that. There is no benefit to this duplication and it can lead to confusion when trying to tune halt polling. Thus move the definition of these variables to kvm_host.h and export them. This will allow the kvm-hv module to use the same module parameters by accessing these variables, which will be implemented in the next patch, meaning that they will no longer be duplicated. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-11-28 11:48:47 +11:00
Marc Zyngier	8ca18eec2b	KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs When we inject a level triggerered interrupt (and unless it is backed by the physical distributor - timer style), we request a maintenance interrupt. Part of the processing for that interrupt is to feed to the rest of KVM (and to the eventfd subsystem) the information that the interrupt has been EOIed. But that notification only makes sense for SPIs, and not PPIs (such as the PMU interrupt). Skip over the notification if the interrupt is not an SPI. Cc: stable@vger.kernel.org # 4.7+ Fixes: `140b086dd1` ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Fixes: `59529f69f5` ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-24 13:12:07 +00:00
Pan Xinhui	4ec6e86362	kvm: Introduce kvm_write_guest_offset_cached() It allows us to update some status or field of a structure partially. We can also save a kvm_read_guest_cached() call if we just update one fild of the struct regardless of its current value. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-11-22 12:48:07 +01:00
Paolo Bonzini	22583f0d9c	KVM: async_pf: avoid recursive flushing of work items This was reported by syzkaller: [ INFO: possible recursive locking detected ] 4.9.0-rc4+ #49 Not tainted --------------------------------------------- kworker/2:1/5658 is trying to acquire lock: ([ 1644.769018] (&work->work) [< inline >] list_empty include/linux/compiler.h:243 [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511 but task is already holding lock: ([ 1644.769018] (&work->work) [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093 stack backtrace: CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events async_pf_execute ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480 0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27 ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0 Call Trace: ... [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846 [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916 [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951 [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126 [< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841 [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946 [< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731 [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752 [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111 [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096 [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230 [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209 [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 The reason is that kvm_put_kvm is causing the destruction of the VM, but the page fault is still on the ->queue list. The ->queue list is owned by the VCPU, not by the work items, so we cannot just add list_del to the work item. Instead, use work->vcpu to note async page faults that have been resolved and will be processed through the done list. There is no need to flush those. Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-19 19:04:17 +01:00
Radim Krčmář	e5dbc4bf0b	KVM/ARM updates for v4.9-rc6 - Fix handling of the 32bit cycle counter - Fix cycle counter filtering -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYLsbzAAoJECPQ0LrRPXpDXdoQAL4tI3HDNKGP71aNNBrCqmOw WZFYagsTRgpAePctjxkFZAGHmJoQ/SDOeg6qcb0LKTMQ6ZaorV8+MGWOjvpNtQHz ltpdbVUxPCfLzZAUYWyg6PoF5geHrSVHfb+AMShiZePp2/5Rf+9M2MioGz53cDZW UmjmvUYi3LF9lwSqdbGJZtpfEOZp4aNeKLQ6I9Cw65NuVjrJzEJ4cRKCk4id9PlW jeULDNX5EsnKnyjwROyghCV2RITZ7lpgvQr9PGBleZ0k5kEAqN0pxi9gAWA8D2lC uLdBdfFBW9wM31urCFeOMu6S3Ff0v3tquPZK6f2m1Ul+Bii+Kfr5i0U6VfwsvOc6 TRn6r6FiiQV/OXz3GYqHkd7qEGyIPNv7j5Y3OFZo1uN3v60nnkU32NfalBRDCJE4 9Q4SvZ3z5oZ12QYYNaCwwR1g3Xd6wuV4JYH+6Z4JFfazJLQ5zgr123iglhmDAneC Gurmn1GnkgiwXzMaYCRYKXxX/D+Gob6hRCT9OszqqrpgOzlRIIbZcEKua8T9ihnS xDY4+QFwaVsGeWJCjOXPw4wU0l0HUQ+J5u/3DRwv9u0qnW4VBvWCoHHeXxjypqtC Lzw04M8ZH98p0zsN4SX7pXjkkRtcTOnwdW7gVyIbq10kT/ylBvrOaFfiXtuIZCQ2 yD0Qvg/cUs4vWZqhFx2t =cJHy -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc6 - Fix handling of the 32bit cycle counter - Fix cycle counter filtering	2016-11-19 18:02:07 +01:00
Wei Huang	b112c84a6f	KVM: arm64: Fix the issues when guest PMCCFILTR is configured KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured. But this function can't deals with PMCCFILTR correctly because the evtCount bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this function shouldn't return immediately; instead it needs to check further if select_idx is ARMV8_PMU_CYCLE_IDX. Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER blindly to attr.config. Instead it ought to convert the request to the "cpu cycle" event type (i.e. 0x11). To support this patch and to prevent duplicated definitions, a limited set of ARMv8 perf event types were relocated from perf_event.c to asm/perf_event.h. Cc: stable@vger.kernel.org # 4.6+ Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-18 09:06:58 +00:00
Radim Krčmář	813ae37e6a	Merge branch 'x86/cpufeature' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into kvm/next Topic branch for AVX512_4VNNIW and AVX512_4FMAPS support in KVM.	2016-11-16 22:07:36 +01:00
Longpeng(Mike)	fd5ebf99f8	arm/arm64: KVM: Clean up useless code in kvm_timer_enable 1) Since commit:41a54482 changed timer enabled variable to per-vcpu, the correlative comment in kvm_timer_enable is useless now. 2) After the kvm module init successfully, the timecounter is always non-null, so we can remove the checking of timercounter. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-15 11:54:16 +00:00
Vladimir Murzin	2988509dd8	ARM: KVM: Support vGICv3 ITS This patch allows to build and use vGICv3 ITS in 32-bit mode. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-14 10:32:54 +00:00
Vladimir Murzin	e29bd6f267	KVM: arm64: vgic-its: Fix compatibility with 32-bit Evaluate GITS_BASER_ENTRY_SIZE once as an int data (GITS_BASER<n>'s Entry Size is 5-bit wide only), so when used as divider no reference to __aeabi_uldivmod is generated when build for AArch32. Use unsigned long long for GITS_BASER_PAGE_SIZE_* since they are used in conjunction with 64-bit data. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-14 10:32:24 +00:00
Paolo Bonzini	05d36a7dff	KVM/ARM updates for v4.9-rc4 - Kick the vcpu when a pending interrupt becomes pending again - Prevent access to invalid interrupt registers - Invalid TLBs when two vcpus from the same VM share a CPU -----BEGIN PGP SIGNATURE----- iQIyBAABCAAcBQJYHNMTFRxtYXJjLnp5bmdpZXJAYXJtLmNvbQAKCRAj0NC60T16 Q1WDD/9d5KfQ3dWiLtBXbeD3w2K0gXknwLAMsCCAdhgkCdLenxSBjlB7lmVYi1lZ pTnshnR4HC0P3yW3bA78J7LZnUzJg72pq/S5K/om9KylVUdXz9WzQ3u+XyB3KTFW b+viTUK3mqose67UcBSKGfFEWpIOmJ/nZVvWAIaUTg49btxnetKjyhv2Ux744Hm/ Jba3trcA4m8RPJ8Vu6mIfd6gkTXzSkQaN2wGVaEFhCFHOPDCQHjcdspe20Ig9fmY kTXEBe4r0sC+8fXoymEM6TDQFWB8WthIIqfeIJ3FgfoETKrwmyJ23YfLAh49m1cB nFpyy/lr9PNsOjJKXFi84pzx6l8U/CDslnBm5klYTT2kFc3stKbyDtIILvUOwKl8 n9UZSO8NGhOpKscGXLzO/CmIO+wgL15LTsxYsOh3HK7KjzocspQpxyD7pPWN8CUI M2IGLvYMzCaBAOzs6WO4P9xlJRNtUMK8lvAthnBiCeE2Nnu3Oajf8krR4DZmBcQh Q/GOACa1kuBMfqmWNrCVq3UNiFLxxAseShgxq9/E/dNe20daXOnxSaRGdRzTvAQF dRBEtHXdY0qDgLz3tVzBdTTmx3M2k4B4/t+VxnsFFVlvbr0OyOozvFH42tGeTw5t IBoXP9x87+Rpl6P6wW+ICketXQMRmdl40JXNjR96sXN94Y/Z4A== =vj/s -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for v4.9-rc4 - Kick the vcpu when a pending interrupt becomes pending again - Prevent access to invalid interrupt registers - Invalid TLBs when two vcpus from the same VM share a CPU	2016-11-11 11:13:36 +01:00
Linus Torvalds	66cecb6789	One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYG2H5AAoJEL/70l94x66DK/cH/0jEQ3ynuLAd5CKux7JxI/EP msSJh1Xqr4+XhXZnuDpGQWrdsBlxoiqA6PsJrUTtyi4nQCDXlT8g+2MDuvqhWIHz 7vw58j/EMJDCVQzYAbN5VDUfk13uB5aSWTo3M9Rf09v0hU1Ql7z8u4CtKEdLpN5Y LY9bT9fxUmXO7REKP7bdW6ZrDX/hUShYHgMqzXGFMyGBG3ym3a9bggXEzTCD6eNQ ioogQIWqg+icdhta0iLNAwFClPlcKB2/xo4IUuNgrPwGoHFGJN/8+qxT4+sVbp2B v8u1zOXlCFXBcskWE+yRRsGe72+mIzz6QScCyO+5HbhKYVfbE9H7KBlFX9rZZ2c= =IbKx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP	2016-11-04 13:08:05 -07:00
Shih-Wei Li	d42c79701a	KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQs In cases like IPI, we could be queueing an interrupt for a VCPU that is already running and is not about to exit, because the VCPU has entered the VM with the interrupt pending and would not trap on EOI'ing that interrupt. This could result to delays in interrupt deliveries or even loss of interrupts. To guarantee prompt interrupt injection, here we have to try to kick the VCPU. Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-04 17:56:56 +00:00
Andre Przywara	112b0b8f8f	KVM: arm/arm64: vgic: Prevent access to invalid SPIs In our VGIC implementation we limit the number of SPIs to a number that the userland application told us. Accordingly we limit the allocation of memory for virtual IRQs to that number. However in our MMIO dispatcher we didn't check if we ever access an IRQ beyond that limit, leading to out-of-bound accesses. Add a test against the number of allocated SPIs in check_region(). Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which is not implemented on ARM(32). [maz: cleaned-up original patch] Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-11-04 17:56:54 +00:00
Suraj Jitindar Singh	ce35ef27d4	kvm/stats: Update kvm stats to clear on write to their debugfs entry Various kvm vm and vcpu stats are provided via debugfs entries. Currently there is no way to reset these stats back to zero. Add the ability to clear (reset back to zero) these stats on a per stat basis by writing to the debugfs files. Only a write value of 0 is accepted. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-02 21:32:17 +01:00
Paolo Bonzini	36343f6ea7	KVM: fix OOPS on flush_work The conversion done by commit `3706feacd0` ("KVM: Remove deprecated create_singlethread_workqueue") is broken. It flushes a single work item &irqfd->shutdown instead of all of them, and even worse if there is no irqfd on the list then you get a NULL pointer dereference. Revert the virt/kvm/eventfd.c part of that patch; to avoid the deprecated function, just allocate our own workqueue---it does not even have to be unbound---with alloc_workqueue. Fixes: `3706feacd0` Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-26 14:06:51 +02:00
Lorenzo Stoakes	0d73175982	mm: unexport __get_user_pages() This patch unexports the low-level __get_user_pages() function. Recent refactoring of the get_user_pages* functions allow flags to be passed through get_user_pages() which eliminates the need for access to this function from its one user, kvm. We can see that the two calls to get_user_pages() which replace __get_user_pages() in kvm_main.c are equivalent by examining their call stacks: get_user_page_nowait(): get_user_pages(start, 1, flags, page, NULL) __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL, false, flags \| FOLL_TOUCH) __get_user_pages(current, current->mm, start, 1, flags \| FOLL_TOUCH \| FOLL_GET, page, NULL, NULL) check_user_page_hwpoison(): get_user_pages(addr, 1, flags, NULL, NULL) __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL, false, flags \| FOLL_TOUCH) __get_user_pages(current, current->mm, addr, 1, flags \| FOLL_TOUCH, NULL, NULL, NULL) Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-24 19:13:20 -07:00
Lorenzo Stoakes	d4944b0ece	mm: remove write/force parameters from __get_user_pages_unlocked() This removes the redundant 'write' and 'force' parameters from __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-18 14:13:37 -07:00
Radim Krčmář	45ca877ad0	KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJX6rKQAAoJEEtpOizt6ddy8i4H/0bfB1EVukggoL/FfGeds/dg p2FG0oOsggcSBwK7VXUUvVllO7ioUssRCqqkn1e0/bCLtQrN4ex4PqJ3618EHFz/ pLP72hf8Zl33rP3OVtPaDcxzjjKKdf+xGbBIv3AE7x7O5rFZg4lWHeWjy4yuhFv2 Jm+8ul7JCxCMse08Xc90riou4i/jWjyoLadHbAoeX3tR+dVcZyOUZSlgAPI1bS/P rOQi/zkl3bT2R3kh28QuEFTrJ9BVTnmw25BRW8DNr6+CWmR9bpM6y7AGzOwrZ3FZ F1MbsPpN3ogcjvPg2QTYuOoqrwz8NLLHw5pR5YNj84VppjSpSsAhKU7Ug5Uhsr0= =1z/L -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs	2016-09-29 16:01:51 +02:00
Christoffer Dall	0099b7701f	KVM: arm/arm64: vgic: Don't flush/sync without a working vgic If the vgic hasn't been created and initialized, we shouldn't attempt to look at its data structures or flush/sync anything to the GIC hardware. This fixes an issue reported by Alexander Graf when using a userspace irqchip. Fixes: `0919e84c0f` ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Cc: stable@vger.kernel.org Reported-by: Alexander Graf <agraf@suse.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-27 18:57:35 +02:00
Christoffer Dall	6fe407f2d1	KVM: arm64: Require in-kernel irqchip for PMU support If userspace creates a PMU for the VCPU, but doesn't create an in-kernel irqchip, then we end up in a nasty path where we try to take an uninitialized spinlock, which can lead to all sorts of breakages. Luckily, QEMU always creates the VGIC before the PMU, so we can establish this as ABI and check for the VGIC in the PMU init stage. This can be relaxed at a later time if we want to support PMU with a userspace irqchip. Cc: stable@vger.kernel.org Cc: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-27 18:57:07 +02:00
Vladimir Murzin	acda5430be	ARM: KVM: Support vgic-v3 This patch allows to build and use vgic-v3 in 32-bit mode. Unfortunately, it can not be split in several steps without extra stubs to keep patches independent and bisectable. For instance, virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre to be already defined. It is how support has been done: * handle SGI requests from the guest * report configured SRE on access to GICv3 cpu interface from the guest * required vgic-v3 macros are provided via uapi.h * static keys are used to select GIC backend * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with the static inlines Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-22 13:22:21 +02:00
Vladimir Murzin	d7d0a11e44	KVM: arm: vgic: Support 64-bit data manipulation on 32-bit host systems We have couple of 64-bit registers defined in GICv3 architecture, so unsigned long accesses to these registers will only access a single 32-bit part of that regitser. On the other hand these registers can't be accessed as 64-bit with a single instruction like ldrd/strd or ldmia/stmia if we run a 32-bit host because KVM does not support access to MMIO space done by these instructions. It means that a 32-bit guest accesses these registers in 32-bit chunks, so the only thing we need to do is to ensure that extract_bytes() always takes 64-bit data. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-22 13:21:59 +02:00
Vladimir Murzin	e533a37f7b	KVM: arm: vgic: Fix compiler warnings when built for 32-bit Well, this patch is looking ahead of time, but we'll get following compiler warnings as soon as we introduce vgic-v3 to 32-bit world CC arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer': arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow] value = (mpidr & GENMASK(23, 0)) << 32; ^ In file included from ./include/linux/kernel.h:10:0, from ./include/asm-generic/bug.h:13, from ./arch/arm/include/asm/bug.h:59, from ./include/linux/bug.h:4, from ./include/linux/io.h:23, from ./arch/arm/include/asm/arch_gicv3.h:23, from ./include/linux/irqchip/arm-gic-v3.h:411, from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14: arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi': ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow] #define BIT(nr) (1UL << (nr)) ^ arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT' broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT); ^ Let's fix them now. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-22 13:21:48 +02:00
Vladimir Murzin	7a1ff70828	KVM: arm64: vgic-its: Introduce config option to guard ITS specific code By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was introduced to hide everything specific to vgic-v3 from 32-bit world. We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3 will gone, but we don't have support for ITS there yet and we need to continue keeping ITS away. Introduce the new config option to prevent ITS code being build in 32-bit mode when support for vgic-v3 is done. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-22 13:21:47 +02:00
Vladimir Murzin	19f0ece439	arm64: KVM: Move vgic-v3 save/restore to virt/kvm/arm/hyp So we can reuse the code under arch/arm Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-22 13:21:46 +02:00
Vladimir Murzin	5a7a8426b2	arm64: KVM: Use static keys for selecting the GIC backend Currently GIC backend is selected via alternative framework and this is fine. We are going to introduce vgic-v3 to 32-bit world and there we don't have patching framework in hand, so we can either check support for GICv3 every time we need to choose which backend to use or try to optimise it by using static keys. The later looks quite promising because we can share logic involved in selecting GIC backend between architectures if both uses static keys. This patch moves arm64 from alternative to static keys framework for selecting GIC backend. For that we embed static key into vgic_global and enable the key during vgic initialisation based on what has already been exposed by the host GIC driver. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-22 13:21:35 +02:00
Luiz Capitulino	45b5939e50	kvm: create per-vcpu dirs in debugfs This commit adds the ability for archs to export per-vcpu information via a new per-vcpu dir in the VM's debugfs directory. If kvm_arch_has_vcpu_debugfs() returns true, then KVM will create a vcpu dir for each vCPU in the VM's debugfs directory. Then kvm_arch_create_vcpu_debugfs() is responsible for populating each vcpu directory with arch specific entries. The per-vcpu path in debugfs will look like: /sys/kernel/debug/kvm/29162-10/vcpu0 /sys/kernel/debug/kvm/29162-10/vcpu1 This is all arch specific for now because the only user of this interface (x86) wants to export x86-specific per-vcpu information to user-space. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-16 16:57:47 +02:00
Luiz Capitulino	9d5a1dcebf	kvm: kvm_destroy_vm_debugfs(): check debugfs_stat_data pointer This make it possible to call kvm_destroy_vm_debugfs() from kvm_create_vm_debugfs() in error conditions. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-16 16:57:46 +02:00
Paolo Bonzini	ad53e35ae5	Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: The highlights are: * Reduced latency for interrupts from PCI pass-through devices, from Suresh Warrier and me. * Halt-polling implementation from Suraj Jitindar Singh. * 64-bit VCPU statistics, also from Suraj. * Various other minor fixes and improvements.	2016-09-13 15:20:55 +02:00
Paolo Bonzini	5d947a1447	KVM: ARM: cleanup kvm_timer_hyp_init Remove two unnecessary labels now that kvm_timer_hyp_init is not creating its own workqueue anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:54:00 +02:00
Marc Zyngier	3272f0d08e	arm64: KVM: Inject a vSerror if detecting a bad GICV access at EL2 If, when proxying a GICV access at EL2, we detect that the guest is doing something silly, report an EL1 SError instead ofgnoring the access. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	a07d3b07a8	arm64: KVM: vgic-v2: Enable GICV access from HYP if access from guest is unsafe So far, we've been disabling KVM on systems where the GICV region couldn't be safely given to a guest. Now that we're able to handle this access safely by emulating it in HYP, we can enable this feature when we detect an unsafe configuration. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	bf8feb3964	arm64: KVM: vgic-v2: Add GICV access from HYP Now that we have the necessary infrastructure to handle MMIO accesses in HYP, perform the GICV access on behalf of the guest. This requires checking that the access is strictly 32bit, properly aligned, and falls within the expected range. When all condition are satisfied, we perform the access and tell the rest of the HYP code that the instruction has been correctly emulated. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	fb5ee369cc	arm64: KVM: vgic-v2: Add the GICV emulation infrastructure In order to efficiently perform the GICV access on behalf of the guest, we need to be able to avoid going back all the way to the host kernel. For this, we introduce a new hook in the world switch code, conveniently placed just after populating the fault info. At that point, we only have saved/restored the GP registers, and we can quickly perform all the required checks (data abort, translation fault, valid faulting syndrome, not an external abort, not a PTW). Coming back from the emulation code, we need to skip the emulated instruction. This involves an additional bit of save/restore in order to be able to access the guest's PC (and possibly CPSR if this is a 32bit guest). At this stage, no emulation code is provided. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	8cebe750c4	arm64: KVM: Make kvm_skip_instr32 available to HYP As we plan to do some emulation at HYP, let's make kvm_skip_instr32 as part of the hyp_text section. This doesn't preclude the kernel from using it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	3aedd5c49e	arm: KVM: Use common AArch32 conditional execution code Add the bit of glue and const-ification that is required to use the code inherited from the arm64 port, and move over to it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	427d7cacf9	arm64: KVM: Move the AArch32 conditional execution to common code It would make some sense to share the conditional execution code between 32 and 64bit. In order to achieve this, let's move that code to virt/kvm/arm/aarch32.c. While we're at it, drop a superfluous BUG_ON() that wasn't that useful. Following patches will migrate the 32bit port to that code base. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Marc Zyngier	55d7cad6a9	KVM: arm: vgic: Drop build compatibility hack for older kernel versions As kvm_set_routing_entry() was changing prototype between 4.7 and 4.8, an ugly hack was put in place in order to survive both building in -next and the merge window. Now that everything has been merged, let's dump the compatibility hack for good. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-09-08 12:53:00 +02:00
Christoffer Dall	1fe0009833	KVM: arm/arm64: Rename vgic_attr_regs_access to vgic_attr_regs_access_v2 Just a rename so we can implement a v3-specific function later. We take the chance to get rid of the V2/V3 ops comments as well. No functional change. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>	2016-09-08 12:53:00 +02:00
Christoffer Dall	ba7b9169b5	KVM: arm/arm64: Factor out vgic_attr_regs_access functionality As we are about to deal with multiple data types and situations where the vgic should not be initialized when doing userspace accesses on the register attributes, factor out the functionality of vgic_attr_regs_access into smaller bits which can be reused by a new function later. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>	2016-09-08 12:53:00 +02:00
Suraj Jitindar Singh	8a7e75d47b	KVM: Add provisioning for ulong vm stats and u64 vcpu stats vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-08 12:25:37 +10:00
Bhaktipriya Shridhar	3706feacd0	KVM: Remove deprecated create_singlethread_workqueue The workqueue "irqfd_cleanup_wq" queues a single work item &irqfd->shutdown and hence doesn't require ordering. It is a host-wide workqueue for issuing deferred shutdown requests aggregated from all vm* instances. It is not being used on a memory reclaim path. Hence, it has been converted to use system_wq. The work item has been flushed in kvm_irqfd_release(). The workqueue "wqueue" queues a single work item &timer->expired and hence doesn't require ordering. Also, it is not being used on a memory reclaim path. Hence, it has been converted to use system_wq. System workqueues have been able to handle high level of concurrency for a long time now and hence it's not required to have a singlethreaded workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue created with create_singlethread_workqueue(), system_wq allows multiple work items to overlap executions even on the same CPU; however, a per-cpu workqueue doesn't have any CPU locality or global ordering guarantee unless the target CPU is explicitly specified and thus the increase of local concurrency shouldn't make any difference. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-07 19:34:28 +02:00
Paolo Bonzini	2eeb321fd2	KVM/ARM Fixes for v4.8-rc3 This tag contains the following fixes on top of v4.8-rc1: - ITS init issues - ITS error handling issues - ITS IRQ leakage fix - Plug a couple of ITS race conditions - An erratum workaround for timers - Some removal of misleading use of errors and comments - A fix for GICv3 on 32-bit guests -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXtLicAAoJEEtpOizt6ddyEC4H/16IngntN6Gz1WPwtBBelgyj ZfU970uzGOyDtDOeOX1NT+gJpkDvUMhsNlngWnMrMwqqqPVKdE4XBShPiW2v53E7 JquDTd2kKl+OO9e9XnkLw9yUcARmJFKIjHdISlg+E78t2kcNHn+XB2jrfTLKQVl8 tk1ztDALb4LXSGYPZQ/uHTYp9U0qei+2SbbQufRcdQ3ggyxLDwPP2aO25amctzEP 0Y42tlnNoZj7yBBp0X9BWRrHF2AZuOp+qBJnpFiQdsgLL6G1P3DcU/t9+KDjVBVr LYKN8jId2r5eyGGg8aKb4I3trevayToWhDw/jzarrTNAovB1cp8G5J7ozfmeS3g= =4PCW -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Fixes for v4.8-rc3 This tag contains the following fixes on top of v4.8-rc1: - ITS init issues - ITS error handling issues - ITS IRQ leakage fix - Plug a couple of ITS race conditions - An erratum workaround for timers - Some removal of misleading use of errors and comments - A fix for GICv3 on 32-bit guests	2016-08-18 12:19:19 +02:00
Marc Zyngier	cabdc5c59a	KVM: arm/arm64: timer: Workaround misconfigured timer interrupt Similarily to `f005bd7e3b` ("clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered"), make sure we can survive an interrupt that has been misconfigured as edge-triggered by forcing it to be level-triggered (active low is assumed, but the GIC doesn't really care whether this is high or low). Hopefully, the amount of shouting in the kernel log will convince the user to do something about their firmware. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-17 12:23:47 +02:00
Andre Przywara	286054a7a8	KVM: arm64: ITS: avoid re-mapping LPIs When a guest wants to map a device-ID/event-ID combination that is already mapped, we may end up in a situation where an LPI is never "put", thus never being freed. Since the GICv3 spec says that mapping an already mapped LPI is UNPREDICTABLE, lets just bail out early in this situation to avoid any potential leaks. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-16 19:27:22 +02:00
Andre Przywara	505a19eec4	KVM: arm64: check for ITS device on MSI injection When userspace provides the doorbell address for an MSI to be injected into the guest, we find a KVM device which feels responsible. Lets check that this device is really an emulated ITS before we make real use of the container_of-ed pointer. [ Moved NULL-pointer check to caller of static function - Christoffer ] Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-15 23:00:22 +02:00
Andre Przywara	c7735769d5	KVM: arm64: ITS: move ITS registration into first VCPU run Currently we register an ITS device upon userland issuing the CTLR_INIT ioctl to mark initialization of the ITS as done. This deviates from the initialization sequence of the existing GIC devices and does not play well with the way QEMU handles things. To be more in line with what we are used to, register the ITS(es) just before the first VCPU is about to run, so in the map_resources() call. This involves iterating through the list of KVM devices and map each ITS that we find. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-15 23:00:21 +02:00
Christoffer Dall	d9ae449b3d	KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic There are two problems with the current implementation of the MMIO handlers for the propbaser and pendbaser: First, the write to the value itself is not guaranteed to be an atomic 64-bit write so two concurrent writes to the structure field could be intermixed. Second, because we do a read-modify-update operation without any synchronization, if we have two 32-bit accesses to separate parts of the register, we can loose one of them. By using the atomic cmpxchg64 we should cover both issues above. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-15 23:00:20 +02:00
Christoffer Dall	a28ebea2ad	KVM: Protect device ops->create and list_add with kvm->lock KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-08-12 12:01:27 +02:00
Christoffer Dall	023e9fddc3	KVM: PPC: Move xics_debugfs_init out of create As we are about to hold the kvm->lock during the create operation on KVM devices, we should move the call to xics_debugfs_init into its own function, since holding a mutex over extended amounts of time might not be a good idea. Introduce an init operation on the kvm_device_ops struct which cannot fail and call this, if configured, after the device has been created. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-08-12 12:01:26 +02:00
Christoffer Dall	2cccbb368a	KVM: arm64: vgic-its: Plug race in vgic_put_irq Right now the following sequence of events can happen: 1. Thread X calls vgic_put_irq 2. Thread Y calls vgic_add_lpi 3. Thread Y gets lpi_list_lock 4. Thread X drops the ref count to 0 and blocks on lpi_list_lock 5. Thread Y finds the irq via the lpi_list_lock, raises the ref count to 1, and release the lpi_list_lock. 6. Thread X proceeds and frees the irq. Avoid this by holding the spinlock around the kref_put. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-10 11:41:54 +02:00
Christoffer Dall	99e5e886a0	KVM: arm64: vgic-its: Handle errors from vgic_add_lpi During low memory conditions, we could be dereferencing a NULL pointer when vgic_add_lpi fails to allocate memory. Consider for example this call sequence: vgic_its_cmd_handle_mapi itte->irq = vgic_add_lpi(kvm, lpi_nr); update_lpi_config(kvm, itte->irq, NULL); ret = kvm_read_guest(kvm, propbase + irq->intid ^^^^ kaboom? Instead, return an error pointer from vgic_add_lpi and check the return value from its single caller. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-10 11:41:35 +02:00
Andre Przywara	fd837b08d9	KVM: arm64: ITS: return 1 on successful MSI injection According to the KVM API documentation a successful MSI injection should return a value > 0 on success. Return possible errors in vgic_its_trigger_msi() and report a successful injection back to userland, while also reporting the case where the MSI could not be delivered due to the guest not having the LPI mapped, for instance. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-08-09 16:43:23 +02:00
Paolo Bonzini	6f49b2f341	KVM/ARM Changes for v4.8 - Take 2 Includes GSI routing support to go along with the new VGIC and a small fix that has been cooking in -next for a while. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXoydqAAoJEEtpOizt6ddyM3oH/1A4VeG/J9q4fBPXqY2tVWXs c3P7UgNcrEgUNs/F9ykQY/lb31deecUzaBt1OyTf+RlsNbihq3dQdYcBhxtUODw/ Faok582ya3UFgLW+IRHcID0EbkVOpIzMhOStYsnU/Dz7HG1JL9HdPzwkid7iu9LT fI6yrrBnJFjdWAAQ4BkcEKBENRsY8NTs7jX5vnFA92MkUBby7BmariPDD3FtrB+f Ob9B7CxM30pNqsN7OA/QvFOHMJHxf3s1TBKwmPHe5TLIfSzV1YxcEGiMc0lWqF4v BT8ZeMGCtjDw94tND1DskfQQRPaMqPmGuRTrAW/IuE2n92bFtbqIqs7Cbw0fzLE= =Vm6Q -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.8-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.8 - Take 2 Includes GSI routing support to go along with the new VGIC and a small fix that has been cooking in -next for a while.	2016-08-04 13:59:56 +02:00
Linus Torvalds	221bb8a46e	- ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. The ugly bit is the conflicts. A couple of them are simple conflicts due to 4.7 fixes, but most of them are with other trees. There was definitely too much reliance on Acked-by here. Some conflicts are for KVM patches where _I_ gave my Acked-by, but the worst are for this pull request's patches that touch files outside arch//kvm. KVM submaintainers should probably learn to synchronize better with arch maintainers, with the latter providing topic branches whenever possible instead of Acked-by. This is what we do with arch/x86. And I should learn to refuse pull requests when linux-next sends scary signals, even if that means that submaintainers have to rebase their branches. Anyhow, here's the list: - arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed by the nvdimm tree. This tree adds handle_preemption_timer and EXIT_REASON_PREEMPTION_TIMER at the same place. In general all mentions of pcommit have to go. There is also a conflict between a stable fix and this patch, where the stable fix removed the vmx_create_pml_buffer function and its call. - virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree. This tree adds kvm_io_bus_get_dev at the same place. - virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the file was completely removed for 4.8. - include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault; this is a change that should have gone in through the irqchip tree and pulled by kvm-arm. I think I would have rejected this kvm-arm pull request. The KVM version is the right one, except that it lacks GITS_BASER_PAGES_SHIFT. - arch/powerpc: what a mess. For the idle_book3s.S conflict, the KVM tree is the right one; everything else is trivial. In this case I am not quite sure what went wrong. The commit that is causing the mess (`fd7bacbca4`, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/ and arch/powerpc/kvm/. It's large, but at 396 insertions/5 deletions I guessed that it wasn't really possible to split it and that the 5 deletions wouldn't conflict. That wasn't the case. - arch/s390: also messy. First is hypfs_diag.c where the KVM tree moved some code and the s390 tree patched it. You have to reapply the relevant part of commits `6c22c98637`, plus all of `e030c1125e`, to arch/s390/kernel/diag.c. Or pick the linux-next conflict resolution from http://marc.info/?l=kvm&m=146717549531603&w=2. Second, there is a conflict in gmap.c between a stable fix and 4.8. The KVM version here is the correct one. I have pushed my resolution at refs/heads/merge-20160802 (commit 3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7 U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI 69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y= =42ti -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...	2016-08-02 16:11:27 -04:00
Linus Torvalds	a6408f6cb6	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the next part of the hotplug rework. - Convert all notifiers with a priority assigned - Convert all CPU_STARTING/DYING notifiers The final removal of the STARTING/DYING infrastructure will happen when the merge window closes. Another 700 hundred line of unpenetrable maze gone :)" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) timers/core: Correct callback order during CPU hot plug leds/trigger/cpu: Move from CPU_STARTING to ONLINE level powerpc/numa: Convert to hotplug state machine arm/perf: Fix hotplug state machine conversion irqchip/armada: Avoid unused function warnings ARC/time: Convert to hotplug state machine clocksource/atlas7: Convert to hotplug state machine clocksource/armada-370-xp: Convert to hotplug state machine clocksource/exynos_mct: Convert to hotplug state machine clocksource/arm_global_timer: Convert to hotplug state machine rcu: Convert rcutree to hotplug state machine KVM/arm/arm64/vgic-new: Convert to hotplug state machine smp/cfd: Convert core to hotplug state machine x86/x2apic: Convert to CPU hotplug state machine profile: Convert to hotplug state machine timers/core: Convert to hotplug state machine hrtimer: Convert to hotplug state machine x86/tboot: Convert to hotplug state machine arm64/armv8 deprecated: Convert to hotplug state machine hwtracing/coresight-etm4x: Convert to hotplug state machine ...	2016-07-29 13:55:30 -07:00
Marc Zyngier	3f312db6b6	KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype kvm_set_routing_entry is changing in -next, and causes things to explode. Add a temporary workaround that should be dropped when we hit 4.8-rc1 Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-24 10:37:38 +01:00
Radim Krčmář	912902ce78	KVM/ARM changes for Linux 4.8 - GICv3 ITS emulation - Simpler idmap management that fixes potential TLB conflicts - Honor the kernel protection in HYP mode - Removal of the old vgic implementation -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXkk6wAAoJECPQ0LrRPXpDkIQP/iJ2yXTxrfbJoyaVq1vuMn3R UFhVwNXP8OEjQrmp5lvMBazB1MRBkNDzlVXL1fSb+ijKmbIELOqHhO6ijrkK4zmc 0Ie0x5Bt4gIFPTZyZORVpy1eU/0YFGWERAfsAjYdMCeKwHjaUCRSrZBXF2YsFTfo Hh/ILvHa8TjUXWsQXvtZCL6AAnkDKBsbDWqsq5zspuT+PA8umI+dGLIiULXBpc4t S2TCDxOU1JgsAn+Y0XVbPXV9id+bs5LRd6nNH/RmipIVqWmukSrScXOjg/po/l2S laO4tHmyEeN6ecnCxWttpjacNwyTDNh5n3lL1ceBnBZFqn1k/7NjqV3fQzJxGd1T 1U6edE9+EuS9uXWF5XcEuAD660EiMs4FLVSjPgqYQtto3gOHilmuWL9eeeOOgCem Lknnu/7G8h36PaQuLnEXWXQb7jeS2rTuC0RqxCG62gD9UWEJTckRz5pRh/e6gz7n ZVXMrwGiVZ3zR78qE6i2j5CZ6A0BMAK3nZ85AI3kmgKg0CfVY28uPOj8llAOaYm+ 0XVdfRj7ed75eu3GobjHUyZ0fQ40jovmH2vy3mupBm5XBUHgH/j6X510KJ1UTLWI C2EO9KogbjoVeu60mQi4bKGSPi8/wdgYqVft/Qzl5D5iFvQ7Ia+TQNMArCQazBID Ihe1E09NGrHjV3Yw/GWV =2Del -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM changes for Linux 4.8 - GICv3 ITS emulation - Simpler idmap management that fixes potential TLB conflicts - Honor the kernel protection in HYP mode - Removal of the old vgic implementation	2016-07-22 20:27:26 +02:00
Eric Auger	995a0ee980	KVM: arm/arm64: Enable MSI routing Up to now, only irqchip routing entries could be set. This patch adds the capability to insert MSI routing entries. For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this include SPI irqchip routes plus MSI routes. In the future this might be extended. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-22 18:52:03 +01:00
Eric Auger	180ae7b118	KVM: arm/arm64: Enable irqchip routing This patch adds compilation and link against irqchip. Main motivation behind using irqchip code is to enable MSI routing code. In the future irqchip routing may also be useful when targeting multiple irqchips. Routing standard callbacks now are implemented in vgic-irqfd: - kvm_set_routing_entry - kvm_set_irq - kvm_set_msi They only are supported with new_vgic code. Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. So from now on IRQCHIP routing is enabled and a routing table entry must exist for irqfd injection to succeed for a given SPI. This patch builds a default flat irqchip routing table (gsi=irqchip.pin) covering all the VGIC SPI indexes. This routing table is overwritten by the first first user-space call to KVM_SET_GSI_ROUTING ioctl. MSI routing setup is not yet allowed. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-22 18:52:01 +01:00
Eric Auger	ebe915259c	KVM: irqchip: Convey devid to kvm_set_msi on ARM, a devid field is populated in kvm_msi struct in case the flag is set to KVM_MSI_VALID_DEVID. Let's propagate both flags and devid field in kvm_kernel_irq_routing_entry. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-22 18:51:58 +01:00
Marc Zyngier	3a88bded20	KVM: arm64: vgic-its: Simplify MAPI error handling If we care to move all the checks that do not involve any memory allocation, we can simplify the MAPI error handling. Let's do that, it cannot hurt. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:20 +01:00
Marc Zyngier	a3e7aa271e	KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers vgic_its_cmd_handle_mapi has an extra "subcmd" argument, which is already contained in the command buffer that all command handlers obtain from the command queue. Let's drop it, as it is not that useful. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:19 +01:00
Marc Zyngier	6d03a68f80	KVM: arm64: vgic-its: Turn device_id validation into generic ID validation There is no need to have separate functions to validate devices and collections, as the architecture doesn't really distinguish the two, and they are supposed to be managed the same way. Let's turn the DevID checker into a generic one. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:19 +01:00
Marc Zyngier	bb7176449f	KVM: arm64: vgic-its: Add pointer to corresponding kvm_device Going from the ITS structure to the corresponding KVM structure would be quite handy at times. The kvm_device pointer that is passed at create time is quite convenient for this, so let's keep a copy of it in the vgic_its structure. This will be put to a good use in subsequent patches. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:18 +01:00
Marc Zyngier	17a21f58ff	KVM: arm64: vgic-its: Add collection allocator/destructor Instead of spreading random allocations all over the place, consolidate allocation/init/freeing of collections in a pair of constructor/destructor. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:18 +01:00
Marc Zyngier	d6c7f865f0	KVM: arm64: vgic-its: Fix L2 entry validation for indirect tables When checking that the storage address of a device entry is valid, it is critical to compute the actual address of the entry, rather than relying on the beginning of the page to match a CPU page of the same size: for example, if the guest places the table at the last 64kB boundary of RAM, but RAM size isn't a multiple of 64kB... Fix this by computing the actual offset of the device ID in the L2 page, and check the corresponding GFN. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:18 +01:00
Marc Zyngier	333a53ff7f	KVM: arm64: vgic-its: Validate the device table L1 entry Checking that the device_id fits if the table, and we must make sure that the associated memory is also accessible. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:17 +01:00
Marc Zyngier	b90338b7cb	KVM: arm64: vgic-its: Fix misleading nr_entries in vgic_its_check_device_id The nr_entries variable in vgic_its_check_device_id actually describe the size of the L1 table, and not the number of entries in this table. Rename it to l1_tbl_size, so that we can now change the code with a better understanding of what is what. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:17 +01:00
Marc Zyngier	7e3963a515	KVM: arm64: vgic-its: Fix vgic_its_check_device_id BE handling The ITS tables are stored in LE format. If the host is reading a L1 table entry to check its validity, it must convert it to the CPU endianness. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:16 +01:00
Marc Zyngier	c0091073dd	KVM: arm64: vgic-its: Fix handling of indirect tables The current code will fail on valid indirect tables, and happily use the ones that are pointing out of the guest RAM. Funny what a small "!" can do for you... Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:16 +01:00
Marc Zyngier	d97594e6bc	KVM: arm64: vgic-its: Generalize use of vgic_get_irq_kref Instead of sprinkling raw kref_get() calls everytime we cannot do a normal vgic_get_irq(), use the existing vgic_get_irq_kref(), which does the same thing and is paired with a vgic_put_irq(). vgic_get_irq_kref is moved to vgic.h in order to be easily shared. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:16 +01:00
Eric Auger	9d5fcb9dd7	KVM: arm/arm64: Fix vGICv2 KVM_DEV_ARM_VGIC_GRP_CPU/DIST_REGS For VGICv2 save and restore the CPU interface registers are accessed. Restore the modality which has been altered. Also explicitly set the iodev_type for both the DIST and CPU interface. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:15:15 +01:00
Andre Przywara	0e4e82f154	KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller Now that all ITS emulation functionality is in place, we advertise MSI functionality to userland and also the ITS device to the guest - if userland has configured that. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:38 +01:00
Andre Przywara	2891a7dfb6	KVM: arm64: vgic-its: Implement MSI injection in ITS emulation When userland wants to inject an MSI into the guest, it uses the KVM_SIGNAL_MSI ioctl, which carries the doorbell address along with the payload and the device ID. With the help of the KVM IO bus framework we learn the corresponding ITS from the doorbell address. We then use our wrapper functions to iterate the linked lists and find the proper Interrupt Translation Table Entry (ITTE) and thus the corresponding struct vgic_irq to finally set the pending bit. We also provide the handler for the ITS "INT" command, which allows a guest to trigger an MSI via the ITS command queue. Since this one knows about the right ITS already, we directly call the MMIO handler function without using the kvm_io_bus framework. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:38 +01:00
Andre Przywara	df9f58fbea	KVM: arm64: vgic-its: Implement ITS command queue command handlers The connection between a device, an event ID, the LPI number and the associated CPU is stored in in-memory tables in a GICv3, but their format is not specified by the spec. Instead software uses a command queue in a ring buffer to let an ITS implementation use its own format. Implement handlers for the various ITS commands and let them store the requested relation into our own data structures. Those data structures are protected by the its_lock mutex. Our internal ring buffer read and write pointers are protected by the its_cmd mutex, so that only one VCPU per ITS can handle commands at any given time. Error handling is very basic at the moment, as we don't have a good way of communicating errors to the guest (usually an SError). The INT command handler is missing from this patch, as we gain the capability of actually injecting MSIs into the guest only later on. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:38 +01:00
Andre Przywara	f9f77af9e2	KVM: arm64: vgic-its: Allow updates of LPI configuration table The (system-wide) LPI configuration table is held in a table in (guest) memory. To achieve reasonable performance, we cache this data in our struct vgic_irq. If the guest updates the configuration data (which consists of the enable bit and the priority value), it issues an INV or INVALL command to allow us to update our information. Provide functions that update that information for one LPI or all LPIs mapped to a specific collection. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:37 +01:00
Andre Przywara	33d3bc9556	KVM: arm64: vgic-its: Read initial LPI pending table The LPI pending status for a GICv3 redistributor is held in a table in (guest) memory. To achieve reasonable performance, we cache the pending bit in our struct vgic_irq. The initial pending state must be read from guest memory upon enabling LPIs for this redistributor. As we can't access the guest memory while we hold the lpi_list spinlock, we create a snapshot of the LPI list and iterate over that. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:37 +01:00
Andre Przywara	3802411d01	KVM: arm64: vgic-its: Connect LPIs to the VGIC emulation LPIs are dynamically created (mapped) at guest runtime and their actual number can be quite high, but is mostly assigned using a very sparse allocation scheme. So arrays are not an ideal data structure to hold the information. We use a spin-lock protected linked list to hold all mapped LPIs, represented by their struct vgic_irq. This lock is grouped between the ap_list_lock and the vgic_irq lock in our locking order. Also we store a pointer to that struct vgic_irq in our struct its_itte, so we can easily access it. Eventually we call our new vgic_get_lpi() from vgic_get_irq(), so the VGIC code gets transparently access to LPIs. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:36 +01:00
Andre Przywara	424c33830f	KVM: arm64: vgic-its: Implement basic ITS register handlers Add emulation for some basic MMIO registers used in the ITS emulation. This includes: - GITS_{CTLR,TYPER,IIDR} - ID registers - GITS_{CBASER,CREADR,CWRITER} (which implement the ITS command buffer handling) - GITS_BASER<n> Most of the handlers are pretty straight forward, only the CWRITER handler is a bit more involved by taking the new its_cmd mutex and then iterating over the command buffer. The registers holding base addresses and attributes are sanitised before storing them. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:36 +01:00
Andre Przywara	1085fdc68c	KVM: arm64: vgic-its: Introduce new KVM ITS device Introduce a new KVM device that represents an ARM Interrupt Translation Service (ITS) controller. Since there can be multiple of this per guest, we can't piggy back on the existing GICv3 distributor device, but create a new type of KVM device. On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data structure and store the pointer in the kvm_device data. Upon an explicit init ioctl from userland (after having setup the MMIO address) we register the handlers with the kvm_io_bus framework. Any reference to an ITS thus has to go via this interface. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:35 +01:00
Andre Przywara	59c5ab4098	KVM: arm64: vgic-its: Introduce ITS emulation file with MMIO framework The ARM GICv3 ITS emulation code goes into a separate file, but needs to be connected to the GICv3 emulation, of which it is an option. The ITS MMIO handlers require the respective ITS pointer to be passed in, so we amend the existing VGIC MMIO framework to let it cope with that. Also we introduce the basic ITS data structure and initialize it, but don't return any success yet, as we are not yet ready for the show. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:35 +01:00
Andre Przywara	0aa1de5731	KVM: arm64: vgic: Handle ITS related GICv3 redistributor registers In the GICv3 redistributor there are the PENDBASER and PROPBASER registers which we did not emulate so far, as they only make sense when having an ITS. In preparation for that emulate those MMIO accesses by storing the 64-bit data written into it into a variable which we later read in the ITS emulation. We also sanitise the registers, making sure RES0 regions are respected and checking for valid memory attributes. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:14:35 +01:00
Andre Przywara	5dd4b924e3	KVM: arm/arm64: vgic: Add refcounting for IRQs In the moment our struct vgic_irq's are statically allocated at guest creation time. So getting a pointer to an IRQ structure is trivial and safe. LPIs are more dynamic, they can be mapped and unmapped at any time during the guest's _runtime_. In preparation for supporting LPIs we introduce reference counting for those structures using the kernel's kref infrastructure. Since private IRQs and SPIs are statically allocated, we avoid actually refcounting them, since they would never be released anyway. But we take provisions to increase the refcount when an IRQ gets onto a VCPU list and decrease it when it gets removed. Also this introduces vgic_put_irq(), which wraps kref_put and hides the release function from the callers. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:10:48 +01:00
Andre Przywara	8a39d00670	KVM: kvm_io_bus: Add kvm_io_bus_get_dev() call The kvm_io_bus framework is a nice place of holding information about various MMIO regions for kernel emulated devices. Add a call to retrieve the kvm_io_device structure which is associated with a certain MMIO address. This avoids to duplicate kvm_io_bus' knowledge of MMIO regions without having to fake MMIO calls if a user needs the device a certain MMIO address belongs to. This will be used by the ITS emulation to get the associated ITS device when someone triggers an MSI via an ioctl from userspace. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:10:40 +01:00
Andre Przywara	42c8870f90	KVM: arm/arm64: vgic: Check return value for kvm_register_vgic_device kvm_register_device_ops() can return an error, so lets check its return value and propagate this up the call chain. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:10:08 +01:00
Andre Przywara	8f6cdc1c2e	KVM: arm/arm64: vgic: Move redistributor kvm_io_devices Logically a GICv3 redistributor is assigned to a (v)CPU, so we should aim to keep redistributor related variables out of our struct vgic_dist. Let's start by replacing the redistributor related kvm_io_device array with two members in our existing struct vgic_cpu, which are naturally per-VCPU and thus don't require any allocation / freeing. So apart from the better fit with the redistributor design this saves some code as well. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-07-18 18:09:40 +01:00
Anna-Maria Gleixner	15d7e3d349	KVM/arm/arm64/vgic-new: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Eric Auger <eric.auger@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-arm-kernel@lists.infradead.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.900484868@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-07-15 10:41:43 +02:00
Richard Cochran	b3c9950a5c	arm/kvm/arch_timer: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-arm-kernel@lists.infradead.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153336.634155707@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-07-15 10:40:27 +02:00
Richard Cochran	42ec50b5f2	arm/kvm/vgic: Convert to hotplug state machine Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. The VGIC callback is run after KVM's main callback since it reflects the makefile order. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-arm-kernel@lists.infradead.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153336.546953286@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-07-15 10:40:26 +02:00
Thomas Gleixner	8c18b2d2d0	virt: Convert kvm hotplug to state machine Install the callbacks via the state machine. The core won't invoke the callbacks on already online CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153335.886159080@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-07-15 10:40:23 +02:00
Al Viro	506cfba9e7	KVM: don't use anon_inode_getfd() before possible failures Once anon_inode_getfd() has succeeded, it's impossible to undo in a clean way and no, sys_close() is not usable in such cases. Use anon_inode_getfile() and get_unused_fd_flags() to get struct file and descriptor and do not install the file into the descriptor table until after the last possible failure exit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-14 19:11:22 +02:00
Paolo Bonzini	7964218c7d	Revert "KVM: release anon file in failure path of vm creation" This reverts commit 77ecc085fed1af1000ca719522977b960aa6da52. Al Viro colorfully says: "You should NEVER use sys_close() on failure exit paths like that. Moreover, this kvm_put_kvm() becomes a double-put, since closing the damn file will drop that reference to kvm. Please, revert. anon_inode_getfd() should be used only when there's no possible failures past its call". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-14 19:11:21 +02:00
Liu Shuo	2be5b3f6dc	KVM: release anon file in failure path of vm creation The failure of create debugfs of VM will return directly without release the anon file. It will leak memory and file descriptors, even through be not serious. Signed-off-by: Liu Shuo <shuo.a.liu@intel.com> Fixes: `536a6f88c4` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-14 19:11:21 +02:00
Jim Mattson	2f1fe81123	KVM: nVMX: Fix memory corruption when using VMCS shadowing When freeing the nested resources of a vcpu, there is an assumption that the vcpu's vmcs01 is the current VMCS on the CPU that executes nested_release_vmcs12(). If this assumption is violated, the vcpu's vmcs01 may be made active on multiple CPUs at the same time, in violation of Intel's specification. Moreover, since the vcpu's vmcs01 is not VMCLEARed on every CPU on which it is active, it can linger in a CPU's VMCS cache after it has been freed and potentially repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity miss can result in memory corruption. It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If the vcpu in question was last loaded on a different CPU, it must be migrated to the current CPU before calling vmx_load_vmcs01(). Signed-off-by: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-14 19:11:20 +02:00
Radim Krčmář	c63cf538eb	KVM: pass struct kvm to kvm_set_routing_entry Arch-specific code will use it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-14 09:03:56 +02:00
Paolo Bonzini	add6a0cd1c	KVM: MMU: try to fix up page faults before giving up The vGPU folks would like to trap the first access to a BAR by setting vm_ops on the VMAs produced by mmap-ing a VFIO device. The fault handler then can use remap_pfn_range to place some non-reserved pages in the VMA. This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn and fixup_user_fault together help supporting it. The patch also supports VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to reference counting. Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Neo Jia <cjia@nvidia.com> Reported-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-05 14:41:26 +02:00
Paolo Bonzini	92176a8ede	KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames Handle VM_IO like VM_PFNMAP, as is common in the rest of Linux; extract the formula to convert hva->pfn into a new function, which will soon gain more capabilities. Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-07-05 14:41:03 +02:00
Marc Zyngier	50926d82fa	KVM: arm/arm64: The GIC is dead, long live the GIC I don't think any single piece of the KVM/ARM code ever generated as much hatred as the GIC emulation. It was written by someone who had zero experience in modeling hardware (me), was riddled with design flaws, should have been scrapped and rewritten from scratch long before having a remote chance of reaching mainline, and yet we supported it for a good three years. No need to mention the names of those who suffered, the git log is singing their praises. Thankfully, we now have a much more maintainable implementation, and we can safely put the grumpy old GIC to rest. Fellow hackers, please raise your glass in memory of the GIC: The GIC is dead, long live the GIC! Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-07-03 23:09:37 +02:00
Xiubo Li	caf1ff26e1	kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES These days, we experienced one guest crash with 8 cores and 3 disks, with qemu error logs as bellow: qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. And then we found one patch(bdf026317d) in qemu tree, which said could fix this bug. Execute the following script will reproduce the BUG quickly: irq_affinity.sh ======================================================================== vda_irq_num=25 vdb_irq_num=27 while [ 1 ] do for irq in {1,2,4,8,10,20,40,80} do echo $irq > /proc/irq/$vda_irq_num/smp_affinity echo $irq > /proc/irq/$vdb_irq_num/smp_affinity dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct done done ======================================================================== The following qemu log is added in the qemu code and is displayed when this bug reproduced: kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024, irq_routes->nr: 1024, gsi_count: 1024. That's to say when irq_routes->nr == 1024, there are 1024 routing entries, but in the kernel code when routes->nr >= 1024, will just return -EINVAL; The nr is the number of the routing entries which is in of [1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1]. This patch fix the BUG above. Cc: stable@vger.kernel.org Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 09:38:15 +02:00
Paolo Bonzini	557abc40d1	KVM: remove kvm_vcpu_compatible The new created_vcpus field makes it possible to avoid the race between irqchip and VCPU creation in a much nicer way; just check under kvm->lock whether a VCPU has already been created. We can then remove KVM_APIC_ARCHITECTURE too, because at this point the symbol is only governing the default definition of kvm_vcpu_compatible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 00:05:00 +02:00
Paolo Bonzini	6c7caebc26	KVM: introduce kvm->created_vcpus The race between creating the irqchip and the first VCPU is currently fixed by checking the presence of an irqchip before updating kvm->online_vcpus, and undoing the whole VCPU creation if someone created the irqchip in the meanwhile. Instead, introduce a new field in struct kvm that will count VCPUs under a mutex, without the atomic access and memory ordering that we need elsewhere to protect the vcpus array. This also plugs the race and is more easily applicable in all similar circumstances. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 00:05:00 +02:00
Paolo Bonzini	f8c1b85b25	KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID This causes an ugly dmesg splat. Beautified syzkaller testcase: #include <unistd.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <fcntl.h> #include <linux/kvm.h> long r[8]; int main() { struct kvm_irq_routing ir = { 0 }; r[2] = open("/dev/kvm", O_RDWR); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_SET_GSI_ROUTING, &ir); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-06-02 17:38:50 +02:00
Paolo Bonzini	c622a3c21e	KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi Found by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm] PGD 6f80b067 PUD b6535067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 [...] Call Trace: [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm] [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm] [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a1062>] tracesys_phase2+0x84/0x89 Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85 RIP [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm] RSP <ffff8800926cbca8> CR2: 0000000000000120 Testcase: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[26]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); struct kvm_irqfd ifd; ifd.fd = syscall(SYS_eventfd2, 5, 0); ifd.gsi = 3; ifd.flags = 2; ifd.resamplefd = ifd.fd; r[25] = ioctl(r[3], KVM_IRQFD, &ifd); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-06-02 17:38:50 +02:00
Marc Zyngier	05fb05a6ca	KVM: arm/arm64: vgic-new: Removel harmful BUG_ON When changing the active bit from an MMIO trap, we decide to explode if the intid is that of a private interrupt. This flawed logic comes from the fact that we were assuming that kvm_vcpu_kick() as called by kvm_arm_halt_vcpu() would not return before the called vcpu responded, but this is not the case, so we need to perform this wait even for private interrupts. Dropping the BUG_ON seems like the right thing to do. [ Commit message tweaked by Christoffer ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-06-02 11:52:21 +02:00
Marc Zyngier	637d122baa	KVM: arm/arm64: vgic-v3: Always resample level interrupts When reading back from the list registers, we need to perform two actions for level interrupts: 1) clear the soft-pending bit if the interrupt is not pending anymore in the list register 2) resample the line level and propagate it to the pending state But these two actions shouldn't be linked, and we should always resample the line level, no matter what state is in the list register. Otherwise, we may end-up injecting spurious interrupts that have been already retired. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-31 16:12:16 +02:00
Marc Zyngier	df7942d17e	KVM: arm/arm64: vgic-v2: Always resample level interrupts When reading back from the list registers, we need to perform two actions for level interrupts: 1) clear the soft-pending bit if the interrupt is not pending anymore in the list register 2) resample the line level and propagate it to the pending state But these two actions shouldn't be linked, and we should always resample the line level, no matter what state is in the list register. Otherwise, we may end-up injecting spurious interrupts that have been already retired. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-31 16:12:15 +02:00
Christoffer Dall	4d3afc9bad	KVM: arm/arm64: vgic-v2: Clear all dirty LRs When saving the state of the list registers, it is critical to reset them zero, as we could otherwise leave unexpected EOI interrupts pending for virtual level interrupts. Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-05-31 16:09:28 +02:00
Janosch Frank	536a6f88c4	KVM: Create debugfs dir and stat files for each VM This patch adds a kvm debugfs subdirectory for each VM, which is named after its pid and file descriptor. The directories contain the same kind of files that are already in the kvm debugfs directory, but the data exported through them is now VM specific. This makes the debugfs kvm data a convenient alternative to the tracepoints which already have per VM data. The debugfs data is easy to read and low overhead. CC: Dan Carpenter <dan.carpenter@oracle.com> [includes fixes by Dan Carpenter] Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-25 16:12:05 +02:00
Paolo Bonzini	44bcc92238	KVM/ARM Changes for v4.7 take 2 "The GIC is dead; Long live the GIC" This set of changes include the new vgic, which is a reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove it. Also fixes a non-critical issue with virtual abort injection to guests. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXRA7PAAoJEEtpOizt6ddy9a0H+wQ5LTC4rrgcWOsLjfa7res7 SqB3HEqrzmN3zMbv4dgYeMSBDqr6F3b1+8DJ6n1WkG7afXOKrpMWsl9SXgwmRc1q 8H54DYLcu/CDakzi/FKTeZEIp/u+tpMC7xQLFk8PFx/NUPfspPt1NU6Qi0fumZj6 5AbXwC6FKojplgO6wxV7oHRRiEnfrN5F+1whD3QlaDmI+rlNUSYNp2Ljhp8k9m9E NSeGzZgeMdHOeZpv60uhjotuxegG1zmHeHI59ltNytdfjuFL3LvNm18yG8u1yjKm 9ZDjIu1m7dfuPw39DYC99cxP6tvEh03/N2zw86ZFgj7QfdW756WrV+UkSFfeIKE= =xy+z -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4-7-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM Changes for v4.7 take 2 "The GIC is dead; Long live the GIC" This set of changes include the new vgic, which is a reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove it. Also fixes a non-critical issue with virtual abort injection to guests.	2016-05-24 12:10:51 +02:00
Christoffer Dall	35a2d58588	KVM: arm/arm64: vgic-new: Synchronize changes to active state When modifying the active state of an interrupt via the MMIO interface, we should ensure that the write has the intended effect. If a guest sets an interrupt to active, but that interrupt is already flushed into a list register on a running VCPU, then that VCPU will write the active state back into the struct vgic_irq upon returning from the guest and syncing its state. This is a non-benign race, because the guest can observe that an interrupt is not active, and it can have a reasonable expectations that other VCPUs will not ack any IRQs, and then set the state to active, and expect it to stay that way. Currently we are not honoring this case. Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the world, change the irq state, potentially queue the irq if we're setting it to active, and then continue. We take this chance to slightly optimize these functions by not stopping the world when touching private interrupts where there is inherently no possible race. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-20 16:26:38 +02:00
Andre Przywara	efffe55af5	KVM: arm/arm64: vgic-new: enable build Now that the new VGIC implementation has reached feature parity with the old one, add the new files to the build system and add a Kconfig option to switch between the two versions. We set the default to the new version to get maximum test coverage, in case people experience problems they can switch back to the old behaviour if needed. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-20 15:40:09 +02:00
Andre Przywara	568e8c901e	KVM: arm/arm64: vgic-new: implement mapped IRQ handling We now store the mapped hardware IRQ number in our struct, so we don't need the irq_phys_map for the new VGIC. Implement the hardware IRQ mapping on top of the reworked arch timer interface. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-20 15:40:09 +02:00
Andre Przywara	03f0c94c73	KVM: arm/arm64: vgic-new: Wire up irqfd injection Connect to the new VGIC to the irqfd framework, so that we can inject IRQs. GSI routing and MSI routing is not yet implemented. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-20 15:40:08 +02:00
Eric Auger	f7b6985cc3	KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable Enable the VGIC operation by properly initialising the registers in the hypervisor GIC interface. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-20 15:40:07 +02:00
Eric Auger	b0442ee227	KVM: arm/arm64: vgic-new: vgic_init: implement map_resources map_resources is the last initialization step. It is executed on first VCPU run. At that stage the code checks that userspace has provided the base addresses for the relevant VGIC regions, which depend on the type of VGIC that is exposed to the guest. Also we check if the two regions overlap. If the checks succeeded, we register the respective register frames with the kvm_io_bus framework. If we emulate a GICv2, the function also forces vgic_init execution if it has not been executed yet. Also we map the virtual GIC CPU interface onto the guest's CPU interface. Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>	2016-05-20 15:40:07 +02:00

... 2 3 4 5 6 ...

1403 Commits