OpenCloudOS-Kernel

History

Rik van Riel f775b13eed x86,kvm: move qemu/guest FPU switching out to vcpu_run Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>		2017-12-05 21:16:43 +01:00
..
alpha	treewide: setup_timer() -> timer_setup()	2017-11-21 15:57:07 -08:00
arc	ARC updates for 4.15-rc1	2017-11-25 08:21:54 -10:00
arm	KVM/ARM Fixes for v4.15.	2017-12-05 18:02:03 +01:00
arm64	KVM/ARM Fixes for v4.15.	2017-12-05 18:02:03 +01:00
blackfin	treewide: setup_timer() -> timer_setup()	2017-11-21 15:57:07 -08:00
c6x	Kbuild updates for v4.15	2017-11-17 17:45:29 -08:00
cris	pci-v4.15-changes	2017-11-15 15:01:28 -08:00
frv	Kbuild updates for v4.15	2017-11-17 17:45:29 -08:00
h8300	mm, arch: remove empty_bad_page*	2017-11-15 18:21:03 -08:00
hexagon	Kbuild updates for v4.15	2017-11-17 17:45:29 -08:00
ia64	arch/ia64/include/asm/topology.h: remove unused parent_node() macro	2017-11-17 16:10:04 -08:00
m32r	m32r: fix endianness constraints	2017-11-15 18:21:00 -08:00
m68k	m68k/macboing: Fix missed timer callback assignment	2017-11-24 16:19:40 +01:00
metag	DeviceTree for 4.15:	2017-11-14 18:25:40 -08:00
microblaze	Microblaze patch for 4.15-rc2	2017-11-29 14:19:22 -08:00
mips	* x86 bugfixes: APIC, nested virtualization, IOAPIC	2017-11-30 08:15:19 -08:00
mn10300	bug: define the "cut here" string in a single place	2017-11-17 16:10:01 -08:00
nios2	DeviceTree for 4.15:	2017-11-14 18:25:40 -08:00
openrisc	kmemcheck: remove annotations	2017-11-15 18:21:04 -08:00
parisc	treewide: Switch DEFINE_TIMER callbacks to struct timer_list *	2017-11-21 15:57:05 -08:00
powerpc	powerpc fixes for 4.15 #3	2017-12-01 08:40:17 -05:00
riscv	RISC-V: Fixes for clean allmodconfig build	2017-12-01 13:31:31 -08:00
s390	* x86 bugfixes: APIC, nested virtualization, IOAPIC	2017-11-30 08:15:19 -08:00
score	License cleanup: add SPDX license identifier to uapi header files with no license	2017-11-02 11:19:54 +01:00
sh	treewide: setup_timer() -> timer_setup()	2017-11-21 15:57:07 -08:00
sparc	Merge branch 'akpm' (patches from Andrew)	2017-11-29 19:12:44 -08:00
tile	mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE	2017-11-29 18:40:42 -08:00
um	This pull request contains the following core changes:	2017-11-22 20:46:06 -10:00
unicore32	kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK	2017-11-15 18:21:04 -08:00
x86	x86,kvm: move qemu/guest FPU switching out to vcpu_run	2017-12-05 21:16:43 +01:00
xtensa	libnvdimm for 4.15	2017-11-17 09:51:57 -08:00
.gitignore	…
Kconfig	bpf: Revert bpf_overrid_function() helper changes.	2017-11-11 18:24:55 +09:00