OpenCloudOS-Kernel/arch
Rik van Riel f775b13eed x86,kvm: move qemu/guest FPU switching out to vcpu_run
Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:

    [258270.527947]  __warn+0xcb/0xf0
    [258270.527948]  warn_slowpath_null+0x1d/0x20
    [258270.527951]  kernel_fpu_disable+0x3f/0x50
    [258270.527953]  __kernel_fpu_begin+0x49/0x100
    [258270.527955]  kernel_fpu_begin+0xe/0x10
    [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
    [258270.527961]  crypto_shash_update+0x3f/0x110
    [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
    [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
    [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
    [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
    [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
    [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
    [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
    [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
    [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
    [258270.528002]  shrink_slab.part.40+0x1f5/0x420
    [258270.528004]  shrink_node+0x22c/0x320
    [258270.528006]  do_try_to_free_pages+0xf5/0x330
    [258270.528008]  try_to_free_pages+0xe9/0x190
    [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
    [258270.528011]  __alloc_pages_nodemask+0x209/0x260
    [258270.528014]  alloc_pages_vma+0x1f1/0x250
    [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
    [258270.528021]  handle_mm_fault+0xfd3/0x1330
    [258270.528025]  __get_user_pages+0x113/0x640
    [258270.528027]  get_user_pages+0x4f/0x60
    [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
    [258270.528108]  try_async_pf+0x66/0x230 [kvm]
    [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
    [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
    [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
    [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.

Cc: stable@vger.kernel.org
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
 which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-05 21:16:43 +01:00
..
alpha treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
arc ARC updates for 4.15-rc1 2017-11-25 08:21:54 -10:00
arm KVM/ARM Fixes for v4.15. 2017-12-05 18:02:03 +01:00
arm64 KVM/ARM Fixes for v4.15. 2017-12-05 18:02:03 +01:00
blackfin treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
c6x Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
cris pci-v4.15-changes 2017-11-15 15:01:28 -08:00
frv Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
h8300 mm, arch: remove empty_bad_page* 2017-11-15 18:21:03 -08:00
hexagon Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
ia64 arch/ia64/include/asm/topology.h: remove unused parent_node() macro 2017-11-17 16:10:04 -08:00
m32r m32r: fix endianness constraints 2017-11-15 18:21:00 -08:00
m68k m68k/macboing: Fix missed timer callback assignment 2017-11-24 16:19:40 +01:00
metag DeviceTree for 4.15: 2017-11-14 18:25:40 -08:00
microblaze Microblaze patch for 4.15-rc2 2017-11-29 14:19:22 -08:00
mips * x86 bugfixes: APIC, nested virtualization, IOAPIC 2017-11-30 08:15:19 -08:00
mn10300 bug: define the "cut here" string in a single place 2017-11-17 16:10:01 -08:00
nios2 DeviceTree for 4.15: 2017-11-14 18:25:40 -08:00
openrisc kmemcheck: remove annotations 2017-11-15 18:21:04 -08:00
parisc treewide: Switch DEFINE_TIMER callbacks to struct timer_list * 2017-11-21 15:57:05 -08:00
powerpc powerpc fixes for 4.15 #3 2017-12-01 08:40:17 -05:00
riscv RISC-V: Fixes for clean allmodconfig build 2017-12-01 13:31:31 -08:00
s390 * x86 bugfixes: APIC, nested virtualization, IOAPIC 2017-11-30 08:15:19 -08:00
score License cleanup: add SPDX license identifier to uapi header files with no license 2017-11-02 11:19:54 +01:00
sh treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
sparc Merge branch 'akpm' (patches from Andrew) 2017-11-29 19:12:44 -08:00
tile mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE 2017-11-29 18:40:42 -08:00
um This pull request contains the following core changes: 2017-11-22 20:46:06 -10:00
unicore32 kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK 2017-11-15 18:21:04 -08:00
x86 x86,kvm: move qemu/guest FPU switching out to vcpu_run 2017-12-05 21:16:43 +01:00
xtensa libnvdimm for 4.15 2017-11-17 09:51:57 -08:00
.gitignore
Kconfig bpf: Revert bpf_overrid_function() helper changes. 2017-11-11 18:24:55 +09:00