OpenCloudOS-Kernel

History

Andrea Arcangeli 127393fbe5 mm: thp: kvm: fix memory corruption in KVM with THP enabled After the THP refcounting change, obtaining a compound pages from get_user_pages() no longer allows us to assume the entire compound page is immediately mappable from a secondary MMU. A secondary MMU doesn't want to call get_user_pages() more than once for each compound page, in order to know if it can map the whole compound page. So a secondary MMU needs to know from a single get_user_pages() invocation when it can map immediately the entire compound page to avoid a flood of unnecessary secondary MMU faults and spurious atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier users). Ideally instead of the page->_mapcount < 1 check, get_user_pages() should return the granularity of the "page" mapping in the "mm" passed to get_user_pages(). However it's non trivial change to pass the "pmd" status belonging to the "mm" walked by get_user_pages up the stack (up to the caller of get_user_pages). So the fix just checks if there is not a single pte mapping on the page returned by get_user_pages, and in turn if the caller can assume that the whole compound page is mapped in the current "mm" (in a pmd_trans_huge()). In such case the entire compound page is safe to map into the secondary MMU without additional get_user_pages() calls on the surrounding tail/head pages. In addition of being faster, not having to run other get_user_pages() calls also reduces the memory footprint of the secondary MMU fault in case the pmd split happened as result of memory pressure. Without this fix after a MADV_DONTNEED (like invoked by QEMU during postcopy live migration or balloning) or after generic swapping (with a failure in split_huge_page() that would only result in pmd splitting and not a physical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all times. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will need the same information if it doesn't want to run a flood of get_user_pages_fast and it can support multiple granularity in the secondary MMU mappings, so I think it is justified to be exposed not just to KVM. The other option would be to move transparent_hugepage_adjust to mm/huge_memory.c but that currently has all kind of KVM data structures in it, so it's definitely not a cut-and-paste work, so I couldn't do a fix as cleaner as this one for 4.6. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2016-05-05 17:38:53 -07:00
..
hyp	ARM: KVM: Use common version of timer-sr.c	2016-02-29 18:34:19 +00:00
Kconfig	arm/arm64: KVM : Enable vhost device selection under KVM config menu	2015-10-22 23:01:45 +02:00
Makefile	ARM: KVM: Add TLB invalidation code	2016-02-29 18:34:13 +00:00
arm.c	arm64: KVM: unregister notifiers in hyp mode teardown path	2016-04-06 13:47:52 +02:00
coproc.c	ARM: KVM: Switch the CP reg search to be a binary search	2016-02-29 18:34:22 +00:00
coproc.h	ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit	2016-02-29 18:34:22 +00:00
coproc_a7.c	arm/arm64: KVM: Use set/way op trapping to track the state of the caches	2015-01-29 23:24:56 +01:00
coproc_a15.c	arm/arm64: KVM: Use set/way op trapping to track the state of the caches	2015-01-29 23:24:56 +01:00
emulate.c	ARM: KVM: Move GP registers into the CPU context structure	2016-02-29 18:34:12 +00:00
guest.c	One of the largest releases for KVM... Hardly any generic improvement,	2016-03-16 09:55:35 -07:00
handle_exit.c	ARM: KVM: Remove handling of ARM_EXCEPTION_DATA/PREF_ABORT	2016-02-29 18:34:15 +00:00
init.S	ARM: KVM: Switch to C-based stage2 init	2016-02-29 18:34:14 +00:00
interrupts.S	ARM: KVM: Remove the old world switch	2016-02-29 18:34:14 +00:00
mmio.c	arm/arm64: KVM: Feed initialized memory to MMIO accesses	2016-02-24 11:53:09 +00:00
mmu.c	mm: thp: kvm: fix memory corruption in KVM with THP enabled	2016-05-05 17:38:53 -07:00
perf.c	ARM: KVM: add support for minimal host vs guest profiling	2013-04-28 21:44:01 -07:00
psci.c	KVM: Use simple waitqueue for vcpu->wq	2016-02-25 11:27:16 +01:00
reset.c	ARM: KVM: Move GP registers into the CPU context structure	2016-02-29 18:34:12 +00:00
trace.h	arm/arm64: KVM: Improve kvm_exit tracepoint	2015-10-22 23:01:47 +02:00