OpenCloudOS-Kernel/arch/powerpc/mm
Aneesh Kumar K.V 75358ea359 powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race
MADV_DONTNEED holds mmap_sem in read mode and that implies a
parallel page fault is possible and the kernel can end up with a level 1 PTE
entry (THP entry) converted to a level 0 PTE entry without flushing
the THP TLB entry.

Most architectures including POWER have issues with kernel instantiating a level
0 PTE entry while holding level 1 TLB entries.

The code sequence I am looking at is

down_read(mmap_sem)                         down_read(mmap_sem)

zap_pmd_range()
 zap_huge_pmd()
  pmd lock held
  pmd_cleared
  table details added to mmu_gather
  pmd_unlock()
                                         insert a level 0 PTE entry()

tlb_finish_mmu().

Fix this by forcing a tlb flush before releasing pmd lock if this is
not a fullmm invalidate. We can safely skip this invalidate for
task exit case (fullmm invalidate) because in that case we are sure
there can be no parallel fault handlers.

This do change the Qemu guest RAM del/unplug time as below

128 core, 496GB guest:

Without patch:
munmap start: timer = 196449 ms, PID=6681
munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms

With patch:
munmap start: timer = 196345 ms, PID=6879
munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-23-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
..
book3s32 powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits. 2020-03-25 12:09:27 +11:00
book3s64 powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race 2020-05-05 21:20:16 +10:00
kasan powerpc updates for 5.7 2020-04-05 11:12:59 -07:00
nohash powerpc/fsl_booke: Avoid creating duplicate tlb1 entry 2020-03-17 23:40:35 +11:00
ptdump powerpc/mm: ptdump: no need to check return value of debugfs_create functions 2020-03-04 22:44:25 +11:00
Makefile powerpc/mm: Move ioremap functions out of pgtable_32/64.c 2019-08-27 13:03:35 +10:00
copro_fault.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 153 2019-05-30 11:26:32 -07:00
dma-noncoherent.c dma-mapping: drop the dev argument to arch_sync_dma_for_* 2019-11-20 20:31:38 +01:00
drmem.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
fault.c powerpc/pkeys: Check vma before returning key fault error to the user 2020-05-05 21:20:14 +10:00
highmem.c powerpc/highmem: Change BUG_ON() to WARN_ON() 2019-04-20 22:02:11 +10:00
hugetlbpage.c powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size 2020-02-17 12:47:05 +11:00
init-common.c powerpc: introduce kernstart_virt_addr to store the kernel base 2019-11-13 19:27:32 +11:00
init_32.c powerpc: move memstart_addr and kernstart_addr to init-common.c 2019-11-13 19:27:28 +11:00
init_64.c Merge branch 'topic/kaslr-book3e32' into next 2019-11-14 19:23:33 +11:00
ioremap.c mm/memremap_pages: Introduce memremap_compat_align() 2020-02-20 16:58:55 -08:00
ioremap_32.c powerpc/ioremap: warn on early use of ioremap() 2019-11-19 19:38:38 +11:00
ioremap_64.c powerpc/ioremap: warn on early use of ioremap() 2019-11-19 19:38:38 +11:00
mem.c mm/memory_hotplug: add pgprot_t to mhp_params 2020-04-10 15:36:21 -07:00
mmap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mmu_context.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
mmu_decl.h powerpc/ptdump: don't entirely rebuild kernel when selecting CONFIG_PPC_DEBUG_WX 2020-01-23 21:31:11 +11:00
numa.c powerpc/numa: Remove late request for home node associativity 2020-03-04 22:44:31 +11:00
pgtable-frag.c mm: treewide: clarify pgtable_page_{ctor,dtor}() naming 2019-09-26 10:10:44 -07:00
pgtable.c powerpc updates for 5.3 2019-07-13 16:08:36 -07:00
pgtable_32.c powerpc/32: drop get_pteptr() 2020-02-26 10:34:41 +11:00
pgtable_64.c powerpc/mm: Move ioremap functions out of pgtable_32/64.c 2019-08-27 13:03:35 +10:00
slice.c powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notrace 2019-12-23 21:12:51 +11:00