linux-sg2042

History

Yang Shi 7a30df49f6 mm: mmu_gather: remove __tlb_reset_range() for force flush A few new fields were added to mmu_gather to make TLB flush smarter for huge page by telling what level of page table is changed. __tlb_reset_range() is used to reset all these page table state to unchanged, which is called by TLB flush for parallel mapping changes for the same range under non-exclusive lock (i.e. read mmap_sem). Before commit `dd2283f260` ("mm: mmap: zap pages with read mmap_sem in munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update PTEs in parallel don't remove page tables. But, the forementioned commit may do munmap() under read mmap_sem and free page tables. This may result in program hang on aarch64 reported by Jan Stancek. The problem could be reproduced by his test program with slightly modified below. ---8<--- static int map_size = 4096; static int num_iter = 500; static long threads_total; static void distant_area; void map_write_unmap(void ptr) { int fd = ptr; unsigned char map_address; int i, j = 0; for (i = 0; i < num_iter; i++) { map_address = mmap(distant_area, (size_t) map_size, PROT_WRITE \| PROT_READ, MAP_SHARED \| MAP_ANONYMOUS, -1, 0); if (map_address == MAP_FAILED) { perror("mmap"); exit(1); } for (j = 0; j < map_size; j++) map_address[j] = 'b'; if (munmap(map_address, map_size) == -1) { perror("munmap"); exit(1); } } return NULL; } void dummy(void ptr) { return NULL; } int main(void) { pthread_t thid[2]; / hint for mmap in map_write_unmap() / distant_area = mmap(0, DISTANT_MMAP_SIZE, PROT_WRITE \| PROT_READ, MAP_ANONYMOUS \| MAP_PRIVATE, -1, 0); munmap(distant_area, (size_t)DISTANT_MMAP_SIZE); distant_area += DISTANT_MMAP_SIZE / 2; while (1) { pthread_create(&thid[0], NULL, map_write_unmap, NULL); pthread_create(&thid[1], NULL, dummy, NULL); pthread_join(thid[0], NULL); pthread_join(thid[1], NULL); } } ---8<--- The program may bring in parallel execution like below: t1 t2 munmap(map_address) downgrade_write(&mm->mmap_sem); unmap_region() tlb_gather_mmu() inc_tlb_flush_pending(tlb->mm); free_pgtables() tlb->freed_tables = 1 tlb->cleared_pmds = 1 pthread_exit() madvise(thread_stack, 8M, MADV_DONTNEED) zap_page_range() tlb_gather_mmu() inc_tlb_flush_pending(tlb->mm); tlb_finish_mmu() if (mm_tlb_flush_nested(tlb->mm)) __tlb_reset_range() __tlb_reset_range() would reset freed_tables and cleared_ bits, but this may cause inconsistency for munmap() which do free page tables. Then it may result in some architectures, e.g. aarch64, may not flush TLB completely as expected to have stale TLB entries remained. Use fullmm flush since it yields much better performance on aarch64 and non-fullmm doesn't yields significant difference on x86. The original proposed fix came from Jan Stancek who mainly debugged this issue, I just wrapped up everything together. Jan's testing results: v5.2-rc2-24-gbec7550cca10 -------------------------- mean stddev real 37.382 2.780 user 1.420 0.078 sys 54.658 1.855 v5.2-rc2-24-gbec7550cca10 + "mm: mmu_gather: remove __tlb_reset_range() for force flush" ---------------------------------------------------------------------------------------_ mean stddev real 37.119 2.105 user 1.548 0.087 sys 55.698 1.357 [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1558322252-113575-1-git-send-email-yang.shi@linux.alibaba.com Fixes: `dd2283f260` ("mm: mmap: zap pages with read mmap_sem in munmap") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Suggested-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <npiggin@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nadav Amit <namit@vmware.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-06-13 17:34:56 -10:00
..
kasan	kasan: initialize tag to 0xff in __kasan_kmalloc	2019-06-01 15:51:31 -07:00
Kconfig	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
Kconfig.debug	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
Makefile	mm: shuffle initial free memory to improve memory-side-cache utilization	2019-05-14 19:52:48 -07:00
backing-dev.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
balloon_compaction.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
cleancache.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
cma.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98	2019-05-24 17:37:54 +02:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
cma_debug.c	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()	2019-05-14 09:47:45 -07:00
compaction.c	mm, compaction: make sure we isolate a valid PFN	2019-06-01 15:51:32 -07:00
debug.c	mm: update references to page _refcount	2019-05-14 19:52:47 -07:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
dmapool.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 403	2019-06-05 17:37:13 +02:00
early_ioremap.c	mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep	2017-12-11 14:54:44 +01:00
fadvise.c	vfs: implement readahead(2) using POSIX_FADV_WILLNEED	2018-08-30 20:01:32 +02:00
failslab.c	mm: no need to check return value of debugfs_create functions	2019-03-05 21:07:17 -08:00
filemap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
frame_vector.c	mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'	2017-12-14 16:00:48 -08:00
frontswap.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
gup.c	mm/gup: continue VM_FAULT_RETRY processing even for pre-faults	2019-06-01 15:51:31 -07:00
gup_benchmark.c	mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM	2019-05-14 09:47:45 -07:00
highmem.c	mm: convert totalram_pages and totalhigh_pages variables to atomic	2018-12-28 12:11:47 -08:00
hmm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
huge_memory.c	mm/huge_memory.c: make __thp_get_unmapped_area static	2019-05-14 09:47:51 -07:00
hugetlb.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
hugetlb_cgroup.c	mm: rename page_counter's count/limit into usage/max	2018-06-07 17:34:35 -07:00
hwpoison-inject.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
init-mm.c	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	2018-07-17 09:35:30 +02:00
internal.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
interval_tree.c	mm/interval_tree.c: use vma_pages() helper	2018-01-31 17:18:37 -08:00
khugepaged.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
kmemleak-test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
kmemleak.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
ksm.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
list_lru.c	mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node	2019-06-13 17:34:56 -10:00
maccess.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
madvise.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
memblock.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
memcontrol.c	mm: memcontrol: don't batch updates of local VM stats and events	2019-06-13 17:34:56 -10:00
memfd.c	mm: page cache: store only head pages in i_pages	2019-05-14 09:47:45 -07:00
memory-failure.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 263	2019-06-05 17:30:28 +02:00
memory.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
memory_hotplug.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mempolicy.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225	2019-05-30 11:29:56 -07:00
mempool.c	docs/core-api/mm: fix return value descriptions in mm/	2019-03-05 21:07:20 -08:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
mincore.c	mm/mincore.c: make mincore() more conservative	2019-05-14 19:52:48 -07:00
mlock.c	mm/mlock.c: mlockall error for flag MCL_ONFAULT	2019-06-13 17:34:56 -10:00
mm_init.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmu_context.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
mmu_gather.c	mm: mmu_gather: remove __tlb_reset_range() for force flush	2019-06-13 17:34:56 -10:00
mmu_notifier.c	mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper	2019-05-14 09:47:49 -07:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	mm/mprotect.c: fix compilation warning because of unused 'mm' variable	2019-05-14 09:47:51 -07:00
mremap.c	mm/mmu_notifier: contextual information for event triggering invalidation	2019-05-14 09:47:49 -07:00
msync.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nommu.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
oom_kill.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
page-writeback.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
page_alloc.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	memblock: drop memblock_alloc_*_nopanic() variants	2019-03-12 10:04:02 -07:00
page_idle.c	mm: remove zone_lru_lock() function, access ->lru_lock directly	2019-03-05 21:07:21 -08:00
page_io.c	mm/page_io.c: fix polled swap page in	2019-01-04 13:13:48 -08:00
page_isolation.c	mm/page_isolation.c: remove redundant pfn_valid_within() in __first_valid_page()	2019-05-14 09:47:46 -07:00
page_owner.c	mm/page_owner: Simplify stack trace handling	2019-04-29 12:37:50 +02:00
page_poison.c	page_poison: play nicely with KASAN	2019-03-05 21:07:13 -08:00
page_vma_mapped.c	mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly	2018-10-31 08:54:11 -07:00
pagewalk.c	mm: kernel-doc: add missing parameter descriptions	2018-04-05 21:36:27 -07:00
percpu-internal.h	percpu: convert chunk hints to be based on pcpu_block_md	2019-03-13 12:25:31 -07:00
percpu-km.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-stats.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-vm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
pgtable-generic.c	x86/mm: Page size aware flush_tlb_mm_range()	2018-10-09 16:51:11 +02:00
process_vm_access.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
quicklist.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
readahead.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
rmap.c	mm/rmap.c: use the pra.mapcount to do the check	2019-05-14 09:47:49 -07:00
rodata_test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
shmem.c	mm: page cache: store only head pages in i_pages	2019-05-14 09:47:45 -07:00
shuffle.c	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
shuffle.h	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
slab.c	slab: remove /proc/slab_allocators	2019-05-16 15:51:55 -07:00
slab.h	mm: add support for kmem caches in DMA32 zone	2019-03-29 10:01:37 -07:00
slab_common.c	mm: add support for kmem caches in DMA32 zone	2019-03-29 10:01:37 -07:00
slob.c	slob: use slab_list instead of lru	2019-05-14 09:47:44 -07:00
slub.c	mm/slub.c: update the comment about slab frozen	2019-05-14 09:47:45 -07:00
sparse-vmemmap.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
sparse.c	mm/sparse.c: clean up obsolete code comment	2019-05-14 09:47:48 -07:00
swap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	mm, swap, get_swap_pages: use entry_size instead of cluster in parameter	2018-08-22 10:52:44 -07:00
swap_state.c	mm: page cache: store only head pages in i_pages	2019-05-14 09:47:45 -07:00
swapfile.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
truncate.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
usercopy.c	mm/usercopy.c: no check page span for stack objects	2019-01-08 17:15:11 -08:00
userfaultfd.c	hugetlb: use same fault hash key for shared and private mappings	2019-05-14 09:47:48 -07:00
util.c	prctl_set_mm: downgrade mmap_sem to read lock	2019-06-01 15:51:31 -07:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	mm/vmalloc.c: fix typo in comment	2019-06-01 15:51:31 -07:00
vmpressure.c	mm/vmpressure.c: convert to use match_string() helper	2018-06-07 17:34:36 -07:00
vmscan.c	mm/vmscan.c: fix recent_rotated history	2019-06-13 17:34:56 -10:00
vmstat.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
workingset.c	mm: memcontrol: make cgroup stats and events query API explicitly local	2019-05-14 19:52:53 -07:00
z3fold.c	z3fold: fix sheduling while atomic	2019-06-01 15:51:31 -07:00
zbud.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zpool.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zsmalloc.c	mm/zsmalloc.c: fix fall-through annotation	2018-10-26 16:26:35 -07:00
zswap.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00