linux-sg2042/mm
Mel Gorman 2f84a8990e mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
..
kasan kasan: fix last shadow judgement in memory_is_poisoned_16() 2015-09-17 21:16:07 -07:00
Kconfig media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
backing-dev.c Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block 2015-09-10 18:56:14 -07:00
balloon_compaction.c mm/balloon_compaction: fix deflation when compaction is disabled 2014-10-29 16:33:15 -07:00
bootmem.c bootmem: avoid freeing to bootmem after bootmem is done 2015-09-08 15:35:28 -07:00
cleancache.c cleancache: remove limit on the number of cleancache enabled filesystems 2015-04-14 16:49:03 -07:00
cma.c mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute 2015-06-24 17:49:44 -07:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
compaction.c mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
dmapool.c mm: add support for __GFP_ZERO flag to dma_pool_alloc() 2015-09-08 15:35:28 -07:00
early_ioremap.c mm/early_ioremap: add explicit #include of asm/early_ioremap.h 2015-09-11 15:21:34 -07:00
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c
filemap.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
frame_vector.c [media] mm: Provide new get_vaddr_frames() helper 2015-08-16 13:02:47 -03:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c mm: make GUP handle pfn mapping unless FOLL_GET is requested 2015-09-04 16:54:41 -07:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2015-10-01 21:42:35 -04:00
hugetlb_cgroup.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm/compaction: correct to flush migrated pages if pageblock skip happens 2015-09-08 15:35:28 -07:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c kmemleak: use seq_hex_dump() to dump buffers 2015-09-10 13:29:01 -07:00
ksm.c mm: remove rest of ACCESS_ONCE() usages 2015-04-15 16:35:18 -07:00
list_lru.c list_lru: don't call list_lru_from_kmem if the list_head is empty 2015-09-08 15:35:28 -07:00
maccess.c lib: move strncpy_from_unsafe() into mm/maccess.c 2015-08-31 12:36:10 -07:00
madvise.c mm: madvise allow remove operation for hugetlbfs 2015-09-08 15:35:28 -07:00
memblock.c mm/memblock.c: fix comment in __next_mem_range() 2015-09-08 15:35:28 -07:00
memcontrol.c memcg: zap try_get_mem_cgroup_from_page 2015-09-10 13:29:01 -07:00
memory-failure.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
memory.c mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() 2015-09-10 13:29:01 -07:00
memory_hotplug.c libnvdimm for 4.3: 2015-09-08 14:35:59 -07:00
mempolicy.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
mempool.c mm/mempool: allow NULL `pool' pointer in mempool_destroy() 2015-09-08 15:35:28 -07:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm: migrate: hugetlb: putback destination hugepage to active list 2015-09-22 15:09:53 -07:00
mincore.c mincore: apply page table walker on do_mincore() 2015-02-11 17:06:06 -08:00
mlock.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified 2015-09-22 15:09:53 -07:00
mmu_context.c
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c mm: microoptimize zonelist operations 2015-02-11 17:06:02 -08:00
mprotect.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mremap.c mremap: simplify the "overlap" check in mremap_to() 2015-09-04 16:54:41 -07:00
msync.c mm: remove rest usage of VM_NONLINEAR and pte_file() 2015-02-10 14:30:31 -08:00
nobootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
nommu.c mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() 2015-09-10 13:29:01 -07:00
oom_kill.c mm, oom: remove unnecessary variable 2015-09-08 15:35:28 -07:00
page-writeback.c Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block 2015-09-10 18:56:14 -07:00
page_alloc.c Merge branch 'akpm' (patches from Andrew) 2015-09-08 17:52:23 -07:00
page_counter.c mm: page_counter: pull "-1" handling out of page_counter_memparse() 2015-02-11 17:06:02 -08:00
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm, page_isolation: make set/unset_migratetype_isolate() file-local 2015-09-08 15:35:28 -07:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
pagewalk.c mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers 2015-03-25 16:20:30 -07:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c percpu: clean up of schunk->map[] assignment in pcpu_setup_first_chunk 2015-07-21 11:31:00 -04:00
pgtable-generic.c mm: clarify that the function operates on hugepage pte 2015-06-24 17:49:44 -07:00
process_vm_access.c process_vm_access: switch to {compat_,}import_iovec() 2015-04-11 22:27:12 -04:00
quicklist.c
readahead.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
rmap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
shmem.c shmem: recalculate file inode when fstat 2015-09-08 15:35:28 -07:00
slab.c mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) 2015-10-01 21:42:35 -04:00
slab.h mm/slab.h: fix argument order in cache_from_obj's error message 2015-09-04 16:54:41 -07:00
slab_common.c memcg: export struct mem_cgroup 2015-09-08 15:35:28 -07:00
slob.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
slub.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
sparse-vmemmap.c
sparse.c
swap.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: swap: zswap: maybe_preload & refactoring 2015-09-08 15:35:28 -07:00
swapfile.c mm: /proc/pid/smaps:: show proportional swap share of the mapping 2015-09-08 15:35:28 -07:00
truncate.c memcg: add per cgroup dirty page accounting 2015-06-02 08:33:33 -06:00
userfaultfd.c userfaultfd: avoid mmap_sem read recursion in mcopy_atomic 2015-09-04 16:54:41 -07:00
util.c mm: uninline and cleanup page-mapping related helpers 2015-04-15 16:35:19 -07:00
vmacache.c mm,vmacache: count number of system-wide flushes 2014-12-13 12:42:48 -08:00
vmalloc.c mm/vmalloc: get rid of dirty bitmap inside vmap_block structure 2015-04-15 16:35:18 -07:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c vmscan: fix sane_reclaim helper for legacy memcg 2015-09-22 15:09:53 -07:00
vmstat.c vmstat: Reduce time interval to stat update on idle cpu 2015-02-11 17:06:07 -08:00
workingset.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
zbud.c mm: zbud: constify the zbud_ops 2015-09-08 15:35:28 -07:00
zpool.c zpool: add zpool_has_pool() 2015-09-10 13:29:01 -07:00
zsmalloc.c mm: zpool: constify the zpool_ops 2015-09-08 15:35:28 -07:00
zswap.c zswap: change zpool/compressor at runtime 2015-09-10 13:29:01 -07:00