OpenCloudOS-Kernel

History

Huang, Ying 235b621767 mm/swap: add cluster lock This patch is to reduce the lock contention of swap_info_struct->lock via using a more fine grained lock in swap_cluster_info for some swap operations. swap_info_struct->lock is heavily contended if multiple processes reclaim pages simultaneously. Because there is only one lock for each swap device. While in common configuration, there is only one or several swap devices in the system. The lock protects almost all swap related operations. In fact, many swap operations only access one element of swap_info_struct->swap_map array. And there is no dependency between different elements of swap_info_struct->swap_map. So a fine grained lock can be used to allow parallel access to the different elements of swap_info_struct->swap_map. In this patch, a spinlock is added to swap_cluster_info to protect the elements of swap_info_struct->swap_map in the swap cluster and the fields of swap_cluster_info. This reduced locking contention for swap_info_struct->swap_map access greatly. Because of the added spinlock, the size of swap_cluster_info increases from 4 bytes to 8 bytes on the 64 bit and 32 bit system. This will use additional 4k RAM for every 1G swap space. Because the size of swap_cluster_info is much smaller than the size of the cache line (8 vs 64 on x86_64 architecture), there may be false cache line sharing between spinlocks in swap_cluster_info. To avoid the false sharing in the first round of the swap cluster allocation, the order of the swap clusters in the free clusters list is changed. So that, the swap_cluster_info sharing the same cache line will be placed as far as possible. After the first round of allocation, the order of the clusters in free clusters list is expected to be random. So the false sharing should be not serious. Compared with a previous implementation using bit_spin_lock, the sequential swap out throughput improved about 3.2%. Test was done on a Xeon E5 v3 system. The swap device used is a RAM simulated PMEM (persistent memory) device. To test the sequential swapping out, the test case created 32 processes, which sequentially allocate and write to the anonymous pages until the RAM and part of the swap device is used. [ying.huang@intel.com: v5] Link: http://lkml.kernel.org/r/878tqeuuic.fsf_-_@yhuang-dev.intel.com [minchan@kernel.org: initialize spinlock for swap_cluster_info] Link: http://lkml.kernel.org/r/1486434945-29753-1-git-send-email-minchan@kernel.org [hughd@google.com: annotate nested locking for cluster lock] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1702161050540.21773@eggly.anvils Link: http://lkml.kernel.org/r/dbb860bbd825b1aaba18988015e8963f263c3f0d.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> escreveu: Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-02-22 16:41:30 -08:00
..
kasan	arm64 updates for 4.11:	2017-02-22 10:46:44 -08:00
Kconfig	mm: THP page cache support for ppc64	2016-12-12 18:55:08 -08:00
Kconfig.debug	PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO	2016-09-13 02:35:27 +02:00
Makefile	Disable the __builtin_return_address() warning globally after all	2016-10-12 10:23:41 -07:00
backing-dev.c	block: fix double-free in the failure path of cgwb_bdi_init()	2017-02-08 13:52:01 -07:00
balloon_compaction.c	mm: balloon: use general non-lru movable page feature	2016-07-26 16:19:19 -07:00
bootmem.c	mm/bootmem.c: cosmetic improvement of code readability	2017-02-22 16:41:29 -08:00
cleancache.c	cleancache: constify cleancache_ops structure	2016-01-27 09:09:57 -05:00
cma.c	mm/cma: Cleanup highmem check	2017-01-11 13:56:49 +00:00
cma.h	mm: cma: mark cma_bitmap_maxno() inline in header	2015-08-14 15:56:32 -07:00
cma_debug.c	mm/cma_debug: correct size input to bitmap function	2015-07-17 16:39:54 -07:00
compaction.c	mm,compaction: serialize waitqueue_active() checks	2017-02-22 16:41:29 -08:00
debug.c	mm, debug: print raw struct page data in __dump_page()	2016-12-12 18:55:08 -08:00
debug_page_ref.c	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
dmapool.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
early_ioremap.c	mm/early_ioremap: use offset_in_page macro	2015-11-05 19:34:48 -08:00
fadvise.c	mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED	2016-12-20 09:48:46 -08:00
failslab.c	mm: fault-inject take over bootstrap kmem_cache check	2016-03-15 16:55:16 -07:00
filemap.c	mm: fix filemap.c kernel-doc warnings	2017-02-22 16:41:29 -08:00
frame_vector.c	mm: replace get_vaddr_frames() write/force parameters with gup_flags	2016-10-19 08:11:24 -07:00
frontswap.c	mm, frontswap: convert frontswap_enabled to static key	2016-07-26 16:19:19 -07:00
gup.c	userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY	2017-02-22 16:41:28 -08:00
highmem.c	mm/highmem: make nr_free_highpages() handles all highmem zones by itself	2016-05-19 19:12:14 -07:00
huge_memory.c	mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp	2017-01-24 16:26:14 -08:00
hugetlb.c	userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings	2017-02-22 16:41:28 -08:00
hugetlb_cgroup.c	mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size	2016-05-20 17:58:30 -07:00
hwpoison-inject.c	hwpoison: use page_cgroup_ino for filtering by memcg	2015-09-10 13:29:01 -07:00
init-mm.c	mm: Add a user_ns owner to mm_struct and fix ptrace permission checks	2016-11-22 11:49:48 -06:00
internal.h	mm, compaction: add vmstats for kcompactd work	2017-02-22 16:41:29 -08:00
interval_tree.c	mm: replace vma->sharead.linear with vma->shared	2015-02-10 14:30:31 -08:00
khugepaged.c	mm: get rid of __GFP_OTHER_NODE	2017-01-10 18:31:55 -08:00
kmemcheck.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	kmemleak: fix reference to Documentation	2016-12-12 18:55:07 -08:00
ksm.c	mm,ksm: add __GFP_HIGH to the allocation in alloc_stable_node()	2016-10-07 18:46:29 -07:00
list_lru.c	mm/list_lru.c: avoid error-path NULL pointer deref	2016-10-27 18:43:42 -07:00
maccess.c	x86: remove more uaccess_32.h complexity	2016-05-22 17:21:27 -07:00
madvise.c	userfaultfd: non-cooperative: avoid MADV_DONTNEED race condition	2017-02-22 16:41:28 -08:00
memblock.c	mm/memblock.c: check return value of memblock_reserve() in memblock_virt_alloc_internal()	2017-02-22 16:41:29 -08:00
memcontrol.c	slab: use memcg_kmem_cache_wq for slab destruction operations	2017-02-22 16:41:27 -08:00
memory-failure.c	mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked	2016-12-25 11:54:48 -08:00
memory.c	userfaultfd: hugetlbfs: fix __mcopy_atomic_hugetlb retry/error processing	2017-02-22 16:41:28 -08:00
memory_hotplug.c	mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next	2017-02-22 16:41:29 -08:00
mempolicy.c	mm/mempolicy.c: do not put mempolicy before using its nodemask	2017-01-24 16:26:14 -08:00
mempool.c	Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"	2016-07-28 16:07:41 -07:00
memtest.c	memtest: remove unused header files	2015-09-08 15:35:28 -07:00
migrate.c	mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked	2016-12-25 11:54:48 -08:00
mincore.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
mlock.c	thp: fix corner case of munlock() of PTE-mapped THPs	2016-11-30 16:32:52 -08:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	powerpc: do not make the entire heap executable	2017-02-22 16:41:29 -08:00
mmu_context.c	mm/mmu_context, sched/core: Fix mmu_context.h assumption	2016-04-28 11:44:19 +02:00
mmu_notifier.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
mmzone.c	mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist()	2017-02-22 16:41:29 -08:00
mprotect.c	mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock	2017-02-22 16:41:29 -08:00
mremap.c	userfaultfd: non-cooperative: optimize mremap_userfaultfd_complete()	2017-02-22 16:41:28 -08:00
msync.c	mm/msync: use offset_in_page macro	2015-11-05 19:34:48 -08:00
nobootmem.c	mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping	2016-10-11 15:06:33 -07:00
nommu.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
oom_kill.c	oom: print nodemask in the oom report	2016-10-07 18:46:29 -07:00
page-writeback.c	block: Use pointer to backing_dev_info from request_queue	2017-02-02 08:20:48 -07:00
page_alloc.c	mm: page_alloc: skip over regions of invalid pfns where possible	2017-02-22 16:41:29 -08:00
page_counter.c	mm: page_counter: let page_counter_try_charge() return bool	2015-11-05 19:34:48 -08:00
page_ext.c	mm/page_ext: support extra space allocation by page_ext user	2016-10-07 18:46:27 -07:00
page_idle.c	mm, vmscan: move lru_lock to the node	2016-07-28 16:07:41 -07:00
page_io.c	writeback: add wbc_to_write_flags()	2016-11-02 10:24:03 -06:00
page_isolation.c	mm, page_alloc: avoid page_to_pfn() when merging buddies	2017-02-22 16:41:27 -08:00
page_owner.c	mm/page_owner: don't define fields on struct page_ext by hard-coding	2016-10-07 18:46:27 -07:00
page_poison.c	mm: check the return value of lookup_page_ext for all call sites	2016-06-03 15:06:22 -07:00
pagewalk.c	thp: rename split_huge_page_pmd() to split_huge_pmd()	2016-01-15 17:56:32 -08:00
percpu-km.c	mm: percpu: use pr_fmt to prefix output	2016-03-17 15:09:34 -07:00
percpu-vm.c	…
percpu.c	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu	2016-12-13 12:34:47 -08:00
pgtable-generic.c	mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range	2016-03-17 15:09:34 -07:00
process_vm_access.c	mm: unexport __get_user_pages_unlocked()	2016-12-14 16:04:09 -08:00
quicklist.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
readahead.c	mm: don't cap request size based on read-ahead setting	2016-12-12 18:55:08 -08:00
rmap.c	mm, rmap: handle anon_vma_prepare() common case inline	2016-12-12 18:55:08 -08:00
shmem.c	userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY	2017-02-22 16:41:29 -08:00
slab.c	slab: introduce __kmemcg_cache_deactivate()	2017-02-22 16:41:27 -08:00
slab.h	slab: remove synchronous synchronize_sched() from memcg cache deactivation path	2017-02-22 16:41:27 -08:00
slab_common.c	slab: use memcg_kmem_cache_wq for slab destruction operations	2017-02-22 16:41:27 -08:00
slob.c	slab: introduce __kmemcg_cache_deactivate()	2017-02-22 16:41:27 -08:00
slub.c	slub: make sysfs directories for memcg sub-caches optional	2017-02-22 16:41:27 -08:00
sparse-vmemmap.c	treewide: replace obsolete _refok by __ref	2016-08-02 17:31:41 -04:00
sparse.c	mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next	2017-02-22 16:41:29 -08:00
swap.c	mm: add PageWaiters indicating tasks are waiting for a page bit	2016-12-25 11:54:48 -08:00
swap_cgroup.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
swap_state.c	mm, swap: use offset of swap entry as key of swap cache	2016-10-07 18:46:28 -07:00
swapfile.c	mm/swap: add cluster lock	2017-02-22 16:41:30 -08:00
truncate.c	mm: Invalidate DAX radix tree entries only if appropriate	2016-12-26 20:29:24 -08:00
usercopy.c	mm/usercopy: Switch to using lm_alias	2017-01-11 13:56:50 +00:00
userfaultfd.c	userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings	2017-02-22 16:41:28 -08:00
util.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
vmacache.c	mm: unrig VMA cache hit ratio	2016-10-07 18:46:27 -07:00
vmalloc.c	mm/vmalloc.c: use rb_entry_safe	2017-02-22 16:41:27 -08:00
vmpressure.c	mm/vmpressure.c: fix subtree pressure detection	2016-02-03 08:28:43 -08:00
vmscan.c	mm, vmscan: add mm_vmscan_inactive_list_is_low tracepoint	2017-02-22 16:41:29 -08:00
vmstat.c	mm, compaction: add vmstats for kcompactd work	2017-02-22 16:41:29 -08:00
workingset.c	mm: workingset: fix use-after-free in shadow node shrinker	2017-01-07 18:22:40 -08:00
z3fold.c	mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup	2016-06-03 16:02:55 -07:00
zbud.c	mm/zbud.c: use list_last_entry() instead of list_tail_entry()	2016-01-15 11:40:52 -08:00
zpool.c	mm: zsmalloc: constify struct zs_pool name	2015-11-06 17:50:42 -08:00
zsmalloc.c	mm: fix some typos in mm/zsmalloc.c	2017-02-22 16:41:29 -08:00
zswap.c	zswap: disable changing params if init fails	2017-02-03 14:13:19 -08:00