OpenCloudOS-Kernel/include
Johannes Weiner 73f576c04b mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears.  At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild.  Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.

Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.

Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later.  They pose no hurdle.

Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages.  And those
references are under the user's control, so they are manageable.

This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that.  This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.

This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:

  set -e
  mkdir -p pages
  for x in `seq 128000`; do
    [ $((x % 1000)) -eq 0 ] && echo $x
    mkdir /cgroup/foo
    echo $$ >/cgroup/foo/cgroup.procs
    echo trex >pages/$x
    echo $$ >/cgroup/cgroup.procs
    rmdir /cgroup/foo
  done

When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:

  [root@ham ~]# ./cssidstress.sh
  [...]
  65000
  mkdir: cannot create directory '/cgroup/foo': No space left on device

After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.

[hannes@cmpxchg.org: init the IDR]
  Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e6 ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org>	[3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23 10:25:54 +09:00
..
acpi Revert "ACPI 2.0 / AML: Improve module level execution by moving the If/Else/While execution to per-table basis" 2016-07-11 16:21:08 +02:00
asm-generic vmlinux.lds: account for destructor sections 2016-07-15 14:54:27 +09:00
clocksource clocksource: arm_arch_timer: Remove arch_timer_get_timecounter 2016-05-03 12:54:21 +02:00
crypto Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-05-19 09:21:36 -07:00
drm Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes 2016-07-15 13:51:55 +10:00
dt-bindings Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2016-05-26 09:23:43 -07:00
keys
kvm arm64: KVM: fix build with CONFIG_ARM_PMU disabled 2016-06-27 12:55:51 +02:00
linux mm: memcontrol: fix cgroup creation failure after many small jobs 2016-07-23 10:25:54 +09:00
math-emu
media Update my main e-mails at the Kernel tree 2016-06-15 15:35:37 -10:00
memory
misc cxl: Add kernel API to allow a context to operate with relocate disabled 2016-05-11 21:54:10 +10:00
net bonding: prevent out of bound accesses 2016-07-01 06:06:09 -04:00
pcmcia
ras
rdma IB/rdmavt: Correct qp_priv_alloc() return value test 2016-06-23 10:16:15 -04:00
rxrpc
scsi Merge branch '4.7/scsi-queue' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi into for-4.7-zac 2016-05-09 12:34:39 -04:00
soc ARC updates for 4.7-rc1 2016-05-19 09:46:18 -07:00
sound ASoC: Updates for v4.7 2016-05-16 14:59:00 +02:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2016-05-28 12:04:17 -07:00
trace - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat 2016-05-27 13:41:54 -07:00
uapi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2016-07-16 07:04:12 +09:00
video imx-drm probing fix 2016-05-25 12:36:20 +10:00
xen
Kbuild