linux-sg2042/include
Michal Hocko 4db9b2efe9 hugetlb, memory_hotplug: prefer to use reserved pages for migration
new_node_page will try to use the origin's next NUMA node as the
migration destination for hugetlb pages.  If such a node doesn't have
any preallocated pool it falls back to __alloc_buddy_huge_page_no_mpol
to allocate a surplus page instead.  This is quite subotpimal for any
configuration when hugetlb pages are no distributed to all NUMA nodes
evenly.  Say we have a hotplugable node 4 and spare hugetlb pages are
node 0

  /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:10000
  /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node4/hugepages/hugepages-2048kB/nr_hugepages:10000
  /sys/devices/system/node/node5/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node6/hugepages/hugepages-2048kB/nr_hugepages:0
  /sys/devices/system/node/node7/hugepages/hugepages-2048kB/nr_hugepages:0

Now we consume the whole pool on node 4 and try to offline this node.
All the allocated pages should be moved to node0 which has enough
preallocated pages to hold them.  With the current implementation
offlining very likely fails because hugetlb allocations during runtime
are much less reliable.

Fix this by reusing the nodemask which excludes migration source and try
to find a first node which has a page in the preallocated pool first and
fall back to __alloc_buddy_huge_page_no_mpol only when the whole pool is
consumed.

[akpm@linux-foundation.org: remove bogus arg from alloc_huge_page_nodemask() stub]
Link: http://lkml.kernel.org/r/20170608074553.22152-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
..
acpi arm64 updates for 4.13: 2017-07-05 17:09:27 -07:00
asm-generic powerpc updates for 4.13 2017-07-07 13:55:45 -07:00
clocksource
crypto crypto: engine - replace pr_xxx by dev_xxx 2017-06-19 14:19:54 +08:00
drm main drm pull for v4.13 2017-07-09 18:48:37 -07:00
dt-bindings This is the bulk of GPIO changes for the v4.13 series: 2017-07-07 12:40:27 -07:00
keys
kvm
linux hugetlb, memory_hotplug: prefer to use reserved pages for migration 2017-07-10 16:32:31 -07:00
math-emu
media main drm pull for v4.13 2017-07-09 18:48:37 -07:00
memory
misc cxl: Export library to support IBM XSL 2017-07-03 23:07:03 +10:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-08 12:01:22 -07:00
pcmcia
ras trace, ras: add ARM processor error trace event 2017-06-22 18:22:05 +01:00
rdma Fixes #3 for 4.12-rc 2017-07-06 11:45:08 -07:00
rxrpc
scsi SCSI misc on 20170704 2017-07-06 12:10:33 -07:00
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
sound ASoC: Updates for v4.13 2017-07-03 19:51:42 +02:00
target
trace oom, trace: remove ENUM evaluation of COMPACTION_FEEDBACK 2017-07-10 16:32:31 -07:00
uapi main drm pull for v4.13 2017-07-09 18:48:37 -07:00
video
xen This is the first pull request for the new dma-mapping subsystem 2017-07-06 19:20:54 -07:00