mm: numa: Document automatic NUMA balancing sysctls
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-3-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
parent
37bf06375c
commit
10fc05d0e5
|
@ -355,6 +355,72 @@ utilize.
|
||||||
|
|
||||||
==============================================================
|
==============================================================
|
||||||
|
|
||||||
|
numa_balancing
|
||||||
|
|
||||||
|
Enables/disables automatic page fault based NUMA memory
|
||||||
|
balancing. Memory is moved automatically to nodes
|
||||||
|
that access it often.
|
||||||
|
|
||||||
|
Enables/disables automatic NUMA memory balancing. On NUMA machines, there
|
||||||
|
is a performance penalty if remote memory is accessed by a CPU. When this
|
||||||
|
feature is enabled the kernel samples what task thread is accessing memory
|
||||||
|
by periodically unmapping pages and later trapping a page fault. At the
|
||||||
|
time of the page fault, it is determined if the data being accessed should
|
||||||
|
be migrated to a local memory node.
|
||||||
|
|
||||||
|
The unmapping of pages and trapping faults incur additional overhead that
|
||||||
|
ideally is offset by improved memory locality but there is no universal
|
||||||
|
guarantee. If the target workload is already bound to NUMA nodes then this
|
||||||
|
feature should be disabled. Otherwise, if the system overhead from the
|
||||||
|
feature is too high then the rate the kernel samples for NUMA hinting
|
||||||
|
faults may be controlled by the numa_balancing_scan_period_min_ms,
|
||||||
|
numa_balancing_scan_delay_ms, numa_balancing_scan_period_reset,
|
||||||
|
numa_balancing_scan_period_max_ms and numa_balancing_scan_size_mb sysctls.
|
||||||
|
|
||||||
|
==============================================================
|
||||||
|
|
||||||
|
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
|
||||||
|
numa_balancing_scan_period_max_ms, numa_balancing_scan_period_reset,
|
||||||
|
numa_balancing_scan_size_mb
|
||||||
|
|
||||||
|
Automatic NUMA balancing scans tasks address space and unmaps pages to
|
||||||
|
detect if pages are properly placed or if the data should be migrated to a
|
||||||
|
memory node local to where the task is running. Every "scan delay" the task
|
||||||
|
scans the next "scan size" number of pages in its address space. When the
|
||||||
|
end of the address space is reached the scanner restarts from the beginning.
|
||||||
|
|
||||||
|
In combination, the "scan delay" and "scan size" determine the scan rate.
|
||||||
|
When "scan delay" decreases, the scan rate increases. The scan delay and
|
||||||
|
hence the scan rate of every task is adaptive and depends on historical
|
||||||
|
behaviour. If pages are properly placed then the scan delay increases,
|
||||||
|
otherwise the scan delay decreases. The "scan size" is not adaptive but
|
||||||
|
the higher the "scan size", the higher the scan rate.
|
||||||
|
|
||||||
|
Higher scan rates incur higher system overhead as page faults must be
|
||||||
|
trapped and potentially data must be migrated. However, the higher the scan
|
||||||
|
rate, the more quickly a tasks memory is migrated to a local node if the
|
||||||
|
workload pattern changes and minimises performance impact due to remote
|
||||||
|
memory accesses. These sysctls control the thresholds for scan delays and
|
||||||
|
the number of pages scanned.
|
||||||
|
|
||||||
|
numa_balancing_scan_period_min_ms is the minimum delay in milliseconds
|
||||||
|
between scans. It effectively controls the maximum scanning rate for
|
||||||
|
each task.
|
||||||
|
|
||||||
|
numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
|
||||||
|
when it initially forks.
|
||||||
|
|
||||||
|
numa_balancing_scan_period_max_ms is the maximum delay between scans. It
|
||||||
|
effectively controls the minimum scanning rate for each task.
|
||||||
|
|
||||||
|
numa_balancing_scan_size_mb is how many megabytes worth of pages are
|
||||||
|
scanned for a given scan.
|
||||||
|
|
||||||
|
numa_balancing_scan_period_reset is a blunt instrument that controls how
|
||||||
|
often a tasks scan delay is reset to detect sudden changes in task behaviour.
|
||||||
|
|
||||||
|
==============================================================
|
||||||
|
|
||||||
osrelease, ostype & version:
|
osrelease, ostype & version:
|
||||||
|
|
||||||
# cat osrelease
|
# cat osrelease
|
||||||
|
|
Loading…
Reference in New Issue