mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
|
|
|
/*
|
|
|
|
* DAMON api
|
|
|
|
*
|
|
|
|
* Author: SeongJae Park <sjpark@amazon.de>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef _DAMON_H_
|
|
|
|
#define _DAMON_H_
|
|
|
|
|
|
|
|
#include <linux/mutex.h>
|
|
|
|
#include <linux/time64.h>
|
|
|
|
#include <linux/types.h>
|
2022-01-15 06:09:53 +08:00
|
|
|
#include <linux/random.h>
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
|
2021-09-08 10:56:36 +08:00
|
|
|
/* Minimal region size. Every damon_region is aligned by this. */
|
|
|
|
#define DAMON_MIN_REGION PAGE_SIZE
|
2021-11-06 04:47:33 +08:00
|
|
|
/* Max priority score for DAMON-based operation schemes */
|
|
|
|
#define DAMOS_MAX_SCORE (99)
|
2021-09-08 10:56:36 +08:00
|
|
|
|
2022-01-15 06:09:53 +08:00
|
|
|
/* Get a random number in [l, r) */
|
2022-01-15 06:09:56 +08:00
|
|
|
static inline unsigned long damon_rand(unsigned long l, unsigned long r)
|
|
|
|
{
|
|
|
|
return l + prandom_u32_max(r - l);
|
|
|
|
}
|
2022-01-15 06:09:53 +08:00
|
|
|
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
/**
|
|
|
|
* struct damon_addr_range - Represents an address region of [@start, @end).
|
|
|
|
* @start: Start address of the region (inclusive).
|
|
|
|
* @end: End address of the region (exclusive).
|
|
|
|
*/
|
|
|
|
struct damon_addr_range {
|
|
|
|
unsigned long start;
|
|
|
|
unsigned long end;
|
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct damon_region - Represents a monitoring target region.
|
|
|
|
* @ar: The address range of the region.
|
|
|
|
* @sampling_addr: Address of the sample for the next access check.
|
|
|
|
* @nr_accesses: Access frequency of this region.
|
|
|
|
* @list: List head for siblings.
|
mm/damon/core: account age of target regions
Patch series "Implement Data Access Monitoring-based Memory Operation Schemes".
Introduction
============
DAMON[1] can be used as a primitive for data access aware memory
management optimizations. For that, users who want such optimizations
should run DAMON, read the monitoring results, analyze it, plan a new
memory management scheme, and apply the new scheme by themselves. Such
efforts will be inevitable for some complicated optimizations.
However, in many other cases, the users would simply want the system to
apply a memory management action to a memory region of a specific size
having a specific access frequency for a specific time. For example,
"page out a memory region larger than 100 MiB keeping only rare accesses
more than 2 minutes", or "Do not use THP for a memory region larger than
2 MiB rarely accessed for more than 1 seconds".
To make the works easier and non-redundant, this patchset implements a
new feature of DAMON, which is called Data Access Monitoring-based
Operation Schemes (DAMOS). Using the feature, users can describe the
normal schemes in a simple way and ask DAMON to execute those on its
own.
[1] https://damonitor.github.io
Evaluations
===========
DAMOS is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation
are not for production but only for proof of concepts.
Please refer to the showcase web site's evaluation document[1] for
detailed evaluation setup and results.
[1] https://damonitor.github.io/doc/html/v34/vm/damon/eval.html
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are
another couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://git.kernel.org/sj/h/damon/for-v5.4.y
- For v5.10.y: https://git.kernel.org/sj/h/damon/for-v5.10.y
Sequence Of Patches
===================
The 1st patch accounts age of each region. The 2nd patch implements the
core of the DAMON-based operation schemes feature. The 3rd patch makes
the default monitoring primitives for virtual address spaces to support
the schemes. From this point, the kernel space users can use DAMOS.
The 4th patch exports the feature to the user space via the debugfs
interface. The 5th patch implements schemes statistics feature for
easier tuning of the schemes and runtime access pattern analysis, and
the 6th patch adds selftests for these changes. Finally, the 7th patch
documents this new feature.
This patch (of 7):
DAMON can be used for data access pattern aware memory management
optimizations. For that, users should run DAMON, read the monitoring
results, analyze it, plan a new memory management scheme, and apply the
new scheme by themselves. It would not be too hard, but still require
some level of effort. For complicated cases, this effort is inevitable.
That said, in many cases, users would simply want to apply an actions to
a memory region of a specific size having a specific access frequency
for a specific time. For example, "page out a memory region larger than
100 MiB but having a low access frequency more than 10 minutes", or "Use
THP for a memory region larger than 2 MiB having a high access frequency
for more than 2 seconds".
For such optimizations, users will need to first account the age of each
region themselves. To reduce such efforts, this implements a simple age
account of each region in DAMON. For each aggregation step, DAMON
compares the access frequency with that from last aggregation and reset
the age of the region if the change is significant. Else, the age is
incremented. Also, in case of the merge of regions, the region
size-weighted average of the ages is set as the age of merged new
region.
Link: https://lkml.kernel.org/r/20211001125604.29660-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211001125604.29660-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Greg Thelen <gthelen@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: David Rienjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:18 +08:00
|
|
|
* @age: Age of this region.
|
|
|
|
*
|
|
|
|
* @age is initially zero, increased for each aggregation interval, and reset
|
|
|
|
* to zero again if the access frequency is significantly changed. If two
|
|
|
|
* regions are merged into a new region, both @nr_accesses and @age of the new
|
|
|
|
* region are set as region size-weighted average of those of the two regions.
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
*/
|
|
|
|
struct damon_region {
|
|
|
|
struct damon_addr_range ar;
|
|
|
|
unsigned long sampling_addr;
|
|
|
|
unsigned int nr_accesses;
|
|
|
|
struct list_head list;
|
mm/damon/core: account age of target regions
Patch series "Implement Data Access Monitoring-based Memory Operation Schemes".
Introduction
============
DAMON[1] can be used as a primitive for data access aware memory
management optimizations. For that, users who want such optimizations
should run DAMON, read the monitoring results, analyze it, plan a new
memory management scheme, and apply the new scheme by themselves. Such
efforts will be inevitable for some complicated optimizations.
However, in many other cases, the users would simply want the system to
apply a memory management action to a memory region of a specific size
having a specific access frequency for a specific time. For example,
"page out a memory region larger than 100 MiB keeping only rare accesses
more than 2 minutes", or "Do not use THP for a memory region larger than
2 MiB rarely accessed for more than 1 seconds".
To make the works easier and non-redundant, this patchset implements a
new feature of DAMON, which is called Data Access Monitoring-based
Operation Schemes (DAMOS). Using the feature, users can describe the
normal schemes in a simple way and ask DAMON to execute those on its
own.
[1] https://damonitor.github.io
Evaluations
===========
DAMOS is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation
are not for production but only for proof of concepts.
Please refer to the showcase web site's evaluation document[1] for
detailed evaluation setup and results.
[1] https://damonitor.github.io/doc/html/v34/vm/damon/eval.html
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are
another couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://git.kernel.org/sj/h/damon/for-v5.4.y
- For v5.10.y: https://git.kernel.org/sj/h/damon/for-v5.10.y
Sequence Of Patches
===================
The 1st patch accounts age of each region. The 2nd patch implements the
core of the DAMON-based operation schemes feature. The 3rd patch makes
the default monitoring primitives for virtual address spaces to support
the schemes. From this point, the kernel space users can use DAMOS.
The 4th patch exports the feature to the user space via the debugfs
interface. The 5th patch implements schemes statistics feature for
easier tuning of the schemes and runtime access pattern analysis, and
the 6th patch adds selftests for these changes. Finally, the 7th patch
documents this new feature.
This patch (of 7):
DAMON can be used for data access pattern aware memory management
optimizations. For that, users should run DAMON, read the monitoring
results, analyze it, plan a new memory management scheme, and apply the
new scheme by themselves. It would not be too hard, but still require
some level of effort. For complicated cases, this effort is inevitable.
That said, in many cases, users would simply want to apply an actions to
a memory region of a specific size having a specific access frequency
for a specific time. For example, "page out a memory region larger than
100 MiB but having a low access frequency more than 10 minutes", or "Use
THP for a memory region larger than 2 MiB having a high access frequency
for more than 2 seconds".
For such optimizations, users will need to first account the age of each
region themselves. To reduce such efforts, this implements a simple age
account of each region in DAMON. For each aggregation step, DAMON
compares the access frequency with that from last aggregation and reset
the age of the region if the change is significant. Else, the age is
incremented. Also, in case of the merge of regions, the region
size-weighted average of the ages is set as the age of merged new
region.
Link: https://lkml.kernel.org/r/20211001125604.29660-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211001125604.29660-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Greg Thelen <gthelen@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: David Rienjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:18 +08:00
|
|
|
|
|
|
|
unsigned int age;
|
|
|
|
/* private: Internal value for age calculation. */
|
|
|
|
unsigned int last_nr_accesses;
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct damon_target - Represents a monitoring target.
|
mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this commit removes the concept and uses clear
type definition. For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring. If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 05:48:40 +08:00
|
|
|
* @pid: The PID of the virtual address space to monitor.
|
2021-09-08 10:56:36 +08:00
|
|
|
* @nr_regions: Number of monitoring target regions of this target.
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
* @regions_list: Head of the monitoring target regions of this target.
|
|
|
|
* @list: List head for siblings.
|
|
|
|
*
|
|
|
|
* Each monitoring context could have multiple targets. For example, a context
|
|
|
|
* for virtual memory address spaces could have multiple target processes. The
|
2022-03-23 05:48:46 +08:00
|
|
|
* @pid should be set for appropriate &struct damon_operations including the
|
|
|
|
* virtual address spaces monitoring operations.
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
*/
|
|
|
|
struct damon_target {
|
mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this commit removes the concept and uses clear
type definition. For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring. If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 05:48:40 +08:00
|
|
|
struct pid *pid;
|
2021-09-08 10:56:36 +08:00
|
|
|
unsigned int nr_regions;
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
struct list_head regions_list;
|
|
|
|
struct list_head list;
|
|
|
|
};
|
|
|
|
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
/**
|
|
|
|
* enum damos_action - Represents an action of a Data Access Monitoring-based
|
|
|
|
* Operation Scheme.
|
|
|
|
*
|
|
|
|
* @DAMOS_WILLNEED: Call ``madvise()`` for the region with MADV_WILLNEED.
|
|
|
|
* @DAMOS_COLD: Call ``madvise()`` for the region with MADV_COLD.
|
|
|
|
* @DAMOS_PAGEOUT: Call ``madvise()`` for the region with MADV_PAGEOUT.
|
|
|
|
* @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE.
|
|
|
|
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
|
2021-11-06 04:46:32 +08:00
|
|
|
* @DAMOS_STAT: Do nothing but count the stat.
|
2022-03-23 05:49:24 +08:00
|
|
|
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
*/
|
|
|
|
enum damos_action {
|
|
|
|
DAMOS_WILLNEED,
|
|
|
|
DAMOS_COLD,
|
|
|
|
DAMOS_PAGEOUT,
|
|
|
|
DAMOS_HUGEPAGE,
|
|
|
|
DAMOS_NOHUGEPAGE,
|
2021-11-06 04:46:32 +08:00
|
|
|
DAMOS_STAT, /* Do nothing but only record the stat */
|
2022-03-23 05:49:24 +08:00
|
|
|
NR_DAMOS_ACTIONS,
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
};
|
|
|
|
|
2021-11-06 04:47:16 +08:00
|
|
|
/**
|
|
|
|
* struct damos_quota - Controls the aggressiveness of the given scheme.
|
2021-11-06 04:47:23 +08:00
|
|
|
* @ms: Maximum milliseconds that the scheme can use.
|
2021-11-06 04:47:16 +08:00
|
|
|
* @sz: Maximum bytes of memory that the action can be applied.
|
|
|
|
* @reset_interval: Charge reset interval in milliseconds.
|
|
|
|
*
|
2021-11-06 04:47:33 +08:00
|
|
|
* @weight_sz: Weight of the region's size for prioritization.
|
|
|
|
* @weight_nr_accesses: Weight of the region's nr_accesses for prioritization.
|
|
|
|
* @weight_age: Weight of the region's age for prioritization.
|
|
|
|
*
|
2021-11-06 04:47:16 +08:00
|
|
|
* To avoid consuming too much CPU time or IO resources for applying the
|
2021-11-06 04:47:23 +08:00
|
|
|
* &struct damos->action to large memory, DAMON allows users to set time and/or
|
|
|
|
* size quotas. The quotas can be set by writing non-zero values to &ms and
|
|
|
|
* &sz, respectively. If the time quota is set, DAMON tries to use only up to
|
|
|
|
* &ms milliseconds within &reset_interval for applying the action. If the
|
|
|
|
* size quota is set, DAMON tries to apply the action only up to &sz bytes
|
|
|
|
* within &reset_interval.
|
|
|
|
*
|
|
|
|
* Internally, the time quota is transformed to a size quota using estimated
|
|
|
|
* throughput of the scheme's action. DAMON then compares it against &sz and
|
|
|
|
* uses smaller one as the effective quota.
|
2021-11-06 04:47:33 +08:00
|
|
|
*
|
|
|
|
* For selecting regions within the quota, DAMON prioritizes current scheme's
|
2022-03-23 05:48:46 +08:00
|
|
|
* target memory regions using the &struct damon_operations->get_scheme_score.
|
2021-11-06 04:47:33 +08:00
|
|
|
* You could customize the prioritization logic by setting &weight_sz,
|
2022-03-23 05:48:46 +08:00
|
|
|
* &weight_nr_accesses, and &weight_age, because monitoring operations are
|
2021-11-06 04:47:33 +08:00
|
|
|
* encouraged to respect those.
|
2021-11-06 04:47:16 +08:00
|
|
|
*/
|
|
|
|
struct damos_quota {
|
2021-11-06 04:47:23 +08:00
|
|
|
unsigned long ms;
|
2021-11-06 04:47:16 +08:00
|
|
|
unsigned long sz;
|
|
|
|
unsigned long reset_interval;
|
|
|
|
|
2021-11-06 04:47:33 +08:00
|
|
|
unsigned int weight_sz;
|
|
|
|
unsigned int weight_nr_accesses;
|
|
|
|
unsigned int weight_age;
|
|
|
|
|
2021-11-06 04:47:23 +08:00
|
|
|
/* private: */
|
|
|
|
/* For throughput estimation */
|
|
|
|
unsigned long total_charged_sz;
|
|
|
|
unsigned long total_charged_ns;
|
|
|
|
|
|
|
|
unsigned long esz; /* Effective size quota in bytes */
|
|
|
|
|
|
|
|
/* For charging the quota */
|
2021-11-06 04:47:16 +08:00
|
|
|
unsigned long charged_sz;
|
|
|
|
unsigned long charged_from;
|
2021-11-06 04:47:20 +08:00
|
|
|
struct damon_target *charge_target_from;
|
|
|
|
unsigned long charge_addr_from;
|
2021-11-06 04:47:33 +08:00
|
|
|
|
|
|
|
/* For prioritization */
|
|
|
|
unsigned long histogram[DAMOS_MAX_SCORE + 1];
|
|
|
|
unsigned int min_score;
|
2021-11-06 04:47:16 +08:00
|
|
|
};
|
|
|
|
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
/**
|
|
|
|
* enum damos_wmark_metric - Represents the watermark metric.
|
|
|
|
*
|
|
|
|
* @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme.
|
|
|
|
* @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000].
|
2022-03-23 05:49:24 +08:00
|
|
|
* @NR_DAMOS_WMARK_METRICS: Total number of DAMOS watermark metrics
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
*/
|
|
|
|
enum damos_wmark_metric {
|
|
|
|
DAMOS_WMARK_NONE,
|
|
|
|
DAMOS_WMARK_FREE_MEM_RATE,
|
2022-03-23 05:49:24 +08:00
|
|
|
NR_DAMOS_WMARK_METRICS,
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct damos_watermarks - Controls when a given scheme should be activated.
|
|
|
|
* @metric: Metric for the watermarks.
|
|
|
|
* @interval: Watermarks check time interval in microseconds.
|
|
|
|
* @high: High watermark.
|
|
|
|
* @mid: Middle watermark.
|
|
|
|
* @low: Low watermark.
|
|
|
|
*
|
|
|
|
* If &metric is &DAMOS_WMARK_NONE, the scheme is always active. Being active
|
|
|
|
* means DAMON does monitoring and applying the action of the scheme to
|
|
|
|
* appropriate memory regions. Else, DAMON checks &metric of the system for at
|
|
|
|
* least every &interval microseconds and works as below.
|
|
|
|
*
|
|
|
|
* If &metric is higher than &high, the scheme is inactivated. If &metric is
|
|
|
|
* between &mid and &low, the scheme is activated. If &metric is lower than
|
|
|
|
* &low, the scheme is inactivated.
|
|
|
|
*/
|
|
|
|
struct damos_watermarks {
|
|
|
|
enum damos_wmark_metric metric;
|
|
|
|
unsigned long interval;
|
|
|
|
unsigned long high;
|
|
|
|
unsigned long mid;
|
|
|
|
unsigned long low;
|
|
|
|
|
|
|
|
/* private: */
|
|
|
|
bool activated;
|
|
|
|
};
|
|
|
|
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
/**
|
|
|
|
* struct damos_stat - Statistics on a given scheme.
|
|
|
|
* @nr_tried: Total number of regions that the scheme is tried to be applied.
|
|
|
|
* @sz_tried: Total size of regions that the scheme is tried to be applied.
|
|
|
|
* @nr_applied: Total number of regions that the scheme is applied.
|
|
|
|
* @sz_applied: Total size of regions that the scheme is applied.
|
2022-01-15 06:10:20 +08:00
|
|
|
* @qt_exceeds: Total number of times the quota of the scheme has exceeded.
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
*/
|
|
|
|
struct damos_stat {
|
|
|
|
unsigned long nr_tried;
|
|
|
|
unsigned long sz_tried;
|
|
|
|
unsigned long nr_applied;
|
|
|
|
unsigned long sz_applied;
|
2022-01-15 06:10:20 +08:00
|
|
|
unsigned long qt_exceeds;
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
};
|
|
|
|
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
/**
|
|
|
|
* struct damos - Represents a Data Access Monitoring-based Operation Scheme.
|
|
|
|
* @min_sz_region: Minimum size of target regions.
|
|
|
|
* @max_sz_region: Maximum size of target regions.
|
|
|
|
* @min_nr_accesses: Minimum ``->nr_accesses`` of target regions.
|
|
|
|
* @max_nr_accesses: Maximum ``->nr_accesses`` of target regions.
|
|
|
|
* @min_age_region: Minimum age of target regions.
|
|
|
|
* @max_age_region: Maximum age of target regions.
|
|
|
|
* @action: &damo_action to be applied to the target regions.
|
2021-11-06 04:47:16 +08:00
|
|
|
* @quota: Control the aggressiveness of this scheme.
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
* @wmarks: Watermarks for automated (in)activation of this scheme.
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
* @stat: Statistics of this scheme.
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
* @list: List head for siblings.
|
|
|
|
*
|
2021-11-06 04:47:16 +08:00
|
|
|
* For each aggregation interval, DAMON finds regions which fit in the
|
|
|
|
* condition (&min_sz_region, &max_sz_region, &min_nr_accesses,
|
|
|
|
* &max_nr_accesses, &min_age_region, &max_age_region) and applies &action to
|
|
|
|
* those. To avoid consuming too much CPU time or IO resources for the
|
|
|
|
* &action, "a is used.
|
|
|
|
*
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
* To do the work only when needed, schemes can be activated for specific
|
|
|
|
* system situations using &wmarks. If all schemes that registered to the
|
|
|
|
* monitoring context are inactive, DAMON stops monitoring either, and just
|
|
|
|
* repeatedly checks the watermarks.
|
|
|
|
*
|
|
|
|
* If all schemes that registered to a &struct damon_ctx are inactive, DAMON
|
|
|
|
* stops monitoring and just repeatedly checks the watermarks.
|
|
|
|
*
|
2021-11-06 04:47:16 +08:00
|
|
|
* After applying the &action to each region, &stat_count and &stat_sz is
|
|
|
|
* updated to reflect the number of regions and total size of regions that the
|
|
|
|
* &action is applied.
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
*/
|
|
|
|
struct damos {
|
|
|
|
unsigned long min_sz_region;
|
|
|
|
unsigned long max_sz_region;
|
|
|
|
unsigned int min_nr_accesses;
|
|
|
|
unsigned int max_nr_accesses;
|
|
|
|
unsigned int min_age_region;
|
|
|
|
unsigned int max_age_region;
|
|
|
|
enum damos_action action;
|
2021-11-06 04:47:16 +08:00
|
|
|
struct damos_quota quota;
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
struct damos_watermarks wmarks;
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
struct damos_stat stat;
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
struct list_head list;
|
|
|
|
};
|
|
|
|
|
2022-03-23 05:48:49 +08:00
|
|
|
/**
|
|
|
|
* enum damon_ops_id - Identifier for each monitoring operations implementation
|
|
|
|
*
|
|
|
|
* @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces
|
2022-05-10 09:20:52 +08:00
|
|
|
* @DAMON_OPS_FVADDR: Monitoring operations for only fixed ranges of virtual
|
|
|
|
* address spaces
|
2022-03-23 05:48:49 +08:00
|
|
|
* @DAMON_OPS_PADDR: Monitoring operations for the physical address space
|
|
|
|
*/
|
|
|
|
enum damon_ops_id {
|
|
|
|
DAMON_OPS_VADDR,
|
2022-05-10 09:20:52 +08:00
|
|
|
DAMON_OPS_FVADDR,
|
2022-03-23 05:48:49 +08:00
|
|
|
DAMON_OPS_PADDR,
|
|
|
|
NR_DAMON_OPS,
|
|
|
|
};
|
|
|
|
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
struct damon_ctx;
|
|
|
|
|
|
|
|
/**
|
2022-03-23 05:48:46 +08:00
|
|
|
* struct damon_operations - Monitoring operations for given use cases.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
*
|
2022-03-23 05:48:49 +08:00
|
|
|
* @id: Identifier of this operations set.
|
2022-03-23 05:48:46 +08:00
|
|
|
* @init: Initialize operations-related data structures.
|
|
|
|
* @update: Update operations-related data structures.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @prepare_access_checks: Prepare next access check of target regions.
|
|
|
|
* @check_accesses: Check the accesses to target regions.
|
|
|
|
* @reset_aggregated: Reset aggregated accesses monitoring results.
|
2021-11-06 04:47:33 +08:00
|
|
|
* @get_scheme_score: Get the score of a region for a scheme.
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
* @apply_scheme: Apply a DAMON-based operation scheme.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @target_valid: Determine if the target is valid.
|
|
|
|
* @cleanup: Clean up the context.
|
|
|
|
*
|
|
|
|
* DAMON can be extended for various address spaces and usages. For this,
|
2022-03-23 05:48:46 +08:00
|
|
|
* users should register the low level operations for their target address
|
|
|
|
* space and usecase via the &damon_ctx.ops. Then, the monitoring thread
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* (&damon_ctx.kdamond) calls @init and @prepare_access_checks before starting
|
2022-03-23 05:48:46 +08:00
|
|
|
* the monitoring, @update after each &damon_ctx.ops_update_interval, and
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @check_accesses, @target_valid and @prepare_access_checks after each
|
|
|
|
* &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each
|
|
|
|
* &damon_ctx.aggr_interval.
|
|
|
|
*
|
2022-03-23 05:48:49 +08:00
|
|
|
* Each &struct damon_operations instance having valid @id can be registered
|
|
|
|
* via damon_register_ops() and selected by damon_select_ops() later.
|
2022-03-23 05:48:46 +08:00
|
|
|
* @init should initialize operations-related data structures. For example,
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* this could be used to construct proper monitoring target regions and link
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
* those to @damon_ctx.adaptive_targets.
|
2022-03-23 05:48:46 +08:00
|
|
|
* @update should update the operations-related data structures. For example,
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* this could be used to update monitoring target regions for current status.
|
|
|
|
* @prepare_access_checks should manipulate the monitoring regions to be
|
|
|
|
* prepared for the next access check.
|
|
|
|
* @check_accesses should check the accesses to each region that made after the
|
|
|
|
* last preparation and update the number of observed accesses of each region.
|
2021-09-08 10:56:36 +08:00
|
|
|
* It should also return max number of observed accesses that made as a result
|
|
|
|
* of its update. The value will be used for regions adjustment threshold.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @reset_aggregated should reset the access monitoring results that aggregated
|
|
|
|
* by @check_accesses.
|
2021-11-06 04:47:33 +08:00
|
|
|
* @get_scheme_score should return the priority score of a region for a scheme
|
|
|
|
* as an integer in [0, &DAMOS_MAX_SCORE].
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
* @apply_scheme is called from @kdamond when a region for user provided
|
|
|
|
* DAMON-based operation scheme is found. It should apply the scheme's action
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
* to the region and return bytes of the region that the action is successfully
|
|
|
|
* applied.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @target_valid should check whether the target is still valid for the
|
|
|
|
* monitoring.
|
|
|
|
* @cleanup is called from @kdamond just before its termination.
|
|
|
|
*/
|
2022-03-23 05:48:46 +08:00
|
|
|
struct damon_operations {
|
2022-03-23 05:48:49 +08:00
|
|
|
enum damon_ops_id id;
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
void (*init)(struct damon_ctx *context);
|
|
|
|
void (*update)(struct damon_ctx *context);
|
|
|
|
void (*prepare_access_checks)(struct damon_ctx *context);
|
2021-09-08 10:56:36 +08:00
|
|
|
unsigned int (*check_accesses)(struct damon_ctx *context);
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
void (*reset_aggregated)(struct damon_ctx *context);
|
2021-11-06 04:47:33 +08:00
|
|
|
int (*get_scheme_score)(struct damon_ctx *context,
|
|
|
|
struct damon_target *t, struct damon_region *r,
|
|
|
|
struct damos *scheme);
|
mm/damon/schemes: account scheme actions that successfully applied
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning".
To help online access pattern analysis and tuning of DAMON-based
Operation Schemes (DAMOS), DAMOS provides simple statistics for each
scheme. Introduction of DAMOS time/space quota further made the tuning
easier by making the risk management easier. However, that also made
understanding of the working schemes a little bit more difficult.
For an example, progress of a given scheme can now be throttled by not
only the aggressiveness of the target access pattern, but also the
time/space quotas. So, when a scheme is showing unexpectedly slow
progress, it's difficult to know by what the progress of the scheme is
throttled, with currently provided statistics.
This patchset extends the statistics to contain some metrics that can be
helpful for such online schemes analysis and tuning (patches 1-2),
exports those to users (patches 3 and 5), and add documents (patches 4
and 6).
This patch (of 6):
DAMON-based operation schemes (DAMOS) stats provide only the number and
the amount of regions that the action of the scheme has tried to be
applied. Because the action could be failed for some reasons, the
currently provided information is sometimes not useful or convenient
enough for schemes profiling and tuning. To improve this situation,
this commit extends the DAMOS stats to provide the number and the amount
of regions that the action has successfully applied.
Link: https://lkml.kernel.org/r/20211210150016.35349-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211210150016.35349-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 06:10:17 +08:00
|
|
|
unsigned long (*apply_scheme)(struct damon_ctx *context,
|
|
|
|
struct damon_target *t, struct damon_region *r,
|
|
|
|
struct damos *scheme);
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
bool (*target_valid)(void *target);
|
|
|
|
void (*cleanup)(struct damon_ctx *context);
|
|
|
|
};
|
|
|
|
|
2021-11-06 04:46:04 +08:00
|
|
|
/**
|
|
|
|
* struct damon_callback - Monitoring events notification callbacks.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
*
|
|
|
|
* @before_start: Called before starting the monitoring.
|
mm/damon/core: add a new callback for watermarks checks
Patch series "mm/damon: Support online tuning".
Effects of DAMON and DAMON-based Operation Schemes highly depends on the
configurations. Wrong configurations could even result in unexpected
efficiency degradations. For finding a best configuration, repeating
incremental configuration changes and results measurements, in other
words, online tuning, could be helpful.
Nevertheless, DAMON kernel API supports only restrictive online tuning.
Worse yet, the sysfs-based DAMON user interface doesn't support online
tuning at all. DAMON_RECLAIM also doesn't support online tuning.
This patchset makes the DAMON kernel API, DAMON sysfs interface, and
DAMON_RECLAIM supports online tuning.
Sequence of patches
-------------------
First two patches enhance DAMON online tuning for kernel API users.
Specifically, patch 1 let kernel API users to be able to do DAMON online
tuning without a restriction, and patch 2 makes error handling easier.
Following seven patches (patches 3-9) refactor code for better readability
and easier reuse of code fragments that will be useful for online tuning
support.
Patch 10 introduces DAMON callback based user request handling structure
for DAMON sysfs interface, and patch 11 enables DAMON online tuning via
DAMON sysfs interface. Documentation patch (patch 12) for usage of it
follows.
Patch 13 enables online tuning of DAMON_RECLAIM and finally patch 14
documents the DAMON_RECLAIM online tuning usage.
This patch (of 14):
For updating input parameters for running DAMON contexts, DAMON kernel API
users can use the contexts' callbacks, as it is the safe place for context
internal data accesses. When the context has DAMON-based operation
schemes and all schemes are deactivated due to their watermarks, however,
DAMON does nothing but only watermarks checks. As a result, no callbacks
will be called back, and therefore the kernel API users cannot update the
input parameters including monitoring attributes, DAMON-based operation
schemes, and watermarks.
To let users easily update such DAMON input parameters in such a case,
this commit adds a new callback, 'after_wmarks_check()'. It will be
called after each watermarks check. Users can do the online input
parameters update in the callback even under the schemes deactivated case.
Link: https://lkml.kernel.org/r/20220429160606.127307-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-10 09:20:54 +08:00
|
|
|
* @after_wmarks_check: Called after each schemes' watermarks check.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @after_sampling: Called after each sampling.
|
|
|
|
* @after_aggregation: Called after each aggregation.
|
|
|
|
* @before_terminate: Called before terminating the monitoring.
|
|
|
|
* @private: User private data.
|
|
|
|
*
|
|
|
|
* The monitoring thread (&damon_ctx.kdamond) calls @before_start and
|
|
|
|
* @before_terminate just before starting and finishing the monitoring,
|
|
|
|
* respectively. Therefore, those are good places for installing and cleaning
|
|
|
|
* @private.
|
|
|
|
*
|
mm/damon/core: add a new callback for watermarks checks
Patch series "mm/damon: Support online tuning".
Effects of DAMON and DAMON-based Operation Schemes highly depends on the
configurations. Wrong configurations could even result in unexpected
efficiency degradations. For finding a best configuration, repeating
incremental configuration changes and results measurements, in other
words, online tuning, could be helpful.
Nevertheless, DAMON kernel API supports only restrictive online tuning.
Worse yet, the sysfs-based DAMON user interface doesn't support online
tuning at all. DAMON_RECLAIM also doesn't support online tuning.
This patchset makes the DAMON kernel API, DAMON sysfs interface, and
DAMON_RECLAIM supports online tuning.
Sequence of patches
-------------------
First two patches enhance DAMON online tuning for kernel API users.
Specifically, patch 1 let kernel API users to be able to do DAMON online
tuning without a restriction, and patch 2 makes error handling easier.
Following seven patches (patches 3-9) refactor code for better readability
and easier reuse of code fragments that will be useful for online tuning
support.
Patch 10 introduces DAMON callback based user request handling structure
for DAMON sysfs interface, and patch 11 enables DAMON online tuning via
DAMON sysfs interface. Documentation patch (patch 12) for usage of it
follows.
Patch 13 enables online tuning of DAMON_RECLAIM and finally patch 14
documents the DAMON_RECLAIM online tuning usage.
This patch (of 14):
For updating input parameters for running DAMON contexts, DAMON kernel API
users can use the contexts' callbacks, as it is the safe place for context
internal data accesses. When the context has DAMON-based operation
schemes and all schemes are deactivated due to their watermarks, however,
DAMON does nothing but only watermarks checks. As a result, no callbacks
will be called back, and therefore the kernel API users cannot update the
input parameters including monitoring attributes, DAMON-based operation
schemes, and watermarks.
To let users easily update such DAMON input parameters in such a case,
this commit adds a new callback, 'after_wmarks_check()'. It will be
called after each watermarks check. Users can do the online input
parameters update in the callback even under the schemes deactivated case.
Link: https://lkml.kernel.org/r/20220429160606.127307-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-10 09:20:54 +08:00
|
|
|
* The monitoring thread calls @after_wmarks_check after each DAMON-based
|
|
|
|
* operation schemes' watermarks check. If users need to make changes to the
|
|
|
|
* attributes of the monitoring context while it's deactivated due to the
|
|
|
|
* watermarks, this is the good place to do.
|
|
|
|
*
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* The monitoring thread calls @after_sampling and @after_aggregation for each
|
|
|
|
* of the sampling intervals and aggregation intervals, respectively.
|
|
|
|
* Therefore, users can safely access the monitoring results without additional
|
|
|
|
* protection. For the reason, users are recommended to use these callback for
|
|
|
|
* the accesses to the results.
|
|
|
|
*
|
|
|
|
* If any callback returns non-zero, monitoring stops.
|
|
|
|
*/
|
|
|
|
struct damon_callback {
|
|
|
|
void *private;
|
|
|
|
|
|
|
|
int (*before_start)(struct damon_ctx *context);
|
mm/damon/core: add a new callback for watermarks checks
Patch series "mm/damon: Support online tuning".
Effects of DAMON and DAMON-based Operation Schemes highly depends on the
configurations. Wrong configurations could even result in unexpected
efficiency degradations. For finding a best configuration, repeating
incremental configuration changes and results measurements, in other
words, online tuning, could be helpful.
Nevertheless, DAMON kernel API supports only restrictive online tuning.
Worse yet, the sysfs-based DAMON user interface doesn't support online
tuning at all. DAMON_RECLAIM also doesn't support online tuning.
This patchset makes the DAMON kernel API, DAMON sysfs interface, and
DAMON_RECLAIM supports online tuning.
Sequence of patches
-------------------
First two patches enhance DAMON online tuning for kernel API users.
Specifically, patch 1 let kernel API users to be able to do DAMON online
tuning without a restriction, and patch 2 makes error handling easier.
Following seven patches (patches 3-9) refactor code for better readability
and easier reuse of code fragments that will be useful for online tuning
support.
Patch 10 introduces DAMON callback based user request handling structure
for DAMON sysfs interface, and patch 11 enables DAMON online tuning via
DAMON sysfs interface. Documentation patch (patch 12) for usage of it
follows.
Patch 13 enables online tuning of DAMON_RECLAIM and finally patch 14
documents the DAMON_RECLAIM online tuning usage.
This patch (of 14):
For updating input parameters for running DAMON contexts, DAMON kernel API
users can use the contexts' callbacks, as it is the safe place for context
internal data accesses. When the context has DAMON-based operation
schemes and all schemes are deactivated due to their watermarks, however,
DAMON does nothing but only watermarks checks. As a result, no callbacks
will be called back, and therefore the kernel API users cannot update the
input parameters including monitoring attributes, DAMON-based operation
schemes, and watermarks.
To let users easily update such DAMON input parameters in such a case,
this commit adds a new callback, 'after_wmarks_check()'. It will be
called after each watermarks check. Users can do the online input
parameters update in the callback even under the schemes deactivated case.
Link: https://lkml.kernel.org/r/20220429160606.127307-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-10 09:20:54 +08:00
|
|
|
int (*after_wmarks_check)(struct damon_ctx *context);
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
int (*after_sampling)(struct damon_ctx *context);
|
|
|
|
int (*after_aggregation)(struct damon_ctx *context);
|
2021-11-06 04:48:27 +08:00
|
|
|
void (*before_terminate)(struct damon_ctx *context);
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* struct damon_ctx - Represents a context for each monitoring. This is the
|
|
|
|
* main interface that allows users to set the attributes and get the results
|
|
|
|
* of the monitoring.
|
|
|
|
*
|
|
|
|
* @sample_interval: The time between access samplings.
|
|
|
|
* @aggr_interval: The time between monitor results aggregations.
|
2022-03-23 05:48:46 +08:00
|
|
|
* @ops_update_interval: The time between monitoring operations updates.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
*
|
|
|
|
* For each @sample_interval, DAMON checks whether each region is accessed or
|
|
|
|
* not. It aggregates and keeps the access information (number of accesses to
|
|
|
|
* each region) for @aggr_interval time. DAMON also checks whether the target
|
|
|
|
* memory regions need update (e.g., by ``mmap()`` calls from the application,
|
|
|
|
* in case of virtual memory monitoring) and applies the changes for each
|
2022-03-23 05:48:46 +08:00
|
|
|
* @ops_update_interval. All time intervals are in micro-seconds.
|
|
|
|
* Please refer to &struct damon_operations and &struct damon_callback for more
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* detail.
|
|
|
|
*
|
|
|
|
* @kdamond: Kernel thread who does the monitoring.
|
|
|
|
* @kdamond_stop: Notifies whether kdamond should stop.
|
|
|
|
* @kdamond_lock: Mutex for the synchronizations with @kdamond.
|
|
|
|
*
|
|
|
|
* For each monitoring context, one kernel thread for the monitoring is
|
|
|
|
* created. The pointer to the thread is stored in @kdamond.
|
|
|
|
*
|
|
|
|
* Once started, the monitoring thread runs until explicitly required to be
|
|
|
|
* terminated or every monitoring target is invalid. The validity of the
|
2022-03-23 05:48:46 +08:00
|
|
|
* targets is checked via the &damon_operations.target_valid of @ops. The
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* termination can also be explicitly requested by writing non-zero to
|
|
|
|
* @kdamond_stop. The thread sets @kdamond to NULL when it terminates.
|
|
|
|
* Therefore, users can know whether the monitoring is ongoing or terminated by
|
|
|
|
* reading @kdamond. Reads and writes to @kdamond and @kdamond_stop from
|
|
|
|
* outside of the monitoring thread must be protected by @kdamond_lock.
|
|
|
|
*
|
|
|
|
* Note that the monitoring thread protects only @kdamond and @kdamond_stop via
|
|
|
|
* @kdamond_lock. Accesses to other fields must be protected by themselves.
|
|
|
|
*
|
2022-03-23 05:48:46 +08:00
|
|
|
* @ops: Set of monitoring operations for given use cases.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
* @callback: Set of callbacks for monitoring events notifications.
|
|
|
|
*
|
2021-09-08 10:56:36 +08:00
|
|
|
* @min_nr_regions: The minimum number of adaptive monitoring regions.
|
|
|
|
* @max_nr_regions: The maximum number of adaptive monitoring regions.
|
|
|
|
* @adaptive_targets: Head of monitoring targets (&damon_target) list.
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
* @schemes: Head of schemes (&damos) list.
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
*/
|
|
|
|
struct damon_ctx {
|
|
|
|
unsigned long sample_interval;
|
|
|
|
unsigned long aggr_interval;
|
2022-03-23 05:48:46 +08:00
|
|
|
unsigned long ops_update_interval;
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
|
|
|
|
/* private: internal use only */
|
|
|
|
struct timespec64 last_aggregation;
|
2022-03-23 05:48:46 +08:00
|
|
|
struct timespec64 last_ops_update;
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
|
|
|
|
/* public: */
|
|
|
|
struct task_struct *kdamond;
|
|
|
|
struct mutex kdamond_lock;
|
|
|
|
|
2022-03-23 05:48:46 +08:00
|
|
|
struct damon_operations ops;
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
struct damon_callback callback;
|
|
|
|
|
2021-09-08 10:56:36 +08:00
|
|
|
unsigned long min_nr_regions;
|
|
|
|
unsigned long max_nr_regions;
|
|
|
|
struct list_head adaptive_targets;
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
struct list_head schemes;
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
};
|
|
|
|
|
2022-01-15 06:09:59 +08:00
|
|
|
static inline struct damon_region *damon_next_region(struct damon_region *r)
|
|
|
|
{
|
|
|
|
return container_of(r->list.next, struct damon_region, list);
|
|
|
|
}
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
|
2022-01-15 06:09:59 +08:00
|
|
|
static inline struct damon_region *damon_prev_region(struct damon_region *r)
|
|
|
|
{
|
|
|
|
return container_of(r->list.prev, struct damon_region, list);
|
|
|
|
}
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
|
2022-01-15 06:09:59 +08:00
|
|
|
static inline struct damon_region *damon_last_region(struct damon_target *t)
|
|
|
|
{
|
|
|
|
return list_last_entry(&t->regions_list, struct damon_region, list);
|
|
|
|
}
|
2021-11-06 04:47:20 +08:00
|
|
|
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
#define damon_for_each_region(r, t) \
|
|
|
|
list_for_each_entry(r, &t->regions_list, list)
|
|
|
|
|
|
|
|
#define damon_for_each_region_safe(r, next, t) \
|
|
|
|
list_for_each_entry_safe(r, next, &t->regions_list, list)
|
|
|
|
|
|
|
|
#define damon_for_each_target(t, ctx) \
|
2021-09-08 10:56:36 +08:00
|
|
|
list_for_each_entry(t, &(ctx)->adaptive_targets, list)
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
|
|
|
|
#define damon_for_each_target_safe(t, next, ctx) \
|
2021-09-08 10:56:36 +08:00
|
|
|
list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list)
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
#define damon_for_each_scheme(s, ctx) \
|
|
|
|
list_for_each_entry(s, &(ctx)->schemes, list)
|
|
|
|
|
|
|
|
#define damon_for_each_scheme_safe(s, next, ctx) \
|
|
|
|
list_for_each_entry_safe(s, next, &(ctx)->schemes, list)
|
|
|
|
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
#ifdef CONFIG_DAMON
|
|
|
|
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
struct damon_region *damon_new_region(unsigned long start, unsigned long end);
|
2022-01-15 06:10:38 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Add a region between two other regions
|
|
|
|
*/
|
|
|
|
static inline void damon_insert_region(struct damon_region *r,
|
2021-09-08 10:56:36 +08:00
|
|
|
struct damon_region *prev, struct damon_region *next,
|
2022-01-15 06:10:38 +08:00
|
|
|
struct damon_target *t)
|
|
|
|
{
|
|
|
|
__list_add(&r->list, &prev->list, &next->list);
|
|
|
|
t->nr_regions++;
|
|
|
|
}
|
|
|
|
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
void damon_add_region(struct damon_region *r, struct damon_target *t);
|
2021-09-08 10:56:36 +08:00
|
|
|
void damon_destroy_region(struct damon_region *r, struct damon_target *t);
|
2022-05-10 09:20:55 +08:00
|
|
|
int damon_set_regions(struct damon_target *t, struct damon_addr_range *ranges,
|
|
|
|
unsigned int nr_ranges);
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
struct damos *damon_new_scheme(
|
|
|
|
unsigned long min_sz_region, unsigned long max_sz_region,
|
|
|
|
unsigned int min_nr_accesses, unsigned int max_nr_accesses,
|
|
|
|
unsigned int min_age_region, unsigned int max_age_region,
|
mm/damon/schemes: activate schemes based on a watermarks mechanism
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this introduces a watermarks-based mechanism. The client can describe
the watermark metric (e.g., amount of free memory in the system),
watermark check interval, and three watermarks, namely high, mid, and
low. If the scheme is deactivated, it only gets the metric and compare
that to the three watermarks for every check interval. If the metric is
higher than the high watermark, the scheme is deactivated. If the
metric is between the mid watermark and the low watermark, the scheme is
activated. If the metric is lower than the low watermark, the scheme is
deactivated again. This is to allow users fall back to traditional
page-granularity mechanisms.
Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:47:47 +08:00
|
|
|
enum damos_action action, struct damos_quota *quota,
|
|
|
|
struct damos_watermarks *wmarks);
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
|
|
|
|
void damon_destroy_scheme(struct damos *s);
|
|
|
|
|
mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this commit removes the concept and uses clear
type definition. For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring. If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 05:48:40 +08:00
|
|
|
struct damon_target *damon_new_target(void);
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
void damon_add_target(struct damon_ctx *ctx, struct damon_target *t);
|
2021-11-06 04:48:07 +08:00
|
|
|
bool damon_targets_empty(struct damon_ctx *ctx);
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
void damon_free_target(struct damon_target *t);
|
|
|
|
void damon_destroy_target(struct damon_target *t);
|
2021-09-08 10:56:36 +08:00
|
|
|
unsigned int damon_nr_regions(struct damon_target *t);
|
mm/damon/core: implement region-based sampling
To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region. As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked. Thus, for each ``sampling interval``,
1. the 'prepare_access_checks' primitive picks one page in each region,
2. waits for one ``sampling interval``,
3. checks whether the page is accessed meanwhile, and
4. increases the access count of the region if so.
Therefore, the monitoring overhead is controllable by adjusting the
number of regions. DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off. In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed. Next commit will address this problem.
Link: https://lkml.kernel.org/r/20210716081449.22187-3-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:32 +08:00
|
|
|
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
struct damon_ctx *damon_new_ctx(void);
|
|
|
|
void damon_destroy_ctx(struct damon_ctx *ctx);
|
|
|
|
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
|
2022-03-23 05:48:46 +08:00
|
|
|
unsigned long aggr_int, unsigned long ops_upd_int,
|
2021-09-08 10:56:36 +08:00
|
|
|
unsigned long min_nr_reg, unsigned long max_nr_reg);
|
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
In many cases, users might use DAMON for simple data access aware memory
management optimizations such as applying an operation scheme to a
memory region of a specific size having a specific access frequency for
a specific time. For example, "page out a memory region larger than 100
MiB but having a low access frequency more than 10 minutes", or "Use THP
for a memory region larger than 2 MiB having a high access frequency for
more than 2 seconds".
Most simple form of the solution would be doing offline data access
pattern profiling using DAMON and modifying the application source code
or system configuration based on the profiling results. Or, developing
a daemon constructed with two modules (one for access monitoring and the
other for applying memory management actions via mlock(), madvise(),
sysctl, etc) is imaginable.
To avoid users spending their time for implementation of such simple
data access monitoring-based operation schemes, this makes DAMON to
handle such schemes directly. With this change, users can simply
specify their desired schemes to DAMON. Then, DAMON will automatically
apply the schemes to the user-specified target processes.
Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target. Specifically, the format is::
<min/max size> <min/max access frequency> <min/max age> <action>
The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region. The age of
region is incremented periodically but reset when its addresses or
access frequency has significantly changed or the action of a scheme was
applied. For the action, current implementation supports a few of
madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``,
and ``NOHUGEPAGE``.
Because DAMON supports various address spaces and application of the
actions to a monitoring target region is dependent to the type of the
target address space, the application code should be implemented by each
primitives and registered to the framework. Note that this only
implements the framework part. Following commit will implement the
action applications for virtual address spaces primitives.
Link: https://lkml.kernel.org/r/20211001125604.29660-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 04:46:22 +08:00
|
|
|
int damon_set_schemes(struct damon_ctx *ctx,
|
|
|
|
struct damos **schemes, ssize_t nr_schemes);
|
mm/damon: implement a debugfs-based user space interface
DAMON is designed to be used by kernel space code such as the memory
management subsystems, and therefore it provides only kernel space API.
That said, letting the user space control DAMON could provide some
benefits to them. For example, it will allow user space to analyze their
specific workloads and make their own special optimizations.
For such cases, this commit implements a simple DAMON application kernel
module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports
those to the user space via the debugfs.
'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and
``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.
Attributes
----------
Users can read and write the ``sampling interval``, ``aggregation
interval``, ``regions update interval``, and min/max number of monitoring
target regions by reading from and writing to the ``attrs`` file. For
example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10,
1000 and check it again::
# cd <debugfs>/damon
# echo 5000 100000 1000000 10 1000 > attrs
# cat attrs
5000 100000 1000000 10 1000
Target IDs
----------
Some types of address spaces supports multiple monitoring target. For
example, the virtual memory address spaces monitoring can have multiple
processes as the monitoring targets. Users can set the targets by writing
relevant id values of the targets to, and get the ids of the current
targets by reading from the ``target_ids`` file. In case of the virtual
address spaces monitoring, the values should be pids of the monitoring
target processes. For example, below commands set processes having pids
42 and 4242 as the monitoring targets and check it again::
# cd <debugfs>/damon
# echo 42 4242 > target_ids
# cat target_ids
42 4242
Note that setting the target ids doesn't start the monitoring.
Turning On/Off
--------------
Setting the files as described above doesn't incur effect unless you
explicitly start the monitoring. You can start, stop, and check the
current status of the monitoring by writing to and reading from the
``monitor_on`` file. Writing ``on`` to the file starts the monitoring of
the targets with the attributes. Writing ``off`` to the file stops those.
DAMON also stops if every targets are invalidated (in case of the virtual
memory monitoring, target processes are invalidated when terminated).
Below example commands turn on, off, and check the status of DAMON::
# cd <debugfs>/damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
off
Please note that you cannot write to the above-mentioned debugfs files
while the monitoring is turned on. If you write to the files while DAMON
is running, an error code such as ``-EBUSY`` will be returned.
[akpm@linux-foundation.org: remove unneeded "alloc failed" printks]
[akpm@linux-foundation.org: replace macro with static inline]
Link: https://lkml.kernel.org/r/20210716081449.22187-8-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:53 +08:00
|
|
|
int damon_nr_running_ctxs(void);
|
2022-05-10 09:20:51 +08:00
|
|
|
bool damon_is_registered_ops(enum damon_ops_id id);
|
2022-03-23 05:48:49 +08:00
|
|
|
int damon_register_ops(struct damon_operations *ops);
|
|
|
|
int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id);
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
|
mm/damon/core: allow non-exclusive DAMON start/stop
Patch series "Introduce DAMON sysfs interface", v3.
Introduction
============
DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so
far. However, it unnecessarily depends on debugfs, while DAMON is not
aimed to be used for only debugging. Also, the interface receives
multiple values via one file. For example, schemes file receives 18
values. As a result, it is inefficient, hard to be used, and difficult to
be extended. Especially, keeping backward compatibility of user space
tools is getting only challenging. It would be better to implement
another reliable and flexible interface and deprecate DAMON_DBGFS in long
term.
For the reason, this patchset introduces a sysfs-based new user interface
of DAMON. The idea of the new interface is, using directory hierarchies
and having one dedicated file for each value. For a short example, users
can do the virtual address monitoring via the interface as below:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
A brief representation of the files hierarchy of DAMON sysfs interface is
as below. Childs are represented with indentation, directories are having
'/' suffix, and files in each directory are separated by comma.
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Detailed usage of the files will be described in the final Documentation
patch of this patchset.
Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------
At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One
important difference between them is their exclusiveness. DAMON_DBGFS
works in an exclusive manner, so that no DAMON worker thread (kdamond) in
the system can run concurrently and interfere somehow. For the reason,
DAMON_DBGFS asks users to construct all monitoring contexts and start them
at once. It's not a big problem but makes the operation a little bit
complex and unflexible.
For more flexible usage, DAMON_SYSFS moves the responsibility of
preventing any possible interference to the admins and work in a
non-exclusive manner. That is, users can configure and start contexts one
by one. Note that DAMON respects both exclusive groups and non-exclusive
groups of contexts, in a manner similar to that of reader-writer locks.
That is, if any exclusive monitoring contexts (e.g., contexts that started
via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and
vice versa.
Future Plan of DAMON_DBGFS Deprecation
======================================
Once this patchset is merged, DAMON_DBGFS development will be frozen.
That is, we will maintain it to work as is now so that no users will be
break. But, it will not be extended to provide any new feature of DAMON.
The support will be continued only until next LTS release. After that, we
will drop DAMON_DBGFS.
User-space Tooling Compatibility
--------------------------------
As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space
tooling can move to DAMON_SYSFS. As we will continue supporting
DAMON_DBGFS until next LTS kernel release, user space tools would have
enough time to move to DAMON_SYSFS.
The official user space tool, damo[1], is already supporting both
DAMON_SYSFS and DAMON_DBGFS. Both correctness tests[2] and performance
tests[3] of DAMON using DAMON_SYSFS also passed.
[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf
Sequence of Patches
===================
First two patches (patches 1-2) make core changes for DAMON_SYSFS. The
first one (patch 1) allows non-exclusive DAMON contexts so that
DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2)
adds size of DAMON enum types so that DAMON API users can safely iterate
the enums.
Third patch (patch 3) implements basic sysfs stub for virtual address
spaces monitoring. Note that this implements only sysfs files and DAMON
is not linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so
that users can control DAMON using the sysfs files.
Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring,
DAMON-based operation schemes, schemes quotas, schemes prioritization
weights, schemes watermarks, and schemes stats).
Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.
This patch (of 13):
To avoid interference between DAMON contexts monitoring overlapping memory
regions, damon_start() works in an exclusive manner. That is,
damon_start() does nothing bug fails if any context that started by
another instance of the function is still running. This makes its usage a
little bit restrictive. However, admins could aware each DAMON usage and
address such interferences on their own in some cases.
This commit hence implements non-exclusive mode of the function and allows
the callers to select the mode. Note that the exclusive groups and
non-exclusive groups of contexts will respect each other in a manner
similar to that of reader-writer locks. Therefore, this commit will not
cause any behavioral change to the exclusive groups.
Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 05:49:21 +08:00
|
|
|
int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive);
|
mm: introduce Data Access MONitor (DAMON)
Patch series "Introduce Data Access MONitor (DAMON)", v34.
Introduction
============
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON called 'region based sampling' and 'adaptive
regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this
patchset for the detail) make it
- accurate (The monitored information is useful for DRAM level memory
management. It might not appropriate for Cache-level accuracy,
though.),
- light-weight (The monitoring overhead is low enough to be applied
online while making no impact on the performance of the target
workloads.), and
- scalable (the upper-bound of the instrumentation overhead is
controllable regardless of the size of target workloads.).
Using this framework, therefore, several memory management mechanisms such
as reclamation and THP can be optimized to aware real data access
patterns. Experimental access pattern aware memory management
optimization works that incurring high instrumentation overhead will be
able to have another try.
Though DAMON is for kernel subsystems, it can be easily exposed to the
user space by writing a DAMON-wrapper kernel subsystem. Then, user space
users who have some special workloads will be able to write personalized
tools or applications for deeper understanding and specialized
optimizations of their systems.
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
The userspace tool[1] is available, released under GPLv2, and actively
being maintained. I am also planning to implement another basic user
interface in perf[2]. Also, the basic test suite for DAMON is available
under GPLv2[3].
[1] https://github.com/awslabs/damo
[2] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[3] https://github.com/awslabs/damon-tests
Long-term Plan
--------------
DAMON is a part of a project called Data Access-aware Operating System
(DAOS). As the name implies, I want to improve the performance and
efficiency of systems using fine-grained data access patterns. The
optimizations are for both kernel and user spaces. I will therefore
modify or create kernel subsystems, export some of those to user space and
implement user space library / tools. Below shows the layers and
components for the project.
---------------------------------------------------------------------------
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework: DAMON
Features: DAMOS, virtual addr, physical addr, ...
Applications: DAMON-debugfs, (DARC), ...
^^^^^^^^^^^^^^^^^^^^^^^ KERNEL SPACE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...
vvvvvvvvvvvvvvvvvvvvvvv USER SPACE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Library: (libdamon), ...
Tools: DAMO, (perf), ...
---------------------------------------------------------------------------
The components in parentheses or marked as '...' are not implemented yet
but in the future plan. IOW, those are the TODO tasks of DAOS project.
For more detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjpark@amazon.com/
Evaluations
===========
We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel
that v24 DAMON patchset is applied.
DAMON is lightweight. It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.
DAMON is accurate and useful for memory management optimizations. An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.
Another experimental DAMON-based 'proactive reclamation' implementation,
'prcl', reduces 93.38% of residential sets and 23.63% of system memory
footprint while incurring only 1.22% runtime overhead in the best case
(parsec3/freqmine).
NOTE that the experimental THP optimization and proactive reclamation are
not for production but only for proof of concepts.
Please refer to the official document[1] or "Documentation/admin-guide/mm:
Add a document for DAMON" patch in this patchset for detailed evaluation
setup and results.
[1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html
Real-world User Story
=====================
In summary, DAMON has used on production systems and proved its usefulness.
DAMON as a profiler
-------------------
We analyzed characteristics of a large scale production systems of our
customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From
this, we were able to find interesting things below.
There were obviously different access pattern under idle workload and
active workload. Under the idle workload, it accessed large memory
regions with low frequency, while the active workload accessed small
memory regions with high freuqnecy.
DAMON found a 7GB memory region that showing obviously high access
frequency under the active workload. We believe this is the
performance-effective working set and need to be protected.
There was a 4KB memory region that showing highest access frequency under
not only active but also idle workloads. We think this must be a hottest
code section like thing that should never be paged out.
For this analysis, DAMON used only 0.3-1% of single CPU time. Because we
used recording-based analysis, it consumed about 3-12 MB of disk space per
20 minutes. This is only small amount of disk space, but we can further
reduce the disk usage by using non-recording-based DAMON features. I'd
like to argue that only DAMON can do such detailed analysis (finding 4KB
highest region in 70GB memory) with the light overhead.
DAMON as a system optimization tool
-----------------------------------
We also found below potential performance problems on the systems and made
DAMON-based solutions.
The system doesn't want to make the workload suffer from the page
reclamation and thus it utilizes enough DRAM but no swap device. However,
we found the system is actively reclaiming file-backed pages, because the
system has intensive file IO. The file IO turned out to be not
performance critical for the workload, but the customer wanted to ensure
performance critical file-backed pages like code section to not mistakenly
be evicted.
Using direct IO should or `mlock()` would be a straightforward solution,
but modifying the user space code is not easy for the customer.
Alternatively, we could use DAMON-based operation scheme[1]. By using it,
we can ask DAMON to track access frequency of each region and make
'process_madvise(MADV_WILLNEED)[2]' call for regions having specific size
and access frequency for a time interval.
We also found the system is having high number of TLB misses. We tried
'always' THP enabled policy and it greatly reduced TLB misses, but the
page reclamation also been more frequent due to the THP internal
fragmentation caused memory bloat. We could try another DAMON-based
operation scheme that applies 'MADV_HUGEPAGE' to memory regions having
>=2MB size and high access frequency, while applying 'MADV_NOHUGEPAGE' to
regions having <2MB size and low access frequency.
We do not own the systems so we only reported the analysis results and
possible optimization solutions to the customers. The customers satisfied
about the analysis results and promised to try the optimization guides.
[1] https://lore.kernel.org/linux-mm/20201006123931.5847-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-api/20200622192900.22757-4-minchan@kernel.org/
Comparison with Idle Page Tracking
==================================
Idle Page Tracking allows users to set and read idleness of pages using a
bitmap file which represents each page with each bit of the file. One
recommended usage of it is working set size detection. Users can do that
by
1. find PFN of each page for workloads in interest,
2. set all the pages as idle by doing writes to the bitmap file,
3. wait until the workload accesses its working set, and
4. read the idleness of the pages again and count pages became not idle.
NOTE: While Idle Page Tracking is for user space users, DAMON is primarily
designed for kernel subsystems though it can easily exposed to the user
space. Hence, this section only assumes such user space use of DAMON.
For what use cases Idle Page Tracking would be better?
------------------------------------------------------
1. Flexible usecases other than hotness monitoring.
Because Idle Page Tracking allows users to control the primitive (Page
idleness) by themselves, Idle Page Tracking users can do anything they
want. Meanwhile, DAMON is primarily designed to monitor the hotness of
each memory region. For this, DAMON asks users to provide sampling
interval and aggregation interval. For the reason, there could be some
use case that using Idle Page Tracking is simpler.
2. Physical memory monitoring.
Idle Page Tracking receives PFN range as input, so natively supports
physical memory monitoring.
DAMON is designed to be extensible for multiple address spaces and use
cases by implementing and using primitives for the given use case.
Therefore, by theory, DAMON has no limitation in the type of target
address space as long as primitives for the given address space exists.
However, the default primitives introduced by this patchset supports only
virtual address spaces.
Therefore, for physical memory monitoring, you should implement your own
primitives and use it, or simply use Idle Page Tracking.
Nonetheless, RFC patchsets[1] for the physical memory address space
primitives is already available. It also supports user memory same to
Idle Page Tracking.
[1] https://lore.kernel.org/linux-mm/20200831104730.28970-1-sjpark@amazon.com/
For what use cases DAMON is better?
-----------------------------------
1. Hotness Monitoring.
Idle Page Tracking let users know only if a page frame is accessed or not.
For hotness check, the user should write more code and use more memory.
DAMON do that by itself.
2. Low Monitoring Overhead
DAMON receives user's monitoring request with one step and then provide
the results. So, roughly speaking, DAMON require only O(1) user/kernel
context switches.
In case of Idle Page Tracking, however, because the interface receives
contiguous page frames, the number of user/kernel context switches
increases as the monitoring target becomes complex and huge. As a result,
the context switch overhead could be not negligible.
Moreover, DAMON is born to handle with the monitoring overhead. Because
the core mechanism is pure logical, Idle Page Tracking users might be able
to implement the mechanism on their own, but it would be time consuming
and the user/kernel context switching will still more frequent than that
of DAMON. Also, the kernel subsystems cannot use the logic in this case.
3. Page granularity working set size detection.
Until v22 of this patchset, this was categorized as the thing Idle Page
Tracking could do better, because DAMON basically maintains additional
metadata for each of the monitoring target regions. So, in the page
granularity working set size detection use case, DAMON would incur (number
of monitoring target pages * size of metadata) memory overhead. Size of
the single metadata item is about 54 bytes, so assuming 4KB pages, about
1.3% of monitoring target pages will be additionally used.
All essential metadata for Idle Page Tracking are embedded in 'struct
page' and page table entries. Therefore, in this use case, only one
counter variable for working set size accounting is required if Idle Page
Tracking is used.
There are more details to consider, but roughly speaking, this is true in
most cases.
However, the situation changed from v23. Now DAMON supports arbitrary
types of monitoring targets, which don't use the metadata. Using that,
DAMON can do the working set size detection with no additional space
overhead but less user-kernel context switch. A first draft for the
implementation of monitoring primitives for this usage is available in a
DAMON development tree[1]. An RFC patchset for it based on this patchset
will also be available soon.
Since v24, the arbitrary type support is dropped from this patchset
because this patchset doesn't introduce real use of the type. You can
still get it from the DAMON development tree[2], though.
[1] https://github.com/sjp38/linux/tree/damon/pgidle_hack
[2] https://github.com/sjp38/linux/tree/damon/master
4. More future usecases
While Idle Page Tracking has tight coupling with base primitives (PG_Idle
and page table Accessed bits), DAMON is designed to be extensible for many
use cases and address spaces. If you need some special address type or
want to use special h/w access check primitives, you can write your own
primitives for that and configure DAMON to use those. Therefore, if your
use case could be changed a lot in future, using DAMON could be better.
Can I use both Idle Page Tracking and DAMON?
--------------------------------------------
Yes, though using them concurrently for overlapping memory regions could
result in interference to each other. Nevertheless, such use case would
be rare or makes no sense at all. Even in the case, the noise would bot
be really significant. So, you can choose whatever you want depending on
the characteristics of your use cases.
More Information
================
We prepared a showcase web site[1] that you can get more information.
There are
- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].
[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damon
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.png.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.png.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html
Baseline and Complete Git Trees
===============================
The patches are based on the latest -mm tree, specifically
v5.14-rc1-mmots-2021-07-15-18-47 of https://github.com/hnaz/linux-mm. You can
also clone the complete git tree:
$ git clone git://github.com/sjp38/linux -b damon/patches/v34
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon/patches/v34
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and features
for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON but using LTS kernels, there are another
couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Amazon Linux Kernel Trees
-------------------------
DAMON is also merged in two public Amazon Linux kernel trees that based on
v5.4.y[1] and v5.10.y[2].
[1] https://github.com/amazonlinux/linux/tree/amazon-5.4.y/master/mm/damon
[2] https://github.com/amazonlinux/linux/tree/amazon-5.10.y/master/mm/damon
Git Tree for Diff of Patches
============================
For easy review of diff between different versions of each patch, I
prepared a git tree containing all versions of the DAMON patchset series:
https://github.com/sjp38/damon-patches
You can clone it and use 'diff' for easy review of changes between
different versions of the patchset. For example:
$ git clone https://github.com/sjp38/damon-patches && cd damon-patches
$ diff -u damon/v33 damon/v34
Sequence Of Patches
===================
First three patches implement the core logics of DAMON. The 1st patch
introduces basic sampling based hotness monitoring for arbitrary types of
targets. Following two patches implement the core mechanisms for control
of overhead and accuracy, namely regions based sampling (patch 2) and
adaptive regions adjustment (patch 3).
Now the essential parts of DAMON is complete, but it cannot work unless
someone provides monitoring primitives for a specific use case. The
following two patches make it just work for virtual address spaces
monitoring. The 4th patch makes 'PG_idle' can be used by DAMON and the
5th patch implements the virtual memory address space specific monitoring
primitives using page table Accessed bits and the 'PG_idle' page flag.
Now DAMON just works for virtual address space monitoring via the kernel
space api. To let the user space users can use DAMON, following four
patches add interfaces for them. The 6th patch adds a tracepoint for
monitoring results. The 7th patch implements a DAMON application kernel
module, namely damon-dbgfs, that simply wraps DAMON and exposes DAMON
interface to the user space via the debugfs interface. The 8th patch
further exports pid of monitoring thread (kdamond) to user space for
easier cpu usage accounting, and the 9th patch makes the debugfs interface
to support multiple contexts.
Three patches for maintainability follows. The 10th patch adds
documentations for both the user space and the kernel space. The 11th
patch provides unit tests (based on the kunit) while the 12th patch adds
user space tests (based on the kselftest).
Finally, the last patch (13th) updates the MAINTAINERS file.
This patch (of 13):
DAMON is a data access monitoring framework for the Linux kernel. The
core mechanisms of DAMON make it
- accurate (the monitoring output is useful enough for DRAM level
performance-centric memory management; It might be inappropriate for
CPU cache levels, though),
- light-weight (the monitoring overhead is normally low enough to be
applied online), and
- scalable (the upper-bound of the overhead is in constant range
regardless of the size of target workloads).
Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications. For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.
Due to its simple and flexible interface, providing user space interface
would be also easy. Then, user space users who have some special
workloads can write personalized applications for better understanding and
optimizations of their workloads and systems.
===
Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic. The basic
access check is as below.
The output of DAMON says what memory regions are how frequently accessed
for a given duration. The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation interval``.
In detail, DAMON checks access to each page per ``sampling interval`` and
aggregates the results. In other words, counts the number of the accesses
to each region. After each ``aggregation interval`` passes, DAMON calls
callback functions that previously registered by users so that users can
read the aggregated results and then clears the results. This can be
described in below simple pseudo-code::
init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)
The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target regions
could be dynamically changed (e.g., mmap() or memory hotplug). The
monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
The basic monitoring primitives for actual access check and dynamic target
regions construction aren't in the core part of DAMON. Instead, it allows
users to implement their own primitives that are optimized for their use
case and configure DAMON to use those. In other words, users cannot use
current version of DAMON without some additional works.
Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.
Link: https://lkml.kernel.org/r/20210716081449.22187-1-sj38.park@gmail.com
Link: https://lkml.kernel.org/r/20210716081449.22187-2-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Leonard Foerster <foersleo@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Marco Elver <elver@google.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 10:56:28 +08:00
|
|
|
int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
|
|
|
|
|
|
|
|
#endif /* CONFIG_DAMON */
|
|
|
|
|
|
|
|
#endif /* _DAMON_H */
|