266 lines
13 KiB
ReStructuredText
266 lines
13 KiB
ReStructuredText
========
|
|
zsmalloc
|
|
========
|
|
|
|
This allocator is designed for use with zram. Thus, the allocator is
|
|
supposed to work well under low memory conditions. In particular, it
|
|
never attempts higher order page allocation which is very likely to
|
|
fail under memory pressure. On the other hand, if we just use single
|
|
(0-order) pages, it would suffer from very high fragmentation --
|
|
any object of size PAGE_SIZE/2 or larger would occupy an entire page.
|
|
This was one of the major issues with its predecessor (xvmalloc).
|
|
|
|
To overcome these issues, zsmalloc allocates a bunch of 0-order pages
|
|
and links them together using various 'struct page' fields. These linked
|
|
pages act as a single higher-order page i.e. an object can span 0-order
|
|
page boundaries. The code refers to these linked pages as a single entity
|
|
called zspage.
|
|
|
|
For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
|
|
since this satisfies the requirements of all its current users (in the
|
|
worst case, page is incompressible and is thus stored "as-is" i.e. in
|
|
uncompressed form). For allocation requests larger than this size, failure
|
|
is returned (see zs_malloc).
|
|
|
|
Additionally, zs_malloc() does not return a dereferenceable pointer.
|
|
Instead, it returns an opaque handle (unsigned long) which encodes actual
|
|
location of the allocated object. The reason for this indirection is that
|
|
zsmalloc does not keep zspages permanently mapped since that would cause
|
|
issues on 32-bit systems where the VA region for kernel space mappings
|
|
is very small. So, before using the allocating memory, the object has to
|
|
be mapped using zs_map_object() to get a usable pointer and subsequently
|
|
unmapped using zs_unmap_object().
|
|
|
|
stat
|
|
====
|
|
|
|
With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
|
|
``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
|
|
|
|
# cat /sys/kernel/debug/zsmalloc/zram0/classes
|
|
|
|
class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
...
|
|
...
|
|
30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14
|
|
31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44
|
|
32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26
|
|
...
|
|
...
|
|
|
|
|
|
class
|
|
index
|
|
size
|
|
object size zspage stores
|
|
10%
|
|
the number of zspages with usage ratio less than 10% (see below)
|
|
20%
|
|
the number of zspages with usage ratio between 10% and 20%
|
|
30%
|
|
the number of zspages with usage ratio between 20% and 30%
|
|
40%
|
|
the number of zspages with usage ratio between 30% and 40%
|
|
50%
|
|
the number of zspages with usage ratio between 40% and 50%
|
|
60%
|
|
the number of zspages with usage ratio between 50% and 60%
|
|
70%
|
|
the number of zspages with usage ratio between 60% and 70%
|
|
80%
|
|
the number of zspages with usage ratio between 70% and 80%
|
|
90%
|
|
the number of zspages with usage ratio between 80% and 90%
|
|
99%
|
|
the number of zspages with usage ratio between 90% and 99%
|
|
100%
|
|
the number of zspages with usage ratio 100%
|
|
obj_allocated
|
|
the number of objects allocated
|
|
obj_used
|
|
the number of objects allocated to the user
|
|
pages_used
|
|
the number of pages allocated for the class
|
|
pages_per_zspage
|
|
the number of 0-order pages to make a zspage
|
|
freeable
|
|
the approximate number of pages class compaction can free
|
|
|
|
Each zspage maintains inuse counter which keeps track of the number of
|
|
objects stored in the zspage. The inuse counter determines the zspage's
|
|
"fullness group" which is calculated as the ratio of the "inuse" objects to
|
|
the total number of objects the zspage can hold (objs_per_zspage). The
|
|
closer the inuse counter is to objs_per_zspage, the better.
|
|
|
|
Internals
|
|
=========
|
|
|
|
zsmalloc has 255 size classes, each of which can hold a number of zspages.
|
|
Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
|
|
The optimal zspage chain size for each size class is calculated during the
|
|
creation of the zsmalloc pool (see calculate_zspage_chain_size()).
|
|
|
|
As an optimization, zsmalloc merges size classes that have similar
|
|
characteristics in terms of the number of pages per zspage and the number
|
|
of objects that each zspage can store.
|
|
|
|
For instance, consider the following size classes:::
|
|
|
|
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
...
|
|
94 1536 0 .... 0 0 0 0 3 0
|
|
100 1632 0 .... 0 0 0 0 2 0
|
|
...
|
|
|
|
|
|
Size classes #95-99 are merged with size class #100. This means that when we
|
|
need to store an object of size, say, 1568 bytes, we end up using size class
|
|
#100 instead of size class #96. Size class #100 is meant for objects of size
|
|
1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
|
|
|
|
Size class #100 consists of zspages with 2 physical pages each, which can
|
|
hold a total of 5 objects. If we need to store 13 objects of size 1568, we
|
|
end up allocating three zspages, or 6 physical pages.
|
|
|
|
However, if we take a closer look at size class #96 (which is meant for
|
|
objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
|
|
find that the most optimal zspage configuration for this class is a chain
|
|
of 5 physical pages:::
|
|
|
|
pages per zspage wasted bytes used%
|
|
1 960 76
|
|
2 352 95
|
|
3 1312 89
|
|
4 704 95
|
|
5 96 99
|
|
|
|
This means that a class #96 configuration with 5 physical pages can store 13
|
|
objects of size 1568 in a single zspage, using a total of 5 physical pages.
|
|
This is more efficient than the class #100 configuration, which would use 6
|
|
physical pages to store the same number of objects.
|
|
|
|
As the zspage chain size for class #96 increases, its key characteristics
|
|
such as pages per-zspage and objects per-zspage also change. This leads to
|
|
dewer class mergers, resulting in a more compact grouping of classes, which
|
|
reduces memory wastage.
|
|
|
|
Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
|
|
|
|
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
|
|
...
|
|
202 3264 0 .. 0 0 0 0 4 0
|
|
254 4096 0 .. 0 0 0 0 1 0
|
|
...
|
|
|
|
Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
|
|
per zspage. Any object larger than 3264 bytes is considered huge and belongs
|
|
to size class #254, which stores each object in its own physical page (objects
|
|
in huge classes do not share pages).
|
|
|
|
Increasing the size of the chain of zspages also results in a higher watermark
|
|
for the huge size class and fewer huge classes overall. This allows for more
|
|
efficient storage of large objects.
|
|
|
|
For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
|
|
|
|
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
|
|
...
|
|
202 3264 0 .. 0 0 0 0 4 0
|
|
211 3408 0 .. 0 0 0 0 5 0
|
|
217 3504 0 .. 0 0 0 0 6 0
|
|
222 3584 0 .. 0 0 0 0 7 0
|
|
225 3632 0 .. 0 0 0 0 8 0
|
|
254 4096 0 .. 0 0 0 0 1 0
|
|
...
|
|
|
|
For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
|
|
|
|
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
|
|
...
|
|
202 3264 0 .. 0 0 0 0 4 0
|
|
206 3328 0 .. 0 0 0 0 13 0
|
|
207 3344 0 .. 0 0 0 0 9 0
|
|
208 3360 0 .. 0 0 0 0 14 0
|
|
211 3408 0 .. 0 0 0 0 5 0
|
|
212 3424 0 .. 0 0 0 0 16 0
|
|
214 3456 0 .. 0 0 0 0 11 0
|
|
217 3504 0 .. 0 0 0 0 6 0
|
|
219 3536 0 .. 0 0 0 0 13 0
|
|
222 3584 0 .. 0 0 0 0 7 0
|
|
223 3600 0 .. 0 0 0 0 15 0
|
|
225 3632 0 .. 0 0 0 0 8 0
|
|
228 3680 0 .. 0 0 0 0 9 0
|
|
230 3712 0 .. 0 0 0 0 10 0
|
|
232 3744 0 .. 0 0 0 0 11 0
|
|
234 3776 0 .. 0 0 0 0 12 0
|
|
235 3792 0 .. 0 0 0 0 13 0
|
|
236 3808 0 .. 0 0 0 0 14 0
|
|
238 3840 0 .. 0 0 0 0 15 0
|
|
254 4096 0 .. 0 0 0 0 1 0
|
|
...
|
|
|
|
Overall the combined zspage chain size effect on zsmalloc pool configuration:::
|
|
|
|
pages per zspage number of size classes (clusters) huge size class watermark
|
|
4 69 3264
|
|
5 86 3408
|
|
6 93 3504
|
|
7 112 3584
|
|
8 123 3632
|
|
9 140 3680
|
|
10 143 3712
|
|
11 159 3744
|
|
12 164 3776
|
|
13 180 3792
|
|
14 183 3808
|
|
15 188 3840
|
|
16 191 3840
|
|
|
|
|
|
A synthetic test
|
|
----------------
|
|
|
|
zram as a build artifacts storage (Linux kernel compilation).
|
|
|
|
* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
|
|
|
|
zsmalloc classes stats:::
|
|
|
|
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
|
|
...
|
|
Total 13 .. 51 413836 412973 159955 3
|
|
|
|
zram mm_stat:::
|
|
|
|
1691783168 628083717 655175680 0 655175680 60 0 34048 34049
|
|
|
|
|
|
* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
|
|
|
|
zsmalloc classes stats:::
|
|
|
|
class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
|
|
|
|
...
|
|
Total 18 .. 87 414852 412978 156666 0
|
|
|
|
zram mm_stat:::
|
|
|
|
1691803648 627793930 641703936 0 641703936 60 0 33591 33591
|
|
|
|
Using larger zspage chains may result in using fewer physical pages, as seen
|
|
in the example where the number of physical pages used decreased from 159955
|
|
to 156666, at the same time maximum zsmalloc pool memory usage went down from
|
|
655175680 to 641703936 bytes.
|
|
|
|
However, this advantage may be offset by the potential for increased system
|
|
memory pressure (as some zspages have larger chain sizes) in cases where there
|
|
is heavy internal fragmentation and zspool compaction is unable to relocate
|
|
objects and release zspages. In these cases, it is recommended to decrease
|
|
the limit on the size of the zspage chains (as specified by the
|
|
CONFIG_ZSMALLOC_CHAIN_SIZE option).
|