2019-05-19 20:07:45 +08:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
mm/page_ext: resurrect struct page extending code for debugging
When we debug something, we'd like to insert some information to every
page. For this purpose, we sometimes modify struct page itself. But,
this has drawbacks. First, it requires re-compile. This makes us
hesitate to use the powerful debug feature so development process is
slowed down. And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency. At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel. Keeping this as it is would be better to reproduce errornous
situation.
This feature is intended to overcome above mentioned problems. This
feature allocates memory for extended data per page in certain place
rather than the struct page itself. This memory can be accessed by the
accessor functions provided by this code. During the boot process, it
checks whether allocation of huge chunk of memory is needed or not. If
not, it avoids allocating memory at all. With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.
Until now, memcg uses this technique. But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed. I'd like to use this code to develop debug feature, so
this patch resurrect it.
To help these things to work well, this patch introduces two callbacks for
clients. One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time. The other is optional, init
callback, which is used to do proper initialization after memory is
allocated. Detailed explanation about purpose of these functions is in
code comment. Please refer it.
Others are completely same with previous extension code in memcg.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 08:55:46 +08:00
|
|
|
config PAGE_EXTENSION
|
|
|
|
bool "Extend memmap on extra space for more information on page"
|
|
|
|
---help---
|
|
|
|
Extend memmap on extra space for more information on page. This
|
|
|
|
could be used for debugging features that need to insert extra
|
|
|
|
field for every page. This extension enables us to save memory
|
|
|
|
by not allocating this extra memory according to boottime
|
|
|
|
configuration.
|
|
|
|
|
2009-04-03 07:56:30 +08:00
|
|
|
config DEBUG_PAGEALLOC
|
|
|
|
bool "Debug page memory allocations"
|
2011-03-23 07:32:46 +08:00
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
|
|
|
|
select PAGE_POISONING if !ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
2009-04-03 07:56:30 +08:00
|
|
|
---help---
|
|
|
|
Unmap pages from the kernel linear mapping after free_pages().
|
2016-03-16 05:55:30 +08:00
|
|
|
Depending on runtime enablement, this results in a small or large
|
|
|
|
slowdown, but helps to find certain types of memory corruption.
|
2009-04-03 07:56:30 +08:00
|
|
|
|
mm, page_alloc: more extensive free page checking with debug_pagealloc
The page allocator checks struct pages for expected state (mapcount,
flags etc) as pages are being allocated (check_new_page()) and freed
(free_pages_check()) to provide some defense against errors in page
allocator users.
Prior commits 479f854a207c ("mm, page_alloc: defer debugging checks of
pages allocated from the PCP") and 4db7548ccbd9 ("mm, page_alloc: defer
debugging checks of freed pages until a PCP drain") this has happened
for order-0 pages as they were allocated from or freed to the per-cpu
caches (pcplists). Since those are fast paths, the checks are now
performed only when pages are moved between pcplists and global free
lists. This however lowers the chances of catching errors soon enough.
In order to increase the chances of the checks to catch errors, the
kernel has to be rebuilt with CONFIG_DEBUG_VM, which also enables
multiple other internal debug checks (VM_BUG_ON() etc), which is
suboptimal when the goal is to catch errors in mm users, not in mm code
itself.
To catch some wrong users of the page allocator we have
CONFIG_DEBUG_PAGEALLOC, which is designed to have virtually no overhead
unless enabled at boot time. Memory corruptions when writing to freed
pages have often the same underlying errors (use-after-free, double free)
as corrupting the corresponding struct pages, so this existing debugging
functionality is a good fit to extend by also perform struct page checks
at least as often as if CONFIG_DEBUG_VM was enabled.
Specifically, after this patch, when debug_pagealloc is enabled on boot,
and CONFIG_DEBUG_VM disabled, pages are checked when allocated from or
freed to the pcplists *in addition* to being moved between pcplists and
free lists. When both debug_pagealloc and CONFIG_DEBUG_VM are enabled,
pages are checked when being moved between pcplists and free lists *in
addition* to when allocated from or freed to the pcplists.
When debug_pagealloc is not enabled on boot, the overhead in fast paths
should be virtually none thanks to the use of static key.
Link: http://lkml.kernel.org/r/20190603143451.27353-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:55:09 +08:00
|
|
|
Also, the state of page tracking structures is checked more often as
|
|
|
|
pages are being allocated and freed, as unexpected state changes
|
|
|
|
often happen for same reasons as memory corruption (e.g. double free,
|
|
|
|
use-after-free).
|
|
|
|
|
2011-03-23 07:32:46 +08:00
|
|
|
For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
|
|
|
|
fill the pages with poison patterns after free_pages() and verify
|
mm, page_alloc: more extensive free page checking with debug_pagealloc
The page allocator checks struct pages for expected state (mapcount,
flags etc) as pages are being allocated (check_new_page()) and freed
(free_pages_check()) to provide some defense against errors in page
allocator users.
Prior commits 479f854a207c ("mm, page_alloc: defer debugging checks of
pages allocated from the PCP") and 4db7548ccbd9 ("mm, page_alloc: defer
debugging checks of freed pages until a PCP drain") this has happened
for order-0 pages as they were allocated from or freed to the per-cpu
caches (pcplists). Since those are fast paths, the checks are now
performed only when pages are moved between pcplists and global free
lists. This however lowers the chances of catching errors soon enough.
In order to increase the chances of the checks to catch errors, the
kernel has to be rebuilt with CONFIG_DEBUG_VM, which also enables
multiple other internal debug checks (VM_BUG_ON() etc), which is
suboptimal when the goal is to catch errors in mm users, not in mm code
itself.
To catch some wrong users of the page allocator we have
CONFIG_DEBUG_PAGEALLOC, which is designed to have virtually no overhead
unless enabled at boot time. Memory corruptions when writing to freed
pages have often the same underlying errors (use-after-free, double free)
as corrupting the corresponding struct pages, so this existing debugging
functionality is a good fit to extend by also perform struct page checks
at least as often as if CONFIG_DEBUG_VM was enabled.
Specifically, after this patch, when debug_pagealloc is enabled on boot,
and CONFIG_DEBUG_VM disabled, pages are checked when allocated from or
freed to the pcplists *in addition* to being moved between pcplists and
free lists. When both debug_pagealloc and CONFIG_DEBUG_VM are enabled,
pages are checked when being moved between pcplists and free lists *in
addition* to when allocated from or freed to the pcplists.
When debug_pagealloc is not enabled on boot, the overhead in fast paths
should be virtually none thanks to the use of static key.
Link: http://lkml.kernel.org/r/20190603143451.27353-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:55:09 +08:00
|
|
|
the patterns before alloc_pages(). Additionally, this option cannot
|
|
|
|
be enabled in combination with hibernation as that would result in
|
|
|
|
incorrect warnings of memory corruption after a resume because free
|
|
|
|
pages are not saved to the suspend image.
|
2011-03-23 07:32:46 +08:00
|
|
|
|
2016-03-16 05:55:30 +08:00
|
|
|
By default this option will have a small overhead, e.g. by not
|
|
|
|
allowing the kernel mapping to be backed by large pages on some
|
|
|
|
architectures. Even bigger overhead comes when the debugging is
|
|
|
|
enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
|
|
|
|
command line parameter.
|
|
|
|
|
|
|
|
config DEBUG_PAGEALLOC_ENABLE_DEFAULT
|
|
|
|
bool "Enable debug page memory allocations by default?"
|
|
|
|
depends on DEBUG_PAGEALLOC
|
|
|
|
---help---
|
|
|
|
Enable debug page memory allocations by default? This value
|
|
|
|
can be overridden by debug_pagealloc=off|on.
|
|
|
|
|
2019-03-06 07:46:19 +08:00
|
|
|
config PAGE_OWNER
|
|
|
|
bool "Track page owner"
|
|
|
|
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
|
|
|
|
select DEBUG_FS
|
|
|
|
select STACKTRACE
|
|
|
|
select STACKDEPOT
|
|
|
|
select PAGE_EXTENSION
|
|
|
|
help
|
|
|
|
This keeps track of what call chain is the owner of a page, may
|
|
|
|
help to find bare alloc_page(s) leaks. Even if you include this
|
|
|
|
feature on your build, it is disabled in default. You should pass
|
|
|
|
"page_owner=on" to boot parameter in order to enable it. Eats
|
|
|
|
a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
|
|
|
|
for user-space helper.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2009-04-01 06:23:17 +08:00
|
|
|
config PAGE_POISONING
|
2016-03-16 05:56:27 +08:00
|
|
|
bool "Poison pages after freeing"
|
|
|
|
select PAGE_POISONING_NO_SANITY if HIBERNATION
|
|
|
|
---help---
|
|
|
|
Fill the pages with poison patterns after free_pages() and verify
|
|
|
|
the patterns before alloc_pages. The filling of the memory helps
|
|
|
|
reduce the risk of information leaks from freed data. This does
|
2018-08-22 12:53:10 +08:00
|
|
|
have a potential performance impact if enabled with the
|
|
|
|
"page_poison=1" kernel boot option.
|
2016-03-16 05:56:27 +08:00
|
|
|
|
|
|
|
Note that "poison" here is not the same thing as the "HWPoison"
|
|
|
|
for CONFIG_MEMORY_FAILURE. This is software poisoning only.
|
|
|
|
|
|
|
|
If unsure, say N
|
|
|
|
|
|
|
|
config PAGE_POISONING_NO_SANITY
|
|
|
|
depends on PAGE_POISONING
|
|
|
|
bool "Only poison, don't sanity check"
|
|
|
|
---help---
|
|
|
|
Skip the sanity checking on alloc, only fill the pages with
|
|
|
|
poison on free. This reduces some of the overhead of the
|
|
|
|
poisoning feature.
|
|
|
|
|
|
|
|
If you are only interested in sanitization, say Y. Otherwise
|
|
|
|
say N.
|
2016-03-16 05:56:30 +08:00
|
|
|
|
|
|
|
config PAGE_POISONING_ZERO
|
2018-08-22 12:53:10 +08:00
|
|
|
bool "Use zero for poisoning instead of debugging value"
|
2016-03-16 05:56:30 +08:00
|
|
|
depends on PAGE_POISONING
|
|
|
|
---help---
|
|
|
|
Instead of using the existing poison value, fill the pages with
|
|
|
|
zeros. This makes it harder to detect when errors are occurring
|
|
|
|
due to sanitization but the zeroing at free means that it is
|
|
|
|
no longer necessary to write zeros when GFP_ZERO is used on
|
|
|
|
allocation.
|
|
|
|
|
|
|
|
If unsure, say N
|
mm/page_ref: add tracepoint to track down page reference manipulation
CMA allocation should be guaranteed to succeed by definition, but,
unfortunately, it would be failed sometimes. It is hard to track down
the problem, because it is related to page reference manipulation and we
don't have any facility to analyze it.
This patch adds tracepoints to track down page reference manipulation.
With it, we can find exact reason of failure and can fix the problem.
Following is an example of tracepoint output. (note: this example is
stale version that printing flags as the number. Recent version will
print it as human readable string.)
<...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
<...>-9018 [004] 92.678378: kernel_stack:
=> get_page_from_freelist (ffffffff81176659)
=> __alloc_pages_nodemask (ffffffff81176d22)
=> alloc_pages_vma (ffffffff811bf675)
=> handle_mm_fault (ffffffff8119e693)
=> __do_page_fault (ffffffff810631ea)
=> trace_do_page_fault (ffffffff81063543)
=> do_async_page_fault (ffffffff8105c40a)
=> async_page_fault (ffffffff817581d8)
[snip]
<...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
[snip]
...
...
<...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
[snip]
<...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
=> release_pages (ffffffff8117c9e4)
=> free_pages_and_swap_cache (ffffffff811b0697)
=> tlb_flush_mmu_free (ffffffff81199616)
=> tlb_finish_mmu (ffffffff8119a62c)
=> exit_mmap (ffffffff811a53f7)
=> mmput (ffffffff81073f47)
=> do_exit (ffffffff810794e9)
=> do_group_exit (ffffffff81079def)
=> SyS_exit_group (ffffffff81079e74)
=> entry_SYSCALL_64_fastpath (ffffffff817560b6)
This output shows that problem comes from exit path. In exit path, to
improve performance, pages are not freed immediately. They are gathered
and processed by batch. During this process, migration cannot be
possible and CMA allocation is failed. This problem is hard to find
without this page reference tracepoint facility.
Enabling this feature bloat kernel text 30 KB in my configuration.
text data bss dec hex filename
12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
Note that, due to header file dependency problem between mm.h and
tracepoint.h, this feature has to open code the static key functions for
tracepoints. Proposed by Steven Rostedt in following link.
https://lkml.org/lkml/2015/12/9/699
[arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
[iamjoonsoo.kim@lge.com: fix build failure for xtensa]
[akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 05:19:29 +08:00
|
|
|
|
|
|
|
config DEBUG_PAGE_REF
|
|
|
|
bool "Enable tracepoint to track down page reference manipulation"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on TRACEPOINTS
|
|
|
|
---help---
|
|
|
|
This is a feature to add tracepoint for tracking down page reference
|
|
|
|
manipulation. This tracking is useful to diagnose functional failure
|
|
|
|
due to migration failures caused by page reference mismatches. Be
|
|
|
|
careful when enabling this feature because it adds about 30 KB to the
|
|
|
|
kernel code. However the runtime performance overhead is virtually
|
|
|
|
nil until the tracepoints are actually enabled.
|
2017-02-28 06:30:22 +08:00
|
|
|
|
|
|
|
config DEBUG_RODATA_TEST
|
|
|
|
bool "Testcase for the marking rodata read-only"
|
|
|
|
depends on STRICT_KERNEL_RWX
|
|
|
|
---help---
|
|
|
|
This option enables a testcase for the setting rodata read-only.
|