mm/page_ext: resurrect struct page extending code for debugging
When we debug something, we'd like to insert some information to every
page. For this purpose, we sometimes modify struct page itself. But,
this has drawbacks. First, it requires re-compile. This makes us
hesitate to use the powerful debug feature so development process is
slowed down. And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency. At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel. Keeping this as it is would be better to reproduce errornous
situation.
This feature is intended to overcome above mentioned problems. This
feature allocates memory for extended data per page in certain place
rather than the struct page itself. This memory can be accessed by the
accessor functions provided by this code. During the boot process, it
checks whether allocation of huge chunk of memory is needed or not. If
not, it avoids allocating memory at all. With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.
Until now, memcg uses this technique. But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed. I'd like to use this code to develop debug feature, so
this patch resurrect it.
To help these things to work well, this patch introduces two callbacks for
clients. One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time. The other is optional, init
callback, which is used to do proper initialization after memory is
allocated. Detailed explanation about purpose of these functions is in
code comment. Please refer it.
Others are completely same with previous extension code in memcg.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 08:55:46 +08:00
|
|
|
config PAGE_EXTENSION
|
|
|
|
bool "Extend memmap on extra space for more information on page"
|
|
|
|
---help---
|
|
|
|
Extend memmap on extra space for more information on page. This
|
|
|
|
could be used for debugging features that need to insert extra
|
|
|
|
field for every page. This extension enables us to save memory
|
|
|
|
by not allocating this extra memory according to boottime
|
|
|
|
configuration.
|
|
|
|
|
2009-04-03 07:56:30 +08:00
|
|
|
config DEBUG_PAGEALLOC
|
|
|
|
bool "Debug page memory allocations"
|
2011-03-23 07:32:46 +08:00
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
|
mm/debug-pagealloc: prepare boottime configurable on/off
Until now, debug-pagealloc needs extra flags in struct page, so we need to
recompile whole source code when we decide to use it. This is really
painful, because it takes some time to recompile and sometimes rebuild is
not possible due to third party module depending on struct page. So, we
can't use this good feature in many cases.
Now, we have the page extension feature that allows us to insert extra
flags to outside of struct page. This gets rid of third party module
issue mentioned above. And, this allows us to determine if we need extra
memory for this page extension in boottime. With these property, we can
avoid using debug-pagealloc in boottime with low computational overhead in
the kernel built with CONFIG_DEBUG_PAGEALLOC. This will help our
development process greatly.
This patch is the preparation step to achive above goal. debug-pagealloc
originally uses extra field of struct page, but, after this patch, it will
use field of struct page_ext. Because memory for page_ext is allocated
later than initialization of page allocator in CONFIG_SPARSEMEM, we should
disable debug-pagealloc feature temporarily until initialization of
page_ext. This patch implements this.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 08:55:49 +08:00
|
|
|
select PAGE_EXTENSION
|
2011-03-23 07:32:46 +08:00
|
|
|
select PAGE_POISONING if !ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
2009-04-03 07:56:30 +08:00
|
|
|
---help---
|
|
|
|
Unmap pages from the kernel linear mapping after free_pages().
|
2016-03-16 05:55:30 +08:00
|
|
|
Depending on runtime enablement, this results in a small or large
|
|
|
|
slowdown, but helps to find certain types of memory corruption.
|
2009-04-03 07:56:30 +08:00
|
|
|
|
2011-03-23 07:32:46 +08:00
|
|
|
For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
|
|
|
|
fill the pages with poison patterns after free_pages() and verify
|
|
|
|
the patterns before alloc_pages(). Additionally,
|
|
|
|
this option cannot be enabled in combination with hibernation as
|
|
|
|
that would result in incorrect warnings of memory corruption after
|
|
|
|
a resume because free pages are not saved to the suspend image.
|
|
|
|
|
2016-03-16 05:55:30 +08:00
|
|
|
By default this option will have a small overhead, e.g. by not
|
|
|
|
allowing the kernel mapping to be backed by large pages on some
|
|
|
|
architectures. Even bigger overhead comes when the debugging is
|
|
|
|
enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
|
|
|
|
command line parameter.
|
|
|
|
|
|
|
|
config DEBUG_PAGEALLOC_ENABLE_DEFAULT
|
|
|
|
bool "Enable debug page memory allocations by default?"
|
|
|
|
default n
|
|
|
|
depends on DEBUG_PAGEALLOC
|
|
|
|
---help---
|
|
|
|
Enable debug page memory allocations by default? This value
|
|
|
|
can be overridden by debug_pagealloc=off|on.
|
|
|
|
|
2009-04-01 06:23:17 +08:00
|
|
|
config PAGE_POISONING
|
2016-03-16 05:56:27 +08:00
|
|
|
bool "Poison pages after freeing"
|
|
|
|
select PAGE_POISONING_NO_SANITY if HIBERNATION
|
|
|
|
---help---
|
|
|
|
Fill the pages with poison patterns after free_pages() and verify
|
|
|
|
the patterns before alloc_pages. The filling of the memory helps
|
|
|
|
reduce the risk of information leaks from freed data. This does
|
2018-08-22 12:53:10 +08:00
|
|
|
have a potential performance impact if enabled with the
|
|
|
|
"page_poison=1" kernel boot option.
|
2016-03-16 05:56:27 +08:00
|
|
|
|
|
|
|
Note that "poison" here is not the same thing as the "HWPoison"
|
|
|
|
for CONFIG_MEMORY_FAILURE. This is software poisoning only.
|
|
|
|
|
|
|
|
If unsure, say N
|
|
|
|
|
|
|
|
config PAGE_POISONING_NO_SANITY
|
|
|
|
depends on PAGE_POISONING
|
|
|
|
bool "Only poison, don't sanity check"
|
|
|
|
---help---
|
|
|
|
Skip the sanity checking on alloc, only fill the pages with
|
|
|
|
poison on free. This reduces some of the overhead of the
|
|
|
|
poisoning feature.
|
|
|
|
|
|
|
|
If you are only interested in sanitization, say Y. Otherwise
|
|
|
|
say N.
|
2016-03-16 05:56:30 +08:00
|
|
|
|
|
|
|
config PAGE_POISONING_ZERO
|
2018-08-22 12:53:10 +08:00
|
|
|
bool "Use zero for poisoning instead of debugging value"
|
2016-03-16 05:56:30 +08:00
|
|
|
depends on PAGE_POISONING
|
|
|
|
---help---
|
|
|
|
Instead of using the existing poison value, fill the pages with
|
|
|
|
zeros. This makes it harder to detect when errors are occurring
|
|
|
|
due to sanitization but the zeroing at free means that it is
|
|
|
|
no longer necessary to write zeros when GFP_ZERO is used on
|
|
|
|
allocation.
|
|
|
|
|
|
|
|
If unsure, say N
|
mm/page_ref: add tracepoint to track down page reference manipulation
CMA allocation should be guaranteed to succeed by definition, but,
unfortunately, it would be failed sometimes. It is hard to track down
the problem, because it is related to page reference manipulation and we
don't have any facility to analyze it.
This patch adds tracepoints to track down page reference manipulation.
With it, we can find exact reason of failure and can fix the problem.
Following is an example of tracepoint output. (note: this example is
stale version that printing flags as the number. Recent version will
print it as human readable string.)
<...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
<...>-9018 [004] 92.678378: kernel_stack:
=> get_page_from_freelist (ffffffff81176659)
=> __alloc_pages_nodemask (ffffffff81176d22)
=> alloc_pages_vma (ffffffff811bf675)
=> handle_mm_fault (ffffffff8119e693)
=> __do_page_fault (ffffffff810631ea)
=> trace_do_page_fault (ffffffff81063543)
=> do_async_page_fault (ffffffff8105c40a)
=> async_page_fault (ffffffff817581d8)
[snip]
<...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
[snip]
...
...
<...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
[snip]
<...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
=> release_pages (ffffffff8117c9e4)
=> free_pages_and_swap_cache (ffffffff811b0697)
=> tlb_flush_mmu_free (ffffffff81199616)
=> tlb_finish_mmu (ffffffff8119a62c)
=> exit_mmap (ffffffff811a53f7)
=> mmput (ffffffff81073f47)
=> do_exit (ffffffff810794e9)
=> do_group_exit (ffffffff81079def)
=> SyS_exit_group (ffffffff81079e74)
=> entry_SYSCALL_64_fastpath (ffffffff817560b6)
This output shows that problem comes from exit path. In exit path, to
improve performance, pages are not freed immediately. They are gathered
and processed by batch. During this process, migration cannot be
possible and CMA allocation is failed. This problem is hard to find
without this page reference tracepoint facility.
Enabling this feature bloat kernel text 30 KB in my configuration.
text data bss dec hex filename
12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
Note that, due to header file dependency problem between mm.h and
tracepoint.h, this feature has to open code the static key functions for
tracepoints. Proposed by Steven Rostedt in following link.
https://lkml.org/lkml/2015/12/9/699
[arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
[iamjoonsoo.kim@lge.com: fix build failure for xtensa]
[akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 05:19:29 +08:00
|
|
|
|
|
|
|
config DEBUG_PAGE_REF
|
|
|
|
bool "Enable tracepoint to track down page reference manipulation"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on TRACEPOINTS
|
|
|
|
---help---
|
|
|
|
This is a feature to add tracepoint for tracking down page reference
|
|
|
|
manipulation. This tracking is useful to diagnose functional failure
|
|
|
|
due to migration failures caused by page reference mismatches. Be
|
|
|
|
careful when enabling this feature because it adds about 30 KB to the
|
|
|
|
kernel code. However the runtime performance overhead is virtually
|
|
|
|
nil until the tracepoints are actually enabled.
|
2017-02-28 06:30:22 +08:00
|
|
|
|
|
|
|
config DEBUG_RODATA_TEST
|
|
|
|
bool "Testcase for the marking rodata read-only"
|
|
|
|
depends on STRICT_KERNEL_RWX
|
|
|
|
---help---
|
|
|
|
This option enables a testcase for the setting rodata read-only.
|