OpenCloudOS-Kernel/lib/Kconfig.debug

2040 lines
68 KiB
Plaintext
Raw Normal View History

menu "Kernel hacking"
menu "printk and dmesg options"
config PRINTK_TIME
bool "Show timing information on printks"
depends on PRINTK
help
Selecting this option causes time stamps of the printk()
messages to be added to the output of the syslog() system
call and at the console.
The timestamp is always recorded internally, and exported
to /dev/kmsg. This flag just specifies if the timestamp should
be included, not that the timestamp is recorded.
The behavior is also controlled by the kernel command line
parameter printk.time=1. See Documentation/admin-guide/kernel-parameters.rst
config CONSOLE_LOGLEVEL_DEFAULT
int "Default console loglevel (1-15)"
range 1 15
default "7"
help
Default loglevel to determine what will be printed on the console.
Setting a default here is equivalent to passing in loglevel=<x> in
the kernel bootargs. loglevel=<x> continues to override whatever
value is specified here as well.
Note: This does not affect the log level of un-prefixed printk()
usage in the kernel. That is controlled by the MESSAGE_LOGLEVEL_DEFAULT
option.
config CONSOLE_LOGLEVEL_QUIET
int "quiet console loglevel (1-15)"
range 1 15
default "4"
help
loglevel to use when "quiet" is passed on the kernel commandline.
When "quiet" is passed on the kernel commandline this loglevel
will be used as the loglevel. IOW passing "quiet" will be the
equivalent of passing "loglevel=<CONSOLE_LOGLEVEL_QUIET>"
config MESSAGE_LOGLEVEL_DEFAULT
int "Default message log level (1-7)"
range 1 7
default "4"
help
Default log level for printk statements with no specified priority.
This was hard-coded to KERN_WARNING since at least 2.6.10 but folks
that are auditing their logs closely may want to set it to a lower
priority.
Note: This does not affect what message level gets printed on the console
by default. To change that, use loglevel=<x> in the kernel bootargs,
or pick a different CONSOLE_LOGLEVEL_DEFAULT configuration value.
config BOOT_PRINTK_DELAY
bool "Delay each boot printk message by N milliseconds"
depends on DEBUG_KERNEL && PRINTK && GENERIC_CALIBRATE_DELAY
help
This build option allows you to read kernel boot messages
by inserting a short delay after each one. The delay is
specified in milliseconds on the kernel command line,
using "boot_delay=N".
It is likely that you would also need to use "lpj=M" to preset
the "loops per jiffie" value.
See a previous boot log for the "lpj" value to use for your
system, and then set "lpj=M" before setting "boot_delay=N".
NOTE: Using this option may adversely affect SMP systems.
I.e., processors other than the first one may not boot up.
BOOT_PRINTK_DELAY also may cause LOCKUP_DETECTOR to detect
what it believes to be lockup conditions.
config DYNAMIC_DEBUG
bool "Enable dynamic printk() support"
default n
depends on PRINTK
depends on DEBUG_FS
help
Compiles debug level messages into the kernel, which would not
otherwise be available at runtime. These messages can then be
enabled/disabled based on various levels of scope - per source file,
function, module, format string, and line number. This mechanism
implicitly compiles in all pr_debug() and dev_dbg() calls, which
enlarges the kernel text size by about 2%.
If a source file is compiled with DEBUG flag set, any
pr_debug() calls in it are enabled by default, but can be
disabled at runtime as below. Note that DEBUG flag is
turned on by many CONFIG_*DEBUG* options.
Usage:
Dynamic debugging is controlled via the 'dynamic_debug/control' file,
which is contained in the 'debugfs' filesystem. Thus, the debugfs
filesystem must first be mounted before making use of this feature.
We refer the control file as: <debugfs>/dynamic_debug/control. This
file contains a list of the debug statements that can be enabled. The
format for each line of the file is:
filename:lineno [module]function flags format
filename : source file of the debug statement
lineno : line number of the debug statement
module : module that contains the debug statement
function : function that contains the debug statement
flags : '=p' means the line is turned 'on' for printing
format : the format used for the debug statement
From a live system:
nullarbor:~ # cat <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
fs/aio.c:222 [aio]__put_ioctx =_ "__put_ioctx:\040freeing\040%p\012"
fs/aio.c:248 [aio]ioctx_alloc =_ "ENOMEM:\040nr_events\040too\040high\012"
fs/aio.c:1770 [aio]sys_io_cancel =_ "calling\040cancel\012"
Example usage:
// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' >
<debugfs>/dynamic_debug/control
// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' >
<debugfs>/dynamic_debug/control
// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' >
<debugfs>/dynamic_debug/control
// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' >
<debugfs>/dynamic_debug/control
See Documentation/admin-guide/dynamic-debug-howto.rst for additional
information.
endmenu # "printk and dmesg options"
menu "Compile-time checks and compiler options"
config DEBUG_INFO
bool "Compile the kernel with debug info"
depends on DEBUG_KERNEL && !COMPILE_TEST
help
If you say Y here the resulting kernel image will include
debugging info resulting in a larger kernel image.
This adds debug symbols to the kernel and modules (gcc -g), and
is needed if you intend to use kernel crashdump or binary object
tools like crash, kgdb, LKCD, gdb, etc on the kernel.
Say Y here only if you plan to debug the kernel.
If unsure, say N.
config DEBUG_INFO_REDUCED
bool "Reduce debugging information"
depends on DEBUG_INFO
help
If you say Y here gcc is instructed to generate less debugging
information for structure types. This means that tools that
need full debugging information (like kgdb or systemtap) won't
be happy. But if you merely need debugging information to
resolve line numbers there is no loss. Advantage is that
build directory object sizes shrink dramatically over a full
DEBUG_INFO build and compile times are reduced too.
Only works with newer gcc versions.
kbuild: Support split debug info v4 This is an alternative approach to lower the overhead of debug info (as we discussed a few days ago) gcc 4.7+ and newer binutils have a new "split debug info" debug info model where the debug info is only written once into central ".dwo" files. This avoids having to copy it around multiple times, from the object files to the final executable. It lowers the disk space requirements. In addition it defaults to compressed debug data. More details here: http://gcc.gnu.org/wiki/DebugFission This patch adds a new option to enable it. It has to be an option, because it'll undoubtedly break everyone's debuginfo packaging scheme. gdb/objdump/etc. all still work, if you have new enough versions. I don't see big compile wins (maybe a second or two faster or so), but the object dirs with debuginfo get significantly smaller. My standard kernel config (slightly bigger than defconfig) shrinks from 2.9G disk space to 1.1G objdir (with non reduced debuginfo). I presume if you are IO limited the compile time difference will be larger. Only problem I've seen so far is that it doesn't play well with older versions of ccache (apparently fixed, see https://bugzilla.samba.org/show_bug.cgi?id=10005) v2: various fixes from Dirk Gouders. Improve commit message slightly. v3: Fix clean rules and improve Kconfig slightly v4: Fix merge error in last version (Sam Ravnborg) Clarify description that it mainly helps disk size. Cc: Dirk Gouders <dirk@gouders.net> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
2014-07-31 02:50:18 +08:00
config DEBUG_INFO_SPLIT
bool "Produce split debuginfo in .dwo files"
depends on DEBUG_INFO
kbuild: Support split debug info v4 This is an alternative approach to lower the overhead of debug info (as we discussed a few days ago) gcc 4.7+ and newer binutils have a new "split debug info" debug info model where the debug info is only written once into central ".dwo" files. This avoids having to copy it around multiple times, from the object files to the final executable. It lowers the disk space requirements. In addition it defaults to compressed debug data. More details here: http://gcc.gnu.org/wiki/DebugFission This patch adds a new option to enable it. It has to be an option, because it'll undoubtedly break everyone's debuginfo packaging scheme. gdb/objdump/etc. all still work, if you have new enough versions. I don't see big compile wins (maybe a second or two faster or so), but the object dirs with debuginfo get significantly smaller. My standard kernel config (slightly bigger than defconfig) shrinks from 2.9G disk space to 1.1G objdir (with non reduced debuginfo). I presume if you are IO limited the compile time difference will be larger. Only problem I've seen so far is that it doesn't play well with older versions of ccache (apparently fixed, see https://bugzilla.samba.org/show_bug.cgi?id=10005) v2: various fixes from Dirk Gouders. Improve commit message slightly. v3: Fix clean rules and improve Kconfig slightly v4: Fix merge error in last version (Sam Ravnborg) Clarify description that it mainly helps disk size. Cc: Dirk Gouders <dirk@gouders.net> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
2014-07-31 02:50:18 +08:00
help
Generate debug info into separate .dwo files. This significantly
reduces the build directory size for builds with DEBUG_INFO,
because it stores the information only once on disk in .dwo
files instead of multiple times in object files and executables.
In addition the debug information is also compressed.
Requires recent gcc (4.7+) and recent gdb/binutils.
Any tool that packages or reads debug information would need
to know about the .dwo files and include them.
Incompatible with older versions of ccache.
config DEBUG_INFO_DWARF4
bool "Generate dwarf4 debuginfo"
depends on DEBUG_INFO
help
Generate dwarf4 debug info. This requires recent versions
of gcc and gdb. It makes the debug information larger.
But it significantly improves the success of resolving
variables in gdb on optimized code.
config GDB_SCRIPTS
bool "Provide GDB scripts for kernel debugging"
depends on DEBUG_INFO
help
This creates the required links to GDB helper scripts in the
build directory. If you load vmlinux into gdb, the helper
scripts will be automatically imported by gdb as well, and
additional functions are available to analyze a Linux kernel
instance. See Documentation/dev-tools/gdb-kernel-debugging.rst
for further details.
config ENABLE_MUST_CHECK
bool "Enable __must_check logic"
default y
help
Enable the __must_check logic in the kernel build. Disable this to
suppress the "warning: ignoring return value of 'foo', declared with
attribute warn_unused_result" messages.
config FRAME_WARN
int "Warn for stack frames larger than (needs gcc 4.4)"
range 0 8192
kasan: rework Kconfig settings We get a lot of very large stack frames using gcc-7.0.1 with the default -fsanitize-address-use-after-scope --param asan-stack=1 options, which can easily cause an overflow of the kernel stack, e.g. drivers/gpu/drm/i915/gvt/handlers.c:2434:1: warning: the frame size of 46176 bytes is larger than 3072 bytes drivers/net/wireless/ralink/rt2x00/rt2800lib.c:5650:1: warning: the frame size of 23632 bytes is larger than 3072 bytes lib/atomic64_test.c:250:1: warning: the frame size of 11200 bytes is larger than 3072 bytes drivers/gpu/drm/i915/gvt/handlers.c:2621:1: warning: the frame size of 9208 bytes is larger than 3072 bytes drivers/media/dvb-frontends/stv090x.c:3431:1: warning: the frame size of 6816 bytes is larger than 3072 bytes fs/fscache/stats.c:287:1: warning: the frame size of 6536 bytes is larger than 3072 bytes To reduce this risk, -fsanitize-address-use-after-scope is now split out into a separate CONFIG_KASAN_EXTRA Kconfig option, leading to stack frames that are smaller than 2 kilobytes most of the time on x86_64. An earlier version of this patch also prevented combining KASAN_EXTRA with KASAN_INLINE, but that is no longer necessary with gcc-7.0.1. All patches to get the frame size below 2048 bytes with CONFIG_KASAN=y and CONFIG_KASAN_EXTRA=n have been merged by maintainers now, so we can bring back that default now. KASAN_EXTRA=y still causes lots of warnings but now defaults to !COMPILE_TEST to disable it in allmodconfig, and it remains disabled in all other defconfigs since it is a new option. I arbitrarily raise the warning limit for KASAN_EXTRA to 3072 to reduce the noise, but an allmodconfig kernel still has around 50 warnings on gcc-7. I experimented a bit more with smaller stack frames and have another follow-up series that reduces the warning limit for 64-bit architectures to 1280 bytes (without CONFIG_KASAN). With earlier versions of this patch series, I also had patches to address the warnings we get with KASAN and/or KASAN_EXTRA, using a "noinline_if_stackbloat" annotation. That annotation now got replaced with a gcc-8 bugfix (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715) and a workaround for older compilers, which means that KASAN_EXTRA is now just as bad as before and will lead to an instant stack overflow in a few extreme cases. This reverts parts of commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y"). Two patches in linux-next should be merged first to avoid introducing warnings in an allmodconfig build: 3cd890dbe2a4 ("media: dvb-frontends: fix i2c access helpers for KASAN") 16c3ada89cff ("media: r820t: fix r820t_write_reg for KASAN") Do we really need to backport this? I think we do: without this patch, enabling KASAN will lead to unavoidable kernel stack overflow in certain device drivers when built with gcc-7 or higher on linux-4.10+ or any version that contains a backport of commit c5caf21ab0cf8. Most people are probably still on older compilers, but it will get worse over time as they upgrade their distros. The warnings we get on kernels older than this should all be for code that uses dangerously large stack frames, though most of them do not cause an actual stack overflow by themselves.The asan-stack option was added in linux-4.0, and commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y") effectively turned off the warning for allmodconfig kernels, so I would like to see this fix backported to any kernels later than 4.0. I have done dozens of fixes for individual functions with stack frames larger than 2048 bytes with asan-stack, and I plan to make sure that all those fixes make it into the stable kernels as well (most are already there). Part of the complication here is that asan-stack (from 4.0) was originally assumed to always require much larger stacks, but that turned out to be a combination of multiple gcc bugs that we have now worked around and fixed, but sanitize-address-use-after-scope (from v4.10) has a much higher inherent stack usage and also suffers from at least three other problems that we have analyzed but not yet fixed upstream, each of them makes the stack usage more severe than it should be. Link: http://lkml.kernel.org/r/20171221134744.2295529-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-07 07:41:41 +08:00
default 3072 if KASAN_EXTRA
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 1280 if (!64BIT && PARISC)
default 1024 if (!64BIT && !PARISC)
default 2048 if 64BIT
help
Tell gcc to warn at build time for stack frames larger than this.
Setting this too low will cause a lot of warnings.
Setting it to 0 disables the warning.
Requires gcc 4.4
config STRIP_ASM_SYMS
bool "Strip assembler-generated symbols during link"
default n
help
Strip internal assembler-generated symbols during a link (symbols
that look like '.Lxxx') so they don't pollute the output of
get_wchan() and suchlike.
config READABLE_ASM
bool "Generate readable assembler code"
depends on DEBUG_KERNEL
help
Disable some compiler optimizations that tend to generate human unreadable
assembler output. This may make the kernel slightly slower, but it helps
to keep kernel developers who have to stare a lot at assembler listings
sane.
config UNUSED_SYMBOLS
bool "Enable unused/obsolete exported symbols"
default y if X86
help
Unused but exported symbols make the kernel needlessly bigger. For
that reason most of these unused exports will soon be removed. This
option is provided temporarily to provide a transition period in case
some external kernel module needs one of these symbols anyway. If you
encounter such a case in your module, consider if you are actually
using the right API. (rationale: since nobody in the kernel is using
this in a module, there is a pretty good chance it's actually the
wrong interface to use). If you really need the symbol, please send a
mail to the linux kernel mailing list mentioning the symbol and why
you really need it, and what the merge plan to the mainline kernel for
your module is.
mm/page_owner: keep track of page owners This is the page owner tracking code which is introduced so far ago. It is resident on Andrew's tree, though, nobody tried to upstream so it remain as is. Our company uses this feature actively to debug memory leak or to find a memory hogger so I decide to upstream this feature. This functionality help us to know who allocates the page. When allocating a page, we store some information about allocation in extra memory. Later, if we need to know status of all pages, we can get and analyze it from this stored information. In previous version of this feature, extra memory is statically defined in struct page, but, in this version, extra memory is allocated outside of struct page. It enables us to turn on/off this feature at boottime without considerable memory waste. Although we already have tracepoint for tracing page allocation/free, using it to analyze page owner is rather complex. We need to enlarge the trace buffer for preventing overlapping until userspace program launched. And, launched program continually dump out the trace buffer for later analysis and it would change system behaviour with more possibility rather than just keeping it in memory, so bad for debug. Moreover, we can use page_owner feature further for various purposes. For example, we can use it for fragmentation statistics implemented in this patch. And, I also plan to implement some CMA failure debugging feature using this interface. I'd like to give the credit for all developers contributed this feature, but, it's not easy because I don't know exact history. Sorry about that. Below is people who has "Signed-off-by" in the patches in Andrew's tree. Contributor: Alexander Nyberg <alexn@dsv.su.se> Mel Gorman <mgorman@suse.de> Dave Hansen <dave@linux.vnet.ibm.com> Minchan Kim <minchan@kernel.org> Michal Nazarewicz <mina86@mina86.com> Andrew Morton <akpm@linux-foundation.org> Jungsoo Son <jungsoo.son@lge.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 08:56:01 +08:00
config PAGE_OWNER
bool "Track page owner"
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
select DEBUG_FS
select STACKTRACE
mm/page_owner: use stackdepot to store stacktrace Currently, we store each page's allocation stacktrace on corresponding page_ext structure and it requires a lot of memory. This causes the problem that memory tight system doesn't work well if page_owner is enabled. Moreover, even with this large memory consumption, we cannot get full stacktrace because we allocate memory at boot time and just maintain 8 stacktrace slots to balance memory consumption. We could increase it to more but it would make system unusable or change system behaviour. To solve the problem, this patch uses stackdepot to store stacktrace. It obviously provides memory saving but there is a drawback that stackdepot could fail. stackdepot allocates memory at runtime so it could fail if system has not enough memory. But, most of allocation stack are generated at very early time and there are much memory at this time. So, failure would not happen easily. And, one failure means that we miss just one page's allocation stacktrace so it would not be a big problem. In this patch, when memory allocation failure happens, we store special stracktrace handle to the page that is failed to save stacktrace. With it, user can guess memory usage properly even if failure happens. Memory saving looks as following. (4GB memory system with page_owner) (before the patch -> after the patch) static allocation: 92274688 bytes -> 25165824 bytes dynamic allocation after boot + kernel build: 0 bytes -> 327680 bytes total: 92274688 bytes -> 25493504 bytes 72% reduction in total. Note that implementation looks complex than someone would imagine because there is recursion issue. stackdepot uses page allocator and page_owner is called at page allocation. Using stackdepot in page_owner could re-call page allcator and then page_owner. That is a recursion. To detect and avoid it, whenever we obtain stacktrace, recursion is checked and page_owner is set to dummy information if found. Dummy information means that this page is allocated for page_owner feature itself (such as stackdepot) and it's understandable behavior for user. [iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3] Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-27 06:23:55 +08:00
select STACKDEPOT
mm/page_owner: keep track of page owners This is the page owner tracking code which is introduced so far ago. It is resident on Andrew's tree, though, nobody tried to upstream so it remain as is. Our company uses this feature actively to debug memory leak or to find a memory hogger so I decide to upstream this feature. This functionality help us to know who allocates the page. When allocating a page, we store some information about allocation in extra memory. Later, if we need to know status of all pages, we can get and analyze it from this stored information. In previous version of this feature, extra memory is statically defined in struct page, but, in this version, extra memory is allocated outside of struct page. It enables us to turn on/off this feature at boottime without considerable memory waste. Although we already have tracepoint for tracing page allocation/free, using it to analyze page owner is rather complex. We need to enlarge the trace buffer for preventing overlapping until userspace program launched. And, launched program continually dump out the trace buffer for later analysis and it would change system behaviour with more possibility rather than just keeping it in memory, so bad for debug. Moreover, we can use page_owner feature further for various purposes. For example, we can use it for fragmentation statistics implemented in this patch. And, I also plan to implement some CMA failure debugging feature using this interface. I'd like to give the credit for all developers contributed this feature, but, it's not easy because I don't know exact history. Sorry about that. Below is people who has "Signed-off-by" in the patches in Andrew's tree. Contributor: Alexander Nyberg <alexn@dsv.su.se> Mel Gorman <mgorman@suse.de> Dave Hansen <dave@linux.vnet.ibm.com> Minchan Kim <minchan@kernel.org> Michal Nazarewicz <mina86@mina86.com> Andrew Morton <akpm@linux-foundation.org> Jungsoo Son <jungsoo.son@lge.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 08:56:01 +08:00
select PAGE_EXTENSION
help
This keeps track of what call chain is the owner of a page, may
help to find bare alloc_page(s) leaks. Even if you include this
feature on your build, it is disabled in default. You should pass
"page_owner=on" to boot parameter in order to enable it. Eats
a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
for user-space helper.
If unsure, say N.
config DEBUG_FS
bool "Debug Filesystem"
help
debugfs is a virtual file system that kernel developers use to put
debugging files into. Enable this option to be able to read and
write to these files.
For detailed documentation on the debugfs API, see
Documentation/filesystems/.
If unsure, say N.
config HEADERS_CHECK
bool "Run 'make headers_check' when building vmlinux"
depends on !UML
help
This option will extract the user-visible kernel headers whenever
building the kernel, and will run basic sanity checks on them to
ensure that exported files do not attempt to include files which
were not exported, etc.
If you're making modifications to header files which are
relevant for userspace, say 'Y', and check the headers
exported to $(INSTALL_HDR_PATH) (usually 'usr/include' in
your build tree), to make sure they're suitable.
config DEBUG_SECTION_MISMATCH
bool "Enable full Section mismatch analysis"
help
The section mismatch analysis checks if there are illegal
references from one section to another section.
During linktime or runtime, some sections are dropped;
any use of code/data previously in these sections would
most likely result in an oops.
In the code, functions and variables are annotated with
__init,, etc. (see the full list in include/linux/init.h),
which results in the code/data being placed in specific sections.
The section mismatch analysis is always performed after a full
kernel build, and enabling this option causes the following
additional steps to occur:
- Add the option -fno-inline-functions-called-once to gcc commands.
When inlining a function annotated with __init in a non-init
function, we would lose the section information and thus
the analysis would not catch the illegal reference.
This option tells gcc to inline less (but it does result in
a larger kernel).
- Run the section mismatch analysis for each module/built-in.a file.
When we run the section mismatch analysis on vmlinux.o, we
lose valuable information about where the mismatch was
introduced.
Running the analysis for each module/built-in.a file
tells where the mismatch happens much closer to the
source. The drawback is that the same mismatch is
reported at least twice.
- Enable verbose reporting from modpost in order to help resolve
the section mismatches that are reported.
config SECTION_MISMATCH_WARN_ONLY
bool "Make section mismatch errors non-fatal"
default y
help
If you say N here, the build process will fail if there are any
section mismatch, instead of just throwing warnings.
If unsure, say Y.
#
# Select this config option from the architecture Kconfig, if it
# is preferred to always offer frame pointers as a config
# option on the architecture (regardless of KERNEL_DEBUG):
#
config ARCH_WANT_FRAME_POINTERS
bool
config FRAME_POINTER
bool "Compile the kernel with frame pointers"
depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS
default y if (DEBUG_INFO && UML) || ARCH_WANT_FRAME_POINTERS
help
If you say Y here the resulting kernel image will be slightly
larger and slower, but it gives very useful debugging information
in case of kernel bugs. (precise oopses/stacktraces/warnings)
config STACK_VALIDATION
bool "Compile-time stack metadata validation"
depends on HAVE_STACK_VALIDATION
default n
help
Add compile-time checks to validate stack metadata, including frame
pointers (if CONFIG_FRAME_POINTER is enabled). This helps ensure
that runtime stack traces are more reliable.
This is also a prerequisite for generation of ORC unwind data, which
is needed for CONFIG_UNWINDER_ORC.
For more information, see
tools/objtool/Documentation/stack-validation.txt.
config DEBUG_FORCE_WEAK_PER_CPU
bool "Force weak per-cpu definitions"
depends on DEBUG_KERNEL
help
s390 and alpha require percpu variables in modules to be
defined weak to work around addressing range issue which
puts the following two restrictions on percpu variable
definitions.
1. percpu symbols must be unique whether static or not
2. percpu variables can't be defined inside a function
To ensure that generic code follows the above rules, this
option forces all percpu variables to be defined as weak.
endmenu # "Compiler options"
config MAGIC_SYSRQ
bool "Magic SysRq key"
depends on !UML
help
If you say Y here, you will have some control over the system even
if the system crashes for example during kernel debugging (e.g., you
will be able to flush the buffer cache to disk, reboot the system
immediately or dump some status information). This is accomplished
by pressing various keys while holding SysRq (Alt+PrintScreen). It
also works on a serial console (on PC hardware at least), if you
send a BREAK and then within 5 seconds a command keypress. The
keys are documented in <file:Documentation/admin-guide/sysrq.rst>.
Don't say Y unless you really know what this hack does.
config MAGIC_SYSRQ_DEFAULT_ENABLE
hex "Enable magic SysRq key functions by default"
depends on MAGIC_SYSRQ
default 0x1
help
Specifies which SysRq key functions are enabled by default.
This may be set to 1 or 0 to enable or disable them all, or
to a bitmask as described in Documentation/admin-guide/sysrq.rst.
config MAGIC_SYSRQ_SERIAL
bool "Enable magic SysRq key over serial"
depends on MAGIC_SYSRQ
default y
help
Many embedded boards have a disconnected TTL level serial which can
generate some garbage that can lead to spurious false sysrq detects.
This option allows you to decide whether you want to enable the
magic SysRq key.
config DEBUG_KERNEL
bool "Kernel debugging"
help
Say Y here if you are developing drivers or trying to debug and
identify kernel problems.
menu "Memory Debugging"
source mm/Kconfig.debug
config DEBUG_OBJECTS
bool "Debug object operations"
depends on DEBUG_KERNEL
help
If you say Y here, additional code will be inserted into the
kernel to track the life time of various objects and validate
the operations on those objects.
config DEBUG_OBJECTS_SELFTEST
bool "Debug objects selftest"
depends on DEBUG_OBJECTS
help
This enables the selftest of the object debug code.
config DEBUG_OBJECTS_FREE
bool "Debug objects in freed memory"
depends on DEBUG_OBJECTS
help
This enables checks whether a k/v free operation frees an area
which contains an object which has not been deactivated
properly. This can make kmalloc/kfree-intensive workloads
much slower.
infrastructure to debug (dynamic) objects We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 15:55:01 +08:00
config DEBUG_OBJECTS_TIMERS
bool "Debug timer objects"
depends on DEBUG_OBJECTS
help
If you say Y here, additional code will be inserted into the
timer routines to track the life time of timer objects and
validate the timer operations.
config DEBUG_OBJECTS_WORK
bool "Debug work objects"
depends on DEBUG_OBJECTS
help
If you say Y here, additional code will be inserted into the
work queue routines to track the life time of work objects and
validate the work operations.
tree/tiny rcu: Add debug RCU head objects Helps finding racy users of call_rcu(), which results in hangs because list entries are overwritten and/or skipped. Changelog since v4: - Bissectability is now OK - Now generate a WARN_ON_ONCE() for non-initialized rcu_head passed to call_rcu(). Statically initialized objects are detected with object_is_static(). - Rename rcu_head_init_on_stack to init_rcu_head_on_stack. - Remove init_rcu_head() completely. Changelog since v3: - Include comments from Lai Jiangshan This new patch version is based on the debugobjects with the newly introduced "active state" tracker. Non-initialized entries are all considered as "statically initialized". An activation fixup (triggered by call_rcu()) takes care of performing the debug object initialization without issuing any warning. Since we cannot increase the size of struct rcu_head, I don't see much room to put an identifier for statically initialized rcu_head structures. So for now, we have to live without "activation without explicit init" detection. But the main purpose of this debug option is to detect double-activations (double call_rcu() use of a rcu_head before the callback is executed), which is correctly addressed here. This also detects potential internal RCU callback corruption, which would cause the callbacks to be executed twice. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: David S. Miller <davem@davemloft.net> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: akpm@linux-foundation.org CC: mingo@elte.hu CC: laijs@cn.fujitsu.com CC: dipankar@in.ibm.com CC: josh@joshtriplett.org CC: dvhltc@us.ibm.com CC: niv@us.ibm.com CC: tglx@linutronix.de CC: peterz@infradead.org CC: rostedt@goodmis.org CC: Valdis.Kletnieks@vt.edu CC: dhowells@redhat.com CC: eric.dumazet@gmail.com CC: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2010-04-17 20:48:42 +08:00
config DEBUG_OBJECTS_RCU_HEAD
bool "Debug RCU callbacks objects"
depends on DEBUG_OBJECTS
tree/tiny rcu: Add debug RCU head objects Helps finding racy users of call_rcu(), which results in hangs because list entries are overwritten and/or skipped. Changelog since v4: - Bissectability is now OK - Now generate a WARN_ON_ONCE() for non-initialized rcu_head passed to call_rcu(). Statically initialized objects are detected with object_is_static(). - Rename rcu_head_init_on_stack to init_rcu_head_on_stack. - Remove init_rcu_head() completely. Changelog since v3: - Include comments from Lai Jiangshan This new patch version is based on the debugobjects with the newly introduced "active state" tracker. Non-initialized entries are all considered as "statically initialized". An activation fixup (triggered by call_rcu()) takes care of performing the debug object initialization without issuing any warning. Since we cannot increase the size of struct rcu_head, I don't see much room to put an identifier for statically initialized rcu_head structures. So for now, we have to live without "activation without explicit init" detection. But the main purpose of this debug option is to detect double-activations (double call_rcu() use of a rcu_head before the callback is executed), which is correctly addressed here. This also detects potential internal RCU callback corruption, which would cause the callbacks to be executed twice. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: David S. Miller <davem@davemloft.net> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: akpm@linux-foundation.org CC: mingo@elte.hu CC: laijs@cn.fujitsu.com CC: dipankar@in.ibm.com CC: josh@joshtriplett.org CC: dvhltc@us.ibm.com CC: niv@us.ibm.com CC: tglx@linutronix.de CC: peterz@infradead.org CC: rostedt@goodmis.org CC: Valdis.Kletnieks@vt.edu CC: dhowells@redhat.com CC: eric.dumazet@gmail.com CC: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2010-04-17 20:48:42 +08:00
help
Enable this to turn on debugging of RCU list heads (call_rcu() usage).
percpu_counter: add debugobj support All percpu counters are linked to a global list on initialization and removed from it on destruction. The list is walked during CPU up/down. If a percpu counter is freed without being properly destroyed, the system will oops only on the next CPU up/down making it pretty nasty to track down. This patch adds debugobj support for percpu counters so that such problems can be found easily. As percpu counters don't make sense on stack and can't be statically initialized, debugobj support is pretty simple. It's initialized and activated on counter initialization, and deactivatd and destroyed on counter destruction. With this patch applied, the bug fixed by commit 602586a83b719df0fbd94196a1359ed35aeb2df3 (shmem: put_super must percpu_counter_destroy) triggers the following warning on tmpfs unmount and the system won't oops on the next cpu up/down operation. ------------[ cut here ]------------ WARNING: at lib/debugobjects.c:259 debug_print_object+0x5c/0x70() Hardware name: Bochs ODEBUG: free active (active state 0) object type: percpu_counter Modules linked in: Pid: 3999, comm: umount Not tainted 2.6.36-rc2-work+ #5 Call Trace: [<ffffffff81083f7f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81084076>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813b45cc>] debug_print_object+0x5c/0x70 [<ffffffff813b50e5>] debug_check_no_obj_freed+0x125/0x210 [<ffffffff811577d3>] kfree+0xb3/0x2f0 [<ffffffff81132edd>] shmem_put_super+0x1d/0x30 [<ffffffff81162e96>] generic_shutdown_super+0x56/0xe0 [<ffffffff81162f86>] kill_anon_super+0x16/0x60 [<ffffffff81162ff7>] kill_litter_super+0x27/0x30 [<ffffffff81163295>] deactivate_locked_super+0x45/0x60 [<ffffffff81163cfa>] deactivate_super+0x4a/0x70 [<ffffffff8117d446>] mntput_no_expire+0x86/0xe0 [<ffffffff8117df7f>] sys_umount+0x6f/0x360 [<ffffffff8103f01b>] system_call_fastpath+0x16/0x1b ---[ end trace cce2a341ba3611a7 ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglxlinutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 05:23:05 +08:00
config DEBUG_OBJECTS_PERCPU_COUNTER
bool "Debug percpu counter objects"
depends on DEBUG_OBJECTS
help
If you say Y here, additional code will be inserted into the
percpu counter routines to track the life time of percpu counter
objects and validate the percpu counter operations.
config DEBUG_OBJECTS_ENABLE_DEFAULT
int "debug_objects bootup default value (0-1)"
range 0 1
default "1"
depends on DEBUG_OBJECTS
help
Debug objects boot parameter default value
config DEBUG_SLAB
bool "Debug slab memory allocations"
depends on DEBUG_KERNEL && SLAB
help
Say Y here to have the kernel do limited verification on memory
allocation as well as poisoning memory on free to catch use of freed
memory. This can make kmalloc/kfree-intensive workloads much slower.
[PATCH] slab: implement /proc/slab_allocators Implement /proc/slab_allocators. It produces output like: idr_layer_cache: 80 idr_pre_get+0x33/0x4e buffer_head: 2555 alloc_buffer_head+0x20/0x75 mm_struct: 9 mm_alloc+0x1e/0x42 mm_struct: 20 dup_mm+0x36/0x370 vm_area_struct: 384 dup_mm+0x18f/0x370 vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3 vm_area_struct: 1 split_vma+0x5a/0x10e vm_area_struct: 11 do_brk+0x206/0x2e2 vm_area_struct: 2 copy_vma+0xda/0x142 vm_area_struct: 9 setup_arg_pages+0x99/0x214 fs_cache: 8 copy_fs_struct+0x21/0x133 fs_cache: 29 copy_process+0xf38/0x10e3 files_cache: 30 alloc_files+0x1b/0xcf signal_cache: 81 copy_process+0xbaa/0x10e3 sighand_cache: 77 copy_process+0xe65/0x10e3 sighand_cache: 1 de_thread+0x4d/0x5f8 anon_vma: 241 anon_vma_prepare+0xd9/0xf3 size-2048: 1 add_sect_attrs+0x5f/0x145 size-2048: 2 journal_init_revoke+0x99/0x302 size-2048: 2 journal_init_revoke+0x137/0x302 size-2048: 2 journal_init_inode+0xf9/0x1c4 Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> DESC slab-leaks3-locking-fix EDESC From: Andrew Morton <akpm@osdl.org> Update for slab-remove-cachep-spinlock.patch Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 19:06:39 +08:00
config DEBUG_SLAB_LEAK
bool "Memory leak debugging"
depends on DEBUG_SLAB
config SLUB_DEBUG_ON
bool "SLUB debugging on by default"
depends on SLUB && SLUB_DEBUG
default n
help
Boot with debugging on by default. SLUB boots by default with
the runtime debug capabilities switched off. Enabling this is
equivalent to specifying the "slub_debug" parameter on boot.
There is no support for more fine grained debug control like
possible with slub_debug=xxx. SLUB debugging may be switched
off in a kernel built with CONFIG_SLUB_DEBUG_ON by specifying
"slub_debug=-".
SLUB: Support for performance statistics The statistics provided here allow the monitoring of allocator behavior but at the cost of some (minimal) loss of performance. Counters are placed in SLUB's per cpu data structure. The per cpu structure may be extended by the statistics to grow larger than one cacheline which will increase the cache footprint of SLUB. There is a compile option to enable/disable the inclusion of the runtime statistics and its off by default. The slabinfo tool is enhanced to support these statistics via two options: -D Switches the line of information displayed for a slab from size mode to activity mode. -A Sorts the slabs displayed by activity. This allows the display of the slabs most important to the performance of a certain load. -r Report option will report detailed statistics on Example (tbench load): slabinfo -AD ->Shows the most active slabs Name Objects Alloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :0000192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct 1349 119642 118355 91 22 :0004096 15 66753 66751 98 98 :0000064 2067 25297 23383 98 78 dentry 10259 28635 18464 91 45 :0000080 11004 18950 8089 98 98 :0000096 1703 12358 10784 99 98 :0000128 762 10582 9875 94 18 :0000512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :0000768 258 7174 6931 58 15 So the skbuff_fclone_cache is of highest importance for the tbench load. Pretty high load on the 192 sized slab. Look for the aliases slabinfo -a | grep 000192 :0000192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. Looking into the statistics of the skbuff_fclone_cache is possible through slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned .... Usual output ... Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc 272 264 0 0 Add partial 25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total 111954404 111954404 Flushes 49 Refill 0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) Looks good because the fastpath is overwhelmingly taken. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 5297262 5259882 99 99 Slowpath 4477 39586 0 0 Page Alloc 937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Descriptions of the output: Total: The total number of allocation and frees that occurred for a slab Fastpath: The number of allocations/frees that used the fastpath. Slowpath: Other allocations Page Alloc: Number of calls to the page allocator as a result of slowpath processing Add Partial: Number of slabs added to the partial list through free or alloc (occurs during cpuslab flushes) Remove Partial: Number of slabs removed from the partial list as a result of allocations retrieving a partial slab or by a free freeing the last object of a slab. RemoteObj/Froz: How many times were remotely freed object encountered when a slab was about to be deactivated. Frozen: How many times was free able to skip list processing because the slab was in use as the cpuslab of another processor. Flushes: Number of times the cpuslab was flushed on request (kmem_cache_shrink, may result from races in __slab_alloc) Refill: Number of times we were able to refill the cpuslab from remotely freed objects for the same slab. Deactivate: Statistics how slabs were deactivated. Shows how they were put onto the partial list. In general fastpath is very good. Slowpath without partial list processing is also desirable. Any touching of partial list uses node specific locks which may potentially cause list lock contention. Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-08 09:47:41 +08:00
config SLUB_STATS
default n
bool "Enable SLUB performance statistics"
depends on SLUB && SYSFS
SLUB: Support for performance statistics The statistics provided here allow the monitoring of allocator behavior but at the cost of some (minimal) loss of performance. Counters are placed in SLUB's per cpu data structure. The per cpu structure may be extended by the statistics to grow larger than one cacheline which will increase the cache footprint of SLUB. There is a compile option to enable/disable the inclusion of the runtime statistics and its off by default. The slabinfo tool is enhanced to support these statistics via two options: -D Switches the line of information displayed for a slab from size mode to activity mode. -A Sorts the slabs displayed by activity. This allows the display of the slabs most important to the performance of a certain load. -r Report option will report detailed statistics on Example (tbench load): slabinfo -AD ->Shows the most active slabs Name Objects Alloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :0000192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct 1349 119642 118355 91 22 :0004096 15 66753 66751 98 98 :0000064 2067 25297 23383 98 78 dentry 10259 28635 18464 91 45 :0000080 11004 18950 8089 98 98 :0000096 1703 12358 10784 99 98 :0000128 762 10582 9875 94 18 :0000512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :0000768 258 7174 6931 58 15 So the skbuff_fclone_cache is of highest importance for the tbench load. Pretty high load on the 192 sized slab. Look for the aliases slabinfo -a | grep 000192 :0000192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. Looking into the statistics of the skbuff_fclone_cache is possible through slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned .... Usual output ... Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc 272 264 0 0 Add partial 25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total 111954404 111954404 Flushes 49 Refill 0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) Looks good because the fastpath is overwhelmingly taken. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 5297262 5259882 99 99 Slowpath 4477 39586 0 0 Page Alloc 937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Descriptions of the output: Total: The total number of allocation and frees that occurred for a slab Fastpath: The number of allocations/frees that used the fastpath. Slowpath: Other allocations Page Alloc: Number of calls to the page allocator as a result of slowpath processing Add Partial: Number of slabs added to the partial list through free or alloc (occurs during cpuslab flushes) Remove Partial: Number of slabs removed from the partial list as a result of allocations retrieving a partial slab or by a free freeing the last object of a slab. RemoteObj/Froz: How many times were remotely freed object encountered when a slab was about to be deactivated. Frozen: How many times was free able to skip list processing because the slab was in use as the cpuslab of another processor. Flushes: Number of times the cpuslab was flushed on request (kmem_cache_shrink, may result from races in __slab_alloc) Refill: Number of times we were able to refill the cpuslab from remotely freed objects for the same slab. Deactivate: Statistics how slabs were deactivated. Shows how they were put onto the partial list. In general fastpath is very good. Slowpath without partial list processing is also desirable. Any touching of partial list uses node specific locks which may potentially cause list lock contention. Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-08 09:47:41 +08:00
help
SLUB statistics are useful to debug SLUBs allocation behavior in
order find ways to optimize the allocator. This should never be
enabled for production use since keeping statistics slows down
the allocator by a few percentage points. The slabinfo command
supports the determination of the most active slabs to figure
out which slabs are relevant to a particular load.
Try running: slabinfo -DA
config HAVE_DEBUG_KMEMLEAK
bool
config DEBUG_KMEMLEAK
bool "Kernel memory leak detector"
depends on DEBUG_KERNEL && HAVE_DEBUG_KMEMLEAK
select DEBUG_FS
select STACKTRACE if STACKTRACE_SUPPORT
select KALLSYMS
select CRC32
help
Say Y here if you want to enable the memory leak
detector. The memory allocation/freeing is traced in a way
similar to the Boehm's conservative garbage collector, the
difference being that the orphan objects are not freed but
only shown in /sys/kernel/debug/kmemleak. Enabling this
feature will introduce an overhead to memory
allocations. See Documentation/dev-tools/kmemleak.rst for more
details.
Enabling DEBUG_SLAB or SLUB_DEBUG may increase the chances
of finding leaks due to the slab objects poisoning.
In order to access the kmemleak file, debugfs needs to be
mounted (usually at /sys/kernel/debug).
config DEBUG_KMEMLEAK_EARLY_LOG_SIZE
int "Maximum kmemleak early log entries"
depends on DEBUG_KMEMLEAK
range 200 40000
default 400
help
Kmemleak must track all the memory allocations to avoid
reporting false positives. Since memory may be allocated or
freed before kmemleak is initialised, an early log buffer is
used to store these actions. If kmemleak reports "early log
buffer exceeded", please increase this value.
config DEBUG_KMEMLEAK_TEST
tristate "Simple test for the kernel memory leak detector"
depends on DEBUG_KMEMLEAK && m
help
This option enables a module that explicitly leaks memory.
If unsure, say N.
config DEBUG_KMEMLEAK_DEFAULT_OFF
bool "Default kmemleak to off"
depends on DEBUG_KMEMLEAK
help
Say Y here to disable kmemleak by default. It can then be enabled
on the command line via kmemleak=on.
config DEBUG_STACK_USAGE
bool "Stack utilization instrumentation"
depends on DEBUG_KERNEL && !IA64
help
Enables the display of the minimum amount of free stack which each
task has ever had available in the sysrq-T and sysrq-P debug output.
This option will slow down process creation somewhat.
config DEBUG_VM
bool "Debug VM"
depends on DEBUG_KERNEL
help
Enable this to turn on extended checks in the virtual-memory system
that may impact performance.
If unsure, say N.
config DEBUG_VM_VMACACHE
bool "Debug VMA caching"
depends on DEBUG_VM
help
Enable this to turn on VMA caching debug information. Doing so
can cause significant overhead, so only enable it in non-production
environments.
If unsure, say N.
config DEBUG_VM_RB
bool "Debug VM red-black trees"
depends on DEBUG_VM
help
Enable VM red-black tree debugging information and extra validations.
If unsure, say N.
page-flags: introduce page flags policies wrt compound pages This patch adds a third argument to macros which create function definitions for page flags. This argument defines how page-flags helpers behave on compound functions. For now we define four policies: - PF_ANY: the helper function operates on the page it gets, regardless if it's non-compound, head or tail. - PF_HEAD: the helper function operates on the head page of the compound page if it gets tail page. - PF_NO_TAIL: only head and non-compond pages are acceptable for this helper function. - PF_NO_COMPOUND: only non-compound pages are acceptable for this helper function. For now we use policy PF_ANY for all helpers, which matches current behaviour. We do not enforce the policy for TESTPAGEFLAG, because we have flags checked for random pages all over the kernel. Noticeable exception to this is PageTransHuge() which triggers VM_BUG_ON() for tail page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:51:21 +08:00
config DEBUG_VM_PGFLAGS
bool "Debug page-flags operations"
depends on DEBUG_VM
help
Enables extra validation on page flags operations.
If unsure, say N.
config ARCH_HAS_DEBUG_VIRTUAL
bool
config DEBUG_VIRTUAL
bool "Debug VM translations"
depends on DEBUG_KERNEL && ARCH_HAS_DEBUG_VIRTUAL
help
Enable some costly sanity checks in virtual to page code. This can
catch mistakes with virt_to_page() and friends.
If unsure, say N.
config DEBUG_NOMMU_REGIONS
bool "Debug the global anon/private NOMMU mapping region tree"
depends on DEBUG_KERNEL && !MMU
help
This option causes the global tree of anonymous and private mapping
regions to be regularly checked for invalid topology.
config DEBUG_MEMORY_INIT
bool "Debug memory initialisation" if EXPERT
default !EXPERT
help
Enable this for additional checks during memory initialisation.
The sanity checks verify aspects of the VM such as the memory model
and other information provided by the architecture. Verbose
information will be printed at KERN_DEBUG loglevel depending
on the mminit_loglevel= command-line option.
If unsure, say Y
config MEMORY_NOTIFIER_ERROR_INJECT
tristate "Memory hotplug notifier error injection module"
depends on MEMORY_HOTPLUG_SPARSE && NOTIFIER_ERROR_INJECTION
help
This option provides the ability to inject artificial errors to
memory hotplug notifier chain callbacks. It is controlled through
debugfs interface under /sys/kernel/debug/notifier-error-inject/memory
If the notifier call chain should be failed with some events
notified, write the error code to "actions/<notifier event>/error".
Example: Inject memory hotplug offline error (-12 == -ENOMEM)
# cd /sys/kernel/debug/notifier-error-inject/memory
# echo -12 > actions/MEM_GOING_OFFLINE/error
# echo offline > /sys/devices/system/memory/memoryXXX/state
bash: echo: write error: Cannot allocate memory
To compile this code as a module, choose M here: the module will
be called memory-notifier-error-inject.
If unsure, say N.
config DEBUG_PER_CPU_MAPS
bool "Debug access to per_cpu maps"
depends on DEBUG_KERNEL
depends on SMP
help
Say Y to verify that the per_cpu map being accessed has
been set up. This adds a fair amount of code to kernel memory
and decreases performance.
Say N if unsure.
config DEBUG_HIGHMEM
bool "Highmem debugging"
depends on DEBUG_KERNEL && HIGHMEM
help
This option enables additional error checking for high memory
systems. Disable for production systems.
config HAVE_DEBUG_STACKOVERFLOW
bool
config DEBUG_STACKOVERFLOW
bool "Check for stack overflows"
depends on DEBUG_KERNEL && HAVE_DEBUG_STACKOVERFLOW
---help---
Say Y here if you want to check for overflows of kernel, IRQ
and exception stacks (if your architecture uses them). This
option will show detailed messages if free stack space drops
below a certain limit.
These kinds of bugs usually occur when call-chains in the
kernel get too deep, especially when interrupts are
involved.
Use this in cases where you see apparently random memory
corruption, especially if it appears in 'struct thread_info'
If in doubt, say "N".
kasan: add kernel address sanitizer infrastructure Kernel Address sanitizer (KASan) is a dynamic memory error detector. It provides fast and comprehensive solution for finding use-after-free and out-of-bounds bugs. KASAN uses compile-time instrumentation for checking every memory access, therefore GCC > v4.9.2 required. v4.9.2 almost works, but has issues with putting symbol aliases into the wrong section, which breaks kasan instrumentation of globals. This patch only adds infrastructure for kernel address sanitizer. It's not available for use yet. The idea and some code was borrowed from [1]. Basic idea: The main idea of KASAN is to use shadow memory to record whether each byte of memory is safe to access or not, and use compiler's instrumentation to check the shadow memory on each memory access. Address sanitizer uses 1/8 of the memory addressable in kernel for shadow memory and uses direct mapping with a scale and offset to translate a memory address to its corresponding shadow address. Here is function to translate address to corresponding shadow address: unsigned long kasan_mem_to_shadow(unsigned long addr) { return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET; } where KASAN_SHADOW_SCALE_SHIFT = 3. So for every 8 bytes there is one corresponding byte of shadow memory. The following encoding used for each shadow byte: 0 means that all 8 bytes of the corresponding memory region are valid for access; k (1 <= k <= 7) means that the first k bytes are valid for access, and other (8 - k) bytes are not; Any negative value indicates that the entire 8-bytes are inaccessible. Different negative values used to distinguish between different kinds of inaccessible memory (redzones, freed memory) (see mm/kasan/kasan.h). To be able to detect accesses to bad memory we need a special compiler. Such compiler inserts a specific function calls (__asan_load*(addr), __asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16. These functions check whether memory region is valid to access or not by checking corresponding shadow memory. If access is not valid an error printed. Historical background of the address sanitizer from Dmitry Vyukov: "We've developed the set of tools, AddressSanitizer (Asan), ThreadSanitizer and MemorySanitizer, for user space. We actively use them for testing inside of Google (continuous testing, fuzzing, running prod services). To date the tools have found more than 10'000 scary bugs in Chromium, Google internal codebase and various open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and lots of others): [2] [3] [4]. The tools are part of both gcc and clang compilers. We have not yet done massive testing under the Kernel AddressSanitizer (it's kind of chicken and egg problem, you need it to be upstream to start applying it extensively). To date it has found about 50 bugs. Bugs that we've found in upstream kernel are listed in [5]. We've also found ~20 bugs in out internal version of the kernel. Also people from Samsung and Oracle have found some. [...] As others noted, the main feature of AddressSanitizer is its performance due to inline compiler instrumentation and simple linear shadow memory. User-space Asan has ~2x slowdown on computational programs and ~2x memory consumption increase. Taking into account that kernel usually consumes only small fraction of CPU and memory when running real user-space programs, I would expect that kernel Asan will have ~10-30% slowdown and similar memory consumption increase (when we finish all tuning). I agree that Asan can well replace kmemcheck. We have plans to start working on Kernel MemorySanitizer that finds uses of unitialized memory. Asan+Msan will provide feature-parity with kmemcheck. As others noted, Asan will unlikely replace debug slab and pagealloc that can be enabled at runtime. Asan uses compiler instrumentation, so even if it is disabled, it still incurs visible overheads. Asan technology is easily portable to other architectures. Compiler instrumentation is fully portable. Runtime has some arch-dependent parts like shadow mapping and atomic operation interception. They are relatively easy to port." Comparison with other debugging features: ======================================== KMEMCHECK: - KASan can do almost everything that kmemcheck can. KASan uses compile-time instrumentation, which makes it significantly faster than kmemcheck. The only advantage of kmemcheck over KASan is detection of uninitialized memory reads. Some brief performance testing showed that kasan could be x500-x600 times faster than kmemcheck: $ netperf -l 30 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec no debug: 87380 16384 16384 30.00 41624.72 kasan inline: 87380 16384 16384 30.00 12870.54 kasan outline: 87380 16384 16384 30.00 10586.39 kmemcheck: 87380 16384 16384 30.03 20.23 - Also kmemcheck couldn't work on several CPUs. It always sets number of CPUs to 1. KASan doesn't have such limitation. DEBUG_PAGEALLOC: - KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page granularity level, so it able to find more bugs. SLUB_DEBUG (poisoning, redzones): - SLUB_DEBUG has lower overhead than KASan. - SLUB_DEBUG in most cases are not able to detect bad reads, KASan able to detect both reads and writes. - In some cases (e.g. redzone overwritten) SLUB_DEBUG detect bugs only on allocation/freeing of object. KASan catch bugs right before it will happen, so we always know exact place of first bad read/write. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel [2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs [3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs [4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs [5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies Based on work by Andrey Konovalov. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14 06:39:17 +08:00
source "lib/Kconfig.kasan"
endmenu # "Memory Debugging"
kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-23 05:27:30 +08:00
config ARCH_HAS_KCOV
bool
help
KCOV does not have any arch-specific code, but currently it is enabled
only for x86_64. KCOV requires testing on other archs, and most likely
disabling of instrumentation for some early boot code.
config CC_HAS_SANCOV_TRACE_PC
def_bool $(cc-option,-fsanitize-coverage=trace-pc)
kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-23 05:27:30 +08:00
config KCOV
bool "Code coverage for fuzzing"
depends on ARCH_HAS_KCOV
depends on CC_HAS_SANCOV_TRACE_PC || GCC_PLUGINS
kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-23 05:27:30 +08:00
select DEBUG_FS
select GCC_PLUGIN_SANCOV if !CC_HAS_SANCOV_TRACE_PC
kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-23 05:27:30 +08:00
help
KCOV exposes kernel code coverage information in a form suitable
for coverage-guided fuzzing (randomized testing).
If RANDOMIZE_BASE is enabled, PC values will not be stable across
different machines and across reboots. If you need stable PC values,
disable RANDOMIZE_BASE.
For more details, see Documentation/dev-tools/kcov.rst.
kernel: add kcov code coverage kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-23 05:27:30 +08:00
config KCOV_ENABLE_COMPARISONS
bool "Enable comparison operands collection by KCOV"
depends on KCOV
depends on $(cc-option,-fsanitize-coverage=trace-cmp)
help
KCOV also exposes operands of every comparison in the instrumented
code along with operand sizes and PCs of the comparison instructions.
These operands can be used by fuzzing engines to improve the quality
of fuzzing coverage.
config KCOV_INSTRUMENT_ALL
bool "Instrument all code by default"
depends on KCOV
default y
help
If you are doing generic system call fuzzing (like e.g. syzkaller),
then you will want to instrument the whole kernel and you should
say y here. If you are doing more targeted fuzzing (like e.g.
filesystem fuzzing with AFL) then you will want to enable coverage
for more specific subsets of files, and should say n here.
config DEBUG_SHIRQ
bool "Debug shared IRQ handlers"
depends on DEBUG_KERNEL
help
Enable this to generate a spurious interrupt as soon as a shared
interrupt handler is registered, and just before one is deregistered.
Drivers ought to be able to handle interrupts coming in at those
points; some don't and need to be caught.
menu "Debug Lockups and Hangs"
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-08 05:11:44 +08:00
config LOCKUP_DETECTOR
bool
config SOFTLOCKUP_DETECTOR
bool "Detect Soft Lockups"
depends on DEBUG_KERNEL && !S390
select LOCKUP_DETECTOR
help
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-08 05:11:44 +08:00
Say Y here to enable the kernel to act as a watchdog to detect
soft lockups.
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-08 05:11:44 +08:00
Softlockups are bugs that cause the kernel to loop in kernel
mode for more than 20 seconds, without giving other tasks a
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-08 05:11:44 +08:00
chance to run. The current stack trace is displayed upon
detection and the system will stay locked up.
config BOOTPARAM_SOFTLOCKUP_PANIC
bool "Panic (Reboot) On Soft Lockups"
depends on SOFTLOCKUP_DETECTOR
help
Say Y here to enable the kernel to panic on "soft lockups",
which are bugs that cause the kernel to loop in kernel
mode for more than 20 seconds (configurable using the watchdog_thresh
sysctl), without giving other tasks a chance to run.
The panic can be used in combination with panic_timeout,
to cause the system to reboot automatically after a
lockup has been detected. This feature is useful for
high-availability systems that have uptime guarantees and
where a lockup must be resolved ASAP.
Say N if unsure.
config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
int
depends on SOFTLOCKUP_DETECTOR
range 0 1
default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
config HARDLOCKUP_DETECTOR_PERF
bool
select SOFTLOCKUP_DETECTOR
kernel/watchdog: Prevent false positives with turbo modes The hardlockup detector on x86 uses a performance counter based on unhalted CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the performance counter period, so the hrtimer should fire 2-3 times before the performance counter NMI fires. The NMI code checks whether the hrtimer fired since the last invocation. If not, it assumess a hard lockup. The calculation of those periods is based on the nominal CPU frequency. Turbo modes increase the CPU clock frequency and therefore shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x nominal frequency) the perf/NMI period is shorter than the hrtimer period which leads to false positives. A simple fix would be to shorten the hrtimer period, but that comes with the side effect of more frequent hrtimer and softlockup thread wakeups, which is not desired. Implement a low pass filter, which checks the perf/NMI period against kernel time. If the perf/NMI fires before 4/5 of the watchdog period has elapsed then the event is ignored and postponed to the next perf/NMI. That solves the problem and avoids the overhead of shorter hrtimer periods and more frequent softlockup thread wakeups. Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector") Reported-and-tested-by: Kan Liang <Kan.liang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dzickus@redhat.com Cc: prarit@redhat.com Cc: ak@linux.intel.com Cc: babu.moger@oracle.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: acme@redhat.com Cc: stable@vger.kernel.org Cc: atomlin@redhat.com Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
2017-08-15 15:50:13 +08:00
#
# Enables a timestamp based low pass filter to compensate for perf based
# hard lockup detection which runs too fast due to turbo modes.
#
config HARDLOCKUP_CHECK_TIMESTAMP
bool
#
# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
# lockup detector rather than the perf based detector.
#
config HARDLOCKUP_DETECTOR
bool "Detect Hard Lockups"
depends on DEBUG_KERNEL && !S390
depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
select LOCKUP_DETECTOR
select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH
help
Say Y here to enable the kernel to act as a watchdog to detect
hard lockups.
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-08 05:11:44 +08:00
Hardlockups are bugs that cause the CPU to loop in kernel mode
for more than 10 seconds, without letting other interrupts have a
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-08 05:11:44 +08:00
chance to run. The current stack trace is displayed upon detection
and the system will stay locked up.
config BOOTPARAM_HARDLOCKUP_PANIC
bool "Panic (Reboot) On Hard Lockups"
depends on HARDLOCKUP_DETECTOR
help
Say Y here to enable the kernel to panic on "hard lockups",
which are bugs that cause the kernel to loop in kernel
mode with interrupts disabled for more than 10 seconds (configurable
using the watchdog_thresh sysctl).
Say N if unsure.
config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
int
depends on HARDLOCKUP_DETECTOR
range 0 1
default 0 if !BOOTPARAM_HARDLOCKUP_PANIC
default 1 if BOOTPARAM_HARDLOCKUP_PANIC
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
default SOFTLOCKUP_DETECTOR
help
Say Y here to enable the kernel to detect "hung tasks",
which are bugs that cause the task to be stuck in
uninterruptible "D" state indefinitely.
When a hung task is detected, the kernel will print the
current stack trace (which you should report), but the
task will stay in uninterruptible state. If lockdep is
enabled then all held locks will also be reported. This
feature has negligible overhead.
[PATCH] slab: implement /proc/slab_allocators Implement /proc/slab_allocators. It produces output like: idr_layer_cache: 80 idr_pre_get+0x33/0x4e buffer_head: 2555 alloc_buffer_head+0x20/0x75 mm_struct: 9 mm_alloc+0x1e/0x42 mm_struct: 20 dup_mm+0x36/0x370 vm_area_struct: 384 dup_mm+0x18f/0x370 vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3 vm_area_struct: 1 split_vma+0x5a/0x10e vm_area_struct: 11 do_brk+0x206/0x2e2 vm_area_struct: 2 copy_vma+0xda/0x142 vm_area_struct: 9 setup_arg_pages+0x99/0x214 fs_cache: 8 copy_fs_struct+0x21/0x133 fs_cache: 29 copy_process+0xf38/0x10e3 files_cache: 30 alloc_files+0x1b/0xcf signal_cache: 81 copy_process+0xbaa/0x10e3 sighand_cache: 77 copy_process+0xe65/0x10e3 sighand_cache: 1 de_thread+0x4d/0x5f8 anon_vma: 241 anon_vma_prepare+0xd9/0xf3 size-2048: 1 add_sect_attrs+0x5f/0x145 size-2048: 2 journal_init_revoke+0x99/0x302 size-2048: 2 journal_init_revoke+0x137/0x302 size-2048: 2 journal_init_inode+0xf9/0x1c4 Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> DESC slab-leaks3-locking-fix EDESC From: Andrew Morton <akpm@osdl.org> Update for slab-remove-cachep-spinlock.patch Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexander Nyberg <alexn@telia.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 19:06:39 +08:00
config DEFAULT_HUNG_TASK_TIMEOUT
int "Default timeout for hung task detection (in seconds)"
depends on DETECT_HUNG_TASK
default 120
help
This option controls the default timeout (in seconds) used
to determine when a task has become non-responsive and should
be considered hung.
It can be adjusted at runtime via the kernel.hung_task_timeout_secs
sysctl or by writing a value to
/proc/sys/kernel/hung_task_timeout_secs.
SLUB: Support for performance statistics The statistics provided here allow the monitoring of allocator behavior but at the cost of some (minimal) loss of performance. Counters are placed in SLUB's per cpu data structure. The per cpu structure may be extended by the statistics to grow larger than one cacheline which will increase the cache footprint of SLUB. There is a compile option to enable/disable the inclusion of the runtime statistics and its off by default. The slabinfo tool is enhanced to support these statistics via two options: -D Switches the line of information displayed for a slab from size mode to activity mode. -A Sorts the slabs displayed by activity. This allows the display of the slabs most important to the performance of a certain load. -r Report option will report detailed statistics on Example (tbench load): slabinfo -AD ->Shows the most active slabs Name Objects Alloc Free %Fast skbuff_fclone_cache 33 111953835 111953835 99 99 :0000192 2666 5283688 5281047 99 99 :0001024 849 5247230 5246389 83 83 vm_area_struct 1349 119642 118355 91 22 :0004096 15 66753 66751 98 98 :0000064 2067 25297 23383 98 78 dentry 10259 28635 18464 91 45 :0000080 11004 18950 8089 98 98 :0000096 1703 12358 10784 99 98 :0000128 762 10582 9875 94 18 :0000512 184 9807 9647 95 81 :0002048 479 9669 9195 83 65 anon_vma 777 9461 9002 99 71 kmalloc-8 6492 9981 5624 99 97 :0000768 258 7174 6931 58 15 So the skbuff_fclone_cache is of highest importance for the tbench load. Pretty high load on the 192 sized slab. Look for the aliases slabinfo -a | grep 000192 :0000192 <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili Likely skbuff_head_cache. Looking into the statistics of the skbuff_fclone_cache is possible through slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned .... Usual output ... Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 111953360 111946981 99 99 Slowpath 1044 7423 0 0 Page Alloc 272 264 0 0 Add partial 25 325 0 0 Remove partial 86 264 0 0 RemoteObj/SlabFrozen 350 4832 0 0 Total 111954404 111954404 Flushes 49 Refill 0 Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%) Looks good because the fastpath is overwhelmingly taken. skbuff_head_cache: Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 5297262 5259882 99 99 Slowpath 4477 39586 0 0 Page Alloc 937 824 0 0 Add partial 0 2515 0 0 Remove partial 1691 824 0 0 RemoteObj/SlabFrozen 2621 9684 0 0 Total 5301739 5299468 Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Descriptions of the output: Total: The total number of allocation and frees that occurred for a slab Fastpath: The number of allocations/frees that used the fastpath. Slowpath: Other allocations Page Alloc: Number of calls to the page allocator as a result of slowpath processing Add Partial: Number of slabs added to the partial list through free or alloc (occurs during cpuslab flushes) Remove Partial: Number of slabs removed from the partial list as a result of allocations retrieving a partial slab or by a free freeing the last object of a slab. RemoteObj/Froz: How many times were remotely freed object encountered when a slab was about to be deactivated. Frozen: How many times was free able to skip list processing because the slab was in use as the cpuslab of another processor. Flushes: Number of times the cpuslab was flushed on request (kmem_cache_shrink, may result from races in __slab_alloc) Refill: Number of times we were able to refill the cpuslab from remotely freed objects for the same slab. Deactivate: Statistics how slabs were deactivated. Shows how they were put onto the partial list. In general fastpath is very good. Slowpath without partial list processing is also desirable. Any touching of partial list uses node specific locks which may potentially cause list lock contention. Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-08 09:47:41 +08:00
A timeout of 0 disables the check. The default is two minutes.
Keeping the default should be fine in most cases.
config BOOTPARAM_HUNG_TASK_PANIC
bool "Panic (Reboot) On Hung Tasks"
depends on DETECT_HUNG_TASK
help
Say Y here to enable the kernel to panic on "hung tasks",
which are bugs that cause the kernel to leave a task stuck
in uninterruptible "D" state.
The panic can be used in combination with panic_timeout,
to cause the system to reboot automatically after a
hung task has been detected. This feature is useful for
high-availability systems that have uptime guarantees and
where a hung tasks must be resolved ASAP.
Say N if unsure.
config BOOTPARAM_HUNG_TASK_PANIC_VALUE
int
depends on DETECT_HUNG_TASK
range 0 1
default 0 if !BOOTPARAM_HUNG_TASK_PANIC
default 1 if BOOTPARAM_HUNG_TASK_PANIC
workqueue: implement lockup detector Workqueue stalls can happen from a variety of usage bugs such as missing WQ_MEM_RECLAIM flag or concurrency managed work item indefinitely staying RUNNING. These stalls can be extremely difficult to hunt down because the usual warning mechanisms can't detect workqueue stalls and the internal state is pretty opaque. To alleviate the situation, this patch implements workqueue lockup detector. It periodically monitors all worker_pools periodically and, if any pool failed to make forward progress longer than the threshold duration, triggers warning and dumps workqueue state as follows. BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s! Showing busy workqueues and worker pools: workqueue events: flags=0x0 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256 pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent workqueue events_power_efficient: flags=0x80 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 pending: check_lifetime, neigh_periodic_work workqueue cgroup_pidlist_destroy: flags=0x0 pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 pending: cgroup_pidlist_destroy_work_fn ... The detection mechanism is controller through kernel parameter workqueue.watchdog_thresh and can be updated at runtime through the sysfs module parameter file. v2: Decoupled from softlockup control knobs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Andrew Morton <akpm@linux-foundation.org>
2015-12-09 00:28:04 +08:00
config WQ_WATCHDOG
bool "Detect Workqueue Stalls"
depends on DEBUG_KERNEL
help
Say Y here to enable stall detection on workqueues. If a
worker pool doesn't make forward progress on a pending work
item for over a given amount of time, 30s by default, a
warning message is printed along with dump of workqueue
state. This can be configured through kernel parameter
"workqueue.watchdog_thresh" and its sysfs counterpart.
endmenu # "Debug lockups and hangs"
config PANIC_ON_OOPS
bool "Panic on Oops"
help
Say Y here to enable the kernel to panic when it oopses. This
has the same effect as setting oops=panic on the kernel command
line.
This feature is useful to ensure that the kernel does not do
anything erroneous after an oops which could result in data
corruption or other issues.
Say N if unsure.
config PANIC_ON_OOPS_VALUE
int
range 0 1
default 0 if !PANIC_ON_OOPS
default 1 if PANIC_ON_OOPS
config PANIC_TIMEOUT
int "panic timeout"
default 0
help
Set the timeout value (in seconds) until a reboot occurs when the
the kernel panics. If n = 0, then we wait forever. A timeout
value n > 0 will wait n seconds before rebooting, while a timeout
value n < 0 will reboot immediately.
config SCHED_DEBUG
bool "Collect scheduler debugging info"
depends on DEBUG_KERNEL && PROC_FS
default y
help
If you say Y here, the /proc/sched_debug file will be provided
that can help debug the scheduler. The runtime overhead of this
option is minimal.
config SCHED_INFO
bool
default n
config SCHEDSTATS
bool "Collect scheduler statistics"
depends on DEBUG_KERNEL && PROC_FS
select SCHED_INFO
help
If you say Y here, additional code will be inserted into the
scheduler and related routines to collect statistics about
scheduler behavior and provide them in /proc/schedstat. These
stats may be useful for both tuning and debugging the scheduler
If you aren't debugging the scheduler or trying to tune a specific
application, you can say N to avoid the very slight overhead
this adds.
sched: Add default-disabled option to BUG() when stack end location is overwritten Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. This patch checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: oleg@redhat.com Cc: riel@redhat.com Cc: prarit@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Cc: rostedt@goodmis.org Cc: hannes@cmpxchg.org Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lubomir Rintel <lkundrak@v3.sk> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1410527779-8133-4-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-12 21:16:19 +08:00
config SCHED_STACK_END_CHECK
bool "Detect stack corruption on calls to schedule()"
depends on DEBUG_KERNEL
default n
help
This option checks for a stack overrun on calls to schedule().
If the stack end location is found to be over written always panic as
the content of the corrupted region can no longer be trusted.
This is to ensure no erroneous behaviour occurs which could result in
data corruption or a sporadic crash at a later stage once the region
is examined. The runtime overhead introduced is minimal.
config DEBUG_TIMEKEEPING
bool "Enable extra timekeeping sanity checking"
help
This option will enable additional timekeeping sanity checks
which may be helpful when diagnosing issues where timekeeping
problems are suspected.
This may include checks in the timekeeping hotpaths, so this
option may have a (very small) performance impact to some
workloads.
If unsure, say N.
config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT
default y
help
If you say Y here then the kernel will use a debug variant of the
commonly used smp_processor_id() function and will print warnings
if kernel code uses it in a preemption-unsafe way. Also, the kernel
will detect preemption count underflows.
menu "Lock Debugging (spinlocks, mutexes, etc...)"
config LOCK_DEBUGGING_SUPPORT
bool
depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
default y
config PROVE_LOCKING
bool "Lock debugging: prove locking correctness"
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
select LOCKDEP
select DEBUG_SPINLOCK
select DEBUG_MUTEXES
select DEBUG_RT_MUTEXES if RT_MUTEXES
select DEBUG_RWSEMS if RWSEM_SPIN_ON_OWNER
select DEBUG_WW_MUTEX_SLOWPATH
select DEBUG_LOCK_ALLOC
select TRACE_IRQFLAGS
default n
help
This feature enables the kernel to prove that all locking
that occurs in the kernel runtime is mathematically
correct: that under no circumstance could an arbitrary (and
not yet triggered) combination of observed locking
sequences (on an arbitrary number of CPUs, running an
arbitrary number of tasks and interrupt contexts) cause a
deadlock.
In short, this feature enables the kernel to report locking
related deadlocks before they actually occur.
The proof does not depend on how hard and complex a
deadlock scenario would be to trigger: how many
participant CPUs, tasks and irq-contexts would be needed
for it to trigger. The proof also does not depend on
timing: if a race and a resulting deadlock is possible
theoretically (no matter how unlikely the race scenario
is), it will be proven so and will immediately be
reported by the kernel (once the event is observed that
makes the deadlock theoretically possible).
If a deadlock is impossible (i.e. the locking rules, as
observed by the kernel, are mathematically correct), the
kernel reports nothing.
NOTE: this feature can also be enabled for rwlocks, mutexes
and rwsems - in which case all dependencies between these
different locking variants are observed and mapped too, and
the proof of observed correctness is also maintained for an
arbitrary combination of these separate locking variants.
For more details, see Documentation/locking/lockdep-design.txt.
config LOCK_STAT
bool "Lock usage statistics"
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
select LOCKDEP
select DEBUG_SPINLOCK
select DEBUG_MUTEXES
select DEBUG_RT_MUTEXES if RT_MUTEXES
select DEBUG_LOCK_ALLOC
default n
help
This feature enables tracking lock contention points
For more details, see Documentation/locking/lockstat.txt
This also enables lock events required by "perf lock",
subcommand of perf.
If you want to use "perf lock", you also need to turn on
CONFIG_EVENT_TRACING.
CONFIG_LOCK_STAT defines "contended" and "acquired" lock events.
(CONFIG_LOCKDEP defines "acquire" and "release" events.)
config DEBUG_RT_MUTEXES
bool "RT Mutex debugging, deadlock detection"
depends on DEBUG_KERNEL && RT_MUTEXES
help
This allows rt mutex semantics violations and rt mutex related
deadlocks (lockups) to be detected and reported automatically.
config DEBUG_SPINLOCK
bool "Spinlock and rw-lock debugging: basic checks"
depends on DEBUG_KERNEL
select UNINLINE_SPIN_UNLOCK
help
Say Y here and build SMP to catch missing spinlock initialization
and certain other kinds of spinlock errors commonly made. This is
best used in conjunction with the NMI watchdog so that spinlock
deadlocks are also debuggable.
config DEBUG_MUTEXES
bool "Mutex debugging: basic checks"
depends on DEBUG_KERNEL
help
This feature allows mutex semantics violations to be detected and
reported.
mutex: Add w/w mutex slowpath debugging Injects EDEADLK conditions at pseudo-random interval, with exponential backoff up to UINT_MAX (to ensure that every lock operation still completes in a reasonable time). This way we can test the wound slowpath even for ww mutex users where contention is never expected, and the ww deadlock avoidance algorithm is only needed for correctness against malicious userspace. An example would be protecting kernel modesetting properties, which thanks to single-threaded X isn't really expected to contend, ever. I've looked into using the CONFIG_FAULT_INJECTION infrastructure, but decided against it for two reasons: - EDEADLK handling is mandatory for ww mutex users and should never affect the outcome of a syscall. This is in contrast to -ENOMEM injection. So fine configurability isn't required. - The fault injection framework only allows to set a simple probability for failure. Now the probability that a ww mutex acquire stage with N locks will never complete (due to too many injected EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock operations for the completely uncontended case would be O(exp(N)). The per-acuiqire ctx exponential backoff solution choosen here only results in O(log N) overhead due to injection and so O(log N * N) lock operations. This way we can fail with high probability (and so have good test coverage even for fancy backoff and lock acquisition paths) without running into patalogical cases. Note that EDEADLK will only ever be injected when we managed to acquire the lock. This prevents any behaviour changes for users which rely on the EALREADY semantics. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 19:31:17 +08:00
config DEBUG_WW_MUTEX_SLOWPATH
bool "Wait/wound mutex debugging: Slowpath testing"
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
mutex: Add w/w mutex slowpath debugging Injects EDEADLK conditions at pseudo-random interval, with exponential backoff up to UINT_MAX (to ensure that every lock operation still completes in a reasonable time). This way we can test the wound slowpath even for ww mutex users where contention is never expected, and the ww deadlock avoidance algorithm is only needed for correctness against malicious userspace. An example would be protecting kernel modesetting properties, which thanks to single-threaded X isn't really expected to contend, ever. I've looked into using the CONFIG_FAULT_INJECTION infrastructure, but decided against it for two reasons: - EDEADLK handling is mandatory for ww mutex users and should never affect the outcome of a syscall. This is in contrast to -ENOMEM injection. So fine configurability isn't required. - The fault injection framework only allows to set a simple probability for failure. Now the probability that a ww mutex acquire stage with N locks will never complete (due to too many injected EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock operations for the completely uncontended case would be O(exp(N)). The per-acuiqire ctx exponential backoff solution choosen here only results in O(log N) overhead due to injection and so O(log N * N) lock operations. This way we can fail with high probability (and so have good test coverage even for fancy backoff and lock acquisition paths) without running into patalogical cases. Note that EDEADLK will only ever be injected when we managed to acquire the lock. This prevents any behaviour changes for users which rely on the EALREADY semantics. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 19:31:17 +08:00
select DEBUG_LOCK_ALLOC
select DEBUG_SPINLOCK
select DEBUG_MUTEXES
help
This feature enables slowpath testing for w/w mutex users by
injecting additional -EDEADLK wound/backoff cases. Together with
the full mutex checks enabled with (CONFIG_PROVE_LOCKING) this
will test all possible w/w mutex interface abuse with the
exception of simply not acquiring all the required locks.
Note that this feature can introduce significant overhead, so
it really should not be enabled in a production or distro kernel,
even a debug kernel. If you are a driver writer, enable it. If
you are a distro, do not.
mutex: Add w/w mutex slowpath debugging Injects EDEADLK conditions at pseudo-random interval, with exponential backoff up to UINT_MAX (to ensure that every lock operation still completes in a reasonable time). This way we can test the wound slowpath even for ww mutex users where contention is never expected, and the ww deadlock avoidance algorithm is only needed for correctness against malicious userspace. An example would be protecting kernel modesetting properties, which thanks to single-threaded X isn't really expected to contend, ever. I've looked into using the CONFIG_FAULT_INJECTION infrastructure, but decided against it for two reasons: - EDEADLK handling is mandatory for ww mutex users and should never affect the outcome of a syscall. This is in contrast to -ENOMEM injection. So fine configurability isn't required. - The fault injection framework only allows to set a simple probability for failure. Now the probability that a ww mutex acquire stage with N locks will never complete (due to too many injected EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock operations for the completely uncontended case would be O(exp(N)). The per-acuiqire ctx exponential backoff solution choosen here only results in O(log N) overhead due to injection and so O(log N * N) lock operations. This way we can fail with high probability (and so have good test coverage even for fancy backoff and lock acquisition paths) without running into patalogical cases. Note that EDEADLK will only ever be injected when we managed to acquire the lock. This prevents any behaviour changes for users which rely on the EALREADY semantics. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 19:31:17 +08:00
config DEBUG_RWSEMS
bool "RW Semaphore debugging: basic checks"
depends on DEBUG_KERNEL && RWSEM_SPIN_ON_OWNER
help
This debugging feature allows mismatched rw semaphore locks and unlocks
to be detected and reported.
config DEBUG_LOCK_ALLOC
bool "Lock debugging: detect incorrect freeing of live locks"
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
select DEBUG_SPINLOCK
select DEBUG_MUTEXES
select DEBUG_RT_MUTEXES if RT_MUTEXES
select LOCKDEP
help
This feature will check whether any held lock (spinlock, rwlock,
mutex or rwsem) is incorrectly freed by the kernel, via any of the
memory-freeing routines (kfree(), kmem_cache_free(), free_pages(),
vfree(), etc.), whether a live lock is incorrectly reinitialized via
spin_lock_init()/mutex_init()/etc., or whether there is any lock
held during task exit.
config LOCKDEP
bool
depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
select STACKTRACE
arch: remove obsolete architecture ports This removes the entire architecture code for blackfin, cris, frv, m32r, metag, mn10300, score, and tile, including the associated device drivers. I have been working with the (former) maintainers for each one to ensure that my interpretation was right and the code is definitely unused in mainline kernels. Many had fond memories of working on the respective ports to start with and getting them included in upstream, but also saw no point in keeping the port alive without any users. In the end, it seems that while the eight architectures are extremely different, they all suffered the same fate: There was one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem, which was more costly than licensing newer off-the-shelf CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems that all the SoC product lines are still around, but have not used the custom CPU architectures for several years at this point. In contrast, CPU instruction sets that remain popular and have actively maintained kernel ports tend to all be used across multiple licensees. The removal came out of a discussion that is now documented at https://lwn.net/Articles/748074/. Unlike the original plans, I'm not marking any ports as deprecated but remove them all at once after I made sure that they are all unused. Some architectures (notably tile, mn10300, and blackfin) are still being shipped in products with old kernels, but those products will never be updated to newer kernel releases. After this series, we still have a few architectures without mainline gcc support: - unicore32 and hexagon both have very outdated gcc releases, but the maintainers promised to work on providing something newer. At least in case of hexagon, this will only be llvm, not gcc. - openrisc, risc-v and nds32 are still in the process of finishing their support or getting it added to mainline gcc in the first place. They all have patched gcc-7.3 ports that work to some degree, but complete upstream support won't happen before gcc-8.1. Csky posted their first kernel patch set last week, their situation will be similar. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJawdL2AAoJEGCrR//JCVInuH0P/RJAZh1nTD+TR34ZhJq2TBoo PgygwDU7Z2+tQVU+EZ453Gywz9/NMRFk1RWAZqrLix4ZtyIMvC6A1qfT2yH1Y7Fb Qh6tccQeLe4ezq5u4S/46R/fQXu3Txr92yVwzJJUuPyU0arF9rv5MmI8e6p7L1en yb74kSEaCe+/eMlsEj1Cc1dgthDNXGKIURHkRsILoweysCpesjiTg4qDcL+yTibV FP2wjVbniKESMKS6qL71tiT5sexvLsLwMNcGiHPj94qCIQuI7DLhLdBVsL5Su6gI sbtgv0dsq4auRYAbQdMaH1hFvu6WptsuttIbOMnz2Yegi2z28H8uVXkbk2WVLbqG ZESUwutGh8MzOL2RJ4jyyQq5sfo++CRGlfKjr6ImZRv03dv0pe/W85062cK5cKNs cgDDJjGRorOXW7dyU6jG2gRqODOQBObIv3w5efdq5OgzOWlbI4EC+Y5u1Z0JF/76 pSwtGXA6YhwC+9LLAlnVTHG+yOwuLmAICgoKcTbzTVDKA2YQZG/cYuQfI5S1wD8e X6urPx3Md2GCwLXQ9mzKBzKZUpu/Tuhx0NvwF4qVxy6x1PELjn68zuP7abDHr46r 57/09ooVN+iXXnEGMtQVS/OPvYHSa2NgTSZz6Y86lCRbZmUOOlK31RDNlMvYNA+s 3iIVHovno/JuJnTOE8LY =fQ8z -----END PGP SIGNATURE----- Merge tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pul removal of obsolete architecture ports from Arnd Bergmann: "This removes the entire architecture code for blackfin, cris, frv, m32r, metag, mn10300, score, and tile, including the associated device drivers. I have been working with the (former) maintainers for each one to ensure that my interpretation was right and the code is definitely unused in mainline kernels. Many had fond memories of working on the respective ports to start with and getting them included in upstream, but also saw no point in keeping the port alive without any users. In the end, it seems that while the eight architectures are extremely different, they all suffered the same fate: There was one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem, which was more costly than licensing newer off-the-shelf CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems that all the SoC product lines are still around, but have not used the custom CPU architectures for several years at this point. In contrast, CPU instruction sets that remain popular and have actively maintained kernel ports tend to all be used across multiple licensees. [ See the new nds32 port merged in the previous commit for the next generation of "one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem" - Linus ] The removal came out of a discussion that is now documented at https://lwn.net/Articles/748074/. Unlike the original plans, I'm not marking any ports as deprecated but remove them all at once after I made sure that they are all unused. Some architectures (notably tile, mn10300, and blackfin) are still being shipped in products with old kernels, but those products will never be updated to newer kernel releases. After this series, we still have a few architectures without mainline gcc support: - unicore32 and hexagon both have very outdated gcc releases, but the maintainers promised to work on providing something newer. At least in case of hexagon, this will only be llvm, not gcc. - openrisc, risc-v and nds32 are still in the process of finishing their support or getting it added to mainline gcc in the first place. They all have patched gcc-7.3 ports that work to some degree, but complete upstream support won't happen before gcc-8.1. Csky posted their first kernel patch set last week, their situation will be similar [ Palmer Dabbelt points out that RISC-V support is in mainline gcc since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]" This really says it all: 2498 files changed, 95 insertions(+), 467668 deletions(-) * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits) MAINTAINERS: UNICORE32: Change email account staging: iio: remove iio-trig-bfin-timer driver tty: hvc: remove tile driver tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers serial: remove tile uart driver serial: remove m32r_sio driver serial: remove blackfin drivers serial: remove cris/etrax uart drivers usb: Remove Blackfin references in USB support usb: isp1362: remove blackfin arch glue usb: musb: remove blackfin port usb: host: remove tilegx platform glue pwm: remove pwm-bfin driver i2c: remove bfin-twi driver spi: remove blackfin related host drivers watchdog: remove bfin_wdt driver can: remove bfin_can driver mmc: remove bfin_sdh driver input: misc: remove blackfin rotary driver input: keyboard: remove bf54x driver ...
2018-04-03 11:20:12 +08:00
select FRAME_POINTER if !MIPS && !PPC && !ARM_UNWIND && !S390 && !MICROBLAZE && !ARC && !X86
select KALLSYMS
select KALLSYMS_ALL
config LOCKDEP_SMALL
bool
config DEBUG_LOCKDEP
bool "Lock dependency engine debugging"
depends on DEBUG_KERNEL && LOCKDEP
help
If you say Y here, the lock dependency engine will do
additional runtime checks to debug itself, at the price
of more runtime overhead.
config DEBUG_ATOMIC_SLEEP
bool "Sleep inside atomic section checking"
select PREEMPT_COUNT
depends on DEBUG_KERNEL
depends on !ARCH_NO_PREEMPT
help
If you say Y here, various routines which may sleep will become very
noisy if they are called inside atomic sections: when a spinlock is
held, inside an rcu read side critical section, inside preempt disabled
sections, inside an interrupt, etc...
[PATCH] lockdep: locking API self tests Introduce DEBUG_LOCKING_API_SELFTESTS, which uses the generic lock debugging code's silent-failure feature to run a matrix of testcases. There are 210 testcases currently: +----------------------- | Locking API testsuite: +------------------------------+------+------+------+------+------+------+ | spin |wlock |rlock |mutex | wsem | rsem | -------------------------------+------+------+------+------+------+------+ A-A deadlock: ok | ok | ok | ok | ok | ok | A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | double unlock: ok | ok | ok | ok | ok | ok | bad unlock order: ok | ok | ok | ok | ok | ok | --------------------------------------+------+------+------+------+------+ recursive read-lock: | ok | | ok | --------------------------------------+------+------+------+------+------+ non-nested unlock: ok | ok | ok | ok | --------------------------------------+------+------+------+ hard-irqs-on + irq-safe-A/12: ok | ok | ok | soft-irqs-on + irq-safe-A/12: ok | ok | ok | hard-irqs-on + irq-safe-A/21: ok | ok | ok | soft-irqs-on + irq-safe-A/21: ok | ok | ok | sirq-safe-A => hirqs-on/12: ok | ok | ok | sirq-safe-A => hirqs-on/21: ok | ok | ok | hard-safe-A + irqs-on/12: ok | ok | ok | soft-safe-A + irqs-on/12: ok | ok | ok | hard-safe-A + irqs-on/21: ok | ok | ok | soft-safe-A + irqs-on/21: ok | ok | ok | hard-safe-A + unsafe-B #1/123: ok | ok | ok | soft-safe-A + unsafe-B #1/123: ok | ok | ok | hard-safe-A + unsafe-B #1/132: ok | ok | ok | soft-safe-A + unsafe-B #1/132: ok | ok | ok | hard-safe-A + unsafe-B #1/213: ok | ok | ok | soft-safe-A + unsafe-B #1/213: ok | ok | ok | hard-safe-A + unsafe-B #1/231: ok | ok | ok | soft-safe-A + unsafe-B #1/231: ok | ok | ok | hard-safe-A + unsafe-B #1/312: ok | ok | ok | soft-safe-A + unsafe-B #1/312: ok | ok | ok | hard-safe-A + unsafe-B #1/321: ok | ok | ok | soft-safe-A + unsafe-B #1/321: ok | ok | ok | hard-safe-A + unsafe-B #2/123: ok | ok | ok | soft-safe-A + unsafe-B #2/123: ok | ok | ok | hard-safe-A + unsafe-B #2/132: ok | ok | ok | soft-safe-A + unsafe-B #2/132: ok | ok | ok | hard-safe-A + unsafe-B #2/213: ok | ok | ok | soft-safe-A + unsafe-B #2/213: ok | ok | ok | hard-safe-A + unsafe-B #2/231: ok | ok | ok | soft-safe-A + unsafe-B #2/231: ok | ok | ok | hard-safe-A + unsafe-B #2/312: ok | ok | ok | soft-safe-A + unsafe-B #2/312: ok | ok | ok | hard-safe-A + unsafe-B #2/321: ok | ok | ok | soft-safe-A + unsafe-B #2/321: ok | ok | ok | hard-irq lock-inversion/123: ok | ok | ok | soft-irq lock-inversion/123: ok | ok | ok | hard-irq lock-inversion/132: ok | ok | ok | soft-irq lock-inversion/132: ok | ok | ok | hard-irq lock-inversion/213: ok | ok | ok | soft-irq lock-inversion/213: ok | ok | ok | hard-irq lock-inversion/231: ok | ok | ok | soft-irq lock-inversion/231: ok | ok | ok | hard-irq lock-inversion/312: ok | ok | ok | soft-irq lock-inversion/312: ok | ok | ok | hard-irq lock-inversion/321: ok | ok | ok | soft-irq lock-inversion/321: ok | ok | ok | hard-irq read-recursion/123: ok | soft-irq read-recursion/123: ok | hard-irq read-recursion/132: ok | soft-irq read-recursion/132: ok | hard-irq read-recursion/213: ok | soft-irq read-recursion/213: ok | hard-irq read-recursion/231: ok | soft-irq read-recursion/231: ok | hard-irq read-recursion/312: ok | soft-irq read-recursion/312: ok | hard-irq read-recursion/321: ok | soft-irq read-recursion/321: ok | --------------------------------+-----+---------------- Good, all 210 testcases passed! | --------------------------------+ Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:24:48 +08:00
config DEBUG_LOCKING_API_SELFTESTS
bool "Locking API boot-time self-tests"
depends on DEBUG_KERNEL
help
Say Y here if you want the kernel to run a short self-test during
bootup. The self-test checks whether common types of locking bugs
are detected by debugging mechanisms or not. (if you disable
lock debugging then those bugs wont be detected of course.)
The following locking APIs are covered: spinlocks, rwlocks,
mutexes and rwsems.
config LOCK_TORTURE_TEST
tristate "torture tests for locking"
depends on DEBUG_KERNEL
select TORTURE_TEST
help
This option provides a kernel module that runs torture tests
on kernel locking primitives. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want kernel locking-primitive torture tests
to be built into the kernel.
Say M if you want these torture tests to build as a module.
Say N if you are unsure.
config WW_MUTEX_SELFTEST
tristate "Wait/wound mutex selftests"
help
This option provides a kernel module that runs tests on the
on the struct ww_mutex locking API.
It is recommended to enable DEBUG_WW_MUTEX_SLOWPATH in conjunction
with this test harness.
Say M if you want these self tests to build as a module.
Say N if you are unsure.
endmenu # lock debugging
config TRACE_IRQFLAGS
bool
help
Enables hooks to interrupt enabling and disabling for
either tracing or lock debugging.
config STACKTRACE
bool "Stack backtrace support"
depends on STACKTRACE_SUPPORT
help
This option causes the kernel to create a /proc/pid/stack for
every process, showing its current stack trace.
It is also used by various kernel debugging features that require
stack trace generation.
config WARN_ALL_UNSEEDED_RANDOM
bool "Warn for all uses of unseeded randomness"
default n
help
Some parts of the kernel contain bugs relating to their use of
cryptographically secure random numbers before it's actually possible
to generate those numbers securely. This setting ensures that these
flaws don't go unnoticed, by enabling a message, should this ever
occur. This will allow people with obscure setups to know when things
are going wrong, so that they might contact developers about fixing
it.
Unfortunately, on some models of some architectures getting
a fully seeded CRNG is extremely difficult, and so this can
result in dmesg getting spammed for a surprisingly long
time. This is really bad from a security perspective, and
so architecture maintainers really need to do what they can
to get the CRNG seeded sooner after the system is booted.
However, since users can not do anything actionble to
address this, by default the kernel will issue only a single
warning for the first use of unseeded randomness.
Say Y here if you want to receive warnings for all uses of
unseeded randomness. This will be of use primarily for
those developers interersted in improving the security of
Linux kernels running on their architecture (or
subarchitecture).
config DEBUG_KOBJECT
bool "kobject debugging"
depends on DEBUG_KERNEL
help
If you say Y here, some extra kobject debugging messages will be sent
to the syslog.
config DEBUG_KOBJECT_RELEASE
bool "kobject release debugging"
depends on DEBUG_OBJECTS_TIMERS
help
kobjects are reference counted objects. This means that their
last reference count put is not predictable, and the kobject can
live on past the point at which a driver decides to drop it's
initial reference to the kobject gained on allocation. An
example of this would be a struct device which has just been
unregistered.
However, some buggy drivers assume that after such an operation,
the memory backing the kobject can be immediately freed. This
goes completely against the principles of a refcounted object.
If you say Y here, the kernel will delay the release of kobjects
on the last reference count to improve the visibility of this
kind of kobject release bug.
config HAVE_DEBUG_BUGVERBOSE
bool
config DEBUG_BUGVERBOSE
bool "Verbose BUG() reporting (adds 70K)" if DEBUG_KERNEL && EXPERT
depends on BUG && (GENERIC_BUG || HAVE_DEBUG_BUGVERBOSE)
default y
help
Say Y here to make BUG() panics output the file name and line number
of the BUG call as well as the EIP and oops trace. This aids
debugging but costs about 70-100K of memory.
config DEBUG_LIST
bool "Debug linked list manipulation"
depends on DEBUG_KERNEL || BUG_ON_DATA_CORRUPTION
help
Enable this to turn on extended checks in the linked-list
walking routines.
If unsure, say N.
config DEBUG_PI_LIST
bool "Debug priority linked list manipulation"
depends on DEBUG_KERNEL
help
Enable this to turn on extended checks in the priority-ordered
linked-list (plist) walking routines. This checks the entire
list multiple times during each manipulation.
If unsure, say N.
config DEBUG_SG
bool "Debug SG table operations"
depends on DEBUG_KERNEL
help
Enable this to turn on checks on scatter-gather tables. This can
help find problems with drivers that do not properly initialize
their sg tables.
If unsure, say N.
config DEBUG_NOTIFIERS
bool "Debug notifier call chains"
depends on DEBUG_KERNEL
help
Enable this to turn on sanity checking for notifier call chains.
This is most useful for kernel developers to make sure that
modules properly unregister themselves from notifier chains.
This is a relatively cheap check but if you care about maximum
performance, say N.
config DEBUG_CREDENTIALS
bool "Debug credential management"
depends on DEBUG_KERNEL
help
Enable this to turn on some debug checking for credential
management. The additional code keeps track of the number of
pointers from task_structs to any given cred struct, and checks to
see that this number never exceeds the usage count of the cred
struct.
Furthermore, if SELinux is enabled, this also checks that the
security pointer in the cred struct is never seen to be invalid.
If unsure, say N.
source "kernel/rcu/Kconfig.debug"
config DEBUG_WQ_FORCE_RR_CPU
bool "Force round-robin CPU selection for unbound work items"
depends on DEBUG_KERNEL
default n
help
Workqueue used to implicitly guarantee that work items queued
without explicit CPU specified are put on the local CPU. This
guarantee is no longer true and while local CPU is still
preferred work items may be put on foreign CPUs. Kernel
parameter "workqueue.debug_force_rr_cpu" is added to force
round-robin CPU selection to flush out usages which depend on the
now broken guarantee. This config option enables the debug
feature by default. When enabled, memory and cache locality will
be impacted.
config DEBUG_BLOCK_EXT_DEVT
bool "Force extended block device numbers and spread them"
depends on DEBUG_KERNEL
depends on BLOCK
default n
help
BIG FAT WARNING: ENABLING THIS OPTION MIGHT BREAK BOOTING ON
SOME DISTRIBUTIONS. DO NOT ENABLE THIS UNLESS YOU KNOW WHAT
YOU ARE DOING. Distros, please enable this and fix whatever
is broken.
Conventionally, block device numbers are allocated from
predetermined contiguous area. However, extended block area
may introduce non-contiguous block device numbers. This
option forces most block device numbers to be allocated from
the extended space and spreads them to discover kernel or
userland code paths which assume predetermined contiguous
device number allocation.
Note that turning on this debug option shuffles all the
device numbers for all IDE and SCSI devices including libata
ones, so root partition specified using device number
directly (via rdev or root=MAJ:MIN) won't work anymore.
Textual device names (root=/dev/sdXn) will continue to work.
Say N if you are unsure.
config CPU_HOTPLUG_STATE_CONTROL
bool "Enable CPU hotplug state control"
depends on DEBUG_KERNEL
depends on HOTPLUG_CPU
default n
help
Allows to write steps between "offline" and "online" to the CPUs
sysfs target file so states can be stepped granular. This is a debug
option for now as the hotplug machinery cannot be stopped and
restarted at arbitrary points yet.
Say N if your are unsure.
fault-injection: notifier error injection This patchset provides kernel modules that can be used to test the error handling of notifier call chain failures by injecting artifical errors to the following notifier chain callbacks. * CPU notifier * PM notifier * memory hotplug notifier * powerpc pSeries reconfig notifier Example: Inject CPU offline error (-1 == -EPERM) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -1 > actions/CPU_DOWN_PREPARE/error # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted The patchset also adds cpu and memory hotplug tests to tools/testing/selftests These tests first do simple online and offline test and then do fault injection tests if notifier error injection module is available. This patch: The notifier error injection provides the ability to inject artifical errors to specified notifier chain callbacks. It is useful to test the error handling of notifier call chain failures. This adds common basic functions to define which type of events can be fail and to initialize the debugfs interface to control what error code should be returned and which event should be failed. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 05:43:02 +08:00
config NOTIFIER_ERROR_INJECTION
tristate "Notifier error injection"
depends on DEBUG_KERNEL
select DEBUG_FS
help
This option provides the ability to inject artificial errors to
fault-injection: notifier error injection This patchset provides kernel modules that can be used to test the error handling of notifier call chain failures by injecting artifical errors to the following notifier chain callbacks. * CPU notifier * PM notifier * memory hotplug notifier * powerpc pSeries reconfig notifier Example: Inject CPU offline error (-1 == -EPERM) # cd /sys/kernel/debug/notifier-error-inject/cpu # echo -1 > actions/CPU_DOWN_PREPARE/error # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted The patchset also adds cpu and memory hotplug tests to tools/testing/selftests These tests first do simple online and offline test and then do fault injection tests if notifier error injection module is available. This patch: The notifier error injection provides the ability to inject artifical errors to specified notifier chain callbacks. It is useful to test the error handling of notifier call chain failures. This adds common basic functions to define which type of events can be fail and to initialize the debugfs interface to control what error code should be returned and which event should be failed. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 05:43:02 +08:00
specified notifier chain callbacks. It is useful to test the error
handling of notifier call chain failures.
Say N if unsure.
config PM_NOTIFIER_ERROR_INJECT
tristate "PM notifier error injection module"
depends on PM && NOTIFIER_ERROR_INJECTION
default m if PM_DEBUG
help
This option provides the ability to inject artificial errors to
PM notifier chain callbacks. It is controlled through debugfs
interface /sys/kernel/debug/notifier-error-inject/pm
If the notifier call chain should be failed with some events
notified, write the error code to "actions/<notifier event>/error".
Example: Inject PM suspend error (-12 = -ENOMEM)
# cd /sys/kernel/debug/notifier-error-inject/pm/
# echo -12 > actions/PM_SUSPEND_PREPARE/error
# echo mem > /sys/power/state
bash: echo: write error: Cannot allocate memory
To compile this code as a module, choose M here: the module will
be called pm-notifier-error-inject.
If unsure, say N.
config OF_RECONFIG_NOTIFIER_ERROR_INJECT
tristate "OF reconfig notifier error injection module"
depends on OF_DYNAMIC && NOTIFIER_ERROR_INJECTION
help
This option provides the ability to inject artificial errors to
OF reconfig notifier chain callbacks. It is controlled
through debugfs interface under
/sys/kernel/debug/notifier-error-inject/OF-reconfig/
If the notifier call chain should be failed with some events
notified, write the error code to "actions/<notifier event>/error".
To compile this code as a module, choose M here: the module will
be called of-reconfig-notifier-error-inject.
If unsure, say N.
config NETDEV_NOTIFIER_ERROR_INJECT
tristate "Netdev notifier error injection module"
depends on NET && NOTIFIER_ERROR_INJECTION
help
This option provides the ability to inject artificial errors to
netdevice notifier chain callbacks. It is controlled through debugfs
interface /sys/kernel/debug/notifier-error-inject/netdev
If the notifier call chain should be failed with some events
notified, write the error code to "actions/<notifier event>/error".
Example: Inject netdevice mtu change error (-22 = -EINVAL)
# cd /sys/kernel/debug/notifier-error-inject/netdev
# echo -22 > actions/NETDEV_CHANGEMTU/error
# ip link set eth0 mtu 1024
RTNETLINK answers: Invalid argument
To compile this code as a module, choose M here: the module will
be called netdev-notifier-error-inject.
If unsure, say N.
config FUNCTION_ERROR_INJECTION
def_bool y
depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
config FAULT_INJECTION
bool "Fault-injection framework"
depends on DEBUG_KERNEL
help
Provide fault-injection framework.
For more details, see Documentation/fault-injection/.
config FAILSLAB
bool "Fault-injection capability for kmalloc"
depends on FAULT_INJECTION
depends on SLAB || SLUB
help
Provide fault-injection capability for kmalloc.
config FAIL_PAGE_ALLOC
bool "Fault-injection capabilitiy for alloc_pages()"
depends on FAULT_INJECTION
help
Provide fault-injection capability for alloc_pages().
config FAIL_MAKE_REQUEST
bool "Fault-injection capability for disk IO"
depends on FAULT_INJECTION && BLOCK
help
Provide fault-injection capability for disk IO.
config FAIL_IO_TIMEOUT
bool "Fault-injection capability for faking disk interrupts"
depends on FAULT_INJECTION && BLOCK
help
Provide fault-injection capability on end IO handling. This
will make the block layer "forget" an interrupt as configured,
thus exercising the error handling.
Only works with drivers that use the generic timeout handling,
for others it wont do anything.
config FAIL_FUTEX
bool "Fault-injection capability for futexes"
select DEBUG_FS
depends on FAULT_INJECTION && FUTEX
help
Provide fault-injection capability for futexes.
config FAULT_INJECTION_DEBUG_FS
bool "Debugfs entries for fault-injection capabilities"
depends on FAULT_INJECTION && SYSFS && DEBUG_FS
help
Enable configuration of fault-injection capabilities via debugfs.
config FAIL_FUNCTION
bool "Fault-injection capability for functions"
depends on FAULT_INJECTION_DEBUG_FS && FUNCTION_ERROR_INJECTION
help
Provide function-based fault-injection capability.
This will allow you to override a specific function with a return
with given return value. As a result, function caller will see
an error value and have to handle it. This is useful to test the
error handling in various subsystems.
config FAIL_MMC_REQUEST
bool "Fault-injection capability for MMC IO"
depends on FAULT_INJECTION_DEBUG_FS && MMC
help
Provide fault-injection capability for MMC IO.
This will make the mmc core return data errors. This is
useful to test the error handling in the mmc block device
and to test how the mmc host driver handles retries from
the block device.
config FAULT_INJECTION_STACKTRACE_FILTER
bool "stacktrace filter for fault-injection capabilities"
depends on FAULT_INJECTION_DEBUG_FS && STACKTRACE_SUPPORT
depends on !X86_64
select STACKTRACE
select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !X86
help
Provide stacktrace filter for fault-injection capabilities
config LATENCYTOP
bool "Latency measuring infrastructure"
depends on DEBUG_KERNEL
depends on STACKTRACE_SUPPORT
depends on PROC_FS
x86/kconfig: Make it easier to switch to the new ORC unwinder A couple of Kconfig changes which make it much easier to switch to the new CONFIG_ORC_UNWINDER: 1) Remove x86 dependencies on CONFIG_FRAME_POINTER for lockdep, latencytop, and fault injection. x86 has a 'guess' unwinder which just scans the stack for kernel text addresses. It's not 100% accurate but in many cases it's good enough. This allows those users who don't want the text overhead of the frame pointer or ORC unwinders to still use these features. More importantly, this also makes it much more straightforward to disable frame pointers. 2) Make CONFIG_ORC_UNWINDER depend on !CONFIG_FRAME_POINTER. While it would be possible to have both enabled, it doesn't really make sense to do so. So enforce a sane configuration to prevent the user from making a dumb mistake. With these changes, when you disable CONFIG_FRAME_POINTER, "make oldconfig" will ask if you want to enable CONFIG_ORC_UNWINDER. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/9985fb91ce5005fe33ea5cc2a20f14bd33c61d03.1500938583.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-25 07:36:58 +08:00
select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC && !X86
select KALLSYMS
select KALLSYMS_ALL
select STACKTRACE
select SCHEDSTATS
select SCHED_DEBUG
help
Enable this option if you want to use the LatencyTOP tool
to find out which userspace is blocking on what kernel operations.
source kernel/trace/Kconfig
config PROVIDE_OHCI1394_DMA_INIT
bool "Remote debugging over FireWire early on boot"
depends on PCI && X86
help
If you want to debug problems which hang or crash the kernel early
on boot and the crashing machine has a FireWire port, you can use
this feature to remotely access the memory of the crashed machine
over FireWire. This employs remote DMA as part of the OHCI1394
specification which is now the standard for FireWire controllers.
With remote DMA, you can monitor the printk buffer remotely using
firescope and access all memory below 4GB using fireproxy from gdb.
Even controlling a kernel debugger is possible using remote DMA.
Usage:
If ohci1394_dma=early is used as boot parameter, it will initialize
all OHCI1394 controllers which are found in the PCI config space.
As all changes to the FireWire bus such as enabling and disabling
devices cause a bus reset and thereby disable remote DMA for all
devices, be sure to have the cable plugged and FireWire enabled on
the debugging host before booting the debug target for debugging.
This code (~1k) is freed after boot. By then, the firewire stack
in charge of the OHCI-1394 controllers should be used instead.
See Documentation/debugging-via-ohci1394.txt for more information.
config DMA_API_DEBUG
bool "Enable debugging of DMA-API usage"
select NEED_DMA_MAP_STATE
help
Enable this option to debug the use of the DMA API by device drivers.
With this option you will be able to detect common bugs in device
drivers like double-freeing of DMA mappings or freeing mappings that
were never allocated.
This also attempts to catch cases where a page owned by DMA is
accessed by the cpu in a way that could cause data corruption. For
example, this enables cow_user_page() to check that the source page is
not undergoing DMA.
This option causes a performance degradation. Use only if you want to
debug device drivers and dma interactions.
If unsure, say N.
config DMA_API_DEBUG_SG
bool "Debug DMA scatter-gather usage"
default y
depends on DMA_API_DEBUG
help
Perform extra checking that callers of dma_map_sg() have respected the
appropriate segment length/boundary limits for the given device when
preparing DMA scatterlists.
This is particularly likely to have been overlooked in cases where the
dma_map_sg() API is used for general bulk mapping of pages rather than
preparing literal scatter-gather descriptors, where there is a risk of
unexpected behaviour from DMA API implementations if the scatterlist
is technically out-of-spec.
If unsure, say N.
menuconfig RUNTIME_TESTING_MENU
bool "Runtime Testing"
def_bool y
if RUNTIME_TESTING_MENU
config LKDTM
tristate "Linux Kernel Dump Test Tool Module"
depends on DEBUG_FS
depends on BLOCK
help
This module enables testing of the different dumping mechanisms by
inducing system failures at predefined crash points.
If you don't need it: say N
Choose M here to compile this code as a module. The module will be
called lkdtm.
Documentation on how to use the module can be found in
Documentation/fault-injection/provoke-crashes.txt
config TEST_LIST_SORT
tristate "Linked list sorting test"
depends on DEBUG_KERNEL || m
help
Enable this to turn on 'list_sort()' function test. This test is
executed only once during system boot (so affects only boot time),
or at module load time.
If unsure, say N.
config TEST_SORT
tristate "Array-based sort test"
depends on DEBUG_KERNEL || m
help
This option enables the self-test function of 'sort()' at boot,
or at module load time.
If unsure, say N.
config KPROBES_SANITY_TEST
bool "Kprobes sanity tests"
depends on DEBUG_KERNEL
depends on KPROBES
help
This option provides for testing basic kprobes functionality on
boot. Samples of kprobe and kretprobe are inserted and
verified for functionality.
Say N if you are unsure.
config BACKTRACE_SELF_TEST
tristate "Self test for the backtrace code"
depends on DEBUG_KERNEL
help
This option provides a kernel module that can be used to test
the kernel stack backtrace code. This option is not useful
for distributions or general kernels, but only for kernel
developers working on architecture code.
Note that if you want to also test saved backtraces, you will
have to enable STACKTRACE as well.
Say N if you are unsure.
config RBTREE_TEST
tristate "Red-Black tree test"
depends on DEBUG_KERNEL
help
A benchmark measuring the performance of the rbtree library.
Also includes rbtree invariant checks.
rbtree: add prio tree and interval tree tests Patch 1 implements support for interval trees, on top of the augmented rbtree API. It also adds synthetic tests to compare the performance of interval trees vs prio trees. Short answers is that interval trees are slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x) on search. It is debatable how realistic the synthetic test is, and I have not made such measurements yet, but my impression is that interval trees would still come out faster. Patch 2 uses a preprocessor template to make the interval tree generic, and uses it as a replacement for the vma prio_tree. Patch 3 takes the other prio_tree user, kmemleak, and converts it to use a basic rbtree. We don't actually need the augmented rbtree support here because the intervals are always non-overlapping. Patch 4 removes the now-unused prio tree library. Patch 5 proposes an additional optimization to rb_erase_augmented, now providing it as an inline function so that the augmented callbacks can be inlined in. This provides an additional 5-10% performance improvement for the interval tree insert/erase benchmark. There is a maintainance cost as it exposes augmented rbtree users to some of the rbtree library internals; however I think this cost shouldn't be too high as I expect the augmented rbtree will always have much less users than the base rbtree. I should probably add a quick summary of why I think it makes sense to replace prio trees with augmented rbtree based interval trees now. One of the drivers is that we need augmented rbtrees for Rik's vma gap finding code, and once you have them, it just makes sense to use them for interval trees as well, as this is the simpler and more well known algorithm. prio trees, in comparison, seem *too* clever: they impose an additional 'heap' constraint on the tree, which they use to guarantee a faster worst-case complexity of O(k+log N) for stabbing queries in a well-balanced prio tree, vs O(k*log N) for interval trees (where k=number of matches, N=number of intervals). Now this sounds great, but in practice prio trees don't realize this theorical benefit. First, the additional constraint makes them harder to update, so that the kernel implementation has to simplify things by balancing them like a radix tree, which is not always ideal. Second, the fact that there are both index and heap properties makes both tree manipulation and search more complex, which results in a higher multiplicative time constant. As it turns out, the simple interval tree algorithm ends up running faster than the more clever prio tree. This patch: Add two test modules: - prio_tree_test measures the performance of lib/prio_tree.c, both for insertion/removal and for stabbing searches - interval_tree_test measures the performance of a library of equivalent functionality, built using the augmented rbtree support. In order to support the second test module, lib/interval_tree.c is introduced. It is kept separate from the interval_tree_test main file for two reasons: first we don't want to provide an unfair advantage over prio_tree_test by having everything in a single compilation unit, and second there is the possibility that the interval tree functionality could get some non-test users in kernel over time. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 07:31:23 +08:00
config INTERVAL_TREE_TEST
tristate "Interval tree test"
depends on DEBUG_KERNEL
select INTERVAL_TREE
rbtree: add prio tree and interval tree tests Patch 1 implements support for interval trees, on top of the augmented rbtree API. It also adds synthetic tests to compare the performance of interval trees vs prio trees. Short answers is that interval trees are slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x) on search. It is debatable how realistic the synthetic test is, and I have not made such measurements yet, but my impression is that interval trees would still come out faster. Patch 2 uses a preprocessor template to make the interval tree generic, and uses it as a replacement for the vma prio_tree. Patch 3 takes the other prio_tree user, kmemleak, and converts it to use a basic rbtree. We don't actually need the augmented rbtree support here because the intervals are always non-overlapping. Patch 4 removes the now-unused prio tree library. Patch 5 proposes an additional optimization to rb_erase_augmented, now providing it as an inline function so that the augmented callbacks can be inlined in. This provides an additional 5-10% performance improvement for the interval tree insert/erase benchmark. There is a maintainance cost as it exposes augmented rbtree users to some of the rbtree library internals; however I think this cost shouldn't be too high as I expect the augmented rbtree will always have much less users than the base rbtree. I should probably add a quick summary of why I think it makes sense to replace prio trees with augmented rbtree based interval trees now. One of the drivers is that we need augmented rbtrees for Rik's vma gap finding code, and once you have them, it just makes sense to use them for interval trees as well, as this is the simpler and more well known algorithm. prio trees, in comparison, seem *too* clever: they impose an additional 'heap' constraint on the tree, which they use to guarantee a faster worst-case complexity of O(k+log N) for stabbing queries in a well-balanced prio tree, vs O(k*log N) for interval trees (where k=number of matches, N=number of intervals). Now this sounds great, but in practice prio trees don't realize this theorical benefit. First, the additional constraint makes them harder to update, so that the kernel implementation has to simplify things by balancing them like a radix tree, which is not always ideal. Second, the fact that there are both index and heap properties makes both tree manipulation and search more complex, which results in a higher multiplicative time constant. As it turns out, the simple interval tree algorithm ends up running faster than the more clever prio tree. This patch: Add two test modules: - prio_tree_test measures the performance of lib/prio_tree.c, both for insertion/removal and for stabbing searches - interval_tree_test measures the performance of a library of equivalent functionality, built using the augmented rbtree support. In order to support the second test module, lib/interval_tree.c is introduced. It is kept separate from the interval_tree_test main file for two reasons: first we don't want to provide an unfair advantage over prio_tree_test by having everything in a single compilation unit, and second there is the possibility that the interval tree functionality could get some non-test users in kernel over time. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 07:31:23 +08:00
help
A benchmark measuring the performance of the interval tree library
config PERCPU_TEST
tristate "Per cpu operations test"
depends on m && DEBUG_KERNEL
help
Enable this option to build test module which validates per-cpu
operations.
If unsure, say N.
config ATOMIC64_SELFTEST
tristate "Perform an atomic64_t self-test"
help
Enable this option to test the atomic64_t functions at boot or
at module load time.
If unsure, say N.
config ASYNC_RAID6_TEST
tristate "Self test for hardware accelerated raid6 recovery"
depends on ASYNC_RAID6_RECOV
select ASYNC_MEMCPY
---help---
This is a one-shot self test that permutes through the
recovery of all the possible two disk failure scenarios for a
N-disk array. Recovery is performed with the asynchronous
raid6 recovery routines, and will optionally use an offload
engine if one is available.
If unsure, say N.
config TEST_HEXDUMP
tristate "Test functions located in the hexdump module at runtime"
config TEST_STRING_HELPERS
tristate "Test functions located in the string_helpers module at runtime"
config TEST_KSTRTOX
tristate "Test kstrto*() family of functions at runtime"
config TEST_PRINTF
tristate "Test printf() family of functions at runtime"
config TEST_BITMAP
tristate "Test bitmap_*() family of functions at runtime"
help
Enable this option to test the bitmap functions at boot.
If unsure, say N.
config TEST_BITFIELD
tristate "Test bitfield functions at runtime"
help
Enable this option to test the bitfield functions at boot.
If unsure, say N.
config TEST_UUID
tristate "Test functions located in the uuid module at runtime"
config TEST_OVERFLOW
tristate "Test check_*_overflow() functions at runtime"
config TEST_RHASHTABLE
tristate "Perform selftest on resizable hash table"
help
Enable this option to test the rhashtable functions at boot.
If unsure, say N.
config TEST_HASH
tristate "Perform selftest on hash functions"
help
siphash: add cryptographically secure PRF SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function, or as a general PRF for short input use cases, such as sequence numbers or RNG chaining. For the first usage: There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Currently hashtables use jhash, which is fast but not secure, and some kind of rotating key scheme (or none at all, which isn't good). SipHash is meant as a replacement for jhash in these cases. There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. While SipHash is extremely fast for a cryptographically secure function, it is likely a bit slower than the insecure jhash, and so replacements will be evaluated on a case-by-case basis based on whether or not the difference in speed is negligible and whether or not the current jhash usage poses a real security risk. For the second usage: A few places in the kernel are using MD5 or SHA1 for creating secure sequence numbers, syn cookies, port numbers, or fast random numbers. SipHash is a faster and more fitting, and more secure replacement for MD5 in those situations. Replacing MD5 and SHA1 with SipHash for these uses is obvious and straight-forward, and so is submitted along with this patch series. There shouldn't be much of a debate over its efficacy. Dozens of languages are already using this internally for their hash tables and PRFs. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known set of problems, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 20:54:00 +08:00
Enable this option to test the kernel's integer (<linux/hash.h>),
string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
hash functions on boot (or module load).
This is intended to help people writing architecture-specific
optimized versions. If unsure, say N.
config TEST_IDA
tristate "Perform selftest on IDA functions"
config TEST_PARMAN
tristate "Perform selftest on priority array manager"
depends on PARMAN
help
Enable this option to test priority array manager on boot
(or module load).
If unsure, say N.
config TEST_LKM
test: add minimal module for verification testing This is a pair of test modules I'd like to see in the tree. Instead of putting these in lkdtm, where I've been adding various tests that trigger crashes, these don't make sense there since they need to be either distinctly separate, or their pass/fail state don't need to crash the machine. These live in lib/ for now, along with a few other in-kernel test modules, and use the slightly more common "test_" naming convention, instead of "test-". We should likely standardize on the former: $ find . -name 'test_*.c' | grep -v /tools/ | wc -l 4 $ find . -name 'test-*.c' | grep -v /tools/ | wc -l 2 The first is entirely a no-op module, designed to allow simple testing of the module loading and verification interface. It's useful to have a module that has no other uses or dependencies so it can be reliably used for just testing module loading and verification. The second is a module that exercises the user memory access functions, in an effort to make sure that we can quickly catch any regressions in boundary checking (e.g. like what was recently fixed on ARM). This patch (of 2): When doing module loading verification tests (for example, with module signing, or LSM hooks), it is very handy to have a module that can be built on all systems under test, isn't auto-loaded at boot, and has no device or similar dependencies. This creates the "test_module.ko" module for that purpose, which only reports its load and unload to printk. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-24 07:54:37 +08:00
tristate "Test module loading with 'hello world' module"
depends on m
help
This builds the "test_module" module that emits "Hello, world"
on printk when loaded. It is designed to be used for basic
evaluation of the module loading subsystem (for example when
validating module verification). It lacks any extra dependencies,
and will not normally be loaded by the system unless explicitly
requested by name.
If unsure, say N.
config TEST_USER_COPY
tristate "Test user/kernel boundary protections"
depends on m
help
This builds the "test_user_copy" module that runs sanity checks
on the copy_to/from_user infrastructure, making sure basic
user/kernel boundary testing is working. If it fails to load,
a regression has been detected in the user/kernel memory boundary
protections.
If unsure, say N.
config TEST_BPF
tristate "Test BPF filter functionality"
depends on m && NET
help
This builds the "test_bpf" module that runs various test vectors
against the BPF interpreter or BPF JIT compiler depending on the
current setting. This is in particular useful for BPF JIT compiler
development, but also to run regression tests against changes in
the interpreter code. It also enables test stubs for eBPF maps and
verifier used by user space verifier testsuite.
If unsure, say N.
config FIND_BIT_BENCHMARK
tristate "Test find_bit functions"
help
This builds the "test_find_bit" module that measure find_*_bit()
functions performance.
If unsure, say N.
config TEST_FIRMWARE
tristate "Test firmware loading via userspace interface"
depends on FW_LOADER
help
This builds the "test_firmware" module that creates a userspace
interface for testing firmware loading. This can be used to
control the triggering of firmware loading without needing an
actual firmware-using device. The contents can be rechecked by
userspace.
If unsure, say N.
config TEST_SYSCTL
tristate "sysctl test driver"
depends on PROC_SYSCTL
help
This builds the "test_sysctl" module. This driver enables to test the
proc sysctl interfaces available to drivers safely without affecting
production knobs which might alter system functionality.
If unsure, say N.
config TEST_UDELAY
tristate "udelay test driver"
help
This builds the "udelay_test" module that helps to make sure
that udelay() is working properly.
If unsure, say N.
config TEST_STATIC_KEYS
tristate "Test static keys"
depends on m
help
Test the static key interfaces.
If unsure, say N.
kmod: add test driver to stress test the module loader This adds a new stress test driver for kmod: the kernel module loader. The new stress test driver, test_kmod, is only enabled as a module right now. It should be possible to load this as built-in and load tests early (refer to the force_init_test module parameter), however since a lot of test can get a system out of memory fast we leave this disabled for now. Using a system with 1024 MiB of RAM can *easily* get your kernel OOM fast with this test driver. The test_kmod driver exposes API knobs for us to fine tune simple request_module() and get_fs_type() calls. Since these API calls only allow each one parameter a test driver for these is rather simple. Other factors that can help out test driver though are the number of calls we issue and knowing current limitations of each. This exposes configuration as much as possible through userspace to be able to build tests directly from userspace. Since it allows multiple misc devices its will eventually (once we add a knob to let us create new devices at will) also be possible to perform more tests in parallel, provided you have enough memory. We only enable tests we know work as of right now. Demo screenshots: # tools/testing/selftests/kmod/kmod.sh kmod_test_0001_driver: OK! - loading kmod test kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND kmod_test_0001_fs: OK! - loading kmod test kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL kmod_test_0002_driver: OK! - loading kmod test kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND kmod_test_0002_fs: OK! - loading kmod test kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL kmod_test_0003: OK! - loading kmod test kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS kmod_test_0004: OK! - loading kmod test kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS kmod_test_0005: OK! - loading kmod test kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS kmod_test_0006: OK! - loading kmod test kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS kmod_test_0005: OK! - loading kmod test kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS kmod_test_0006: OK! - loading kmod test kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS XXX: add test restult for 0007 Test completed You can also request for specific tests: # tools/testing/selftests/kmod/kmod.sh -t 0001 kmod_test_0001_driver: OK! - loading kmod test kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND kmod_test_0001_fs: OK! - loading kmod test kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL Test completed Lastly, the current available number of tests: # tools/testing/selftests/kmod/kmod.sh --help Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ] Valid tests: 0001-0009 0001 - Simple test - 1 thread for empty string 0002 - Simple test - 1 thread for modules/filesystems that do not exist 0003 - Simple test - 1 thread for get_fs_type() only 0004 - Simple test - 2 threads for get_fs_type() only 0005 - multithreaded tests with default setup - request_module() only 0006 - multithreaded tests with default setup - get_fs_type() only 0007 - multithreaded tests with default setup test request_module() and get_fs_type() 0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module() 0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type() The following test cases currently fail, as such they are not currently enabled by default: # tools/testing/selftests/kmod/kmod.sh -t 0008 # tools/testing/selftests/kmod/kmod.sh -t 0009 To be sure to run them as intended please unload both of the modules: o test_module o xfs And ensure they are not loaded on your system prior to testing them. If you use these paritions for your rootfs you can change the default test driver used for get_fs_type() by exporting it into your environment. For example of other test defaults you can override refer to kmod.sh allow_user_defaults(). Behind the scenes this is how we fine tune at a test case prior to hitting a trigger to run it: cat /sys/devices/virtual/misc/test_kmod0/config echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads cat /sys/devices/virtual/misc/test_kmod0/config echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads Finally to trigger: echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config The kmod.sh script uses the above constructs to build different test cases. A bit of interpretation of the current failures follows, first two premises: a) When request_module() is used userspace figures out an optimized version of module order for us. Once it finds the modules it needs, as per depmod symbol dep map, it will finit_module() the respective modules which are needed for the original request_module() request. b) We have an optimization in place whereby if a kernel uses request_module() on a module already loaded we never bother userspace as the module already is loaded. This is all handled by kernel/kmod.c. A few things to consider to help identify root causes of issues: 0) kmod 19 has a broken heuristic for modules being assumed to be built-in to your kernel and will return 0 even though request_module() failed. Upgrade to a newer version of kmod. 1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs", not for "xfs". The optimization in kernel described in b) fails to catch if we have a lot of consecutive get_fs_type() calls. The reason is the optimization in place does not look for aliases. This means two consecutive get_fs_type() calls will bump kmod_concurrent, whereas request_module() will not. This one explanation why test case 0009 fails at least once for get_fs_type(). 2) If a module fails to load --- for whatever reason (kmod_concurrent limit reached, file not yet present due to rootfs switch, out of memory) we have a period of time during which module request for the same name either with request_module() or get_fs_type() will *also* fail to load even if the file for the module is ready. This explains why *multiple* NULLs are possible on test 0009. 3) finit_module() consumes quite a bit of memory. 4) Filesystems typically also have more dependent modules than other modules, its important to note though that even though a get_fs_type() call does not incur additional kmod_concurrent bumps, since userspace loads dependencies it finds it needs via finit_module_fd(), it *will* take much more memory to load a module with a lot of dependencies. Because of 3) and 4) we will easily run into out of memory failures with certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM. It panics a box after reaping all userspace processes and still not having enough memory to reap. [arnd@arndb.de: add dependencies for test module] Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-15 05:50:08 +08:00
config TEST_KMOD
tristate "kmod stress tester"
depends on m
depends on BLOCK && (64BIT || LBDAF) # for XFS, BTRFS
depends on NETDEVICES && NET_CORE && INET # for TUN
select TEST_LKM
select XFS_FS
select TUN
select BTRFS_FS
help
Test the kernel's module loading mechanism: kmod. kmod implements
support to load modules using the Linux kernel's usermode helper.
This test provides a series of tests against kmod.
Although technically you can either build test_kmod as a module or
into the kernel we disallow building it into the kernel since
it stress tests request_module() and this will very likely cause
some issues by taking over precious threads available from other
module load requests, ultimately this could be fatal.
To run tests run:
tools/testing/selftests/kmod/kmod.sh --help
If unsure, say N.
config TEST_DEBUG_VIRTUAL
tristate "Test CONFIG_DEBUG_VIRTUAL feature"
depends on DEBUG_VIRTUAL
help
Test the kernel's ability to detect incorrect calls to
virt_to_phys() done against the non-linear part of the
kernel's virtual address map.
If unsure, say N.
endif # RUNTIME_TESTING_MENU
config MEMTEST
bool "Memtest"
depends on HAVE_MEMBLOCK
---help---
This option adds a kernel parameter 'memtest', which allows memtest
to be set.
memtest=0, mean disabled; -- default
memtest=1, mean do 1 test pattern;
...
memtest=17, mean do 17 test patterns.
If you are unsure how to answer this question, answer N.
config BUG_ON_DATA_CORRUPTION
bool "Trigger a BUG when data corruption is detected"
select DEBUG_LIST
help
Select this option if the kernel should BUG when it encounters
data corruption in kernel memory structures when they get checked
for validity.
If unsure, say N.
source "samples/Kconfig"
source "lib/Kconfig.kgdb"
UBSAN: run-time undefined behavior sanity checker UBSAN uses compile-time instrumentation to catch undefined behavior (UB). Compiler inserts code that perform certain kinds of checks before operations that could cause UB. If check fails (i.e. UB detected) __ubsan_handle_* function called to print error message. So the most of the work is done by compiler. This patch just implements ubsan handlers printing errors. GCC has this capability since 4.9.x [1] (see -fsanitize=undefined option and its suboptions). However GCC 5.x has more checkers implemented [2]. Article [3] has a bit more details about UBSAN in the GCC. [1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html [2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html [3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ Issues which UBSAN has found thus far are: Found bugs: * out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") undefined shifts: * d48458d4a768 ("jbd2: use a better hash function for the revoke table") * 10632008b9e1 ("clockevents: Prevent shift out of bounds") * 'x << -1' shift in ext4 - http://lkml.kernel.org/r/<5444EF21.8020501@samsung.com> * undefined rol32(0) - http://lkml.kernel.org/r/<1449198241-20654-1-git-send-email-sasha.levin@oracle.com> * undefined dirty_ratelimit calculation - http://lkml.kernel.org/r/<566594E2.3050306@odin.com> * undefined roundown_pow_of_two(0) - http://lkml.kernel.org/r/<1449156616-11474-1-git-send-email-sasha.levin@oracle.com> * [WONTFIX] undefined shift in __bpf_prog_run - http://lkml.kernel.org/r/<CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com> WONTFIX here because it should be fixed in bpf program, not in kernel. signed overflows: * 32a8df4e0b33f ("sched: Fix odd values in effective_load() calculations") * mul overflow in ntp - http://lkml.kernel.org/r/<1449175608-1146-1-git-send-email-sasha.levin@oracle.com> * incorrect conversion into rtc_time in rtc_time64_to_tm() - http://lkml.kernel.org/r/<1449187944-11730-1-git-send-email-sasha.levin@oracle.com> * unvalidated timespec in io_getevents() - http://lkml.kernel.org/r/<CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com> * [NOTABUG] signed overflow in ktime_add_safe() - http://lkml.kernel.org/r/<CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com> [akpm@linux-foundation.org: fix unused local warning] [akpm@linux-foundation.org: fix __int128 build woes] Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Marek <mmarek@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yury Gribov <y.gribov@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-21 07:00:55 +08:00
source "lib/Kconfig.ubsan"
config ARCH_HAS_DEVMEM_IS_ALLOWED
bool
config STRICT_DEVMEM
bool "Filter access to /dev/mem"
depends on MMU && DEVMEM
depends on ARCH_HAS_DEVMEM_IS_ALLOWED
default y if PPC || X86 || ARM64
---help---
If this option is disabled, you allow userspace (root) access to all
of memory, including kernel and userspace memory. Accidental
access to this is obviously disastrous, but specific access can
be used by people debugging the kernel. Note that with PAT support
enabled, even in this case there are restrictions on /dev/mem
use due to the cache aliasing requirements.
restrict /dev/mem to idle io memory ranges This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE semantics by default. If userspace really believes it is safe to access the memory region it can also perform the extra step of disabling an active driver. This protects device address ranges with read side effects and otherwise directs userspace to use the driver. Persistent memory presents a large "mistake surface" to /dev/mem as now accidental writes can corrupt a filesystem. In general if a device driver is busily using a memory region it already informs other parts of the kernel to not touch it via request_mem_region(). /dev/mem should honor the same safety restriction by default. Debugging a device driver from userspace becomes more difficult with this enabled. Any application using /dev/mem or mmap of sysfs pci resources will now need to perform the extra step of either: 1/ Disabling the driver, for example: echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind 2/ Rebooting with "iomem=relaxed" on the command line 3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n Traditional users of /dev/mem like dosemu are unaffected because the first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction. Legacy X configurations use /dev/mem to talk to graphics hardware, but that functionality has since moved to kernel graphics drivers. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-24 07:49:03 +08:00
If this option is switched on, and IO_STRICT_DEVMEM=n, the /dev/mem
file only allows userspace access to PCI space and the BIOS code and
data regions. This is sufficient for dosemu and X and all common
users of /dev/mem.
If in doubt, say Y.
config IO_STRICT_DEVMEM
bool "Filter I/O access to /dev/mem"
depends on STRICT_DEVMEM
---help---
If this option is disabled, you allow userspace (root) access to all
io-memory regardless of whether a driver is actively using that
range. Accidental access to this is obviously disastrous, but
specific access can be used by people debugging kernel drivers.
If this option is switched on, the /dev/mem file only allows
restrict /dev/mem to idle io memory ranges This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE semantics by default. If userspace really believes it is safe to access the memory region it can also perform the extra step of disabling an active driver. This protects device address ranges with read side effects and otherwise directs userspace to use the driver. Persistent memory presents a large "mistake surface" to /dev/mem as now accidental writes can corrupt a filesystem. In general if a device driver is busily using a memory region it already informs other parts of the kernel to not touch it via request_mem_region(). /dev/mem should honor the same safety restriction by default. Debugging a device driver from userspace becomes more difficult with this enabled. Any application using /dev/mem or mmap of sysfs pci resources will now need to perform the extra step of either: 1/ Disabling the driver, for example: echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind 2/ Rebooting with "iomem=relaxed" on the command line 3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n Traditional users of /dev/mem like dosemu are unaffected because the first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction. Legacy X configurations use /dev/mem to talk to graphics hardware, but that functionality has since moved to kernel graphics drivers. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-24 07:49:03 +08:00
userspace access to *idle* io-memory ranges (see /proc/iomem) This
may break traditional users of /dev/mem (dosemu, legacy X, etc...)
if the driver using a given range cannot be disabled.
If in doubt, say Y.
source "arch/$(SRCARCH)/Kconfig.debug"
endmenu # Kernel hacking