OpenCloudOS-Kernel/include/linux/kernel.h

743 lines
26 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_KERNEL_H
#define _LINUX_KERNEL_H
#include <stdarg.h>
#include <linux/align.h>
#include <linux/limits.h>
#include <linux/linkage.h>
#include <linux/stddef.h>
#include <linux/types.h>
#include <linux/compiler.h>
#include <linux/bitops.h>
#include <linux/log2.h>
#include <linux/math.h>
#include <linux/minmax.h>
#include <linux/typecheck.h>
#include <linux/printk.h>
kernel.h: handle pointers to arrays better in container_of() If the first parameter of container_of() is a pointer to a non-const-qualified array type (and the third parameter names a non-const-qualified array member), the local variable __mptr will be defined with a const-qualified array type. In ISO C, these types are incompatible. They work as expected in GNU C, but some versions will issue warnings. For example, GCC 4.9 produces the warning "initialization from incompatible pointer type". Here is an example of where the problem occurs: ------------------------------------------------------- #include <linux/kernel.h> #include <linux/module.h> MODULE_LICENSE("GPL"); struct st { int a; char b[16]; }; static int __init example_init(void) { struct st t = { .a = 101, .b = "hello" }; char (*p)[16] = &t.b; struct st *x = container_of(p, struct st, b); printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x); return 0; } static void __exit example_exit(void) { } module_init(example_init); module_exit(example_exit); ------------------------------------------------------- Building the module with gcc-4.9 results in these warnings (where '{m}' is the module source and '{k}' is the kernel source): ------------------------------------------------------- In file included from {m}/example.c:1:0: {m}/example.c: In function `example_init': {k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ {m}/example.c:14:17: note: in expansion of macro `container_of' struct st *x = container_of(p, struct st, b); ^ {k}/include/linux/kernel.h:854:48: warning: (near initialization for `x') const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ {m}/example.c:14:17: note: in expansion of macro `container_of' struct st *x = container_of(p, struct st, b); ^ ------------------------------------------------------- Replace the type checking performed by the macro to avoid these warnings. Make sure `*(ptr)` either has type compatible with the member, or has type compatible with `void`, ignoring qualifiers. Raise compiler errors if this is not true. This is stronger than the previous behaviour, which only resulted in compiler warnings for a type mismatch. [arnd@arndb.de: fix new warnings for container_of()] Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.uk Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-13 05:33:04 +08:00
#include <linux/build_bug.h>
#include <linux/static_call_types.h>
#include <asm/byteorder.h>
#include <uapi/linux/kernel.h>
#define STACK_MAGIC 0xdeadbeef
/**
* REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
* @x: value to repeat
*
* NOTE: @x is not checked for > 0xff; larger values produce odd results.
*/
#define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
/* generic data direction definitions */
#define READ 0
#define WRITE 1
/**
* ARRAY_SIZE - get the number of elements in array @arr
* @arr: array to be sized
*/
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
#define PTR_IF(cond, ptr) ((cond) ? (ptr) : NULL)
#define u64_to_user_ptr(x) ( \
{ \
typecheck(u64, (x)); \
(void __user *)(uintptr_t)(x); \
} \
)
#define typeof_member(T, m) typeof(((T*)0)->m)
#define _RET_IP_ (unsigned long)__builtin_return_address(0)
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
/**
* upper_32_bits - return bits 32-63 of a number
* @n: the number we're accessing
*
* A basic shift-right of a 64- or 32-bit quantity. Use this to suppress
* the "right shift count >= width of type" warning when that quantity is
* 32-bits.
*/
#define upper_32_bits(n) ((u32)(((n) >> 16) >> 16))
/**
* lower_32_bits - return bits 0-31 of a number
* @n: the number we're accessing
*/
#define lower_32_bits(n) ((u32)((n) & 0xffffffff))
struct completion;
struct pt_regs;
struct user;
#ifdef CONFIG_PREEMPT_VOLUNTARY
extern int __cond_resched(void);
# define might_resched() __cond_resched()
#elif defined(CONFIG_PREEMPT_DYNAMIC)
extern int __cond_resched(void);
DECLARE_STATIC_CALL(might_resched, __cond_resched);
static __always_inline void might_resched(void)
{
static_call_mod(might_resched)();
}
#else
# define might_resched() do { } while (0)
#endif /* CONFIG_PREEMPT_* */
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
extern void ___might_sleep(const char *file, int line, int preempt_offset);
extern void __might_sleep(const char *file, int line, int preempt_offset);
extern void __cant_sleep(const char *file, int line, int preempt_offset);
extern void __cant_migrate(const char *file, int line);
/**
* might_sleep - annotation for functions that can sleep
*
* this macro will print a stack trace if it is executed in an atomic
* context (spinlock, irq-handler, ...). Additional sections where blocking is
* not allowed can be annotated with non_block_start() and non_block_end()
* pairs.
*
* This is a useful debugging help to be able to catch problems early and not
* be bitten later when the calling function happens to sleep when it is not
* supposed to.
*/
# define might_sleep() \
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
/**
* cant_sleep - annotation for functions that cannot sleep
*
* this macro will print a stack trace if it is executed with preemption enabled
*/
# define cant_sleep() \
do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
sched: don't cause task state changes in nested sleep debugging Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-02 04:23:32 +08:00
# define sched_annotate_sleep() (current->task_state_change = 0)
/**
* cant_migrate - annotation for functions that cannot migrate
*
* Will print a stack trace if executed in code which is migratable
*/
# define cant_migrate() \
do { \
if (IS_ENABLED(CONFIG_SMP)) \
__cant_migrate(__FILE__, __LINE__); \
} while (0)
/**
* non_block_start - annotate the start of section where sleeping is prohibited
*
* This is on behalf of the oom reaper, specifically when it is calling the mmu
* notifiers. The problem is that if the notifier were to block on, for example,
* mutex_lock() and if the process which holds that mutex were to perform a
* sleeping memory allocation, the oom reaper is now blocked on completion of
* that memory allocation. Other blocking calls like wait_event() pose similar
* issues.
*/
# define non_block_start() (current->non_block_count++)
/**
* non_block_end - annotate the end of section where sleeping is prohibited
*
* Closes a section opened by non_block_start().
*/
# define non_block_end() WARN_ON(current->non_block_count-- == 0)
#else
static inline void ___might_sleep(const char *file, int line,
int preempt_offset) { }
static inline void __might_sleep(const char *file, int line,
int preempt_offset) { }
# define might_sleep() do { might_resched(); } while (0)
# define cant_sleep() do { } while (0)
# define cant_migrate() do { } while (0)
sched, exit: Deal with nested sleeps do_wait() is a big wait loop, but we set TASK_RUNNING too late; we end up calling potential sleeps before we reset it. Not strictly a bug since we're guaranteed to exit the loop and not call schedule(); put in annotations to quiet might_sleep(). WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:7123 __might_sleep+0x7e/0x90() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109a788>] do_wait+0x88/0x270 Call Trace: [<ffffffff81694991>] dump_stack+0x4e/0x7a [<ffffffff8109877c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff8109886c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff810bca6e>] __might_sleep+0x7e/0x90 [<ffffffff811a1c15>] might_fault+0x55/0xb0 [<ffffffff8109a3fb>] wait_consider_task+0x90b/0xc10 [<ffffffff8109a804>] do_wait+0x104/0x270 [<ffffffff8109b837>] SyS_wait4+0x77/0x100 [<ffffffff8169d692>] system_call_fastpath+0x16/0x1b Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: tglx@linutronix.de Cc: umgwanakikbuti@gmail.com Cc: ilya.dryomov@inktank.com Cc: Alex Elder <alex.elder@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Axel Lin <axel.lin@ingics.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Guillaume Morin <guillaume@morinfr.org> Cc: Ionut Alexa <ionut.m.alexa@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Michal Schmidt <mschmidt@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140924082242.186408915@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 16:18:49 +08:00
# define sched_annotate_sleep() do { } while (0)
# define non_block_start() do { } while (0)
# define non_block_end() do { } while (0)
#endif
#define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
#if defined(CONFIG_MMU) && \
(defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP))
sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults Commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with pagefault_disable()") removed might_sleep() checks for all user access code (that uses might_fault()). The reason was to disable wrong "sleep in atomic" warnings in the following scenario: pagefault_disable() rc = copy_to_user(...) pagefault_enable() Which is valid, as pagefault_disable() increments the preempt counter and therefore disables the pagefault handler. copy_to_user() will not sleep and return an error code if a page is not available. However, as all might_sleep() checks are removed, CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario: spin_lock(&lock); rc = copy_to_user(...) spin_unlock(&lock) If the kernel is compiled with preemption turned on, preempt_disable() will make in_atomic() detect disabled preemption. The fault handler would correctly never sleep on user access. However, with preemption turned off, preempt_disable() is usually a NOP (with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to detect disabled preemption nor disabled pagefaults. The fault handler could sleep. We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access functions again, otherwise we can end up with horrible deadlocks. Root of all evil is that pagefault_disable() acts almost as preempt_disable(), depending on preemption being turned on/off. As we now have pagefault_disabled(), we can use it to distinguish whether user acces functions might sleep. Convert might_fault() into a makro that calls __might_fault(), to allow proper file + line messages in case of a might_sleep() warning. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-3-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 23:52:07 +08:00
#define might_fault() __might_fault(__FILE__, __LINE__)
void __might_fault(const char *file, int line);
#else
static inline void might_fault(void) { }
#endif
[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 17:16:30 +08:00
extern struct atomic_notifier_head panic_notifier_list;
extern long (*panic_blink)(int state);
__printf(1, 2)
void panic(const char *fmt, ...) __noreturn __cold;
panic: change nmi_panic from macro to function Commit 1717f2096b54 ("panic, x86: Fix re-entrance problem due to panic on NMI") and commit 58c5661f2144 ("panic, x86: Allow CPUs to save registers even if looping in NMI context") introduced nmi_panic() which prevents concurrent/recursive execution of panic(). It also saves registers for the crash dump on x86. However, there are some cases where NMI handlers still use panic(). This patch set partially replaces them with nmi_panic() in those cases. Even this patchset is applied, some NMI or similar handlers (e.g. MCE handler) continue to use panic(). This is because I can't test them well and actual problems won't happen. For example, the possibility that normal panic and panic on MCE happen simultaneously is very low. This patch (of 3): Convert nmi_panic() to a proper function and export it instead of exporting internal implementation details to modules, for obvious reasons. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Javi Merino <javi.merino@arm.com> Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com> Cc: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-23 05:27:17 +08:00
void nmi_panic(struct pt_regs *regs, const char *msg);
extern void oops_enter(void);
extern void oops_exit(void);
extern bool oops_may_print(void);
void do_exit(long error_code) __noreturn;
void complete_and_exit(struct completion *, long) __noreturn;
/* Internal, do not use. */
int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
int __must_check _kstrtol(const char *s, unsigned int base, long *res);
int __must_check kstrtoull(const char *s, unsigned int base, unsigned long long *res);
int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
/**
* kstrtoul - convert a string to an unsigned long
* @s: The start of the string. The string must be null-terminated, and may also
* include a single newline before its terminating null. The first character
* may also be a plus sign, but not a minus sign.
* @base: The number base to use. The maximum supported base is 16. If base is
* given as 0, then the base of the string is automatically detected with the
* conventional semantics - If it begins with 0x the number will be parsed as a
* hexadecimal (case insensitive), if it otherwise begins with 0, it will be
* parsed as an octal number. Otherwise it will be parsed as a decimal.
* @res: Where to write the result of the conversion on success.
*
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
* Preferred over simple_strtoul(). Return code must be checked.
*/
static inline int __must_check kstrtoul(const char *s, unsigned int base, unsigned long *res)
{
/*
* We want to shortcut function call, but
* __builtin_types_compatible_p(unsigned long, unsigned long long) = 0.
*/
if (sizeof(unsigned long) == sizeof(unsigned long long) &&
__alignof__(unsigned long) == __alignof__(unsigned long long))
return kstrtoull(s, base, (unsigned long long *)res);
else
return _kstrtoul(s, base, res);
}
/**
* kstrtol - convert a string to a long
* @s: The start of the string. The string must be null-terminated, and may also
* include a single newline before its terminating null. The first character
* may also be a plus sign or a minus sign.
* @base: The number base to use. The maximum supported base is 16. If base is
* given as 0, then the base of the string is automatically detected with the
* conventional semantics - If it begins with 0x the number will be parsed as a
* hexadecimal (case insensitive), if it otherwise begins with 0, it will be
* parsed as an octal number. Otherwise it will be parsed as a decimal.
* @res: Where to write the result of the conversion on success.
*
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
* Preferred over simple_strtol(). Return code must be checked.
*/
static inline int __must_check kstrtol(const char *s, unsigned int base, long *res)
{
/*
* We want to shortcut function call, but
* __builtin_types_compatible_p(long, long long) = 0.
*/
if (sizeof(long) == sizeof(long long) &&
__alignof__(long) == __alignof__(long long))
return kstrtoll(s, base, (long long *)res);
else
return _kstrtol(s, base, res);
}
int __must_check kstrtouint(const char *s, unsigned int base, unsigned int *res);
int __must_check kstrtoint(const char *s, unsigned int base, int *res);
static inline int __must_check kstrtou64(const char *s, unsigned int base, u64 *res)
{
return kstrtoull(s, base, res);
}
static inline int __must_check kstrtos64(const char *s, unsigned int base, s64 *res)
{
return kstrtoll(s, base, res);
}
static inline int __must_check kstrtou32(const char *s, unsigned int base, u32 *res)
{
return kstrtouint(s, base, res);
}
static inline int __must_check kstrtos32(const char *s, unsigned int base, s32 *res)
{
return kstrtoint(s, base, res);
}
int __must_check kstrtou16(const char *s, unsigned int base, u16 *res);
int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
int __must_check kstrtobool(const char *s, bool *res);
int __must_check kstrtoull_from_user(const char __user *s, size_t count, unsigned int base, unsigned long long *res);
int __must_check kstrtoll_from_user(const char __user *s, size_t count, unsigned int base, long long *res);
int __must_check kstrtoul_from_user(const char __user *s, size_t count, unsigned int base, unsigned long *res);
int __must_check kstrtol_from_user(const char __user *s, size_t count, unsigned int base, long *res);
int __must_check kstrtouint_from_user(const char __user *s, size_t count, unsigned int base, unsigned int *res);
int __must_check kstrtoint_from_user(const char __user *s, size_t count, unsigned int base, int *res);
int __must_check kstrtou16_from_user(const char __user *s, size_t count, unsigned int base, u16 *res);
int __must_check kstrtos16_from_user(const char __user *s, size_t count, unsigned int base, s16 *res);
int __must_check kstrtou8_from_user(const char __user *s, size_t count, unsigned int base, u8 *res);
int __must_check kstrtos8_from_user(const char __user *s, size_t count, unsigned int base, s8 *res);
int __must_check kstrtobool_from_user(const char __user *s, size_t count, bool *res);
static inline int __must_check kstrtou64_from_user(const char __user *s, size_t count, unsigned int base, u64 *res)
{
return kstrtoull_from_user(s, count, base, res);
}
static inline int __must_check kstrtos64_from_user(const char __user *s, size_t count, unsigned int base, s64 *res)
{
return kstrtoll_from_user(s, count, base, res);
}
static inline int __must_check kstrtou32_from_user(const char __user *s, size_t count, unsigned int base, u32 *res)
{
return kstrtouint_from_user(s, count, base, res);
}
static inline int __must_check kstrtos32_from_user(const char __user *s, size_t count, unsigned int base, s32 *res)
{
return kstrtoint_from_user(s, count, base, res);
}
/*
* Use kstrto<foo> instead.
*
* NOTE: simple_strto<foo> does not check for the range overflow and,
* depending on the input, may give interesting results.
*
* Use these functions if and only if you cannot use kstrto<foo>, because
* the conversion ends on the first non-digit character, which may be far
* beyond the supported range. It might be useful to parse the strings like
* 10x50 or 12:21 without altering original string or temporary buffer in use.
* Keep in mind above caveat.
*/
extern unsigned long simple_strtoul(const char *,char **,unsigned int);
extern long simple_strtol(const char *,char **,unsigned int);
extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
extern long long simple_strtoll(const char *,char **,unsigned int);
extern int num_to_str(char *buf, int size,
unsigned long long num, unsigned int width);
procfs: add num_to_str() to speed up /proc/stat == stat_check.py num = 0 with open("/proc/stat") as f: while num < 1000 : data = f.read() f.seek(0, 0) num = num + 1 == perf shows 20.39% stat_check.py [kernel.kallsyms] [k] format_decode 13.41% stat_check.py [kernel.kallsyms] [k] number 12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf 10.85% stat_check.py [kernel.kallsyms] [k] memcpy 4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup 4.43% stat_check.py [kernel.kallsyms] [k] seq_printf This patch removes most of calls to vsnprintf() by adding num_to_str() and seq_print_decimal_ull(), which prints decimal numbers without rich functions provided by printf(). On my 8cpu box. == Before patch == [root@bluextal test]# time ./stat_check.py real 0m0.150s user 0m0.026s sys 0m0.121s == After patch == [root@bluextal test]# time ./stat_check.py real 0m0.055s user 0m0.022s sys 0m0.030s [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()] [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrea Righi <andrea@betterlinux.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Turner <pjt@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-24 06:02:54 +08:00
/* lib/printf utilities */
extern __printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
extern __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
extern __printf(3, 4)
int snprintf(char *buf, size_t size, const char *fmt, ...);
extern __printf(3, 0)
int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
extern __printf(3, 4)
int scnprintf(char *buf, size_t size, const char *fmt, ...);
extern __printf(3, 0)
int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
extern __printf(2, 3) __malloc
char *kasprintf(gfp_t gfp, const char *fmt, ...);
extern __printf(2, 0) __malloc
char *kvasprintf(gfp_t gfp, const char *fmt, va_list args);
extern __printf(2, 0)
const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args);
extern __scanf(2, 3)
int sscanf(const char *, const char *, ...);
extern __scanf(2, 0)
int vsscanf(const char *, const char *, va_list);
extern int get_option(char **str, int *pint);
extern char *get_options(const char *str, int nints, int *ints);
extern unsigned long long memparse(const char *ptr, char **retptr);
extern bool parse_option_str(const char *str, const char *option);
extern char *next_arg(char *args, char **param, char **val);
extern int core_kernel_text(unsigned long addr);
extern int init_kernel_text(unsigned long addr);
extern int core_kernel_data(unsigned long addr);
extern int __kernel_text_address(unsigned long addr);
extern int kernel_text_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);
panic: add sysctl to dump all CPUs backtraces on oops event Usually when the kernel reaches an oops condition, it's a point of no return; in case not enough debug information is available in the kernel splat, one of the last resorts would be to collect a kernel crash dump and analyze it. The problem with this approach is that in order to collect the dump, a panic is required (to kexec-load the crash kernel). When in an environment of multiple virtual machines, users may prefer to try living with the oops, at least until being able to properly shutdown their VMs / finish their important tasks. This patch implements a way to collect a bit more debug details when an oops event is reached, by printing all the CPUs backtraces through the usage of NMIs (on architectures that support that). The sysctl added (and documented) here was called "oops_all_cpu_backtrace", and when set will (as the name suggests) dump all CPUs backtraces. Far from ideal, this may be the last option though for users that for some reason cannot panic on oops. Most of times oopses are clear enough to indicate the kernel portion that must be investigated, but in virtual environments it's possible to observe hypervisor/KVM issues that could lead to oopses shown in other guests CPUs (like virtual APIC crashes). This patch hence aims to help debug such complex issues without resorting to kdump. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 12:40:48 +08:00
#ifdef CONFIG_SMP
extern unsigned int sysctl_oops_all_cpu_backtrace;
#else
#define sysctl_oops_all_cpu_backtrace 0
#endif /* CONFIG_SMP */
extern void bust_spinlocks(int yes);
extern int panic_timeout;
extern unsigned long panic_print;
extern int panic_on_oops;
extern int panic_on_unrecovered_nmi;
extern int panic_on_io_nmi;
kernel: add panic_on_warn There have been several times where I have had to rebuild a kernel to cause a panic when hitting a WARN() in the code in order to get a crash dump from a system. Sometimes this is easy to do, other times (such as in the case of a remote admin) it is not trivial to send new images to the user. A much easier method would be a switch to change the WARN() over to a panic. This makes debugging easier in that I can now test the actual image the WARN() was seen on and I do not have to engage in remote debugging. This patch adds a panic_on_warn kernel parameter and /proc/sys/kernel/panic_on_warn calls panic() in the warn_slowpath_common() path. The function will still print out the location of the warning. An example of the panic_on_warn output: The first line below is from the WARN_ON() to output the WARN_ON()'s location. After that the panic() output is displayed. WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]() Kernel panic - not syncing: panic_on_warn set ... CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190 0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df Call Trace: [<ffffffff81665190>] dump_stack+0x46/0x58 [<ffffffff8165e2ec>] panic+0xd0/0x204 [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module] [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0 [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module] [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module] [<ffffffff81002144>] do_one_initcall+0xd4/0x210 [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110 [<ffffffff810f8889>] load_module+0x16a9/0x1b30 [<ffffffff810f3d30>] ? store_uevent+0x70/0x70 [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180 [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0 [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17 Successfully tested by me. hpa said: There is another very valid use for this: many operators would rather a machine shuts down than being potentially compromised either functionally or security-wise. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11 07:45:50 +08:00
extern int panic_on_warn;
kernel: add panic_on_taint Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given flag. This is useful for debugging sessions as it avoids having to rebuild the kernel to explicitly add calls to panic() into the code sites that introduce the taint flags of interest. For instance, if one is interested in proceeding with a post-mortem analysis at the point a given code path is hitting a bad page (i.e. unaccount_page_cache_page(), or slab_bug()), a coredump can be collected by rebooting the kernel with 'panic_on_taint=0x20' amended to the command line. Another, perhaps less frequent, use for this option would be as a means for assuring a security policy case where only a subset of taints, or no single taint (in paranoid mode), is allowed for the running system. The optional switch 'nousertaint' is handy in this particular scenario, as it will avoid userspace induced crashes by writes to sysctl interface /proc/sys/kernel/tainted causing false positive hits for such policies. [akpm@linux-foundation.org: tweak kernel-parameters.txt wording] Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Bunk <bunk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 12:40:17 +08:00
extern unsigned long panic_on_taint;
extern bool panic_on_taint_nousertaint;
rcu: sysctl: Panic on RCU Stall It is not always easy to determine the cause of an RCU stall just by analysing the RCU stall messages, mainly when the problem is caused by the indirect starvation of rcu threads. For example, when preempt_rcu is not awakened due to the starvation of a timer softirq. We have been hard coding panic() in the RCU stall functions for some time while testing the kernel-rt. But this is not possible in some scenarios, like when supporting customers. This patch implements the sysctl kernel.panic_on_rcu_stall. If set to 1, the system will panic() when an RCU stall takes place, enabling the capture of a vmcore. The vmcore provides a way to analyze all kernel/tasks states, helping out to point to the culprit and the solution for the stall. The kernel.panic_on_rcu_stall sysctl is disabled by default. Changes from v1: - Fixed a typo in the git log - The if(sysctl_panic_on_rcu_stall) panic() is in a static function - Fixed the CONFIG_TINY_RCU compilation issue - The var sysctl_panic_on_rcu_stall is now __read_mostly Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-03 00:51:41 +08:00
extern int sysctl_panic_on_rcu_stall;
extern int sysctl_max_rcu_stall_to_panic;
extern int sysctl_panic_on_stackoverflow;
extern bool crash_kexec_post_notifiers;
panic, x86: Fix re-entrance problem due to panic on NMI If panic on NMI happens just after panic() on the same CPU, panic() is recursively called. Kernel stalls, as a result, after failing to acquire panic_lock. To avoid this problem, don't call panic() in NMI context if we've already entered panic(). For that, introduce nmi_panic() macro to reduce code duplication. In the case of panic on NMI, don't return from NMI handlers if another CPU already panicked. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Don Zickus <dzickus@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Javi Merino <javi.merino@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: kexec@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: lkml <linux-kernel@vger.kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seth Jennings <sjenning@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs [ Cleanup comments, fixup formatting. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-14 18:19:09 +08:00
/*
* panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
* holds a CPU number which is executing panic() currently. A value of
* PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
*/
extern atomic_t panic_cpu;
#define PANIC_CPU_INVALID -1
/*
* Only to be used by arch init code. If the user over-wrote the default
* CONFIG_PANIC_TIMEOUT, honor it.
*/
static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
{
if (panic_timeout == arch_default_timeout)
panic_timeout = timeout;
}
extern const char *print_tainted(void);
enum lockdep_ok {
LOCKDEP_STILL_OK,
LOCKDEP_NOW_UNRELIABLE
};
extern void add_taint(unsigned flag, enum lockdep_ok);
extern int test_taint(unsigned flag);
extern unsigned long get_taint(void);
extern int root_mountflags;
extern bool early_boot_irqs_disabled;
/*
* Values used for system_state. Ordering of the states must not be changed
* as code checks for <, <=, >, >= STATE.
*/
extern enum system_states {
SYSTEM_BOOTING,
SYSTEM_SCHEDULING,
SYSTEM_RUNNING,
SYSTEM_HALT,
SYSTEM_POWER_OFF,
SYSTEM_RESTART,
SYSTEM_SUSPEND,
} system_state;
/* This cannot be an enum because some may be used in assembly source. */
#define TAINT_PROPRIETARY_MODULE 0
#define TAINT_FORCED_MODULE 1
#define TAINT_CPU_OUT_OF_SPEC 2
#define TAINT_FORCED_RMMOD 3
#define TAINT_MACHINE_CHECK 4
#define TAINT_BAD_PAGE 5
#define TAINT_USER 6
#define TAINT_DIE 7
#define TAINT_OVERRIDDEN_ACPI_TABLE 8
#define TAINT_WARN 9
#define TAINT_CRAP 10
#define TAINT_FIRMWARE_WORKAROUND 11
#define TAINT_OOT_MODULE 12
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE Users have reported being unable to trace non-signed modules loaded within a kernel supporting module signature. This is caused by tracepoint.c:tracepoint_module_coming() refusing to take into account tracepoints sitting within force-loaded modules (TAINT_FORCED_MODULE). The reason for this check, in the first place, is that a force-loaded module may have a struct module incompatible with the layout expected by the kernel, and can thus cause a kernel crash upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y. Tracepoints, however, specifically accept TAINT_OOT_MODULE and TAINT_CRAP, since those modules do not lead to the "very likely system crash" issue cited above for force-loaded modules. With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed module is tainted re-using the TAINT_FORCED_MODULE taint flag. Unfortunately, this means that Tracepoints treat that module as a force-loaded module, and thus silently refuse to consider any tracepoint within this module. Since an unsigned module does not fit within the "very likely system crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag to specifically address this taint behavior, and accept those modules within Tracepoints. We use the letter 'X' as a taint flag character for a module being loaded that doesn't know how to sign its name (proposed by Steven Rostedt). Also add the missing 'O' entry to trace event show_module_flags() list for the sake of completeness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> NAKed-by: Ingo Molnar <mingo@redhat.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: David Howells <dhowells@redhat.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13 09:41:30 +08:00
#define TAINT_UNSIGNED_MODULE 13
#define TAINT_SOFTLOCKUP 14
#define TAINT_LIVEPATCH 15
#define TAINT_AUX 16
#define TAINT_RANDSTRUCT 17
#define TAINT_FLAGS_COUNT 18
kernel: add panic_on_taint Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given flag. This is useful for debugging sessions as it avoids having to rebuild the kernel to explicitly add calls to panic() into the code sites that introduce the taint flags of interest. For instance, if one is interested in proceeding with a post-mortem analysis at the point a given code path is hitting a bad page (i.e. unaccount_page_cache_page(), or slab_bug()), a coredump can be collected by rebooting the kernel with 'panic_on_taint=0x20' amended to the command line. Another, perhaps less frequent, use for this option would be as a means for assuring a security policy case where only a subset of taints, or no single taint (in paranoid mode), is allowed for the running system. The optional switch 'nousertaint' is handy in this particular scenario, as it will avoid userspace induced crashes by writes to sysctl interface /proc/sys/kernel/tainted causing false positive hits for such policies. [akpm@linux-foundation.org: tweak kernel-parameters.txt wording] Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Bunk <bunk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 12:40:17 +08:00
#define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)
taint/module: Clean up global and module taint flags handling The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to potentially print one more character. But it did not increase the size of the corresponding buffers in m_show() and print_modules(). We have recently done the same mistake when adding a taint flag for livepatching, see https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com Also struct module uses an incompatible type for mod-taints flags. It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module taint flags in Oops/panic"). There was used "int" for the global taint flags at these times. But only the global tain flags was later changed to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint flags reliable"). This patch defines TAINT_FLAGS_COUNT that can be used to create arrays and buffers of the right size. Note that we could not use enum because the taint flag indexes are used also in assembly code. Then it reworks the table that describes the taint flags. The TAINT_* numbers can be used as the index. Instead, we add information if the taint flag is also shown per-module. Finally, it uses "unsigned long", bit operations, and the updated taint_flags table also for mod->taints. It is not optimal because only few taint flags can be printed by module_taint_flags(). But better be on the safe side. IMHO, it is not worth the optimization and this is a good compromise. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com [jeyu@redhat.com: fix broken lkml link in changelog] Signed-off-by: Jessica Yu <jeyu@redhat.com>
2016-09-21 19:47:22 +08:00
struct taint_flag {
char c_true; /* character printed when tainted */
char c_false; /* character printed when not tainted */
taint/module: Clean up global and module taint flags handling The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to potentially print one more character. But it did not increase the size of the corresponding buffers in m_show() and print_modules(). We have recently done the same mistake when adding a taint flag for livepatching, see https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com Also struct module uses an incompatible type for mod-taints flags. It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module taint flags in Oops/panic"). There was used "int" for the global taint flags at these times. But only the global tain flags was later changed to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint flags reliable"). This patch defines TAINT_FLAGS_COUNT that can be used to create arrays and buffers of the right size. Note that we could not use enum because the taint flag indexes are used also in assembly code. Then it reworks the table that describes the taint flags. The TAINT_* numbers can be used as the index. Instead, we add information if the taint flag is also shown per-module. Finally, it uses "unsigned long", bit operations, and the updated taint_flags table also for mod->taints. It is not optimal because only few taint flags can be printed by module_taint_flags(). But better be on the safe side. IMHO, it is not worth the optimization and this is a good compromise. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com [jeyu@redhat.com: fix broken lkml link in changelog] Signed-off-by: Jessica Yu <jeyu@redhat.com>
2016-09-21 19:47:22 +08:00
bool module; /* also show as a per-module taint flag */
};
extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];
extern const char hex_asc[];
#define hex_asc_lo(x) hex_asc[((x) & 0x0f)]
#define hex_asc_hi(x) hex_asc[((x) & 0xf0) >> 4]
static inline char *hex_byte_pack(char *buf, u8 byte)
{
*buf++ = hex_asc_hi(byte);
*buf++ = hex_asc_lo(byte);
return buf;
}
extern const char hex_asc_upper[];
#define hex_asc_upper_lo(x) hex_asc_upper[((x) & 0x0f)]
#define hex_asc_upper_hi(x) hex_asc_upper[((x) & 0xf0) >> 4]
static inline char *hex_byte_pack_upper(char *buf, u8 byte)
{
*buf++ = hex_asc_upper_hi(byte);
*buf++ = hex_asc_upper_lo(byte);
return buf;
}
extern int hex_to_bin(char ch);
extern int __must_check hex2bin(u8 *dst, const char *src, size_t count);
extern char *bin2hex(char *dst, const void *src, size_t count);
bool mac_pton(const char *s, u8 *mac);
/*
* General tracing related utility functions - trace_printk(),
tracing: add tracing_on/tracing_off to kernel.h Impact: cleanup The functions tracing_start/tracing_stop have been moved to kernel.h. These are not the functions a developer most likely wants to use when they want to insert a place to stop tracing and restart it from user space. tracing_start/tracing_stop was created to work with things like suspend to ram, where even calling smp_processor_id() can crash the system. The tracing_start/tracing_stop was used to stop the tracer from doing anything. These are still light weight functions, but add a bit more overhead to be able to stop the tracers. They also have no interface back to userland. That is, if the kernel calls tracing_stop, userland can not start tracing. What a developer most likely wants to use is tracing_on/tracing_off. These are very light weight functions (simply sets or clears a bit). These functions just stop recording into the ring buffer. The tracers don't even know that this happens except that they would receive NULL from the ring_buffer_lock_reserve function. Also, there's a way for the user land to enable or disable this bit. In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off()) or echo "1" (same as tracing_on()) into this file. This becomes handy when a kernel developer is debugging and wants tracing to turn off when it hits an anomaly. Then the developer can examine the trace, and restart tracing if they want to try again (echo 1 > tracing_on). This patch moves the prototypes for tracing_on/tracing_off to kernel.h and comments their use, so that a kernel developer will know how to use them. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-05 23:35:56 +08:00
* tracing_on/tracing_off and tracing_start()/tracing_stop
*
* Use tracing_on/tracing_off when you want to quickly turn on or off
* tracing. It simply enables or disables the recording of the trace events.
* This also corresponds to the user space /sys/kernel/debug/tracing/tracing_on
tracing: add tracing_on/tracing_off to kernel.h Impact: cleanup The functions tracing_start/tracing_stop have been moved to kernel.h. These are not the functions a developer most likely wants to use when they want to insert a place to stop tracing and restart it from user space. tracing_start/tracing_stop was created to work with things like suspend to ram, where even calling smp_processor_id() can crash the system. The tracing_start/tracing_stop was used to stop the tracer from doing anything. These are still light weight functions, but add a bit more overhead to be able to stop the tracers. They also have no interface back to userland. That is, if the kernel calls tracing_stop, userland can not start tracing. What a developer most likely wants to use is tracing_on/tracing_off. These are very light weight functions (simply sets or clears a bit). These functions just stop recording into the ring buffer. The tracers don't even know that this happens except that they would receive NULL from the ring_buffer_lock_reserve function. Also, there's a way for the user land to enable or disable this bit. In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off()) or echo "1" (same as tracing_on()) into this file. This becomes handy when a kernel developer is debugging and wants tracing to turn off when it hits an anomaly. Then the developer can examine the trace, and restart tracing if they want to try again (echo 1 > tracing_on). This patch moves the prototypes for tracing_on/tracing_off to kernel.h and comments their use, so that a kernel developer will know how to use them. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-05 23:35:56 +08:00
* file, which gives a means for the kernel and userspace to interact.
* Place a tracing_off() in the kernel where you want tracing to end.
* From user space, examine the trace, and then echo 1 > tracing_on
* to continue tracing.
*
* tracing_stop/tracing_start has slightly more overhead. It is used
* by things like suspend to ram where disabling the recording of the
* trace is not enough, but tracing must actually stop because things
* like calling smp_processor_id() may crash the system.
*
* Most likely, you want to use tracing_on/tracing_off.
*/
enum ftrace_dump_mode {
DUMP_NONE,
DUMP_ALL,
DUMP_ORIG,
};
#ifdef CONFIG_TRACING
void tracing_on(void);
void tracing_off(void);
int tracing_is_on(void);
tracing: Add internal tracing_snapshot() functions The new snapshot feature is quite handy. It's a way for the user to take advantage of the spare buffer that, until then, only the latency tracers used to "snapshot" the buffer when it hit a max latency. Now users can trigger a "snapshot" manually when some condition is hit in a program. But a snapshot currently can not be triggered by a condition inside the kernel. With the addition of tracing_snapshot() and tracing_snapshot_alloc(), snapshots can now be taking when a condition is hit, and the developer wants to snapshot the case without stopping the trace. Note, any snapshot will overwrite the old one, so take care in how this is done. These new functions are to be used like tracing_on(), tracing_off() and trace_printk() are. That is, they should never be called in the mainline Linux kernel. They are solely for the purpose of debugging. The tracing_snapshot() will not allocate a buffer, but it is safe to be called from any context (except NMIs). But if a snapshot buffer isn't allocated when it is called, it will write to the live buffer, complaining about the lack of a snapshot buffer, and then stop tracing (giving you the "permanent snapshot"). tracing_snapshot_alloc() will allocate the snapshot buffer if it was not already allocated and then take the snapshot. This routine *may sleep*, and must be called from context that can sleep. The allocation is done with GFP_KERNEL and not atomic. If you need a snapshot in an atomic context, say in early boot, then it is best to call the tracing_snapshot_alloc() before then, where it will allocate the buffer, and then you can use the tracing_snapshot() anywhere you want and still get snapshots. Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-07 10:45:37 +08:00
void tracing_snapshot(void);
void tracing_snapshot_alloc(void);
extern void tracing_start(void);
extern void tracing_stop(void);
static inline __printf(1, 2)
void ____trace_printk_check_format(const char *fmt, ...)
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
{
}
#define __trace_printk_check_format(fmt, args...) \
do { \
if (0) \
____trace_printk_check_format(fmt, ##args); \
} while (0)
/**
* trace_printk - printf formatting in the ftrace buffer
* @fmt: the printf format for printing
*
* Note: __trace_printk is an internal function for trace_printk() and
* the @ip is passed in via the trace_printk() macro.
*
* This function allows a kernel developer to debug fast path sections
* that printk is not appropriate for. By scattering in various
* printk like tracing in the code, a developer can quickly see
* where problems are occurring.
*
* This is intended as a debugging tool for the developer only.
* Please refrain from leaving trace_printks scattered around in
* your code. (Extra memory is used for special buffers that are
* allocated when trace_printk() is used.)
tracing: Optimize trace_printk() with one arg to use trace_puts() Although trace_printk() is extremely fast, especially when it uses trace_bprintk() (writes args straight to buffer instead of inserting into string), it still has the overhead of calling one of the printf sprintf() functions, that need to scan the fmt string to determine what, if any args it has. This is a waste of precious CPU cycles if the printk format has no args but a single constant string. It is better to use trace_puts() which does not have the overhead of the fmt scanning. But wouldn't it be nice if the developer didn't have to think about such things, and the compile would just do it for them? trace_printk("this string has no args\n"); [...] trace_printk("this sting does %p %d\n", foo, bar); As tracing is critical to have the least amount of overhead, especially when dealing with race conditions, and you want to eliminate any "Heisenbugs", you want the trace_printk() to use the fastest possible means of tracing. Currently the macro magic determines if it will use trace_bprintk() or if the fmt is a dynamic string (a variable), it will fall back to the slow trace_printk() method that does a full snprintf() before copying it into the buffer, where as trace_bprintk() only copys the pointer to the fmt and the args into the buffer. Well, now there's a way to spend some more Hogwarts cash and come up with new fancy macro magic. #define trace_printk(fmt, ...) \ do { \ char _______STR[] = __stringify((__VA_ARGS__)); \ if (sizeof(_______STR) > 3) \ do_trace_printk(fmt, ##__VA_ARGS__); \ else \ trace_puts(fmt); \ } while (0) The above needs a bit of explaining (both here and in the comments). By stringifying the __VA_ARGS__, we can, at compile time, determine the number of args that are being passed to trace_printk(). The extra parenthesis are required, otherwise the compiler complains about too many parameters for __stringify if there is more than one arg. When there are no args, the __stringify((__VA_ARGS__)) converts into "()\0", a string of 3 characters. Anything else, will be a string containing more than 3 characters. Now we assign that string to a dynamic char array, and then take the sizeof() of that array. If it is greater than 3 characters, we know trace_printk() has args and we need to do the full "do_trace_printk()" on them, otherwise it was only passed a single arg and we can optimize to use trace_puts(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven "The King of Nasty Macros!" Rostedt <rostedt@goodmis.org>
2013-03-09 11:11:57 +08:00
*
* A little optimization trick is done here. If there's only one
tracing: Optimize trace_printk() with one arg to use trace_puts() Although trace_printk() is extremely fast, especially when it uses trace_bprintk() (writes args straight to buffer instead of inserting into string), it still has the overhead of calling one of the printf sprintf() functions, that need to scan the fmt string to determine what, if any args it has. This is a waste of precious CPU cycles if the printk format has no args but a single constant string. It is better to use trace_puts() which does not have the overhead of the fmt scanning. But wouldn't it be nice if the developer didn't have to think about such things, and the compile would just do it for them? trace_printk("this string has no args\n"); [...] trace_printk("this sting does %p %d\n", foo, bar); As tracing is critical to have the least amount of overhead, especially when dealing with race conditions, and you want to eliminate any "Heisenbugs", you want the trace_printk() to use the fastest possible means of tracing. Currently the macro magic determines if it will use trace_bprintk() or if the fmt is a dynamic string (a variable), it will fall back to the slow trace_printk() method that does a full snprintf() before copying it into the buffer, where as trace_bprintk() only copys the pointer to the fmt and the args into the buffer. Well, now there's a way to spend some more Hogwarts cash and come up with new fancy macro magic. #define trace_printk(fmt, ...) \ do { \ char _______STR[] = __stringify((__VA_ARGS__)); \ if (sizeof(_______STR) > 3) \ do_trace_printk(fmt, ##__VA_ARGS__); \ else \ trace_puts(fmt); \ } while (0) The above needs a bit of explaining (both here and in the comments). By stringifying the __VA_ARGS__, we can, at compile time, determine the number of args that are being passed to trace_printk(). The extra parenthesis are required, otherwise the compiler complains about too many parameters for __stringify if there is more than one arg. When there are no args, the __stringify((__VA_ARGS__)) converts into "()\0", a string of 3 characters. Anything else, will be a string containing more than 3 characters. Now we assign that string to a dynamic char array, and then take the sizeof() of that array. If it is greater than 3 characters, we know trace_printk() has args and we need to do the full "do_trace_printk()" on them, otherwise it was only passed a single arg and we can optimize to use trace_puts(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven "The King of Nasty Macros!" Rostedt <rostedt@goodmis.org>
2013-03-09 11:11:57 +08:00
* argument, there's no need to scan the string for printf formats.
* The trace_puts() will suffice. But how can we take advantage of
* using trace_puts() when trace_printk() has only one argument?
* By stringifying the args and checking the size we can tell
* whether or not there are args. __stringify((__VA_ARGS__)) will
* turn into "()\0" with a size of 3 when there are no args, anything
* else will be bigger. All we need to do is define a string to this,
* and then take its size and compare to 3. If it's bigger, use
* do_trace_printk() otherwise, optimize it to trace_puts(). Then just
* let gcc optimize the rest.
*/
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
tracing: Optimize trace_printk() with one arg to use trace_puts() Although trace_printk() is extremely fast, especially when it uses trace_bprintk() (writes args straight to buffer instead of inserting into string), it still has the overhead of calling one of the printf sprintf() functions, that need to scan the fmt string to determine what, if any args it has. This is a waste of precious CPU cycles if the printk format has no args but a single constant string. It is better to use trace_puts() which does not have the overhead of the fmt scanning. But wouldn't it be nice if the developer didn't have to think about such things, and the compile would just do it for them? trace_printk("this string has no args\n"); [...] trace_printk("this sting does %p %d\n", foo, bar); As tracing is critical to have the least amount of overhead, especially when dealing with race conditions, and you want to eliminate any "Heisenbugs", you want the trace_printk() to use the fastest possible means of tracing. Currently the macro magic determines if it will use trace_bprintk() or if the fmt is a dynamic string (a variable), it will fall back to the slow trace_printk() method that does a full snprintf() before copying it into the buffer, where as trace_bprintk() only copys the pointer to the fmt and the args into the buffer. Well, now there's a way to spend some more Hogwarts cash and come up with new fancy macro magic. #define trace_printk(fmt, ...) \ do { \ char _______STR[] = __stringify((__VA_ARGS__)); \ if (sizeof(_______STR) > 3) \ do_trace_printk(fmt, ##__VA_ARGS__); \ else \ trace_puts(fmt); \ } while (0) The above needs a bit of explaining (both here and in the comments). By stringifying the __VA_ARGS__, we can, at compile time, determine the number of args that are being passed to trace_printk(). The extra parenthesis are required, otherwise the compiler complains about too many parameters for __stringify if there is more than one arg. When there are no args, the __stringify((__VA_ARGS__)) converts into "()\0", a string of 3 characters. Anything else, will be a string containing more than 3 characters. Now we assign that string to a dynamic char array, and then take the sizeof() of that array. If it is greater than 3 characters, we know trace_printk() has args and we need to do the full "do_trace_printk()" on them, otherwise it was only passed a single arg and we can optimize to use trace_puts(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven "The King of Nasty Macros!" Rostedt <rostedt@goodmis.org>
2013-03-09 11:11:57 +08:00
#define trace_printk(fmt, ...) \
do { \
char _______STR[] = __stringify((__VA_ARGS__)); \
if (sizeof(_______STR) > 3) \
do_trace_printk(fmt, ##__VA_ARGS__); \
else \
trace_puts(fmt); \
} while (0)
#define do_trace_printk(fmt, args...) \
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
do { \
static const char *trace_printk_fmt __used \
__section("__trace_printk_fmt") = \
2011-09-23 02:01:55 +08:00
__builtin_constant_p(fmt) ? fmt : NULL; \
\
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
__trace_printk_check_format(fmt, ##args); \
\
2011-09-23 02:01:55 +08:00
if (__builtin_constant_p(fmt)) \
__trace_bprintk(_THIS_IP_, trace_printk_fmt, ##args); \
2011-09-23 02:01:55 +08:00
else \
__trace_printk(_THIS_IP_, fmt, ##args); \
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
} while (0)
extern __printf(2, 3)
int __trace_bprintk(unsigned long ip, const char *fmt, ...);
extern __printf(2, 3)
int __trace_printk(unsigned long ip, const char *fmt, ...);
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
/**
* trace_puts - write a string into the ftrace buffer
* @str: the string to record
*
* Note: __trace_bputs is an internal function for trace_puts and
* the @ip is passed in via the trace_puts macro.
*
* This is similar to trace_printk() but is made for those really fast
* paths that a developer wants the least amount of "Heisenbug" effects,
* where the processing of the print format is still too much.
*
* This function allows a kernel developer to debug fast path sections
* that printk is not appropriate for. By scattering in various
* printk like tracing in the code, a developer can quickly see
* where problems are occurring.
*
* This is intended as a debugging tool for the developer only.
* Please refrain from leaving trace_puts scattered around in
* your code. (Extra memory is used for special buffers that are
* allocated when trace_puts() is used.)
*
* Returns: 0 if nothing was written, positive # if string was.
* (1 when __trace_bputs is used, strlen(str) when __trace_puts is used)
*/
#define trace_puts(str) ({ \
static const char *trace_printk_fmt __used \
__section("__trace_printk_fmt") = \
__builtin_constant_p(str) ? str : NULL; \
\
if (__builtin_constant_p(str)) \
__trace_bputs(_THIS_IP_, trace_printk_fmt); \
else \
__trace_puts(_THIS_IP_, str, strlen(str)); \
})
extern int __trace_bputs(unsigned long ip, const char *str);
extern int __trace_puts(unsigned long ip, const char *str, int size);
extern void trace_dump_stack(int skip);
/*
* The double __builtin_constant_p is because gcc will give us an error
* if we try to allocate the static variable to fmt if it is not a
* constant. Even with the outer if statement.
*/
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
#define ftrace_vprintk(fmt, vargs) \
do { \
if (__builtin_constant_p(fmt)) { \
static const char *trace_printk_fmt __used \
__section("__trace_printk_fmt") = \
__builtin_constant_p(fmt) ? fmt : NULL; \
\
__ftrace_vbprintk(_THIS_IP_, trace_printk_fmt, vargs); \
} else \
__ftrace_vprintk(_THIS_IP_, fmt, vargs); \
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
} while (0)
extern __printf(2, 0) int
__ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
extern __printf(2, 0) int
__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
#else
static inline void tracing_start(void) { }
static inline void tracing_stop(void) { }
static inline void trace_dump_stack(int skip) { }
static inline void tracing_on(void) { }
static inline void tracing_off(void) { }
static inline int tracing_is_on(void) { return 0; }
tracing: Add internal tracing_snapshot() functions The new snapshot feature is quite handy. It's a way for the user to take advantage of the spare buffer that, until then, only the latency tracers used to "snapshot" the buffer when it hit a max latency. Now users can trigger a "snapshot" manually when some condition is hit in a program. But a snapshot currently can not be triggered by a condition inside the kernel. With the addition of tracing_snapshot() and tracing_snapshot_alloc(), snapshots can now be taking when a condition is hit, and the developer wants to snapshot the case without stopping the trace. Note, any snapshot will overwrite the old one, so take care in how this is done. These new functions are to be used like tracing_on(), tracing_off() and trace_printk() are. That is, they should never be called in the mainline Linux kernel. They are solely for the purpose of debugging. The tracing_snapshot() will not allocate a buffer, but it is safe to be called from any context (except NMIs). But if a snapshot buffer isn't allocated when it is called, it will write to the live buffer, complaining about the lack of a snapshot buffer, and then stop tracing (giving you the "permanent snapshot"). tracing_snapshot_alloc() will allocate the snapshot buffer if it was not already allocated and then take the snapshot. This routine *may sleep*, and must be called from context that can sleep. The allocation is done with GFP_KERNEL and not atomic. If you need a snapshot in an atomic context, say in early boot, then it is best to call the tracing_snapshot_alloc() before then, where it will allocate the buffer, and then you can use the tracing_snapshot() anywhere you want and still get snapshots. Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-07 10:45:37 +08:00
static inline void tracing_snapshot(void) { }
static inline void tracing_snapshot_alloc(void) { }
static inline __printf(1, 2)
int trace_printk(const char *fmt, ...)
{
return 0;
}
static __printf(1, 0) inline int
ftrace_vprintk(const char *fmt, va_list ap)
{
return 0;
}
static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-07 00:21:49 +08:00
#endif /* CONFIG_TRACING */
/* This counts to 12. Any more, it will return 13th argument. */
#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _n, X...) _n
#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
#define __CONCAT(a, b) a ## b
#define CONCATENATE(a, b) __CONCAT(a, b)
/**
* container_of - cast a member of a structure out to the containing structure
* @ptr: the pointer to the member.
* @type: the type of the container struct this is embedded in.
* @member: the name of the member within the struct.
*
*/
kernel.h: handle pointers to arrays better in container_of() If the first parameter of container_of() is a pointer to a non-const-qualified array type (and the third parameter names a non-const-qualified array member), the local variable __mptr will be defined with a const-qualified array type. In ISO C, these types are incompatible. They work as expected in GNU C, but some versions will issue warnings. For example, GCC 4.9 produces the warning "initialization from incompatible pointer type". Here is an example of where the problem occurs: ------------------------------------------------------- #include <linux/kernel.h> #include <linux/module.h> MODULE_LICENSE("GPL"); struct st { int a; char b[16]; }; static int __init example_init(void) { struct st t = { .a = 101, .b = "hello" }; char (*p)[16] = &t.b; struct st *x = container_of(p, struct st, b); printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x); return 0; } static void __exit example_exit(void) { } module_init(example_init); module_exit(example_exit); ------------------------------------------------------- Building the module with gcc-4.9 results in these warnings (where '{m}' is the module source and '{k}' is the kernel source): ------------------------------------------------------- In file included from {m}/example.c:1:0: {m}/example.c: In function `example_init': {k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ {m}/example.c:14:17: note: in expansion of macro `container_of' struct st *x = container_of(p, struct st, b); ^ {k}/include/linux/kernel.h:854:48: warning: (near initialization for `x') const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ {m}/example.c:14:17: note: in expansion of macro `container_of' struct st *x = container_of(p, struct st, b); ^ ------------------------------------------------------- Replace the type checking performed by the macro to avoid these warnings. Make sure `*(ptr)` either has type compatible with the member, or has type compatible with `void`, ignoring qualifiers. Raise compiler errors if this is not true. This is stronger than the previous behaviour, which only resulted in compiler warnings for a type mismatch. [arnd@arndb.de: fix new warnings for container_of()] Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.uk Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-13 05:33:04 +08:00
#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
!__same_type(*(ptr), void), \
"pointer type mismatch in container_of()"); \
((type *)(__mptr - offsetof(type, member))); })
/**
* container_of_safe - cast a member of a structure out to the containing structure
* @ptr: the pointer to the member.
* @type: the type of the container struct this is embedded in.
* @member: the name of the member within the struct.
*
* If IS_ERR_OR_NULL(ptr), ptr is returned unchanged.
*/
#define container_of_safe(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
!__same_type(*(ptr), void), \
"pointer type mismatch in container_of()"); \
IS_ERR_OR_NULL(__mptr) ? ERR_CAST(__mptr) : \
((type *)(__mptr - offsetof(type, member))); })
/* Rebuild everything on CONFIG_FTRACE_MCOUNT_RECORD */
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
# define REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
#endif
/* Permissions on a sysfs file: you didn't miss the 0 prefix did you? */
#define VERIFY_OCTAL_PERMISSIONS(perms) \
(BUILD_BUG_ON_ZERO((perms) < 0) + \
BUILD_BUG_ON_ZERO((perms) > 0777) + \
/* USER_READABLE >= GROUP_READABLE >= OTHER_READABLE */ \
BUILD_BUG_ON_ZERO((((perms) >> 6) & 4) < (((perms) >> 3) & 4)) + \
BUILD_BUG_ON_ZERO((((perms) >> 3) & 4) < ((perms) & 4)) + \
/* USER_WRITABLE >= GROUP_WRITABLE */ \
BUILD_BUG_ON_ZERO((((perms) >> 6) & 2) < (((perms) >> 3) & 2)) + \
/* OTHER_WRITABLE? Generally considered a bad idea. */ \
BUILD_BUG_ON_ZERO((perms) & 2) + \
(perms))
#endif