From 79ee842891595293be37c5aed0e75b4630166c5a Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 8 Dec 2017 11:56:08 +0900 Subject: [PATCH 01/27] sched/autogroup: remove unneeded kallsyms include Autogroup does not seem to use any of kallsyms functions/defines. Link: http://lkml.kernel.org/r/20171208025616.16267-2-sergey.senozhatsky@gmail.com To: Andrew Morton To: Michal Hocko To: Rafael Wysocki To: Len Brown To: Bjorn Helgaas To: Vlastimil Babka To: Tejun Heo To: Lai Jiangshan To: Thomas Gleixner To: Fengguang Wu Cc: Steven Rostedt Cc: LKML Cc: linux-pm@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Sergey Senozhatsky Acked-by: Peter Zijlstra (Intel) Signed-off-by: Petr Mladek --- kernel/sched/autogroup.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c index a43df5193538..0786227a3f48 100644 --- a/kernel/sched/autogroup.c +++ b/kernel/sched/autogroup.c @@ -3,7 +3,6 @@ #include #include -#include #include #include #include From 25493e5fba2f7b8cdade29d0fc8945114ee7732b Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 8 Dec 2017 17:24:22 +0900 Subject: [PATCH 02/27] sched/autogroup: move sched.h include Move local "sched.h" include to the bottom. sched.h defines several macros that are getting redefined in ARCH-specific code, for instance, finish_arch_post_lock_switch() and prepare_arch_switch(), so we need ARCH-specific definitions to come in first. Link: http://lkml.kernel.org/r/20171208082422.5021-1-sergey.senozhatsky@gmail.com To: Martin Schwidefsky Cc: Steven Rostedt Cc: LKML Cc: linux-pm@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Martin Schwidefsky Signed-off-by: Sergey Senozhatsky Acked-by: Peter Zijlstra (Intel) Signed-off-by: Petr Mladek --- kernel/sched/autogroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c index 0786227a3f48..bb4b9fe026a1 100644 --- a/kernel/sched/autogroup.c +++ b/kernel/sched/autogroup.c @@ -1,12 +1,12 @@ // SPDX-License-Identifier: GPL-2.0 -#include "sched.h" - #include #include #include #include #include +#include "sched.h" + unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1; static struct autogroup autogroup_default; static atomic_t autogroup_seq_nr; From cca10d58d25d271f05e1115132b4c2d913bb652e Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Thu, 21 Dec 2017 14:41:49 +0900 Subject: [PATCH 03/27] printk: add console_msg_format command line option 0day and kernelCI automatically parse kernel log - basically some sort of grepping using the pre-defined text patterns - in order to detect and report regressions/errors. There are several sources they get the kernel logs from: a) dmesg or /proc/ksmg This is the preferred way. Because `dmesg --raw' (see later Note) and /proc/kmsg output contains facility and log level, which greatly simplifies grepping for EMERG/ALERT/CRIT/ERR messages. b) serial consoles This option is harder to maintain, because serial console messages don't contain facility and log level. This patch introduces a `console_msg_format=' command line option, to switch between different message formatting on serial consoles. For the time being we have just two options - default and syslog. The "default" option just keeps the existing format. While the "syslog" option makes serial console messages to appear in syslog format [syslog() syscall], matching the `dmesg -S --raw' and `cat /proc/kmsg' output formats: - facility and log level - time stamp (depends on printk_time/PRINTK_TIME) - message <%u>[time stamp] text\n NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually "dmesg -S --raw" that always prints messages in syslog format [per Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog format sometimes. console_msg_format=syslog enables syslog format, thus in documentation we mention "dmesg -S --raw", not "dmesg --raw". Per Kevin Hilman: : Right now we can get this info from a "dmesg --raw" after bootup, : but it would be really nice in certain automation frameworks to : have a kernel command-line option to enable printing of loglevels : in default boot log. : : This is especially useful when ingesting kernel logs into advanced : search/analytics frameworks (I'm playing with and ELK stack: Elastic : Search, Logstash, Kibana). : : The other important reason for having this on the command line is that : for testing linux-next (and other bleeding edge developer branches), : it's common that we never make it to userspace, so can't even run : "dmesg --raw" (or equivalent.) So we really want this on the primary : boot (serial) console. Per Fengguang Wu, 0day scripts should quickly benefit from that feature, because they will be able to switch to a more reliable parsing, based on messages' facility and log levels [1]: `#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )' instead of doing text pattern matching `#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} | grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern | grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist` [1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com To: Steven Rostedt Cc: Linus Torvalds Cc: Fengguang Wu Cc: Kevin Hilman Cc: Mark Brown Cc: Greg Kroah-Hartman Cc: Andrew Morton Cc: LKML Signed-off-by: Sergey Senozhatsky Reviewed-by: Fengguang Wu Reviewed-by: Kevin Hilman Tested-by: Kevin Hilman Signed-off-by: Petr Mladek --- .../admin-guide/kernel-parameters.txt | 14 +++++++++++ kernel/printk/printk.c | 23 ++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 62436bd5f34a..af614995b71d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -640,6 +640,20 @@ console=brl,ttyS0 For now, only VisioBraille is supported. + console_msg_format= + [KNL] Change console messages format + default + By default we print messages on consoles in + "[time stamp] text\n" format (time stamp may not be + printed, depending on CONFIG_PRINTK_TIME or + `printk_time' param). + syslog + Switch to syslog format: "<%u>[time stamp] text\n" + IOW, each message will have a facility and loglevel + prefix. The format is similar to one used by syslog() + syscall, or to executing "dmesg -S --raw" or to reading + from /proc/kmsg. + consoleblank= [KNL] The console blank (screen saver) timeout in seconds. A value of 0 disables the blank timer. Defaults to 0. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 5d81206a572d..568729e0dc2c 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline); /* Flag: console code may call schedule() */ static int console_may_schedule; +enum con_msg_format_flags { + MSG_FORMAT_DEFAULT = 0, + MSG_FORMAT_SYSLOG = (1 << 0), +}; + +static int console_msg_format = MSG_FORMAT_DEFAULT; + /* * The printk log buffer consists of a chain of concatenated variable * length records. Every record starts with a record header, containing @@ -1913,6 +1920,17 @@ static int __add_preferred_console(char *name, int idx, char *options, c->index = idx; return 0; } + +static int __init console_msg_format_setup(char *str) +{ + if (!strcmp(str, "syslog")) + console_msg_format = MSG_FORMAT_SYSLOG; + if (!strcmp(str, "default")) + console_msg_format = MSG_FORMAT_DEFAULT; + return 1; +} +__setup("console_msg_format=", console_msg_format_setup); + /* * Set up a console. Called via do_early_param() in init/main.c * for each "console=" parameter in the boot command line. @@ -2215,7 +2233,10 @@ skip: goto skip; } - len += msg_print_text(msg, false, text + len, sizeof(text) - len); + len += msg_print_text(msg, + console_msg_format & MSG_FORMAT_SYSLOG, + text + len, + sizeof(text) - len); if (nr_ext_console_drivers) { ext_len = msg_print_ext_header(ext_text, sizeof(ext_text), From 3ea70d7ddbddef06222e75b1f94a3e5268812988 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:13 +0900 Subject: [PATCH 04/27] arm: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-2-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message, fixed complication warning] Signed-off-by: Petr Mladek --- arch/arm/kernel/process.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index d96714e1858c..1523cb18b109 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -21,7 +21,6 @@ #include #include #include -#include #include #include #include @@ -121,8 +120,8 @@ void __show_regs(struct pt_regs *regs) show_regs_print_info(KERN_DEFAULT); - print_symbol("PC is at %s\n", instruction_pointer(regs)); - print_symbol("LR is at %s\n", regs->ARM_lr); + printk("PC is at %pS\n", (void *)instruction_pointer(regs)); + printk("LR is at %pS\n", (void *)regs->ARM_lr); printk("pc : [<%08lx>] lr : [<%08lx>] psr: %08lx\n", regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr); printk("sp : %08lx ip : %08lx fp : %08lx\n", From 4ef7963843d3243260aa335dfb9cb2fede06aacf Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:14 +0900 Subject: [PATCH 05/27] arm64: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-3-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: Catalin Marinas Cc: Will Deacon Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- arch/arm64/kernel/process.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index b2adcce7bc18..37c9470cbdaa 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -35,7 +35,6 @@ #include #include #include -#include #include #include #include @@ -221,8 +220,8 @@ void __show_regs(struct pt_regs *regs) show_regs_print_info(KERN_DEFAULT); print_pstate(regs); - print_symbol("pc : %s\n", regs->pc); - print_symbol("lr : %s\n", lr); + printk("pc : %pS\n", (void *)regs->pc); + printk("lr : %pS\n", (void *)lr); printk("sp : %016llx\n", sp); i = top_reg; From 4717fc192ae92976e2707fa4f8129e3ac592fe51 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:15 +0900 Subject: [PATCH 06/27] c6x: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call and avoid using continuous lines. Link: http://lkml.kernel.org/r/20171211125025.2270-4-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: Aurelien Jacquiot Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- arch/c6x/kernel/traps.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/c6x/kernel/traps.c b/arch/c6x/kernel/traps.c index 09b8a40d5680..4c1d4b84dd2b 100644 --- a/arch/c6x/kernel/traps.c +++ b/arch/c6x/kernel/traps.c @@ -11,7 +11,6 @@ #include #include #include -#include #include #include @@ -375,8 +374,7 @@ static void show_trace(unsigned long *stack, unsigned long *endstack) if (i % 5 == 0) pr_debug("\n "); #endif - pr_debug(" [<%08lx>]", addr); - print_symbol(" %s\n", addr); + pr_debug(" [<%08lx>] %pS\n", addr, (void *)addr); i++; } } From 253c962020636252bdc4b9b682035ffe4a7eaa2e Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:16 +0900 Subject: [PATCH 07/27] ia64: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call and avoid using continuous lines. Link: http://lkml.kernel.org/r/20171211125025.2270-5-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- arch/ia64/kernel/process.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c index dda0082056b3..968b5f33e725 100644 --- a/arch/ia64/kernel/process.c +++ b/arch/ia64/kernel/process.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include @@ -69,7 +68,6 @@ void ia64_do_show_stack (struct unw_frame_info *info, void *arg) { unsigned long ip, sp, bsp; - char buf[128]; /* don't make it so big that it overflows the stack! */ printk("\nCall Trace:\n"); do { @@ -79,11 +77,9 @@ ia64_do_show_stack (struct unw_frame_info *info, void *arg) unw_get_sp(info, &sp); unw_get_bsp(info, &bsp); - snprintf(buf, sizeof(buf), - " [<%016lx>] %%s\n" + printk(" [<%016lx>] %pS\n" " sp=%016lx bsp=%016lx\n", - ip, sp, bsp); - print_symbol(buf, ip); + ip, (void *)ip, sp, bsp); } while (unw_unwind(info) >= 0); } @@ -111,7 +107,7 @@ show_regs (struct pt_regs *regs) printk("psr : %016lx ifs : %016lx ip : [<%016lx>] %s (%s)\n", regs->cr_ipsr, regs->cr_ifs, ip, print_tainted(), init_utsname()->release); - print_symbol("ip is at %s\n", ip); + printk("ip is at %pS\n", (void *)ip); printk("unat: %016lx pfs : %016lx rsc : %016lx\n", regs->ar_unat, regs->ar_pfs, regs->ar_rsc); printk("rnat: %016lx bsps: %016lx pr : %016lx\n", From 67580c9fc682bb6e0738a31089e7cc33982484ec Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:17 +0900 Subject: [PATCH 08/27] mn10300: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-6-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- arch/mn10300/kernel/traps.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/mn10300/kernel/traps.c b/arch/mn10300/kernel/traps.c index 800fd0801969..72d1015b2ae7 100644 --- a/arch/mn10300/kernel/traps.c +++ b/arch/mn10300/kernel/traps.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include #include @@ -262,8 +261,7 @@ void show_trace(unsigned long *sp) raslot = ULONG_MAX; else printk(" ?"); - print_symbol(" %s", addr); - printk("\n"); + printk(" %pS\n", (void *)addr); } } From 9e6a42f039a97ee7aef42f4ea93873b534c6d3cf Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:18 +0900 Subject: [PATCH 09/27] sh: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-7-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: Rich Felker Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message, fixed complication warning] Signed-off-by: Petr Mladek --- arch/sh/kernel/process_32.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c index 2c7bdf8cb934..93522069cb15 100644 --- a/arch/sh/kernel/process_32.c +++ b/arch/sh/kernel/process_32.c @@ -20,7 +20,6 @@ #include #include #include -#include #include #include #include @@ -37,8 +36,8 @@ void show_regs(struct pt_regs * regs) printk("\n"); show_regs_print_info(KERN_DEFAULT); - print_symbol("PC is at %s\n", instruction_pointer(regs)); - print_symbol("PR is at %s\n", regs->pr); + printk("PC is at %pS\n", (void *)instruction_pointer(regs)); + printk("PR is at %pS\n", (void *)regs->pr); printk("PC : %08lx SP : %08lx SR : %08lx ", regs->pc, regs->regs[15], regs->sr); From cb380025d4694130b47a8e441ebdc45d93c71cc2 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:19 +0900 Subject: [PATCH 10/27] unicore32: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-8-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message, fixed complication warning] Signed-off-by: Petr Mladek --- arch/unicore32/kernel/process.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c index ddaf78ae6854..2bc10b8e9cf4 100644 --- a/arch/unicore32/kernel/process.c +++ b/arch/unicore32/kernel/process.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -139,8 +138,8 @@ void __show_regs(struct pt_regs *regs) char buf[64]; show_regs_print_info(KERN_DEFAULT); - print_symbol("PC is at %s\n", instruction_pointer(regs)); - print_symbol("LR is at %s\n", regs->UCreg_lr); + printk("PC is at %pS\n", (void *)instruction_pointer(regs)); + printk("LR is at %pS\n", (void *)regs->UCreg_lr); printk(KERN_DEFAULT "pc : [<%08lx>] lr : [<%08lx>] psr: %08lx\n" "sp : %08lx ip : %08lx fp : %08lx\n", regs->UCreg_pc, regs->UCreg_lr, regs->UCreg_asr, From 7b6061627eb82721f224eb8f971b0c50cf585886 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:20 +0900 Subject: [PATCH 11/27] x86: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call and correctly handle continuous lines. Link: http://lkml.kernel.org/r/20171211125025.2270-9-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky Acked-by: Borislav Petkov # mce.c part [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- arch/x86/kernel/cpu/mcheck/mce.c | 3 +-- arch/x86/mm/mmio-mod.c | 5 ++--- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index b1d616d08eee..8ca8f6eb32db 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -14,7 +14,6 @@ #include #include #include -#include #include #include #include @@ -235,7 +234,7 @@ static void __print_mce(struct mce *m) m->cs, m->ip); if (m->cs == __KERNEL_CS) - print_symbol("{%s}", m->ip); + pr_cont("{%pS}", (void *)m->ip); pr_cont("\n"); } diff --git a/arch/x86/mm/mmio-mod.c b/arch/x86/mm/mmio-mod.c index 4d434ddb75db..2c1ecf4763c4 100644 --- a/arch/x86/mm/mmio-mod.c +++ b/arch/x86/mm/mmio-mod.c @@ -29,7 +29,6 @@ #include #include #include -#include #include #include #include /* for ISA_START_ADDRESS */ @@ -123,8 +122,8 @@ static void die_kmmio_nesting_error(struct pt_regs *regs, unsigned long addr) pr_emerg("unexpected fault for address: 0x%08lx, last fault for address: 0x%08lx\n", addr, my_reason->addr); print_pte(addr); - print_symbol(KERN_EMERG "faulting IP is at %s\n", regs->ip); - print_symbol(KERN_EMERG "last faulting IP was at %s\n", my_reason->ip); + pr_emerg("faulting IP is at %pS\n", (void *)regs->ip); + pr_emerg("last faulting IP was at %pS\n", (void *)my_reason->ip); #ifdef __i386__ pr_emerg("eax: %08lx ebx: %08lx ecx: %08lx edx: %08lx\n", regs->ax, regs->bx, regs->cx, regs->dx); From a52668c6748700756545e43d0cd06ec83d7667e0 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:21 +0900 Subject: [PATCH 12/27] drivers: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-10-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- drivers/base/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 110230d86527..409298942d5a 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include #include @@ -687,8 +686,8 @@ static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, if (dev_attr->show) ret = dev_attr->show(dev, dev_attr, buf); if (ret >= (ssize_t)PAGE_SIZE) { - print_symbol("dev_attr_show: %s returned bad count\n", - (unsigned long)dev_attr->show); + printk("dev_attr_show: %pS returned bad count\n", + dev_attr->show); } return ret; } From 9e6d35ff0a8fa9c442cbde9afe88babf5e8a2f1a Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:22 +0900 Subject: [PATCH 13/27] sysfs: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-11-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- fs/sysfs/file.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 39c75a86c67f..bfcbe486d385 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -12,7 +12,6 @@ #include #include -#include #include #include #include @@ -70,8 +69,8 @@ static int sysfs_kf_seq_show(struct seq_file *sf, void *v) * indicate truncated result or overflow in normal use cases. */ if (count >= (ssize_t)PAGE_SIZE) { - print_symbol("fill_read_buffer: %s returned bad count\n", - (unsigned long)ops->show); + printk("fill_read_buffer: %pS returned bad count\n", + ops->show); /* Try to struggle along */ count = PAGE_SIZE - 1; } From 3b14f08d169400a5a635b299a273db23d2be8e49 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 12 Dec 2017 16:34:53 +0900 Subject: [PATCH 14/27] irq debug: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call and avoid using continuous lines. Link: http://lkml.kernel.org/r/20171212073453.21455-1-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu To: David Laight Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- kernel/irq/debug.h | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/kernel/irq/debug.h b/kernel/irq/debug.h index 17f05ef8f575..7e06dd275c17 100644 --- a/kernel/irq/debug.h +++ b/kernel/irq/debug.h @@ -3,8 +3,6 @@ * Debugging printout: */ -#include - #define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f) #define ___PS(f) if (desc->istate & f) printk("%14s set\n", #f) /* FIXME */ @@ -14,14 +12,14 @@ static inline void print_irq_desc(unsigned int irq, struct irq_desc *desc) { printk("irq %d, desc: %p, depth: %d, count: %d, unhandled: %d\n", irq, desc, desc->depth, desc->irq_count, desc->irqs_unhandled); - printk("->handle_irq(): %p, ", desc->handle_irq); - print_symbol("%s\n", (unsigned long)desc->handle_irq); - printk("->irq_data.chip(): %p, ", desc->irq_data.chip); - print_symbol("%s\n", (unsigned long)desc->irq_data.chip); + printk("->handle_irq(): %p, %pS\n", + desc->handle_irq, desc->handle_irq); + printk("->irq_data.chip(): %p, %pS\n", + desc->irq_data.chip, desc->irq_data.chip); printk("->action(): %p\n", desc->action); if (desc->action) { - printk("->action->handler(): %p, ", desc->action->handler); - print_symbol("%s\n", (unsigned long)desc->action->handler); + printk("->action->handler(): %p, %pS\n", + desc->action->handler, desc->action->handler); } ___P(IRQ_LEVEL); From d202d47b5e62cbd54d2cfffbe01471a7690dc652 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Mon, 11 Dec 2017 21:50:24 +0900 Subject: [PATCH 15/27] lib: do not use print_symbol() print_symbol() is a very old API that has been obsoleted by %pS format specifier in a normal printk() call. Replace print_symbol() with a direct printk("%pS") call. Link: http://lkml.kernel.org/r/20171211125025.2270-13-sergey.senozhatsky@gmail.com To: Andrew Morton To: Russell King To: Catalin Marinas To: Mark Salter To: Tony Luck To: David Howells To: Yoshinori Sato To: Guan Xuetao To: Borislav Petkov To: Greg Kroah-Hartman To: Thomas Gleixner To: Peter Zijlstra To: Vineet Gupta To: Fengguang Wu Cc: Steven Rostedt Cc: Petr Mladek Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Sergey Senozhatsky [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- lib/smp_processor_id.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c index 835cc6df2776..85925aaa4fff 100644 --- a/lib/smp_processor_id.c +++ b/lib/smp_processor_id.c @@ -5,7 +5,6 @@ * DEBUG_PREEMPT variant of smp_processor_id(). */ #include -#include #include notrace static unsigned int check_preemption_disabled(const char *what1, @@ -43,7 +42,7 @@ notrace static unsigned int check_preemption_disabled(const char *what1, printk(KERN_ERR "BUG: using %s%s() in preemptible [%08x] code: %s/%d\n", what1, what2, preempt_count() - 1, current->comm, current->pid); - print_symbol("caller is %s\n", (long)__builtin_return_address(0)); + printk("caller is %pS\n", __builtin_return_address(0)); dump_stack(); out_enable: From ce666d917bc07469022f3bf713b3520b344995ae Mon Sep 17 00:00:00 2001 From: Joel Stanley Date: Wed, 13 Dec 2017 14:12:12 +1030 Subject: [PATCH 16/27] openrisc: Fix conflicting types for _exext and _stext MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The printk tree in linux-next has a patch "symbol lookup: introduce dereference_symbol_descriptor()" that includes sections.h in kallsyms.h, so arch/openrisc/kernel/traps.c gets a second extern definition for _etext and _stext. Remove the local definitions and include sections.h directly in preparation for the kallsyms.h change. This fixes the following (future) build error: CC arch/openrisc/kernel/traps.o arch/openrisc/kernel/traps.c:43:13: error: conflicting types for ‘_etext’ extern char _etext, _stext; ^ In file included from ./arch/openrisc/include/generated/asm/sections.h:1:0, from ./include/linux/kallsyms.h:15, from arch/openrisc/kernel/traps.c:35: ./include/asm-generic/sections.h:35:32: note: previous declaration of ‘_etext’ was here extern char _text[], _stext[], _etext[]; ^ Signed-off-by: Joel Stanley Reviewed-by: Sergey Senozhatsky Acked-by: Stafford Horne Signed-off-by: Petr Mladek --- arch/openrisc/kernel/traps.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c index 4085d72fa5ae..5fb6d512502d 100644 --- a/arch/openrisc/kernel/traps.c +++ b/arch/openrisc/kernel/traps.c @@ -39,8 +39,7 @@ #include #include #include - -extern char _etext, _stext; +#include int kstack_depth_to_print = 0x180; int lwa_flag; From b865ea64304ed591b7ab92d74efb12eff5ff4cbb Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 10 Nov 2017 08:48:25 +0900 Subject: [PATCH 17/27] sections: split dereference_function_descriptor() There are two format specifiers to print out a pointer in symbolic format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two mean exactly the same thing, but some architectures (ia64, ppc64, parisc64) use an indirect pointer for C function pointers, where the function pointer points to a function descriptor (which in turn contains the actual pointer to the code). The '%pF/%pf, when used appropriately, automatically does the appropriate function descriptor dereference on such architectures. The "when used appropriately" part is tricky. Basically this is a subtle ABI detail, specific to some platforms, that made it to the API level and people can be unaware of it and miss the whole "we need to dereference the function" business out. [1] proves that point (note that it fixes only '%pF' and '%pS', there might be '%pf' and '%ps' cases as well). It appears that we can handle everything within the affected arches and make '%pS/%ps' smart enough to retire '%pF/%pf'. Function descriptors live in .opd elf section and all affected arches (ia64, ppc64, parisc64) handle it properly for kernel and modules. So we, technically, can decide if the dereference is needed by simply looking at the pointer: if it belongs to .opd section then we need to dereference it. The kernel and modules have their own .opd sections, obviously, that's why we need to split dereference_function_descriptor() and use separate kernel and module dereference arch callbacks. This patch does the first step, it a) adds dereference_kernel_function_descriptor() function. b) adds a weak alias to dereference_module_function_descriptor() function. So, for the time being, we will have: 1) dereference_function_descriptor() A generic function, that simply dereferences the pointer. There is bunch of places that call it: kgdbts, init/main.c, extable, etc. 2) dereference_kernel_function_descriptor() A function to call on kernel symbols that does kernel .opd section address range test. 3) dereference_module_function_descriptor() A function to call on modules' symbols that does modules' .opd section address range test. [1] https://marc.info/?l=linux-kernel&m=150472969730573 Link: http://lkml.kernel.org/r/20171109234830.5067-2-sergey.senozhatsky@gmail.com To: Fenghua Yu To: Benjamin Herrenschmidt To: Paul Mackerras To: Michael Ellerman To: James Bottomley Cc: Andrew Morton Cc: Jessica Yu Cc: Steven Rostedt Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky Tested-by: Tony Luck #ia64 Tested-by: Santosh Sivaraj #powerpc Tested-by: Helge Deller #parisc64 Signed-off-by: Petr Mladek --- include/asm-generic/sections.h | 8 ++++++-- include/linux/module.h | 10 ++++++++++ kernel/module.c | 6 ++++++ 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index 03cc5f9bba71..849cd8eb5ca0 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -30,6 +30,7 @@ * __ctors_start, __ctors_end * __irqentry_text_start, __irqentry_text_end * __softirqentry_text_start, __softirqentry_text_end + * __start_opd, __end_opd */ extern char _text[], _stext[], _etext[]; extern char _data[], _sdata[], _edata[]; @@ -49,12 +50,15 @@ extern char __start_once[], __end_once[]; /* Start and end of .ctors section - used for constructor calls. */ extern char __ctors_start[], __ctors_end[]; +/* Start and end of .opd section - used for function descriptors. */ +extern char __start_opd[], __end_opd[]; + extern __visible const void __nosave_begin, __nosave_end; -/* function descriptor handling (if any). Override - * in asm/sections.h */ +/* Function descriptor handling (if any). Override in asm/sections.h */ #ifndef dereference_function_descriptor #define dereference_function_descriptor(p) (p) +#define dereference_kernel_function_descriptor(p) (p) #endif /* random extra sections (if any). Override diff --git a/include/linux/module.h b/include/linux/module.h index c69b49abe877..e6249795f9e2 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -606,6 +606,9 @@ int ref_module(struct module *a, struct module *b); __mod ? __mod->name : "kernel"; \ }) +/* Dereference module function descriptor */ +void *dereference_module_function_descriptor(struct module *mod, void *ptr); + /* For kallsyms to ask for address resolution. namebuf should be at * least KSYM_NAME_LEN long: a pointer to namebuf is returned if * found, otherwise NULL. */ @@ -760,6 +763,13 @@ static inline bool is_module_sig_enforced(void) return false; } +/* Dereference module function descriptor */ +static inline +void *dereference_module_function_descriptor(struct module *mod, void *ptr) +{ + return ptr; +} + #endif /* CONFIG_MODULES */ #ifdef CONFIG_SYSFS diff --git a/kernel/module.c b/kernel/module.c index f0411a271765..65f6561d70e1 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3938,6 +3938,12 @@ static const char *get_ksymbol(struct module *mod, return symname(kallsyms, best); } +void * __weak dereference_module_function_descriptor(struct module *mod, + void *ptr) +{ + return ptr; +} + /* For kallsyms to ask for address resolution. NULL means not found. Careful * not to lock to avoid deadlock on oopses, simply disable preemption. */ const char *module_address_lookup(unsigned long addr, From 8e30788816d4d4a991136cfe9ba8715d62f31d80 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 10 Nov 2017 08:48:26 +0900 Subject: [PATCH 18/27] ia64: Add .opd based function descriptor dereference We are moving towards separate kernel and module function descriptor dereference callbacks. This patch enables it for IA64. For pointers that belong to the kernel - Added __start_opd and __end_opd pointers, to track the kernel .opd section address range; - Added dereference_kernel_function_descriptor(). Now we will dereference only function pointers that are within [__start_opd, __end_opd); For pointers that belong to a module - Added dereference_module_function_descriptor() to handle module function descriptor dereference. Now we will dereference only pointers that are within [module->opd.start, module->opd.end). Link: http://lkml.kernel.org/r/20171109234830.5067-3-sergey.senozhatsky@gmail.com To: Fenghua Yu To: Helge Deller To: Benjamin Herrenschmidt To: Paul Mackerras To: Michael Ellerman To: James Bottomley Cc: Andrew Morton Cc: Jessica Yu Cc: Steven Rostedt Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky Tested-by: Tony Luck #ia64 Signed-off-by: Petr Mladek --- arch/ia64/include/asm/sections.h | 10 +++++++++- arch/ia64/kernel/module.c | 12 ++++++++++++ arch/ia64/kernel/vmlinux.lds.S | 2 ++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h index f3481408594e..cea15f2dd38d 100644 --- a/arch/ia64/include/asm/sections.h +++ b/arch/ia64/include/asm/sections.h @@ -27,6 +27,8 @@ extern char __start_gate_brl_fsys_bubble_down_patchlist[], __end_gate_brl_fsys_b extern char __start_unwind[], __end_unwind[]; extern char __start_ivt_text[], __end_ivt_text[]; +#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1 + #undef dereference_function_descriptor static inline void *dereference_function_descriptor(void *ptr) { @@ -38,6 +40,12 @@ static inline void *dereference_function_descriptor(void *ptr) return ptr; } +#undef dereference_kernel_function_descriptor +static inline void *dereference_kernel_function_descriptor(void *ptr) +{ + if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd) + return ptr; + return dereference_function_descriptor(ptr); +} #endif /* _ASM_IA64_SECTIONS_H */ - diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c index 853b5611a894..326448f9df16 100644 --- a/arch/ia64/kernel/module.c +++ b/arch/ia64/kernel/module.c @@ -36,6 +36,7 @@ #include #include +#include #define ARCH_MODULE_DEBUG 0 @@ -918,3 +919,14 @@ module_arch_cleanup (struct module *mod) if (mod->arch.core_unw_table) unw_remove_unwind_table(mod->arch.core_unw_table); } + +void *dereference_module_function_descriptor(struct module *mod, void *ptr) +{ + Elf64_Shdr *opd = mod->arch.opd; + + if (ptr < (void *)opd->sh_addr || + ptr >= (void *)(opd->sh_addr + opd->sh_size)) + return ptr; + + return dereference_function_descriptor(ptr); +} diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S index 58db59da0bd8..31e688981b4b 100644 --- a/arch/ia64/kernel/vmlinux.lds.S +++ b/arch/ia64/kernel/vmlinux.lds.S @@ -108,7 +108,9 @@ SECTIONS { RODATA .opd : AT(ADDR(.opd) - LOAD_OFFSET) { + __start_opd = .; *(.opd) + __end_opd = .; } /* From 5633e85b2c3133051d8201b586ba195f1733096b Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 10 Nov 2017 08:48:27 +0900 Subject: [PATCH 19/27] powerpc64: Add .opd based function descriptor dereference We are moving towards separate kernel and module function descriptor dereference callbacks. This patch enables it for powerpc64. For pointers that belong to the kernel - Added __start_opd and __end_opd pointers, to track the kernel .opd section address range; - Added dereference_kernel_function_descriptor(). Now we will dereference only function pointers that are within [__start_opd, __end_opd); For pointers that belong to a module - Added dereference_module_function_descriptor() to handle module function descriptor dereference. Now we will dereference only pointers that are within [module->opd.start, module->opd.end). Link: http://lkml.kernel.org/r/20171109234830.5067-4-sergey.senozhatsky@gmail.com To: Tony Luck To: Fenghua Yu To: Helge Deller To: Benjamin Herrenschmidt To: Paul Mackerras To: Michael Ellerman To: James Bottomley Cc: Andrew Morton Cc: Jessica Yu Cc: Petr Mladek Cc: Steven Rostedt Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Cc: Sergey Senozhatsky Cc: Sergey Senozhatsky Signed-off-by: Sergey Senozhatsky Tested-by: Santosh Sivaraj #powerpc Signed-off-by: Petr Mladek --- arch/powerpc/include/asm/module.h | 3 +++ arch/powerpc/include/asm/sections.h | 12 ++++++++++++ arch/powerpc/kernel/module_64.c | 14 ++++++++++++++ arch/powerpc/kernel/vmlinux.lds.S | 2 ++ 4 files changed, 31 insertions(+) diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index 6c0132c7212f..7e28442827f1 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -45,6 +45,9 @@ struct mod_arch_specific { unsigned long tramp; #endif + /* For module function descriptor dereference */ + unsigned long start_opd; + unsigned long end_opd; #else /* powerpc64 */ /* Indices of PLT sections within module. */ unsigned int core_plt_section; diff --git a/arch/powerpc/include/asm/sections.h b/arch/powerpc/include/asm/sections.h index 82bec63bbd4f..e335a8f846af 100644 --- a/arch/powerpc/include/asm/sections.h +++ b/arch/powerpc/include/asm/sections.h @@ -66,6 +66,9 @@ static inline int overlaps_kvm_tmp(unsigned long start, unsigned long end) } #ifdef PPC64_ELF_ABI_v1 + +#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1 + #undef dereference_function_descriptor static inline void *dereference_function_descriptor(void *ptr) { @@ -76,6 +79,15 @@ static inline void *dereference_function_descriptor(void *ptr) ptr = p; return ptr; } + +#undef dereference_kernel_function_descriptor +static inline void *dereference_kernel_function_descriptor(void *ptr) +{ + if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd) + return ptr; + + return dereference_function_descriptor(ptr); +} #endif /* PPC64_ELF_ABI_v1 */ #endif diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 759104b99f9f..218971ac7e04 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -93,6 +93,15 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym) { return 0; } + +void *dereference_module_function_descriptor(struct module *mod, void *ptr) +{ + if (ptr < (void *)mod->arch.start_opd || + ptr >= (void *)mod->arch.end_opd) + return ptr; + + return dereference_function_descriptor(ptr); +} #endif #define STUB_MAGIC 0x73747562 /* stub */ @@ -344,6 +353,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr, else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0) dedotify_versions((void *)hdr + sechdrs[i].sh_offset, sechdrs[i].sh_size); + else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) { + me->arch.start_opd = sechdrs[i].sh_addr; + me->arch.end_opd = sechdrs[i].sh_addr + + sechdrs[i].sh_size; + } /* We don't handle .init for the moment: rename to _init */ while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init"))) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 0494e1566ee2..5dac5ab22fa2 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -278,7 +278,9 @@ SECTIONS } .opd : AT(ADDR(.opd) - LOAD_OFFSET) { + __start_opd = .; *(.opd) + __end_opd = .; } . = ALIGN(256); From 1705bd6a68171d8b3ba05df9352757ee4e49317b Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 10 Nov 2017 08:48:28 +0900 Subject: [PATCH 20/27] parisc64: Add .opd based function descriptor dereference We are moving towards separate kernel and module function descriptor dereference callbacks. This patch enables it for parisc64. For pointers that belong to the kernel - Added __start_opd and __end_opd pointers, to track the kernel .opd section address range; - Added dereference_kernel_function_descriptor(). Now we will dereference only function pointers that are within [__start_opd, __end_opd); For pointers that belong to a module - Added dereference_module_function_descriptor() to handle module function descriptor dereference. Now we will dereference only pointers that are within [module->opd.start, module->opd.end). Link: http://lkml.kernel.org/r/20171109234830.5067-5-sergey.senozhatsky@gmail.com To: Tony Luck To: Fenghua Yu To: Helge Deller To: Benjamin Herrenschmidt To: Paul Mackerras To: Michael Ellerman To: James Bottomley Cc: Andrew Morton Cc: Jessica Yu Cc: Steven Rostedt Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky Tested-by: Helge Deller #parisc64 Signed-off-by: Petr Mladek --- arch/parisc/boot/compressed/vmlinux.lds.S | 2 ++ arch/parisc/include/asm/sections.h | 6 ++++++ arch/parisc/kernel/module.c | 16 ++++++++++++++++ arch/parisc/kernel/process.c | 9 +++++++++ arch/parisc/kernel/vmlinux.lds.S | 2 ++ 5 files changed, 35 insertions(+) diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S b/arch/parisc/boot/compressed/vmlinux.lds.S index a4ce3314e78e..4ebd4e65524c 100644 --- a/arch/parisc/boot/compressed/vmlinux.lds.S +++ b/arch/parisc/boot/compressed/vmlinux.lds.S @@ -29,7 +29,9 @@ SECTIONS . = ALIGN(16); /* Linkage tables */ .opd : { + __start_opd = .; *(.opd) + __end_opd = .; } PROVIDE (__gp = .); .plt : { *(.plt) diff --git a/arch/parisc/include/asm/sections.h b/arch/parisc/include/asm/sections.h index accdf40aa5b7..5a40b51df80c 100644 --- a/arch/parisc/include/asm/sections.h +++ b/arch/parisc/include/asm/sections.h @@ -6,8 +6,14 @@ #include #ifdef CONFIG_64BIT + +#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1 + #undef dereference_function_descriptor void *dereference_function_descriptor(void *); + +#undef dereference_kernel_function_descriptor +void *dereference_kernel_function_descriptor(void *); #endif #endif diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c index f1a76935a314..b5b3cb00f1fb 100644 --- a/arch/parisc/kernel/module.c +++ b/arch/parisc/kernel/module.c @@ -66,6 +66,7 @@ #include #include +#include #if 0 #define DEBUGP printk @@ -954,3 +955,18 @@ void module_arch_cleanup(struct module *mod) { deregister_unwind_table(mod); } + +#ifdef CONFIG_64BIT +void *dereference_module_function_descriptor(struct module *mod, void *ptr) +{ + unsigned long start_opd = (Elf64_Addr)mod->core_layout.base + + mod->arch.fdesc_offset; + unsigned long end_opd = start_opd + + mod->arch.fdesc_count * sizeof(Elf64_Fdesc); + + if (ptr < (void *)start_opd || ptr >= (void *)end_opd) + return ptr; + + return dereference_function_descriptor(ptr); +} +#endif diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c index 30f92391a93e..6c4585103a91 100644 --- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -276,6 +276,15 @@ void *dereference_function_descriptor(void *ptr) ptr = p; return ptr; } + +void *dereference_kernel_function_descriptor(void *ptr) +{ + if (ptr < (void *)__start_opd || + ptr >= (void *)__end_opd) + return ptr; + + return dereference_function_descriptor(ptr); +} #endif static inline unsigned long brk_rnd(void) diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S index 159a2ec0b4e0..da2e31190efa 100644 --- a/arch/parisc/kernel/vmlinux.lds.S +++ b/arch/parisc/kernel/vmlinux.lds.S @@ -100,7 +100,9 @@ SECTIONS . = ALIGN(16); /* Linkage tables */ .opd : { + __start_opd = .; *(.opd) + __end_opd = .; } PROVIDE (__gp = .); .plt : { *(.plt) From 04b8eb7a4ccd9ef9343e2720ccf2a5db8cfe2f67 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 6 Dec 2017 13:36:49 +0900 Subject: [PATCH 21/27] symbol lookup: introduce dereference_symbol_descriptor() dereference_symbol_descriptor() invokes appropriate ARCH specific function descriptor dereference callbacks: - dereference_kernel_function_descriptor() if the pointer is a kernel symbol; - dereference_module_function_descriptor() if the pointer is a module symbol. This is the last step needed to make '%pS/%ps' smart enough to handle function descriptor dereference on affected ARCHs and to retire '%pF/%pf'. To refresh it: Some architectures (ia64, ppc64, parisc64) use an indirect pointer for C function pointers - the function pointer points to a function descriptor and we need to dereference it to get the actual function pointer. Function descriptors live in .opd elf section and all affected ARCHs (ia64, ppc64, parisc64) handle it properly for kernel and modules. So we, technically, can decide if the dereference is needed by simply looking at the pointer: if it belongs to .opd section then we need to dereference it. The kernel and modules have their own .opd sections, obviously, that's why we need to split dereference_function_descriptor() and use separate kernel and module dereference arch callbacks. Link: http://lkml.kernel.org/r/20171206043649.GB15885@jagdpanzerIV Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: James Bottomley Cc: Andrew Morton Cc: Jessica Yu Cc: Steven Rostedt Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky Tested-by: Tony Luck #ia64 Tested-by: Santosh Sivaraj #powerpc Tested-by: Helge Deller #parisc64 Signed-off-by: Petr Mladek --- Documentation/printk-formats.txt | 35 +++++++-------------- include/linux/kallsyms.h | 54 ++++++++++++++++++++++++++++++++ kernel/kallsyms.c | 35 --------------------- lib/vsprintf.c | 5 ++- 4 files changed, 68 insertions(+), 61 deletions(-) diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 361789df51ec..58c44cce90b6 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -50,42 +50,31 @@ Symbols/Function Pointers :: + %pS versatile_init+0x0/0x110 + %ps versatile_init %pF versatile_init+0x0/0x110 %pf versatile_init - %pS versatile_init+0x0/0x110 %pSR versatile_init+0x9/0x110 (with __builtin_extract_return_addr() translation) - %ps versatile_init %pB prev_fn_of_versatile_init+0x88/0x88 -The ``F`` and ``f`` specifiers are for printing function pointers, -for example, f->func, &gettimeofday. They have the same result as -``S`` and ``s`` specifiers. But they do an extra conversion on -ia64, ppc64 and parisc64 architectures where the function pointers -are actually function descriptors. +The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic +format. They result in the symbol name with (``S``) or without (``s``) +offsets. If KALLSYMS are disabled then the symbol address is printed instead. -The ``S`` and ``s`` specifiers can be used for printing symbols -from direct addresses, for example, __builtin_return_address(0), -(void *)regs->ip. They result in the symbol name with (``S``) or -without (``s``) offsets. If KALLSYMS are disabled then the symbol -address is printed instead. +Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``) +and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and +parisc64 function pointers are indirect and, in fact, are function +descriptors, which require additional dereferencing before we can lookup +the symbol. As of now, ``S`` and ``s`` perform dereferencing on those +platforms (when needed), so ``F`` and ``f`` exist for compatibility +reasons only. The ``B`` specifier results in the symbol name with offsets and should be used when printing stack backtraces. The specifier takes into consideration the effect of compiler optimisations which may occur when tail-call``s are used and marked with the noreturn GCC attribute. -Examples:: - - printk("Going to call: %pF\n", gettimeofday); - printk("Going to call: %pF\n", p->func); - printk("%s: called from %pS\n", __func__, (void *)_RET_IP_); - printk("%s: called from %pS\n", __func__, - (void *)__builtin_return_address(0)); - printk("Faulted at %pS\n", (void *)regs->ip); - printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack); - - Kernel Pointers =============== diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 708f337d780b..e4f2e5a65f14 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -9,6 +9,10 @@ #include #include #include +#include +#include + +#include #define KSYM_NAME_LEN 128 #define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s]") + (KSYM_NAME_LEN - 1) + \ @@ -22,6 +26,56 @@ struct module; +static inline int is_kernel_inittext(unsigned long addr) +{ + if (addr >= (unsigned long)_sinittext + && addr <= (unsigned long)_einittext) + return 1; + return 0; +} + +static inline int is_kernel_text(unsigned long addr) +{ + if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) || + arch_is_kernel_text(addr)) + return 1; + return in_gate_area_no_mm(addr); +} + +static inline int is_kernel(unsigned long addr) +{ + if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end) + return 1; + return in_gate_area_no_mm(addr); +} + +static inline int is_ksym_addr(unsigned long addr) +{ + if (IS_ENABLED(CONFIG_KALLSYMS_ALL)) + return is_kernel(addr); + + return is_kernel_text(addr) || is_kernel_inittext(addr); +} + +static inline void *dereference_symbol_descriptor(void *ptr) +{ +#ifdef HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR + struct module *mod; + + ptr = dereference_kernel_function_descriptor(ptr); + if (is_ksym_addr((unsigned long)ptr)) + return ptr; + + preempt_disable(); + mod = __module_address((unsigned long)ptr); + preempt_enable(); + + if (mod) + ptr = dereference_module_function_descriptor(mod, ptr); +#endif + return ptr; +} + #ifdef CONFIG_KALLSYMS /* Lookup the address for a symbol. Returns 0 if not found. */ unsigned long kallsyms_lookup_name(const char *name); diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 531ffa984bc2..0e4c0922908a 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -12,7 +12,6 @@ * compression (see scripts/kallsyms.c for a more complete description) */ #include -#include #include #include #include @@ -20,15 +19,12 @@ #include #include #include /* for cond_resched */ -#include #include #include #include #include #include -#include - /* * These will be re-linked against their real values * during the second link stage. @@ -52,37 +48,6 @@ extern const u16 kallsyms_token_index[] __weak; extern const unsigned long kallsyms_markers[] __weak; -static inline int is_kernel_inittext(unsigned long addr) -{ - if (addr >= (unsigned long)_sinittext - && addr <= (unsigned long)_einittext) - return 1; - return 0; -} - -static inline int is_kernel_text(unsigned long addr) -{ - if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) || - arch_is_kernel_text(addr)) - return 1; - return in_gate_area_no_mm(addr); -} - -static inline int is_kernel(unsigned long addr) -{ - if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end) - return 1; - return in_gate_area_no_mm(addr); -} - -static int is_ksym_addr(unsigned long addr) -{ - if (IS_ENABLED(CONFIG_KALLSYMS_ALL)) - return is_kernel(addr); - - return is_kernel_text(addr) || is_kernel_inittext(addr); -} - /* * Expand a compressed symbol data into the resulting uncompressed string, * if uncompressed string is too long (>= maxlen), it will be truncated, diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 1746bae94d41..16e2eefb0f79 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -40,7 +40,6 @@ #include "../mm/internal.h" /* For the trace_print_flags arrays */ #include /* for PAGE_SIZE */ -#include /* for dereference_function_descriptor() */ #include /* cpu_to_le16 */ #include @@ -1723,10 +1722,10 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, switch (*fmt) { case 'F': case 'f': - ptr = dereference_function_descriptor(ptr); - /* Fallthrough */ case 'S': case 's': + ptr = dereference_symbol_descriptor(ptr); + /* Fallthrough */ case 'B': return symbol_string(buf, end, ptr, spec, fmt); case 'R': From 1df7338ac96558d5ae4c1a9dd5d1cb60fcd1bdb2 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 10 Nov 2017 08:48:30 +0900 Subject: [PATCH 22/27] checkpatch: add pF/pf deprecation warning We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart enough to handle function pointer dereference on platforms where such dereference is required. Link: http://lkml.kernel.org/r/20171109234830.5067-7-sergey.senozhatsky@gmail.com To: Tony Luck To: Fenghua Yu To: Helge Deller To: Benjamin Herrenschmidt To: Paul Mackerras To: Michael Ellerman To: James Bottomley Cc: Andrew Morton Cc: Jessica Yu Cc: Petr Mladek Cc: Steven Rostedt Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Cc: Sergey Senozhatsky Cc: Sergey Senozhatsky Cc: Joe Perches Cc: Andy Whitcroft Signed-off-by: Sergey Senozhatsky Signed-off-by: Joe Perches Signed-off-by: Petr Mladek --- scripts/checkpatch.pl | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 95cda3ecc66b..0bb68e7ff173 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5753,18 +5753,25 @@ sub process { for (my $count = $linenr; $count <= $lc; $count++) { my $fmt = get_quoted_string($lines[$count - 1], raw_line($count, 0)); $fmt =~ s/%%//g; - if ($fmt =~ /(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNO]).)/) { + if ($fmt =~ /(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNO]).)/) { $bad_extension = $1; last; } } if ($bad_extension ne "") { my $stat_real = raw_line($linenr, 0); + my $ext_type = "Invalid"; + my $use = ""; for (my $count = $linenr + 1; $count <= $lc; $count++) { $stat_real = $stat_real . "\n" . raw_line($count, 0); } + if ($bad_extension =~ /p[Ff]/) { + $ext_type = "Deprecated"; + $use = " - use %pS instead"; + $use =~ s/pS/ps/ if ($bad_extension =~ /pf/); + } WARN("VSPRINTF_POINTER_EXTENSION", - "Invalid vsprintf pointer extension '$bad_extension'\n" . "$here\n$stat_real\n"); + "$ext_type vsprintf pointer extension '$bad_extension'$use\n" . "$here\n$stat_real\n"); } } From d2279c9d7f7db7f97567368bfc4539b3411adf8d Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 5 Jan 2018 19:25:38 +0900 Subject: [PATCH 23/27] kallsyms: remove print_symbol() function No more print_symbol()/__print_symbol() users left, remove these symbols. It was a very old API that encouraged people use continuous lines. It had been obsoleted by %pS format specifier in a normal printk() call. Link: http://lkml.kernel.org/r/20180105102538.GC471@jagdpanzerIV Cc: Andrew Morton Cc: Russell King Cc: Catalin Marinas Cc: Mark Salter Cc: Tony Luck Cc: David Howells Cc: Yoshinori Sato Cc: Guan Xuetao Cc: Borislav Petkov Cc: Greg Kroah-Hartman Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Vineet Gupta Cc: Fengguang Wu Cc: Steven Rostedt Cc: LKML Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linux-am33-list@redhat.com Cc: linux-sh@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: Sergey Senozhatsky Signed-off-by: Sergey Senozhatsky Suggested-by: Joe Perches [pmladek@suse.com: updated commit message] Signed-off-by: Petr Mladek --- Documentation/filesystems/sysfs.txt | 4 ++-- .../translations/zh_CN/filesystems/sysfs.txt | 4 ++-- include/linux/kallsyms.h | 18 ------------------ kernel/kallsyms.c | 11 ----------- 4 files changed, 4 insertions(+), 33 deletions(-) diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 9a3658cc399e..a1426cabcef1 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -154,8 +154,8 @@ static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, if (dev_attr->show) ret = dev_attr->show(dev, dev_attr, buf); if (ret >= (ssize_t)PAGE_SIZE) { - print_symbol("dev_attr_show: %s returned bad count\n", - (unsigned long)dev_attr->show); + printk("dev_attr_show: %pS returned bad count\n", + dev_attr->show); } return ret; } diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt b/Documentation/translations/zh_CN/filesystems/sysfs.txt index 7d3b05edb8ce..452271dda141 100644 --- a/Documentation/translations/zh_CN/filesystems/sysfs.txt +++ b/Documentation/translations/zh_CN/filesystems/sysfs.txt @@ -167,8 +167,8 @@ static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, if (dev_attr->show) ret = dev_attr->show(dev, dev_attr, buf); if (ret >= (ssize_t)PAGE_SIZE) { - print_symbol("dev_attr_show: %s returned bad count\n", - (unsigned long)dev_attr->show); + printk("dev_attr_show: %pS returned bad count\n", + dev_attr->show); } return ret; } diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 708f337d780b..c733941901b3 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -46,9 +46,6 @@ extern int sprint_symbol(char *buffer, unsigned long address); extern int sprint_symbol_no_offset(char *buffer, unsigned long address); extern int sprint_backtrace(char *buffer, unsigned long address); -/* Look up a kernel symbol and print it to the kernel messages. */ -extern void __print_symbol(const char *fmt, unsigned long address); - int lookup_symbol_name(unsigned long addr, char *symname); int lookup_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name); @@ -118,23 +115,8 @@ static inline int kallsyms_show_value(void) return false; } -/* Stupid that this does nothing, but I didn't create this mess. */ -#define __print_symbol(fmt, addr) #endif /*CONFIG_KALLSYMS*/ -/* This macro allows us to keep printk typechecking */ -static __printf(1, 2) -void __check_printsym_format(const char *fmt, ...) -{ -} - -static inline void print_symbol(const char *fmt, unsigned long addr) -{ - __check_printsym_format(fmt, ""); - __print_symbol(fmt, (unsigned long) - __builtin_extract_return_addr((void *)addr)); -} - static inline void print_ip_sym(unsigned long ip) { printk("[<%p>] %pS\n", (void *) ip, (void *) ip); diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 531ffa984bc2..32ba256f0092 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -464,17 +464,6 @@ int sprint_backtrace(char *buffer, unsigned long address) return __sprint_symbol(buffer, address, -1, 1); } -/* Look up a kernel symbol and print it to the kernel messages. */ -void __print_symbol(const char *fmt, unsigned long address) -{ - char buffer[KSYM_SYMBOL_LEN]; - - sprint_symbol(buffer, address); - - printk(fmt, buffer); -} -EXPORT_SYMBOL(__print_symbol); - /* To avoid using get_symbol_offset for every symbol, we carry prefix along. */ struct kallsym_iter { loff_t pos; From dbdda842fe96f8932bae554f0adf463c27c42bc7 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 10 Jan 2018 14:24:17 +0100 Subject: [PATCH 24/27] printk: Add console owner and waiter logic to load balance console writes This patch implements what I discussed in Kernel Summit. I added lockdep annotation (hopefully correctly), and it hasn't had any splats (since I fixed some bugs in the first iterations). It did catch problems when I had the owner covering too much. But now that the owner is only set when actively calling the consoles, lockdep has stayed quiet. Here's the design again: I added a "console_owner" which is set to a task that is actively writing to the consoles. It is *not* the same as the owner of the console_lock. It is only set when doing the calls to the console functions. It is protected by a console_owner_lock which is a raw spin lock. There is a console_waiter. This is set when there is an active console owner that is not current, and waiter is not set. This too is protected by console_owner_lock. In printk() when it tries to write to the consoles, we have: if (console_trylock()) console_unlock(); Now I added an else, which will check if there is an active owner, and no current waiter. If that is the case, then console_waiter is set, and the task goes into a spin until it is no longer set. When the active console owner finishes writing the current message to the consoles, it grabs the console_owner_lock and sees if there is a waiter, and clears console_owner. If there is a waiter, then it breaks out of the loop, clears the waiter flag (because that will release the waiter from its spin), and exits. Note, it does *not* release the console semaphore. Because it is a semaphore, there is no owner. Another task may release it. This means that the waiter is guaranteed to be the new console owner! Which it becomes. Then the waiter calls console_unlock() and continues to write to the consoles. If another task comes along and does a printk() it too can become the new waiter, and we wash rinse and repeat! By Petr Mladek about possible new deadlocks: The thing is that we move console_sem only to printk() call that normally calls console_unlock() as well. It means that the transferred owner should not bring new type of dependencies. As Steven said somewhere: "If there is a deadlock, it was there even before." We could look at it from this side. The possible deadlock would look like: CPU0 CPU1 console_unlock() console_owner = current; spin_lockA() printk() spin = true; while (...) call_console_drivers() spin_lockA() This would be a deadlock. CPU0 would wait for the lock A. While CPU1 would own the lockA and would wait for CPU0 to finish calling the console drivers and pass the console_sem owner. But if the above is true than the following scenario was already possible before: CPU0 spin_lockA() printk() console_unlock() call_console_drivers() spin_lockA() By other words, this deadlock was there even before. Such deadlocks are prevented by using printk_deferred() in the sections guarded by the lock A. By Steven Rostedt: To demonstrate the issue, this module has been shown to lock up a system with 4 CPUs and a slow console (like a serial console). It is also able to lock up a 8 CPU system with only a fast (VGA) console, by passing in "loops=100". The changes in this commit prevent this module from locking up the system. #include #include #include #include #include #include static bool stop_testing; static unsigned int loops = 1; static void preempt_printk_workfn(struct work_struct *work) { int i; while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX NOPREEMPT"); preempt_enable(); } msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, &preempt_printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_param(loops, uint, 0); module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com Cc: akpm@linux-foundation.org Cc: linux-mm@kvack.org Cc: Cong Wang Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jan Kara Cc: Mathieu Desnoyers Cc: Tetsuo Handa Cc: Byungchul Park Cc: Tejun Heo Cc: Pavel Machek Cc: linux-kernel@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) [pmladek@suse.com: Commit message about possible deadlocks] Acked-by: Sergey Senozhatsky Signed-off-by: Petr Mladek --- kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 107 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 5d81206a572d..040fb948924e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers); static struct lockdep_map console_lock_dep_map = { .name = "console_lock" }; +static struct lockdep_map console_owner_dep_map = { + .name = "console_owner" +}; #endif +static DEFINE_RAW_SPINLOCK(console_owner_lock); +static struct task_struct *console_owner; +static bool console_waiter; + enum devkmsg_log_bits { __DEVKMSG_LOG_BIT_ON = 0, __DEVKMSG_LOG_BIT_OFF, @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level, * semaphore. The release will print out buffers and wake up * /dev/kmsg and syslog() users. */ - if (console_trylock()) + if (console_trylock()) { console_unlock(); + } else { + struct task_struct *owner = NULL; + bool waiter; + bool spin = false; + + printk_safe_enter_irqsave(flags); + + raw_spin_lock(&console_owner_lock); + owner = READ_ONCE(console_owner); + waiter = READ_ONCE(console_waiter); + if (!waiter && owner && owner != current) { + WRITE_ONCE(console_waiter, true); + spin = true; + } + raw_spin_unlock(&console_owner_lock); + + /* + * If there is an active printk() writing to the + * consoles, instead of having it write our data too, + * see if we can offload that load from the active + * printer, and do some printing ourselves. + * Go into a spin only if there isn't already a waiter + * spinning, and there is an active printer, and + * that active printer isn't us (recursive printk?). + */ + if (spin) { + /* We spin waiting for the owner to release us */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + /* Owner will clear console_waiter on hand off */ + while (READ_ONCE(console_waiter)) + cpu_relax(); + + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); + + /* + * The owner passed the console lock to us. + * Since we did not spin on console lock, annotate + * this as a trylock. Otherwise lockdep will + * complain. + */ + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + console_unlock(); + printk_safe_enter_irqsave(flags); + } + printk_safe_exit_irqrestore(flags); + + } } return printed_len; @@ -2141,6 +2196,7 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; + bool waiter = false; bool do_cond_resched, retry; if (console_suspended) { @@ -2229,14 +2285,64 @@ skip: console_seq++; raw_spin_unlock(&logbuf_lock); + /* + * While actively printing out messages, if another printk() + * were to occur on another CPU, it may wait for this one to + * finish. This task can not be preempted if there is a + * waiter waiting to take over. + */ + raw_spin_lock(&console_owner_lock); + console_owner = current; + raw_spin_unlock(&console_owner_lock); + + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + /* + * If there is a waiter waiting for us, then pass the + * rest of the work load over to that waiter. + */ + if (waiter) + break; + + /* There was no waiter, and nothing will spin on us here */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); if (do_cond_resched) cond_resched(); } + + /* + * If there is an active waiter waiting on the console_lock. + * Pass off the printing to the waiter, and the waiter + * will continue printing on its CPU, and when all writing + * has finished, the last printer will wake up klogd. + */ + if (waiter) { + WRITE_ONCE(console_waiter, false); + /* The waiter is now free to continue */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + /* + * Hand off console_lock to waiter. The waiter will perform + * the up(). After this, the waiter is the console_lock owner. + */ + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); + /* Note, if waiter is set, logbuf_lock is not held */ + return; + } + console_locked = 0; /* Release the exclusive_console once it is used */ From c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 12 Jan 2018 17:08:37 +0100 Subject: [PATCH 25/27] printk: Hide console waiter logic into helpers The commit ("printk: Add console owner and waiter logic to load balance console writes") made vprintk_emit() and console_unlock() even more complicated. This patch extracts the new code into 3 helper functions. They should help to keep it rather self-contained. It will be easier to use and maintain. This patch just shuffles the existing code. It does not change the functionality. Link: http://lkml.kernel.org/r/20180112160837.GD24497@linux.suse Cc: akpm@linux-foundation.org Cc: linux-mm@kvack.org Cc: Cong Wang Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jan Kara Cc: Mathieu Desnoyers Cc: Tetsuo Handa Cc: rostedt@home.goodmis.org Cc: Byungchul Park Cc: Tejun Heo Cc: Pavel Machek Cc: linux-kernel@vger.kernel.org Reviewed-by: Steven Rostedt (VMware) Acked-by: Sergey Senozhatsky Signed-off-by: Petr Mladek --- kernel/printk/printk.c | 245 +++++++++++++++++++++++++---------------- 1 file changed, 148 insertions(+), 97 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 040fb948924e..3a475f58b749 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers); static struct lockdep_map console_lock_dep_map = { .name = "console_lock" }; -static struct lockdep_map console_owner_dep_map = { - .name = "console_owner" -}; #endif -static DEFINE_RAW_SPINLOCK(console_owner_lock); -static struct task_struct *console_owner; -static bool console_waiter; - enum devkmsg_log_bits { __DEVKMSG_LOG_BIT_ON = 0, __DEVKMSG_LOG_BIT_OFF, @@ -1550,6 +1543,146 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len) return do_syslog(type, buf, len, SYSLOG_FROM_READER); } +/* + * Special console_lock variants that help to reduce the risk of soft-lockups. + * They allow to pass console_lock to another printk() call using a busy wait. + */ + +#ifdef CONFIG_LOCKDEP +static struct lockdep_map console_owner_dep_map = { + .name = "console_owner" +}; +#endif + +static DEFINE_RAW_SPINLOCK(console_owner_lock); +static struct task_struct *console_owner; +static bool console_waiter; + +/** + * console_lock_spinning_enable - mark beginning of code where another + * thread might safely busy wait + * + * This basically converts console_lock into a spinlock. This marks + * the section where the console_lock owner can not sleep, because + * there may be a waiter spinning (like a spinlock). Also it must be + * ready to hand over the lock at the end of the section. + */ +static void console_lock_spinning_enable(void) +{ + raw_spin_lock(&console_owner_lock); + console_owner = current; + raw_spin_unlock(&console_owner_lock); + + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); +} + +/** + * console_lock_spinning_disable_and_check - mark end of code where another + * thread was able to busy wait and check if there is a waiter + * + * This is called at the end of the section where spinning is allowed. + * It has two functions. First, it is a signal that it is no longer + * safe to start busy waiting for the lock. Second, it checks if + * there is a busy waiter and passes the lock rights to her. + * + * Important: Callers lose the lock if there was a busy waiter. + * They must not touch items synchronized by console_lock + * in this case. + * + * Return: 1 if the lock rights were passed, 0 otherwise. + */ +static int console_lock_spinning_disable_and_check(void) +{ + int waiter; + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + if (!waiter) { + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + return 0; + } + + /* The waiter is now free to continue */ + WRITE_ONCE(console_waiter, false); + + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + + /* + * Hand off console_lock to waiter. The waiter will perform + * the up(). After this, the waiter is the console_lock owner. + */ + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); + return 1; +} + +/** + * console_trylock_spinning - try to get console_lock by busy waiting + * + * This allows to busy wait for the console_lock when the current + * owner is running in specially marked sections. It means that + * the current owner is running and cannot reschedule until it + * is ready to lose the lock. + * + * Return: 1 if we got the lock, 0 othrewise + */ +static int console_trylock_spinning(void) +{ + struct task_struct *owner = NULL; + bool waiter; + bool spin = false; + unsigned long flags; + + if (console_trylock()) + return 1; + + printk_safe_enter_irqsave(flags); + + raw_spin_lock(&console_owner_lock); + owner = READ_ONCE(console_owner); + waiter = READ_ONCE(console_waiter); + if (!waiter && owner && owner != current) { + WRITE_ONCE(console_waiter, true); + spin = true; + } + raw_spin_unlock(&console_owner_lock); + + /* + * If there is an active printk() writing to the + * consoles, instead of having it write our data too, + * see if we can offload that load from the active + * printer, and do some printing ourselves. + * Go into a spin only if there isn't already a waiter + * spinning, and there is an active printer, and + * that active printer isn't us (recursive printk?). + */ + if (!spin) { + printk_safe_exit_irqrestore(flags); + return 0; + } + + /* We spin waiting for the owner to release us */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + /* Owner will clear console_waiter on hand off */ + while (READ_ONCE(console_waiter)) + cpu_relax(); + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + + printk_safe_exit_irqrestore(flags); + /* + * The owner passed the console lock to us. + * Since we did not spin on console lock, annotate + * this as a trylock. Otherwise lockdep will + * complain. + */ + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + + return 1; +} + /* * Call the console drivers, asking them to write out * log_buf[start] to log_buf[end - 1]. @@ -1760,56 +1893,8 @@ asmlinkage int vprintk_emit(int facility, int level, * semaphore. The release will print out buffers and wake up * /dev/kmsg and syslog() users. */ - if (console_trylock()) { + if (console_trylock_spinning()) console_unlock(); - } else { - struct task_struct *owner = NULL; - bool waiter; - bool spin = false; - - printk_safe_enter_irqsave(flags); - - raw_spin_lock(&console_owner_lock); - owner = READ_ONCE(console_owner); - waiter = READ_ONCE(console_waiter); - if (!waiter && owner && owner != current) { - WRITE_ONCE(console_waiter, true); - spin = true; - } - raw_spin_unlock(&console_owner_lock); - - /* - * If there is an active printk() writing to the - * consoles, instead of having it write our data too, - * see if we can offload that load from the active - * printer, and do some printing ourselves. - * Go into a spin only if there isn't already a waiter - * spinning, and there is an active printer, and - * that active printer isn't us (recursive printk?). - */ - if (spin) { - /* We spin waiting for the owner to release us */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); - /* Owner will clear console_waiter on hand off */ - while (READ_ONCE(console_waiter)) - cpu_relax(); - - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - printk_safe_exit_irqrestore(flags); - - /* - * The owner passed the console lock to us. - * Since we did not spin on console lock, annotate - * this as a trylock. Otherwise lockdep will - * complain. - */ - mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); - console_unlock(); - printk_safe_enter_irqsave(flags); - } - printk_safe_exit_irqrestore(flags); - - } } return printed_len; @@ -1910,6 +1995,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, static ssize_t msg_print_ext_body(char *buf, size_t size, char *dict, size_t dict_len, char *text, size_t text_len) { return 0; } +static void console_lock_spinning_enable(void) { } +static int console_lock_spinning_disable_and_check(void) { return 0; } static void call_console_drivers(const char *ext_text, size_t ext_len, const char *text, size_t len) {} static size_t msg_print_text(const struct printk_log *msg, @@ -2196,7 +2283,6 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; - bool waiter = false; bool do_cond_resched, retry; if (console_suspended) { @@ -2291,31 +2377,16 @@ skip: * finish. This task can not be preempted if there is a * waiter waiting to take over. */ - raw_spin_lock(&console_owner_lock); - console_owner = current; - raw_spin_unlock(&console_owner_lock); - - /* The waiter may spin on us after setting console_owner */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + console_lock_spinning_enable(); stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); - raw_spin_lock(&console_owner_lock); - waiter = READ_ONCE(console_waiter); - console_owner = NULL; - raw_spin_unlock(&console_owner_lock); - - /* - * If there is a waiter waiting for us, then pass the - * rest of the work load over to that waiter. - */ - if (waiter) - break; - - /* There was no waiter, and nothing will spin on us here */ - spin_release(&console_owner_dep_map, 1, _THIS_IP_); + if (console_lock_spinning_disable_and_check()) { + printk_safe_exit_irqrestore(flags); + return; + } printk_safe_exit_irqrestore(flags); @@ -2323,26 +2394,6 @@ skip: cond_resched(); } - /* - * If there is an active waiter waiting on the console_lock. - * Pass off the printing to the waiter, and the waiter - * will continue printing on its CPU, and when all writing - * has finished, the last printer will wake up klogd. - */ - if (waiter) { - WRITE_ONCE(console_waiter, false); - /* The waiter is now free to continue */ - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - /* - * Hand off console_lock to waiter. The waiter will perform - * the up(). After this, the waiter is the console_lock owner. - */ - mutex_release(&console_lock_dep_map, 1, _THIS_IP_); - printk_safe_exit_irqrestore(flags); - /* Note, if waiter is set, logbuf_lock is not held */ - return; - } - console_locked = 0; /* Release the exclusive_console once it is used */ From fd5f7cde1b85d4c8e09ca46ce948e008a2377f64 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 16 Jan 2018 13:47:16 +0900 Subject: [PATCH 26/27] printk: Never set console_may_schedule in console_trylock() This patch, basically, reverts commit 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers"). That commit was a mistake, it introduced a big dependency on the scheduler, by enabling preemption under console_sem in printk()->console_unlock() path, which is rather too critical. The patch did not significantly reduce the possibilities of printk() lockups, but made it possible to stall printk(), as has been reported by Tetsuo Handa [1]. Another issues is that preemption under console_sem also messes up with Steven Rostedt's hand off scheme, by making it possible to sleep with console_sem both in console_unlock() and in vprintk_emit(), after acquiring the console_sem ownership (anywhere between printk_safe_exit_irqrestore() in console_trylock_spinning() and printk_safe_enter_irqsave() in console_unlock()). This makes hand off less likely and, at the same time, may result in a significant amount of pending logbuf messages. Preempted console_sem owner makes it impossible for other CPUs to emit logbuf messages, but does not make it impossible for other CPUs to append new messages to the logbuf. Reinstate the old behavior and make printk() non-preemptible. Should any printk() lockup reports arrive they must be handled in a different way. [1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV To: Tetsuo Handa Cc: Sergey Senozhatsky Cc: Tejun Heo Cc: akpm@linux-foundation.org Cc: linux-mm@kvack.org Cc: Cong Wang Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: Vlastimil Babka Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jan Kara Cc: Mathieu Desnoyers Cc: Byungchul Park Cc: Pavel Machek Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky Reported-by: Tetsuo Handa Reviewed-by: Steven Rostedt (VMware) Signed-off-by: Petr Mladek --- kernel/printk/printk.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 3a475f58b749..6b9d8d56e0e2 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1888,6 +1888,12 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + /* + * Disable preemption to avoid being preempted while holding + * console_sem which would prevent anyone from printing to + * console + */ + preempt_disable(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1895,6 +1901,7 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + preempt_enable(); } return printed_len; @@ -2211,20 +2218,7 @@ int console_trylock(void) return 0; } console_locked = 1; - /* - * When PREEMPT_COUNT disabled we can't reliably detect if it's - * safe to schedule (e.g. calling printk while holding a spin_lock), - * because preempt_disable()/preempt_enable() are just barriers there - * and preempt_count() is always 0. - * - * RCU read sections have a separate preemption counter when - * PREEMPT_RCU enabled thus we must take extra care and check - * rcu_preempt_depth(), otherwise RCU read sections modify - * preempt_count(). - */ - console_may_schedule = !oops_in_progress && - preemptible() && - !rcu_preempt_depth(); + console_may_schedule = 0; return 1; } EXPORT_SYMBOL(console_trylock); From 6fd78a1a99c9580da49ee8f951fdce9846256375 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 19 Jan 2018 13:39:01 +0900 Subject: [PATCH 27/27] printk: drop redundant devkmsg_log_str memsets We copy in null terminated strings "on" and "off", no need to zero out devkmsg_log_str in control_devkmsg(). Link: http://lkml.kernel.org/r/20180119043901.1728-1-sergey.senozhatsky@gmail.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky Reviewed-by: Steven Rostedt (VMware) Signed-off-by: Petr Mladek --- kernel/printk/printk.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 568729e0dc2c..bf2e6741ec12 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -131,13 +131,10 @@ static int __init control_devkmsg(char *str) /* * Set sysctl string accordingly: */ - if (devkmsg_log == DEVKMSG_LOG_MASK_ON) { - memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE); - strncpy(devkmsg_log_str, "on", 2); - } else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF) { - memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE); - strncpy(devkmsg_log_str, "off", 3); - } + if (devkmsg_log == DEVKMSG_LOG_MASK_ON) + strcpy(devkmsg_log_str, "on"); + else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF) + strcpy(devkmsg_log_str, "off"); /* else "ratelimit" which is set by default. */ /*