From 650b7b23cb1e32d77daeefbac1ceb1329abf3b23 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 20 Feb 2015 15:07:29 +0100 Subject: [PATCH 1/9] kprobes/x86: Use 5-byte NOP when the code might be modified by ftrace can_probe() checks if the given address points to the beginning of an instruction. It analyzes all the instructions from the beginning of the function until the given address. The code might be modified by another Kprobe. In this case, the current code is read into a buffer, int3 breakpoint is replaced by the saved opcode in the buffer, and can_probe() analyzes the buffer instead. There is a bug that __recover_probed_insn() tries to restore the original code even for Kprobes using the ftrace framework. But in this case, the opcode is not stored. See the difference between arch_prepare_kprobe() and arch_prepare_kprobe_ftrace(). The opcode is stored by arch_copy_kprobe() only from arch_prepare_kprobe(). This patch makes Kprobe to use the ideal 5-byte NOP when the code can be modified by ftrace. It is the original instruction, see ftrace_make_nop() and ftrace_nop_replace(). Note that we always need to use the NOP for ftrace locations. Kprobes do not block ftrace and the instruction might get modified at anytime. It might even be in an inconsistent state because it is modified step by step using the int3 breakpoint. The patch also fixes indentation of the touched comment. Note that I found this problem when playing with Kprobes. I did it on x86_64 with gcc-4.8.3 that supported -mfentry. I modified samples/kprobes/kprobe_example.c and added offset 5 to put the probe right after the fentry area: static struct kprobe kp = { .symbol_name = "do_fork", + .offset = 5, }; Then I was able to load kprobe_example before jprobe_example but not the other way around: $> modprobe jprobe_example $> modprobe kprobe_example modprobe: ERROR: could not insert 'kprobe_example': Invalid or incomplete multibyte or wide character It did not make much sense and debugging pointed to the bug described above. Signed-off-by: Petr Mladek Acked-by: Masami Hiramatsu Cc: Ananth NMavinakayanahalli Cc: Anil S Keshavamurthy Cc: David S. Miller Cc: Frederic Weisbecker Cc: Jiri Kosina Cc: Steven Rostedt Link: http://lkml.kernel.org/r/1424441250-27146-2-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar --- arch/x86/kernel/kprobes/core.c | 42 ++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 6a1146ea4d4d..c3b4b46b4797 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -223,27 +223,41 @@ static unsigned long __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr) { struct kprobe *kp; + unsigned long faddr; kp = get_kprobe((void *)addr); - /* There is no probe, return original address */ - if (!kp) + faddr = ftrace_location(addr); + /* + * Use the current code if it is not modified by Kprobe + * and it cannot be modified by ftrace. + */ + if (!kp && !faddr) return addr; /* - * Basically, kp->ainsn.insn has an original instruction. - * However, RIP-relative instruction can not do single-stepping - * at different place, __copy_instruction() tweaks the displacement of - * that instruction. In that case, we can't recover the instruction - * from the kp->ainsn.insn. + * Basically, kp->ainsn.insn has an original instruction. + * However, RIP-relative instruction can not do single-stepping + * at different place, __copy_instruction() tweaks the displacement of + * that instruction. In that case, we can't recover the instruction + * from the kp->ainsn.insn. * - * On the other hand, kp->opcode has a copy of the first byte of - * the probed instruction, which is overwritten by int3. And - * the instruction at kp->addr is not modified by kprobes except - * for the first byte, we can recover the original instruction - * from it and kp->opcode. + * On the other hand, in case on normal Kprobe, kp->opcode has a copy + * of the first byte of the probed instruction, which is overwritten + * by int3. And the instruction at kp->addr is not modified by kprobes + * except for the first byte, we can recover the original instruction + * from it and kp->opcode. + * + * In case of Kprobes using ftrace, we do not have a copy of + * the original instruction. In fact, the ftrace location might + * be modified at anytime and even could be in an inconsistent state. + * Fortunately, we know that the original code is the ideal 5-byte + * long NOP. */ - memcpy(buf, kp->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); - buf[0] = kp->opcode; + memcpy(buf, (void *)addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + if (faddr) + memcpy(buf, ideal_nops[NOP_ATOMIC5], 5); + else + buf[0] = kp->opcode; return (unsigned long)buf; } From 2a6730c8b6e075adf826a89a3e2caa705807afdb Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Fri, 20 Feb 2015 15:07:30 +0100 Subject: [PATCH 2/9] kprobes/x86: Check for invalid ftrace location in __recover_probed_insn() __recover_probed_insn() should always be called from an address where an instructions starts. The check for ftrace_location() might help to discover a potential inconsistency. This patch adds WARN_ON() when the inconsistency is detected. Also it adds handling of the situation when the original code can not get recovered. Suggested-by: Masami Hiramatsu Signed-off-by: Petr Mladek Cc: Ananth NMavinakayanahalli Cc: Anil S Keshavamurthy Cc: David S. Miller Cc: Frederic Weisbecker Cc: Jiri Kosina Cc: Steven Rostedt Link: http://lkml.kernel.org/r/1424441250-27146-3-git-send-email-pmladek@suse.cz Signed-off-by: Ingo Molnar --- arch/x86/kernel/kprobes/core.c | 12 ++++++++++++ arch/x86/kernel/kprobes/opt.c | 2 ++ 2 files changed, 14 insertions(+) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index c3b4b46b4797..4e3d5a9621fe 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -227,6 +227,13 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr) kp = get_kprobe((void *)addr); faddr = ftrace_location(addr); + /* + * Addresses inside the ftrace location are refused by + * arch_check_ftrace_location(). Something went terribly wrong + * if such an address is checked here. + */ + if (WARN_ON(faddr && faddr != addr)) + return 0UL; /* * Use the current code if it is not modified by Kprobe * and it cannot be modified by ftrace. @@ -265,6 +272,7 @@ __recover_probed_insn(kprobe_opcode_t *buf, unsigned long addr) * Recover the probed instruction at addr for further analysis. * Caller must lock kprobes by kprobe_mutex, or disable preemption * for preventing to release referencing kprobes. + * Returns zero if the instruction can not get recovered. */ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) { @@ -299,6 +307,8 @@ static int can_probe(unsigned long paddr) * normally used, we just go through if there is no kprobe. */ __addr = recover_probed_instruction(buf, addr); + if (!__addr) + return 0; kernel_insn_init(&insn, (void *)__addr, MAX_INSN_SIZE); insn_get_length(&insn); @@ -347,6 +357,8 @@ int __copy_instruction(u8 *dest, u8 *src) unsigned long recovered_insn = recover_probed_instruction(buf, (unsigned long)src); + if (!recovered_insn) + return 0; kernel_insn_init(&insn, (void *)recovered_insn, MAX_INSN_SIZE); insn_get_length(&insn); /* Another subsystem puts a breakpoint, failed to recover */ diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 7c523bbf3dc8..3aef248ec1ee 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -259,6 +259,8 @@ static int can_optimize(unsigned long paddr) */ return 0; recovered_insn = recover_probed_instruction(buf, addr); + if (!recovered_insn) + return 0; kernel_insn_init(&insn, (void *)recovered_insn, MAX_INSN_SIZE); insn_get_length(&insn); /* Another subsystem puts a breakpoint */ From e17fdaeaec066c725f73cd3cda1feae52b2646f5 Mon Sep 17 00:00:00 2001 From: Bruce Merry Date: Thu, 15 Jan 2015 11:20:22 +0200 Subject: [PATCH 3/9] perf bench: Fix order of arguments to memcpy_alloc_mem This was causing the destination instead of the source to be filled. As a result, the source was typically all mapped to one zero page, and hence very cacheable. Signed-off-by: Bruce Merry Acked-by: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20150115092022.GA11292@kryton Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/bench/mem-memcpy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/bench/mem-memcpy.c b/tools/perf/bench/mem-memcpy.c index 6c14afe8c1b1..db1d3a29d97f 100644 --- a/tools/perf/bench/mem-memcpy.c +++ b/tools/perf/bench/mem-memcpy.c @@ -289,7 +289,7 @@ static u64 do_memcpy_cycle(const struct routine *r, size_t len, bool prefault) memcpy_t fn = r->fn.memcpy; int i; - memcpy_alloc_mem(&src, &dst, len); + memcpy_alloc_mem(&dst, &src, len); if (prefault) fn(dst, src, len); @@ -312,7 +312,7 @@ static double do_memcpy_gettimeofday(const struct routine *r, size_t len, void *src = NULL, *dst = NULL; int i; - memcpy_alloc_mem(&src, &dst, len); + memcpy_alloc_mem(&dst, &src, len); if (prefault) fn(dst, src, len); From 8eb733829cd17b9b66971f08110df7224d391d65 Mon Sep 17 00:00:00 2001 From: Josh Boyer Date: Wed, 11 Feb 2015 11:24:05 -0500 Subject: [PATCH 4/9] perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check The man page for pthread_attr_set_affinity_np states that _GNU_SOURCE must be defined before pthread.h is included in order to get the proper function declaration. Define this in the Makefile. Without this defined, the feature check fails on a Fedora system with gcc5 and then the perf build later fails with conflicting prototypes for the function. Signed-off-by: Josh Boyer Cc: Jiri Olsa Cc: Peter Zijlstra Cc: Vineet Gupta Link: http://lkml.kernel.org/r/20150211162404.GA15522@hansolo.redhat.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/config/feature-checks/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile index 42ac05aaf8ac..b32ff3372514 100644 --- a/tools/perf/config/feature-checks/Makefile +++ b/tools/perf/config/feature-checks/Makefile @@ -49,7 +49,7 @@ test-hello.bin: $(BUILD) test-pthread-attr-setaffinity-np.bin: - $(BUILD) -Werror -lpthread + $(BUILD) -D_GNU_SOURCE -Werror -lpthread test-stackprotector-all.bin: $(BUILD) -Werror -fstack-protector-all From 95a09cfa3cdf94231ce511f1697754482b918d39 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 24 Feb 2015 12:46:06 +0200 Subject: [PATCH 5/9] perf tools: Fix pthread_attr_setaffinity_np build error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Feature detection for pthread_attr_setaffinity_np was failing, producing this error: In file included from bench/futex-hash.c:17:0: bench/futex.h:73:19: error: conflicting types for ‘pthread_attr_setaffinity_np’ static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr, ^ In file included from bench/futex.h:72:0, from bench/futex-hash.c:17: /usr/include/pthread.h:407:12: note: previous declaration of ‘pthread_attr_setaffinity_np’ was here extern int pthread_attr_setaffinity_np (pthread_attr_t *__attr, ^ make[3]: *** [bench/futex-hash.o] Error 1 make[2]: *** [bench] Error 2 make[2]: *** Waiting for unfinished jobs.... This was because compiling test-pthread-attr-setaffinity-np.c failed due to the function arguments: test-pthread-attr-setaffinity-np.c: In function ‘main’: test-pthread-attr-setaffinity-np.c:11:2: warning: null argument where non-null required (argument 3) [-Wnonnull] ret = pthread_attr_setaffinity_np(&thread_attr, 0, NULL); ^ So fix the arguments. Signed-off-by: Adrian Hunter Tested-by: Stephane Eranian Cc: Jiri Olsa Link: http://lkml.kernel.org/r/1424774766-24194-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo --- .../config/feature-checks/test-pthread-attr-setaffinity-np.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/config/feature-checks/test-pthread-attr-setaffinity-np.c b/tools/perf/config/feature-checks/test-pthread-attr-setaffinity-np.c index 0a0d3ecb4e8a..2b81b72eca23 100644 --- a/tools/perf/config/feature-checks/test-pthread-attr-setaffinity-np.c +++ b/tools/perf/config/feature-checks/test-pthread-attr-setaffinity-np.c @@ -5,10 +5,11 @@ int main(void) { int ret = 0; pthread_attr_t thread_attr; + cpu_set_t cs; pthread_attr_init(&thread_attr); /* don't care abt exact args, just the API itself in libpthread */ - ret = pthread_attr_setaffinity_np(&thread_attr, 0, NULL); + ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cs), &cs); return ret; } From 48536c9195ae8c2a00fd8f400bac72ab613feaab Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 24 Feb 2015 13:20:59 +0200 Subject: [PATCH 6/9] perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag Commit f6edb53c4993ffe92ce521fb449d1c146cea6ec2 converted the probe to a CPU wide event first (pid == -1). For kernels that do not support the PERF_FLAG_FD_CLOEXEC flag the probe fails with EINVAL. Since this errno is not handled pid is not reset to 0 and the subsequent use of pid = -1 as an argument brings in an additional failure path if perf_event_paranoid > 0: $ perf record -- sleep 1 perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied) [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (11 samples) ] Also, ensure the fd of the confirmation check is closed and comment why pid = -1 is used. Needs to go to 3.18 stable tree as well. Signed-off-by: Adrian Hunter Based-on-patch-by: David Ahern Acked-by: David Ahern Cc: David Ahern Link: http://lkml.kernel.org/r/54EC610C.8000403@intel.com Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/cloexec.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c index 47b78b3f0325..6da965bdbc2c 100644 --- a/tools/perf/util/cloexec.c +++ b/tools/perf/util/cloexec.c @@ -25,6 +25,10 @@ static int perf_flag_probe(void) if (cpu < 0) cpu = 0; + /* + * Using -1 for the pid is a workaround to avoid gratuitous jump label + * changes. + */ while (1) { /* check cloexec flag */ fd = sys_perf_event_open(&attr, pid, cpu, -1, @@ -47,16 +51,24 @@ static int perf_flag_probe(void) err, strerror_r(err, sbuf, sizeof(sbuf))); /* not supported, confirm error related to PERF_FLAG_FD_CLOEXEC */ - fd = sys_perf_event_open(&attr, pid, cpu, -1, 0); + while (1) { + fd = sys_perf_event_open(&attr, pid, cpu, -1, 0); + if (fd < 0 && pid == -1 && errno == EACCES) { + pid = 0; + continue; + } + break; + } err = errno; + if (fd >= 0) + close(fd); + if (WARN_ONCE(fd < 0 && err != EBUSY, "perf_event_open(..., 0) failed unexpectedly with error %d (%s)\n", err, strerror_r(err, sbuf, sizeof(sbuf)))) return -1; - close(fd); - return 0; } From a73b6c199a663d64a38198f547d5c5be42163193 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Wed, 18 Feb 2015 19:03:18 -0500 Subject: [PATCH 7/9] perf top: Fix SIGBUS on sparc64 perf-top is terminating due to SIGBUS on sparc64. git bisect points to: commit 82396986032915c1572bfb74b224fcc2e4e8ba7c Author: Arnaldo Carvalho de Melo Date: Mon Sep 8 13:26:35 2014 -0300 perf evlist: Refcount mmaps We need to know how many fds are using a perf mmap via PERF_EVENT_IOC_SET_OUTPUT, so that we can know when to ditch an mmap, refcount it. This commit added 'int refcnt' to struct perf_mmap and the addition makes the event_copy element no longer 8-byte aligned. Fix by adding __attribute__((aligned(8))) to the event_copy struct member. Signed-off-by: David Ahern Link: http://lkml.kernel.org/r/1424304198-92028-1-git-send-email-david.ahern@oracle.com [ Switched from 'int pad;' to using __attribute__, David tested/acked that ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/evlist.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index c94a9e03ecf1..e99a67632831 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -28,7 +28,7 @@ struct perf_mmap { int mask; int refcnt; unsigned int prev; - char event_copy[PERF_SAMPLE_MAX_SIZE]; + char event_copy[PERF_SAMPLE_MAX_SIZE] __attribute__((aligned(8))); }; struct perf_evlist { From e370a3d57664cd5e39c0b95d157ebc841b568409 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Wed, 18 Feb 2015 19:33:37 -0500 Subject: [PATCH 8/9] perf symbols: Define EM_AARCH64 for older OSes 4886f2ca19f6f added an arm-64 check, but the EM_AARCH64 macro is not defined in older releases (e.g., RHEL6). Define if it is not defined. Signed-off-by: David Ahern Cc: Victor Kamensky Link: http://lkml.kernel.org/r/1424306017-96797-1-git-send-email-david.ahern@oracle.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/symbol-elf.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c index b24f9d8727a8..33b7a2aef713 100644 --- a/tools/perf/util/symbol-elf.c +++ b/tools/perf/util/symbol-elf.c @@ -11,6 +11,11 @@ #include #include "debug.h" +#ifndef EM_AARCH64 +#define EM_AARCH64 183 /* ARM 64 bit */ +#endif + + #ifdef HAVE_CPLUS_DEMANGLE_SUPPORT extern char *cplus_demangle(const char *, int); From 4861f87cd3d133f03e3b39b6650f4e12f1a9e421 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Wed, 18 Feb 2015 19:37:02 -0500 Subject: [PATCH 9/9] perf tools: Make sparc64 arch point to sparc The recent build changes cause perf to not compile for sparc64 since the arch/sparc64/Build file does not exist: /home/dahern/kernels/linux.git/tools/build/Makefile.build:40: arch/sparc64/Build: No such file or directory Fix by converting the sparc64 RAW_ARCH to sparc ARCH -- similar to what is done for x86_64. Signed-off-by: David Ahern Cc: Jiri Olsa Link: http://lkml.kernel.org/r/1424306222-96843-1-git-send-email-david.ahern@oracle.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/config/Makefile.arch | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/perf/config/Makefile.arch b/tools/perf/config/Makefile.arch index ff95a68741d1..ac8721ffa6c8 100644 --- a/tools/perf/config/Makefile.arch +++ b/tools/perf/config/Makefile.arch @@ -21,6 +21,10 @@ ifeq ($(RAW_ARCH),x86_64) endif endif +ifeq ($(RAW_ARCH),sparc64) + ARCH ?= sparc +endif + ARCH ?= $(RAW_ARCH) LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)