License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2009-11-05 08:31:34 +08:00
|
|
|
/*
|
|
|
|
* builtin-bench.c
|
|
|
|
*
|
2013-10-23 20:37:56 +08:00
|
|
|
* General benchmarking collections provided by perf
|
2009-11-05 08:31:34 +08:00
|
|
|
*
|
|
|
|
* Copyright (C) 2009, Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2013-10-23 20:37:56 +08:00
|
|
|
* Available benchmark collection list:
|
2009-11-05 08:31:34 +08:00
|
|
|
*
|
2013-10-23 20:37:56 +08:00
|
|
|
* sched ... scheduler and IPC performance
|
2019-03-09 02:17:47 +08:00
|
|
|
* syscall ... System call performance
|
2009-11-17 23:20:09 +08:00
|
|
|
* mem ... memory access performance
|
2013-10-23 20:37:56 +08:00
|
|
|
* numa ... NUMA scheduling and MM performance
|
2013-12-15 12:31:55 +08:00
|
|
|
* futex ... Futex performance
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
* epoll ... Event poll performance
|
2009-11-05 08:31:34 +08:00
|
|
|
*/
|
2015-12-15 23:39:39 +08:00
|
|
|
#include <subcmd/parse-options.h>
|
2009-11-05 08:31:34 +08:00
|
|
|
#include "builtin.h"
|
|
|
|
#include "bench/bench.h"
|
|
|
|
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <string.h>
|
2013-10-23 20:37:56 +08:00
|
|
|
#include <sys/prctl.h>
|
2019-07-04 22:32:27 +08:00
|
|
|
#include <linux/zalloc.h>
|
2009-11-05 08:31:34 +08:00
|
|
|
|
2017-03-27 22:47:20 +08:00
|
|
|
typedef int (*bench_fn_t)(int argc, const char **argv);
|
2013-10-23 20:37:56 +08:00
|
|
|
|
|
|
|
struct bench {
|
|
|
|
const char *name;
|
|
|
|
const char *summary;
|
|
|
|
bench_fn_t fn;
|
2009-11-05 08:31:34 +08:00
|
|
|
};
|
|
|
|
|
2013-09-30 18:07:11 +08:00
|
|
|
#ifdef HAVE_LIBNUMA_SUPPORT
|
2013-10-23 20:37:56 +08:00
|
|
|
static struct bench numa_benchmarks[] = {
|
|
|
|
{ "mem", "Benchmark for NUMA workloads", bench_numa },
|
2015-10-19 16:04:30 +08:00
|
|
|
{ "all", "Run all NUMA benchmarks", NULL },
|
2013-10-23 20:37:56 +08:00
|
|
|
{ NULL, NULL, NULL }
|
perf: Add 'perf bench numa mem' NUMA performance measurement suite
Add a suite of NUMA performance benchmarks.
The goal was simulate the behavior and access patterns of real NUMA
workloads, via a wide range of parameters, so this tool goes well
beyond simple bzero() measurements that most NUMA micro-benchmarks use:
- It processes the data and creates a chain of data dependencies,
like a real workload would. Neither the compiler, nor the
kernel (via KSM and other optimizations) nor the CPU can
eliminate parts of the workload.
- It randomizes the initial state and also randomizes the target
addresses of the processing - it's not a simple forward scan
of addresses.
- It provides flexible options to set process, thread and memory
relationship information: -G sets "global" memory shared between
all test processes, -P sets "process" memory shared by all
threads of a process and -T sets "thread" private memory.
- There's a NUMA convergence monitoring and convergence latency
measurement option via -c and -m.
- Micro-sleeps and synchronization can be injected to provoke lock
contention and scheduling, via the -u and -S options. This simulates
IO and contention.
- The -x option instructs the workload to 'perturb' itself artificially
every N seconds, by moving to the first and last CPU of the system
periodically. This way the stability of convergence equilibrium and
the number of steps taken for the scheduler to reach equilibrium again
can be measured.
- The amount of work can be specified via the -l loop count, and/or
via a -s seconds-timeout value.
- CPU and node memory binding options, to test hard binding scenarios.
THP can be turned on and off via madvise() calls.
- Live reporting of convergence progress in an 'at glance' output format.
Printing of convergence and deconvergence events.
The 'perf bench numa mem -a' option will start an array of about 30
individual tests that will each output such measurements:
# Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
5x5-bw-thread, 20.276, secs, runtime-max/thread
5x5-bw-thread, 20.004, secs, runtime-min/thread
5x5-bw-thread, 20.155, secs, runtime-avg/thread
5x5-bw-thread, 0.671, %, spread-runtime/thread
5x5-bw-thread, 21.153, GB, data/thread
5x5-bw-thread, 528.818, GB, data-total
5x5-bw-thread, 0.959, nsecs, runtime/byte/thread
5x5-bw-thread, 1.043, GB/sec, thread-speed
5x5-bw-thread, 26.081, GB/sec, total-speed
See the help text and the code for more details.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-06 20:51:59 +08:00
|
|
|
};
|
2013-01-28 09:51:22 +08:00
|
|
|
#endif
|
perf: Add 'perf bench numa mem' NUMA performance measurement suite
Add a suite of NUMA performance benchmarks.
The goal was simulate the behavior and access patterns of real NUMA
workloads, via a wide range of parameters, so this tool goes well
beyond simple bzero() measurements that most NUMA micro-benchmarks use:
- It processes the data and creates a chain of data dependencies,
like a real workload would. Neither the compiler, nor the
kernel (via KSM and other optimizations) nor the CPU can
eliminate parts of the workload.
- It randomizes the initial state and also randomizes the target
addresses of the processing - it's not a simple forward scan
of addresses.
- It provides flexible options to set process, thread and memory
relationship information: -G sets "global" memory shared between
all test processes, -P sets "process" memory shared by all
threads of a process and -T sets "thread" private memory.
- There's a NUMA convergence monitoring and convergence latency
measurement option via -c and -m.
- Micro-sleeps and synchronization can be injected to provoke lock
contention and scheduling, via the -u and -S options. This simulates
IO and contention.
- The -x option instructs the workload to 'perturb' itself artificially
every N seconds, by moving to the first and last CPU of the system
periodically. This way the stability of convergence equilibrium and
the number of steps taken for the scheduler to reach equilibrium again
can be measured.
- The amount of work can be specified via the -l loop count, and/or
via a -s seconds-timeout value.
- CPU and node memory binding options, to test hard binding scenarios.
THP can be turned on and off via madvise() calls.
- Live reporting of convergence progress in an 'at glance' output format.
Printing of convergence and deconvergence events.
The 'perf bench numa mem -a' option will start an array of about 30
individual tests that will each output such measurements:
# Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
5x5-bw-thread, 20.276, secs, runtime-max/thread
5x5-bw-thread, 20.004, secs, runtime-min/thread
5x5-bw-thread, 20.155, secs, runtime-avg/thread
5x5-bw-thread, 0.671, %, spread-runtime/thread
5x5-bw-thread, 21.153, GB, data/thread
5x5-bw-thread, 528.818, GB, data-total
5x5-bw-thread, 0.959, nsecs, runtime/byte/thread
5x5-bw-thread, 1.043, GB/sec, thread-speed
5x5-bw-thread, 26.081, GB/sec, total-speed
See the help text and the code for more details.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-06 20:51:59 +08:00
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
static struct bench sched_benchmarks[] = {
|
|
|
|
{ "messaging", "Benchmark for scheduling and IPC", bench_sched_messaging },
|
|
|
|
{ "pipe", "Benchmark for pipe() between two processes", bench_sched_pipe },
|
2015-10-19 16:04:30 +08:00
|
|
|
{ "all", "Run all scheduler benchmarks", NULL },
|
2013-10-23 20:37:56 +08:00
|
|
|
{ NULL, NULL, NULL }
|
2009-11-05 08:31:34 +08:00
|
|
|
};
|
|
|
|
|
2019-03-09 02:17:47 +08:00
|
|
|
static struct bench syscall_benchmarks[] = {
|
|
|
|
{ "basic", "Benchmark for basic getppid(2) calls", bench_syscall_basic },
|
|
|
|
{ "all", "Run all syscall benchmarks", NULL },
|
|
|
|
{ NULL, NULL, NULL },
|
|
|
|
};
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
static struct bench mem_benchmarks[] = {
|
2015-10-19 16:04:26 +08:00
|
|
|
{ "memcpy", "Benchmark for memcpy() functions", bench_mem_memcpy },
|
|
|
|
{ "memset", "Benchmark for memset() functions", bench_mem_memset },
|
2020-07-30 06:00:34 +08:00
|
|
|
{ "find_bit", "Benchmark for find_bit() functions", bench_mem_find_bit },
|
2015-10-19 16:04:30 +08:00
|
|
|
{ "all", "Run all memory access benchmarks", NULL },
|
2013-10-23 20:37:56 +08:00
|
|
|
{ NULL, NULL, NULL }
|
2009-11-17 23:20:09 +08:00
|
|
|
};
|
|
|
|
|
2013-12-15 12:31:55 +08:00
|
|
|
static struct bench futex_benchmarks[] = {
|
|
|
|
{ "hash", "Benchmark for futex hash table", bench_futex_hash },
|
2013-12-15 12:31:56 +08:00
|
|
|
{ "wake", "Benchmark for futex wake calls", bench_futex_wake },
|
2015-05-09 02:37:59 +08:00
|
|
|
{ "wake-parallel", "Benchmark for parallel futex wake calls", bench_futex_wake_parallel },
|
2013-12-15 12:31:57 +08:00
|
|
|
{ "requeue", "Benchmark for futex requeue calls", bench_futex_requeue },
|
2015-07-07 16:55:53 +08:00
|
|
|
/* pi-futexes */
|
|
|
|
{ "lock-pi", "Benchmark for futex lock_pi calls", bench_futex_lock_pi },
|
2015-10-19 16:04:30 +08:00
|
|
|
{ "all", "Run all futex benchmarks", NULL },
|
2013-12-15 12:31:55 +08:00
|
|
|
{ NULL, NULL, NULL }
|
|
|
|
};
|
|
|
|
|
2020-05-20 23:21:07 +08:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
static struct bench epoll_benchmarks[] = {
|
|
|
|
{ "wait", "Benchmark epoll concurrent epoll_waits", bench_epoll_wait },
|
perf bench: Add epoll_ctl(2) benchmark
Benchmark the various operations allowed for epoll_ctl(2). The idea is
to concurrently stress a single epoll instance doing add/mod/del
operations.
Committer testing:
# perf bench epoll ctl
# Running 'epoll/ctl' benchmark:
Run summary [PID 20344]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0x21a46b0 ... 0x21a47ac [ add: 1680960 ops; mod: 1680960 ops; del: 1680960 ops ]
[thread 1] fdmap: 0x21a4960 ... 0x21a4a5c [ add: 1685440 ops; mod: 1685440 ops; del: 1685440 ops ]
[thread 2] fdmap: 0x21a4c10 ... 0x21a4d0c [ add: 1674368 ops; mod: 1674368 ops; del: 1674368 ops ]
[thread 3] fdmap: 0x21a4ec0 ... 0x21a4fbc [ add: 1677568 ops; mod: 1677568 ops; del: 1677568 ops ]
Averaged 1679584 ADD operations (+- 0.14%)
Averaged 1679584 MOD operations (+- 0.14%)
Averaged 1679584 DEL operations (+- 0.14%)
#
Lets measure those calls with 'perf trace' to get a glympse at what this
benchmark is doing in terms of syscalls:
# perf trace -m32768 -s perf bench epoll ctl
# Running 'epoll/ctl' benchmark:
Run summary [PID 20405]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0x21764e0 ... 0x21765dc [ add: 1100480 ops; mod: 1100480 ops; del: 1100480 ops ]
[thread 1] fdmap: 0x2176790 ... 0x217688c [ add: 1250176 ops; mod: 1250176 ops; del: 1250176 ops ]
[thread 2] fdmap: 0x2176a40 ... 0x2176b3c [ add: 1022464 ops; mod: 1022464 ops; del: 1022464 ops ]
[thread 3] fdmap: 0x2176cf0 ... 0x2176dec [ add: 705472 ops; mod: 705472 ops; del: 705472 ops ]
Averaged 1019648 ADD operations (+- 11.27%)
Averaged 1019648 MOD operations (+- 11.27%)
Averaged 1019648 DEL operations (+- 11.27%)
Summary of events:
epoll-ctl (20405), 1264 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
eventfd2 256 9.514 0.001 0.037 5.243 68.00%
clone 4 1.245 0.204 0.311 0.531 24.13%
mprotect 66 0.345 0.002 0.005 0.021 7.43%
openat 45 0.313 0.004 0.007 0.073 21.93%
mmap 88 0.302 0.002 0.003 0.013 5.02%
futex 4 0.160 0.002 0.040 0.140 83.43%
sched_setaffinity 4 0.124 0.005 0.031 0.070 49.39%
read 44 0.103 0.001 0.002 0.013 15.54%
fstat 40 0.052 0.001 0.001 0.003 5.43%
close 39 0.039 0.001 0.001 0.001 1.48%
stat 9 0.034 0.003 0.004 0.006 7.30%
access 3 0.023 0.007 0.008 0.008 4.25%
open 2 0.021 0.008 0.011 0.013 22.60%
getdents 4 0.019 0.001 0.005 0.009 37.15%
write 2 0.013 0.004 0.007 0.009 38.48%
munmap 1 0.010 0.010 0.010 0.010 0.00%
brk 3 0.006 0.001 0.002 0.003 26.34%
rt_sigprocmask 2 0.004 0.001 0.002 0.003 43.95%
rt_sigaction 3 0.004 0.001 0.001 0.002 16.07%
prlimit64 3 0.004 0.001 0.001 0.001 5.39%
prctl 1 0.003 0.003 0.003 0.003 0.00%
epoll_create 1 0.003 0.003 0.003 0.003 0.00%
lseek 2 0.002 0.001 0.001 0.001 11.42%
sched_getaffinity 1 0.002 0.002 0.002 0.002 0.00%
arch_prctl 1 0.002 0.002 0.002 0.002 0.00%
set_tid_address 1 0.001 0.001 0.001 0.001 0.00%
getpid 1 0.001 0.001 0.001 0.001 0.00%
set_robust_list 1 0.001 0.001 0.001 0.001 0.00%
execve 1 0.000 0.000 0.000 0.000 0.00%
epoll-ctl (20406), 1245480 events, 14.6%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
epoll_ctl 619511 1034.927 0.001 0.002 6.691 0.67%
nanosleep 3226 616.114 0.006 0.191 10.376 7.57%
futex 2 11.336 0.002 5.668 11.334 99.97%
set_robust_list 1 0.001 0.001 0.001 0.001 0.00%
clone 1 0.000 0.000 0.000 0.000 0.00%
epoll-ctl (20407), 1243151 events, 14.5%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
epoll_ctl 618350 1042.181 0.001 0.002 2.512 0.40%
nanosleep 3220 366.261 0.012 0.114 18.162 9.59%
futex 4 5.463 0.001 1.366 5.427 99.12%
set_robust_list 1 0.002 0.002 0.002 0.002 0.00%
epoll-ctl (20408), 1801690 events, 21.1%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
epoll_ctl 896174 1540.581 0.001 0.002 6.987 0.74%
nanosleep 4667 783.393 0.006 0.168 10.419 7.10%
futex 2 4.682 0.002 2.341 4.681 99.93%
set_robust_list 1 0.002 0.002 0.002 0.002 0.00%
clone 1 0.000 0.000 0.000 0.000 0.00%
epoll-ctl (20409), 4254890 events, 49.8%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
epoll_ctl 2116416 3768.097 0.001 0.002 9.956 0.41%
nanosleep 11023 1141.778 0.006 0.104 9.447 4.95%
futex 3 0.037 0.002 0.012 0.029 70.50%
set_robust_list 1 0.008 0.008 0.008 0.008 0.00%
madvise 1 0.005 0.005 0.005 0.005 0.00%
clone 1 0.000 0.000 0.000 0.000 0.00%
#
Committer notes:
Fix build on fedora:24-x-ARC-uClibc, debian:experimental-x-mips,
debian:experimental-x-mipsel, ubuntu:16.04-x-arm and ubuntu:16.04-x-powerpc
CC /tmp/build/perf/bench/epoll-ctl.o
bench/epoll-ctl.c: In function 'init_fdmaps':
bench/epoll-ctl.c:214:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
for (i = 0; i < nfds; i+=inc) {
^
bench/epoll-ctl.c: In function 'bench_epoll_ctl':
bench/epoll-ctl.c:377:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
for (i = 0; i < nthreads; i++) {
^
bench/epoll-ctl.c:388:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
for (i = 0; i < nthreads; i++) {
^
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-3-dave@stgolabs.net
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:26 +08:00
|
|
|
{ "ctl", "Benchmark epoll concurrent epoll_ctls", bench_epoll_ctl },
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
{ "all", "Run all futex benchmarks", NULL },
|
|
|
|
{ NULL, NULL, NULL }
|
|
|
|
};
|
2020-05-20 23:21:07 +08:00
|
|
|
#endif // HAVE_EVENTFD_SUPPORT
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
|
2020-04-02 23:43:53 +08:00
|
|
|
static struct bench internals_benchmarks[] = {
|
|
|
|
{ "synthesize", "Benchmark perf event synthesis", bench_synthesize },
|
perf bench: Add kallsyms parsing
Add a benchmark for kallsyms parsing. Example output:
Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 103.971 ms (+- 0.121 ms)
Committer testing:
Test Machine: AMD Ryzen 5 3600X 6-Core Processor
[root@five ~]# perf bench internals kallsyms-parse
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 79.692 ms (+- 0.101 ms)
[root@five ~]# perf stat -r5 perf bench internals kallsyms-parse
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 80.563 ms (+- 0.079 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 81.046 ms (+- 0.155 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 80.874 ms (+- 0.104 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 81.173 ms (+- 0.133 ms)
# Running 'internals/kallsyms-parse' benchmark:
Average kallsyms__parse took: 81.169 ms (+- 0.074 ms)
Performance counter stats for 'perf bench internals kallsyms-parse' (5 runs):
8,093.54 msec task-clock # 0.999 CPUs utilized ( +- 0.14% )
3,165 context-switches # 0.391 K/sec ( +- 0.18% )
10 cpu-migrations # 0.001 K/sec ( +- 23.13% )
744 page-faults # 0.092 K/sec ( +- 0.21% )
34,551,564,954 cycles # 4.269 GHz ( +- 0.05% ) (83.33%)
1,160,584,308 stalled-cycles-frontend # 3.36% frontend cycles idle ( +- 1.60% ) (83.33%)
14,974,323,985 stalled-cycles-backend # 43.34% backend cycles idle ( +- 0.24% ) (83.33%)
58,712,905,705 instructions # 1.70 insn per cycle
# 0.26 stalled cycles per insn ( +- 0.01% ) (83.34%)
14,136,433,778 branches # 1746.632 M/sec ( +- 0.01% ) (83.33%)
141,943,217 branch-misses # 1.00% of all branches ( +- 0.04% ) (83.33%)
8.1040 +- 0.0115 seconds time elapsed ( +- 0.14% )
[root@five ~]#
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200501221315.54715-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-02 06:13:13 +08:00
|
|
|
{ "kallsyms-parse", "Benchmark kallsyms parsing", bench_kallsyms_parse },
|
perf bench: Add build-id injection benchmark
Sometimes I can see that 'perf record' piped with 'perf inject' take a
long time processing build-ids.
So introduce a inject-build-id benchmark to the internals benchmark
suite to measure its overhead regularly.
It runs the 'perf inject' command internally and feeds the given number
of synthesized events (MMAP2 + SAMPLE basically).
Usage: perf bench internals inject-build-id <options>
-i, --iterations <n> Number of iterations used to compute average (default: 100)
-m, --nr-mmaps <n> Number of mmap events for each iteration (default: 100)
-n, --nr-samples <n> Number of sample events per mmap event (default: 100)
-v, --verbose be more verbose (show iteration count, DSO name, etc)
By default, it measures average processing time of 100 MMAP2 events
and 10000 SAMPLE events. Below is a result on my laptop.
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 25.789 msec (+- 0.202 msec)
Average time per event: 2.528 usec (+- 0.020 usec)
Average memory usage: 8411 KB (+- 7 KB)
Committer testing:
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
$ perf bench internals
# List of available benchmarks for collection 'internals':
synthesize: Benchmark perf event synthesis
kallsyms-parse: Benchmark kallsyms parsing
inject-build-id: Benchmark build-id injection
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.202 msec (+- 0.059 msec)
Average time per event: 1.392 usec (+- 0.006 usec)
Average memory usage: 12650 KB (+- 10 KB)
Average build-id-all injection took: 12.831 msec (+- 0.071 msec)
Average time per event: 1.258 usec (+- 0.007 usec)
Average memory usage: 11895 KB (+- 10 KB)
$
$ perf stat -r5 perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.380 msec (+- 0.056 msec)
Average time per event: 1.410 usec (+- 0.006 usec)
Average memory usage: 12608 KB (+- 11 KB)
Average build-id-all injection took: 11.889 msec (+- 0.064 msec)
Average time per event: 1.166 usec (+- 0.006 usec)
Average memory usage: 11838 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.246 msec (+- 0.065 msec)
Average time per event: 1.397 usec (+- 0.006 usec)
Average memory usage: 12744 KB (+- 10 KB)
Average build-id-all injection took: 12.019 msec (+- 0.066 msec)
Average time per event: 1.178 usec (+- 0.006 usec)
Average memory usage: 11963 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.321 msec (+- 0.067 msec)
Average time per event: 1.404 usec (+- 0.007 usec)
Average memory usage: 12690 KB (+- 10 KB)
Average build-id-all injection took: 11.909 msec (+- 0.041 msec)
Average time per event: 1.168 usec (+- 0.004 usec)
Average memory usage: 11938 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.287 msec (+- 0.059 msec)
Average time per event: 1.401 usec (+- 0.006 usec)
Average memory usage: 12864 KB (+- 10 KB)
Average build-id-all injection took: 11.862 msec (+- 0.058 msec)
Average time per event: 1.163 usec (+- 0.006 usec)
Average memory usage: 12103 KB (+- 10 KB)
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 14.402 msec (+- 0.053 msec)
Average time per event: 1.412 usec (+- 0.005 usec)
Average memory usage: 12876 KB (+- 10 KB)
Average build-id-all injection took: 11.826 msec (+- 0.061 msec)
Average time per event: 1.159 usec (+- 0.006 usec)
Average memory usage: 12111 KB (+- 10 KB)
Performance counter stats for 'perf bench internals inject-build-id' (5 runs):
4,267.48 msec task-clock:u # 1.502 CPUs utilized ( +- 0.14% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
102,092 page-faults:u # 0.024 M/sec ( +- 0.08% )
3,894,589,578 cycles:u # 0.913 GHz ( +- 0.19% ) (83.49%)
140,078,421 stalled-cycles-frontend:u # 3.60% frontend cycles idle ( +- 0.77% ) (83.34%)
948,581,189 stalled-cycles-backend:u # 24.36% backend cycles idle ( +- 0.46% ) (83.25%)
5,835,587,719 instructions:u # 1.50 insn per cycle
# 0.16 stalled cycles per insn ( +- 0.21% ) (83.24%)
1,267,423,636 branches:u # 296.996 M/sec ( +- 0.22% ) (83.12%)
17,484,290 branch-misses:u # 1.38% of all branches ( +- 0.12% ) (83.55%)
2.84176 +- 0.00222 seconds time elapsed ( +- 0.08% )
$
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-12 15:02:09 +08:00
|
|
|
{ "inject-build-id", "Benchmark build-id injection", bench_inject_build_id },
|
perf bench: Add benchmark for evlist open/close operations
This new benchmark finds the total time that is taken to open, mmap,
enable, disable, munmap, close an evlist (time taken for new,
create_maps, config, delete is not counted in).
The evlist can be configured as in perf-record using the
-a,-C,-e,-u,--per-thread,-t,-p options.
The events can be duplicated in the evlist to quickly test performance
with many events using the -n options.
Furthermore, also the number of iterations used to calculate the
statistics is customizable.
Examples:
- Open one dummy event system-wide:
$ sudo ./perf bench internals evlist-open-close
Number of cpus: 4
Number of threads: 1
Number of events: 1 (4 fds)
Number of iterations: 100
Average open-close took: 613.870 usec (+- 32.852 usec)
- Open the group '{cs,cycles}' on CPU 0
$ sudo ./perf bench internals evlist-open-close -e '{cs,cycles}' -C 0
Number of cpus: 1
Number of threads: 1
Number of events: 2 (2 fds)
Number of iterations: 100
Average open-close took: 8503.220 usec (+- 252.652 usec)
- Open 10 'cycles' events for user 0, calculate average over 100 runs
$ sudo ./perf bench internals evlist-open-close -e cycles -n 10 -u 0 -i 100
Number of cpus: 4
Number of threads: 328
Number of events: 10 (13120 fds)
Number of iterations: 100
Average open-close took: 180043.140 usec (+- 2295.889 usec)
Committer notes:
Replaced a deprecated bzero() call with designated initialized zeroing.
Added some missing evlist allocation checks, one noted by Riccardo on
the mailing list.
Minor cosmetic changes (sent in private).
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210809201101.277594-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-10 04:11:02 +08:00
|
|
|
{ "evlist-open-close", "Benchmark evlist open and close", bench_evlist_open_close },
|
2020-04-02 23:43:53 +08:00
|
|
|
{ NULL, NULL, NULL }
|
|
|
|
};
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
struct collection {
|
|
|
|
const char *name;
|
|
|
|
const char *summary;
|
|
|
|
struct bench *benchmarks;
|
2009-11-05 08:31:34 +08:00
|
|
|
};
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
static struct collection collections[] = {
|
2013-12-15 12:31:55 +08:00
|
|
|
{ "sched", "Scheduler and IPC benchmarks", sched_benchmarks },
|
2019-03-09 02:17:47 +08:00
|
|
|
{ "syscall", "System call benchmarks", syscall_benchmarks },
|
2013-10-23 20:37:56 +08:00
|
|
|
{ "mem", "Memory access benchmarks", mem_benchmarks },
|
2013-09-30 18:07:11 +08:00
|
|
|
#ifdef HAVE_LIBNUMA_SUPPORT
|
2013-10-23 20:37:56 +08:00
|
|
|
{ "numa", "NUMA scheduling and MM benchmarks", numa_benchmarks },
|
2013-01-28 09:51:22 +08:00
|
|
|
#endif
|
2013-12-15 12:31:55 +08:00
|
|
|
{"futex", "Futex stressing benchmarks", futex_benchmarks },
|
2020-05-20 23:21:07 +08:00
|
|
|
#ifdef HAVE_EVENTFD_SUPPORT
|
perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-06 23:22:25 +08:00
|
|
|
{"epoll", "Epoll stressing benchmarks", epoll_benchmarks },
|
|
|
|
#endif
|
2020-04-02 23:43:53 +08:00
|
|
|
{ "internals", "Perf-internals benchmarks", internals_benchmarks },
|
2013-10-23 20:37:56 +08:00
|
|
|
{ "all", "All benchmarks", NULL },
|
|
|
|
{ NULL, NULL, NULL }
|
2009-11-05 08:31:34 +08:00
|
|
|
};
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
/* Iterate over all benchmark collections: */
|
|
|
|
#define for_each_collection(coll) \
|
|
|
|
for (coll = collections; coll->name; coll++)
|
|
|
|
|
|
|
|
/* Iterate over all benchmarks within a collection: */
|
|
|
|
#define for_each_bench(coll, bench) \
|
2014-03-13 06:40:51 +08:00
|
|
|
for (bench = coll->benchmarks; bench && bench->name; bench++)
|
2013-10-23 20:37:56 +08:00
|
|
|
|
|
|
|
static void dump_benchmarks(struct collection *coll)
|
2009-11-05 08:31:34 +08:00
|
|
|
{
|
2013-10-23 20:37:56 +08:00
|
|
|
struct bench *bench;
|
2009-11-05 08:31:34 +08:00
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
printf("\n # List of available benchmarks for collection '%s':\n\n", coll->name);
|
2009-11-05 08:31:34 +08:00
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
for_each_bench(coll, bench)
|
|
|
|
printf("%14s: %s\n", bench->name, bench->summary);
|
2009-11-05 08:31:34 +08:00
|
|
|
|
|
|
|
printf("\n");
|
|
|
|
}
|
|
|
|
|
2010-05-18 03:22:41 +08:00
|
|
|
static const char *bench_format_str;
|
2013-10-23 20:37:56 +08:00
|
|
|
|
|
|
|
/* Output/formatting style, exported to benchmark modules: */
|
2009-11-10 07:20:00 +08:00
|
|
|
int bench_format = BENCH_FORMAT_DEFAULT;
|
2014-06-17 02:14:19 +08:00
|
|
|
unsigned int bench_repeat = 10; /* default number of times to repeat the run */
|
2009-11-10 07:20:00 +08:00
|
|
|
|
|
|
|
static const struct option bench_options[] = {
|
2015-10-19 16:04:22 +08:00
|
|
|
OPT_STRING('f', "format", &bench_format_str, "default|simple", "Specify the output formatting style"),
|
2014-06-17 02:14:19 +08:00
|
|
|
OPT_UINTEGER('r', "repeat", &bench_repeat, "Specify amount of times to repeat the run"),
|
2009-11-10 07:20:00 +08:00
|
|
|
OPT_END()
|
|
|
|
};
|
|
|
|
|
|
|
|
static const char * const bench_usage[] = {
|
2013-10-23 20:37:56 +08:00
|
|
|
"perf bench [<common options>] <collection> <benchmark> [<options>]",
|
2009-11-10 07:20:00 +08:00
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
static void print_usage(void)
|
|
|
|
{
|
2013-10-23 20:37:56 +08:00
|
|
|
struct collection *coll;
|
2009-11-10 07:20:00 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
printf("Usage: \n");
|
|
|
|
for (i = 0; bench_usage[i]; i++)
|
|
|
|
printf("\t%s\n", bench_usage[i]);
|
|
|
|
printf("\n");
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
printf(" # List of all available benchmark collections:\n\n");
|
2009-11-10 07:20:00 +08:00
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
for_each_collection(coll)
|
|
|
|
printf("%14s: %s\n", coll->name, coll->summary);
|
2009-11-10 07:20:00 +08:00
|
|
|
printf("\n");
|
|
|
|
}
|
|
|
|
|
2010-05-18 03:22:41 +08:00
|
|
|
static int bench_str2int(const char *str)
|
2009-11-10 07:20:00 +08:00
|
|
|
{
|
|
|
|
if (!str)
|
|
|
|
return BENCH_FORMAT_DEFAULT;
|
|
|
|
|
|
|
|
if (!strcmp(str, BENCH_FORMAT_DEFAULT_STR))
|
|
|
|
return BENCH_FORMAT_DEFAULT;
|
|
|
|
else if (!strcmp(str, BENCH_FORMAT_SIMPLE_STR))
|
|
|
|
return BENCH_FORMAT_SIMPLE;
|
|
|
|
|
|
|
|
return BENCH_FORMAT_UNKNOWN;
|
|
|
|
}
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
/*
|
|
|
|
* Run a specific benchmark but first rename the running task's ->comm[]
|
|
|
|
* to something meaningful:
|
|
|
|
*/
|
|
|
|
static int run_bench(const char *coll_name, const char *bench_name, bench_fn_t fn,
|
2017-03-27 22:47:20 +08:00
|
|
|
int argc, const char **argv)
|
2009-12-13 16:01:59 +08:00
|
|
|
{
|
2013-10-23 20:37:56 +08:00
|
|
|
int size;
|
|
|
|
char *name;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
size = strlen(coll_name) + 1 + strlen(bench_name) + 1;
|
|
|
|
|
|
|
|
name = zalloc(size);
|
|
|
|
BUG_ON(!name);
|
|
|
|
|
|
|
|
scnprintf(name, size, "%s-%s", coll_name, bench_name);
|
|
|
|
|
|
|
|
prctl(PR_SET_NAME, name);
|
|
|
|
argv[0] = name;
|
|
|
|
|
2017-03-27 22:47:20 +08:00
|
|
|
ret = fn(argc, argv);
|
2013-10-23 20:37:56 +08:00
|
|
|
|
|
|
|
free(name);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void run_collection(struct collection *coll)
|
|
|
|
{
|
|
|
|
struct bench *bench;
|
2009-12-13 16:01:59 +08:00
|
|
|
const char *argv[2];
|
|
|
|
|
|
|
|
argv[1] = NULL;
|
|
|
|
/*
|
|
|
|
* TODO:
|
2013-10-23 20:37:56 +08:00
|
|
|
*
|
|
|
|
* Preparing preset parameters for
|
2009-12-13 16:01:59 +08:00
|
|
|
* embedded, ordinary PC, HPC, etc...
|
2013-10-23 20:37:56 +08:00
|
|
|
* would be helpful.
|
2009-12-13 16:01:59 +08:00
|
|
|
*/
|
2013-10-23 20:37:56 +08:00
|
|
|
for_each_bench(coll, bench) {
|
|
|
|
if (!bench->fn)
|
|
|
|
break;
|
|
|
|
printf("# Running %s/%s benchmark...\n", coll->name, bench->name);
|
2013-01-08 17:39:26 +08:00
|
|
|
fflush(stdout);
|
2009-12-13 16:01:59 +08:00
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
argv[1] = bench->name;
|
2017-03-27 22:47:20 +08:00
|
|
|
run_bench(coll->name, bench->name, bench->fn, 1, argv);
|
2009-12-13 16:01:59 +08:00
|
|
|
printf("\n");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
static void run_all_collections(void)
|
2009-12-13 16:01:59 +08:00
|
|
|
{
|
2013-10-23 20:37:56 +08:00
|
|
|
struct collection *coll;
|
|
|
|
|
|
|
|
for_each_collection(coll)
|
|
|
|
run_collection(coll);
|
2009-12-13 16:01:59 +08:00
|
|
|
}
|
|
|
|
|
2017-03-27 22:47:20 +08:00
|
|
|
int cmd_bench(int argc, const char **argv)
|
2009-11-05 08:31:34 +08:00
|
|
|
{
|
2013-10-23 20:37:56 +08:00
|
|
|
struct collection *coll;
|
|
|
|
int ret = 0;
|
2009-11-05 08:31:34 +08:00
|
|
|
|
|
|
|
if (argc < 2) {
|
2013-10-23 20:37:56 +08:00
|
|
|
/* No collection specified. */
|
2009-11-10 07:20:00 +08:00
|
|
|
print_usage();
|
|
|
|
goto end;
|
|
|
|
}
|
2009-11-05 08:31:34 +08:00
|
|
|
|
2009-11-10 07:20:00 +08:00
|
|
|
argc = parse_options(argc, argv, bench_options, bench_usage,
|
|
|
|
PARSE_OPT_STOP_AT_NON_OPTION);
|
|
|
|
|
|
|
|
bench_format = bench_str2int(bench_format_str);
|
|
|
|
if (bench_format == BENCH_FORMAT_UNKNOWN) {
|
2013-10-23 20:37:56 +08:00
|
|
|
printf("Unknown format descriptor: '%s'\n", bench_format_str);
|
2009-11-10 07:20:00 +08:00
|
|
|
goto end;
|
|
|
|
}
|
2009-11-05 08:31:34 +08:00
|
|
|
|
2014-06-17 02:14:19 +08:00
|
|
|
if (bench_repeat == 0) {
|
|
|
|
printf("Invalid repeat option: Must specify a positive value\n");
|
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2009-11-10 07:20:00 +08:00
|
|
|
if (argc < 1) {
|
|
|
|
print_usage();
|
2009-11-05 08:31:34 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2009-12-13 16:01:59 +08:00
|
|
|
if (!strcmp(argv[0], "all")) {
|
2013-10-23 20:37:56 +08:00
|
|
|
run_all_collections();
|
2009-12-13 16:01:59 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
for_each_collection(coll) {
|
|
|
|
struct bench *bench;
|
|
|
|
|
|
|
|
if (strcmp(coll->name, argv[0]))
|
2009-11-05 08:31:34 +08:00
|
|
|
continue;
|
|
|
|
|
2009-11-10 07:20:00 +08:00
|
|
|
if (argc < 2) {
|
2013-10-23 20:37:56 +08:00
|
|
|
/* No bench specified. */
|
|
|
|
dump_benchmarks(coll);
|
2009-11-05 08:31:34 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2009-12-13 16:01:59 +08:00
|
|
|
if (!strcmp(argv[1], "all")) {
|
2013-10-23 20:37:56 +08:00
|
|
|
run_collection(coll);
|
2009-12-13 16:01:59 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
for_each_bench(coll, bench) {
|
|
|
|
if (strcmp(bench->name, argv[1]))
|
2009-11-05 08:31:34 +08:00
|
|
|
continue;
|
|
|
|
|
2009-11-10 23:04:00 +08:00
|
|
|
if (bench_format == BENCH_FORMAT_DEFAULT)
|
2013-10-23 20:37:56 +08:00
|
|
|
printf("# Running '%s/%s' benchmark:\n", coll->name, bench->name);
|
2013-01-08 17:39:26 +08:00
|
|
|
fflush(stdout);
|
2017-03-27 22:47:20 +08:00
|
|
|
ret = run_bench(coll->name, bench->name, bench->fn, argc-1, argv+1);
|
2009-11-05 08:31:34 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2009-11-10 07:20:00 +08:00
|
|
|
if (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
|
2013-10-23 20:37:56 +08:00
|
|
|
dump_benchmarks(coll);
|
2009-11-05 08:31:34 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
printf("Unknown benchmark: '%s' for collection '%s'\n", argv[1], argv[0]);
|
|
|
|
ret = 1;
|
2009-11-05 08:31:34 +08:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2013-10-23 20:37:56 +08:00
|
|
|
printf("Unknown collection: '%s'\n", argv[0]);
|
|
|
|
ret = 1;
|
2009-11-05 08:31:34 +08:00
|
|
|
|
|
|
|
end:
|
2013-10-23 20:37:56 +08:00
|
|
|
return ret;
|
2009-11-05 08:31:34 +08:00
|
|
|
}
|