Commit Graph

20 Commits

Author SHA1 Message Date
Bruce Merry e17fdaeaec perf bench: Fix order of arguments to memcpy_alloc_mem
This was causing the destination instead of the source to be filled.  As
a result, the source was typically all mapped to one zero page, and
hence very cacheable.

Signed-off-by: Bruce Merry <bmerry@ska.ac.za>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150115092022.GA11292@kryton
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-22 23:10:56 -03:00
Rabin Vincent 1182f88311 perf bench: Fix memcpy/memset output
The memcpy and memset benchmarks return bogus results when iterations >
0 because the iterations value is not taken into account when
calculating the final result:

 $ perf bench mem memset --only-prefault --length 1GB --iterations 1
 # Running 'mem/memset' benchmark:
 # Copying 1GB Bytes ...

       20.798669 GB/Sec (with prefault)
 $ perf bench mem memset --only-prefault --length 1GB --iterations 10
 # Running 'mem/memset' benchmark:
 # Copying 1GB Bytes ...

        2.086576 GB/Sec (with prefault)
 $ perf bench mem memset --only-prefault --length 1GB --iterations 100
 # Running 'mem/memset' benchmark:
 # Copying 1GB Bytes ...

      212.840917 MB/Sec (with prefault)

Fix this.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabin@rab.in>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1417535441-3965-3-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-09 09:14:08 -03:00
Rabin Vincent 5bce1a5772 perf bench: Merge memset into memcpy
The memset benchmark is largely copy-pasted from the memcpy benchmark.
Merge the two now that memcpy is made more generic.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1417535441-3965-2-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-09 09:14:05 -03:00
Rabin Vincent 308197b947 perf bench: Prepare memcpy for merge
The memset benchmark is largely copy-pasted from the memcpy benchmark.
Prepare the memcpy file for merge with memset by extracting out a
generic function.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1417535441-3965-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-09 09:14:00 -03:00
Yann Droneaud 57480d2cd9 perf tools: Enable close-on-exec flag on perf file descriptor
In commit a21b0b354d ('perf: Introduce a flag to enable
close-on-exec in perf_event_open()'), flag PERF_FLAG_FD_CLOEXEC
was added to perf_event_open(2) syscall to allows userspace
to atomically enable close-on-exec behavor when creating
the file descriptor.

This patch makes perf tools use the new flag if supported
by the kernel, so that the event file descriptors got
automatically closed if perf tool exec a sub-command.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/1404160127-7475-1-git-send-email-ydroneaud@opteya.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-07-18 09:09:34 +02:00
Davidlohr Bueso 424e963488 perf bench mem: The -o and -n options are mutually exclusive
-o, --only-prefault   Show only the result with page faults before mem*
 -n, --no-prefault     Show only the result without page faults before mem*

Makes no sense to call together. Applies to both memset and memcpy.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1402942467-10671-8-git-send-email-davidlohr@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-06-19 16:13:16 -03:00
Ingo Molnar 89fe808ae7 tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORT
Standardize all the feature flags based on the HAVE_{FEATURE}_SUPPORT naming convention:

		HAVE_ARCH_X86_64_SUPPORT
		HAVE_BACKTRACE_SUPPORT
		HAVE_CPLUS_DEMANGLE_SUPPORT
		HAVE_DWARF_SUPPORT
		HAVE_ELF_GETPHDRNUM_SUPPORT
		HAVE_GTK2_SUPPORT
		HAVE_GTK_INFO_BAR_SUPPORT
		HAVE_LIBAUDIT_SUPPORT
		HAVE_LIBELF_MMAP_SUPPORT
		HAVE_LIBELF_SUPPORT
		HAVE_LIBNUMA_SUPPORT
		HAVE_LIBUNWIND_SUPPORT
		HAVE_ON_EXIT_SUPPORT
		HAVE_PERF_REGS_SUPPORT
		HAVE_SLANG_SUPPORT
		HAVE_STRLCPY_SUPPORT

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-u3zvqejddfZhtrbYbfhi3spa@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 08:48:28 +02:00
Andi Kleen a198996c7a perf bench: Fix memcpy benchmark for large sizes
The glibc calloc() function has an optimization to not explicitely
memset() very large calloc allocations that just came from mmap(),
because they are known to be zero.

This could result in the perf memcpy benchmark reading only from
the zero page, which gives unrealistic results.

Always call memset explicitly on the source area to avoid this problem.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-pzz2qrdq9eymxda0y8yxdn33@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-07-22 12:41:56 -03:00
Kirill A. Shutemov 13966721a1 perf bench: Fix memory allocation fail check in mem{set,cpy} workloads
Addresses of allocated memory areas saved to '*src' and '*dst', so we
need to check them for NULL, not 'src' and 'dst'.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1370518503-4230-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-07-08 17:35:40 -03:00
Irina Tirdea 1d037ca164 perf tools: Use __maybe_used for unused variables
perf defines both __used and __unused variables to use for marking
unused variables. The variable __used is defined to
__attribute__((__unused__)), which contradicts the kernel definition to
__attribute__((__used__)) for new gcc versions. On Android, __used is
also defined in system headers and this leads to warnings like: warning:
'__used__' attribute ignored

__unused is not defined in the kernel and is not a standard definition.
If __unused is included everywhere instead of __used, this leads to
conflicts with glibc headers, since glibc has a variables with this name
in its headers.

The best approach is to use __maybe_unused, the definition used in the
kernel for __attribute__((unused)). In this way there is only one
definition in perf sources (instead of 2 definitions that point to the
same thing: __used and __unused) and it works on both Linux and Android.
This patch simply replaces all instances of __used and __unused with
__maybe_unused.

Signed-off-by: Irina Tirdea <irina.tirdea@intel.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
[ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-11 12:19:15 -03:00
Hitoshi Mitake 17d7a1123f perf bench: Fix confused variable namings and descriptions in mem subsystem
As Namhyung Kim pointed, there are confused namings and descriptions of words
"cycle" and "clock" in mem-memset.c and mem-memcpy.c.

With the option "-c" (or "--clock", now renamed as "--cycle"), mem subsystem
measures cost of memset() and memcpy() with cpu-cycles event.

But current mem subsystem source code contains lots of confused variable
namings and descriptions with "clock" (e.g. the variable use_clock). This is a
very bad style because there is another software event named "cpu-clock". This
patch replaces wrong usage of "clock" to "cycle".

v2: modified Documentation/perf-bench.txt for the descriptions of
--cycle option

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1341236777-18457-1-git-send-email-h.mitake@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-07-02 14:35:45 -03:00
Namhyung Kim 08942f6d5d perf bench: Documentation update
The current perf-bench documentation has a couple of typos and even
lacks entire description of mem subsystem. Fix it.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-06-27 13:17:48 -03:00
Namhyung Kim d30d4a080d perf tools: Remove unnecessary ctype.h inclusion
There are unnecessary #include <ctype.h> out there, and they might cause
a nasty build failure in some environment. As we already have most of
ctype macros in util.h, just get rid of them.

A few of exceptions are util/symbol.c which needs isupper() macro util.h
doesn't provide and perl scripting support code which includes ctype.h
internally.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1327827356-8786-4-git-send-email-namhyung@gmail.com
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-01-30 18:37:35 -02:00
Jan Beulich e3e877e79b perf bench: Allow passing an iteration count to "bench mem mem{cpy,set}"
"perf stat ... perf bench mem mem..." is pretty meaningless when using
small block sizes (as the overhead of the invocation of each test run
basically hides the actual test result in the noise). Repeating the
actually interesting function's invocation a number of times allows the
results to become meaningful.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F16D767020000780006D738@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-01-24 20:26:10 -02:00
Hitoshi Mitake 49ce8fc651 perf bench: Print both of prefaulted and no prefaulted results by default
After applying this patch, perf bench mem memcpy prints
both of prefualted and without prefaulted score of memcpy().

New options --no-prefault and --only-prefault are added
to print single result, mainly for scripting usage.

Usage example:

 | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB
 | # Running mem/memcpy benchmark...
 | # Copying 500MB Bytes ...
 |
 |      634.969014 MB/Sec
 |        4.828062 GB/Sec (with prefault)
 | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault
 | # Running mem/memcpy benchmark...
 | # Copying 500MB Bytes ...
 |
 |        4.705192 GB/Sec (with prefault)
 | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault
 | # Running mem/memcpy benchmark...
 | # Copying 500MB Bytes ...
 |
 |      642.725568 MB/Sec

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: h.mitake@gmail.com
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ma Ling <ling.ma@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1290668693-27068-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26 08:15:57 +01:00
Ian Munsie c055564217 perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR()
Parsing an option from the command line with OPT_BOOLEAN on a
bool data type would not work on a big-endian machine due to the
manner in which the boolean was being cast into an int and
incremented. For example, running 'perf probe --list' on a
PowerPC machine would fail to properly set the list_events bool
and would therefore print out the usage information and
terminate.

This patch makes OPT_BOOLEAN work as expected with a bool
datatype. For cases where the original OPT_BOOLEAN was
intentionally being used to increment an int each time it was
passed in on the command line, this patch introduces OPT_INCR
with the old behaviour of OPT_BOOLEAN (the verbose variable is
currently the only such example of this).

I have reviewed every use of OPT_BOOLEAN to verify that a true
C99 bool was passed. Where integers were used, I verified that
they were only being used for boolean logic and changed them to
bools to ensure that they would not be mistakenly used as ints.
The major exception was the verbose variable which now uses
OPT_INCR instead of OPT_BOOLEAN.

Signed-off-by: Ian Munsie <imunsie@au.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport
Cc: Git development list <git@vger.kernel.org>
Cc: Ian Munsie <imunsie@au1.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: WANG Cong <amwang@redhat.com>
Cc: Thiago Farina <tfransosi@gmail.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-14 11:26:44 +02:00
Arnaldo Carvalho de Melo e206d556c5 perf tools: Move the prototypes in util/string.h to util.h
So that we avoid conflict with libc's string.h header.

Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-03 10:19:26 -03:00
Arnaldo Carvalho de Melo 364794845c perf tools: Introduce zalloc() for the common calloc(1, N) case
This way we type less characters and it looks more like the
kzalloc kernel counterpart.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1259071517-3242-3-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-24 16:37:02 +01:00
Hitoshi Mitake 12eac0bf04 perf bench: Make the mem/memcpy tests more user-friendly
mem-memcpy.c uses perf event system calls to obtain CPU clocks.
And it suddenly dies with BUG_ON() when it running on Linux
doesn't support perf event.

Also fail at calloc() can occur easily when too large
length is passed. Fail of calloc() causes sudden death
with assert().

These behaviours are not friendly. So I fixed the treating of
errors.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1258688237-3797-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
[ v2: improved a few small details ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-22 09:41:06 +01:00
Hitoshi Mitake 827f3b4974 perf bench: Add memcpy() benchmark
'perf bench mem memcpy' is a benchmark suite for measuring memcpy()
performance.

Example on a Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz:

| % perf bench mem memcpy -l 1GB
| # Running mem/memcpy benchmark...
| # Copying 1MB Bytes from 0xb7d98008 to 0xb7e99008 ...
|
|     726.216412 MB/Sec

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1258471212-30281-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
[ v2: updated changelog, clarified history of builtin-bench.c ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-19 06:21:48 +01:00