Commit Graph

7709 Commits

Author SHA1 Message Date
Ian Rogers 8586d2744f perf metrics: Don't add all tool events for sharing
Tool events are added to the set of events for parsing so that having a
tool event in a metric doesn't inhibit event sharing of events between
metrics.

All tool events were added but this meant unused tool events would be
counted. Reduce this set of tool events to just those present in the
overall metric list.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:18:48 -03:00
Ian Rogers 9aa09230f0 perf metrics: Support all tool events
Previously duration_time was hard coded, which was ok until commit
b03b89b350 ("perf stat: Add user_time and system_time events")
added additional tool events. Do for all tool events what was previously
done just for duration_time.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:16:49 -03:00
Ian Rogers 79932d161f perf evsel: Add tool event helpers
Convert to and from a string. Fix evsel__tool_name() as array is
off-by-1.  Support more than just duration_time as a metric-id.

Fixes: 75eafc970b ("perf list: Print all available tool events")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:13:00 -03:00
Ian Rogers 545a96c90f perf evsel: Constify a few arrays
Remove public definition of evsel__tool_names(). Not used outside
util/evsel.c.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:12:14 -03:00
Ian Rogers 17b3867d97 Revert "perf stat: Support metrics with hybrid events"
This reverts commit 60344f1a9a.

Hybrid metrics place a PMU at the end of the parse string. This is also
where tool events are placed. The behavior of the parse string isn't
clear and so revert the change for now.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:09:44 -03:00
Ian Rogers 0255571a16 perf cpumap: Switch to using perf_cpu_map API
Switch some raw accesses to the cpu map to using the library API. This
can help with reference count checking. Some BPF cases switch from index
to CPU for consistency, this shouldn't matter as the CPU map is full.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220503041757.2365696-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-05 14:07:27 -03:00
Ian Rogers 570c44a01b perf stat: Avoid printing cpus with no counters
perf_evlist's user_requested_cpus can contain CPUs not present in any
evsel's cpus, for example uncore counters. Avoid printing the prefix and
trailing \n until the first valid counter is encountered.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220503041757.2365696-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-03 11:19:17 -03:00
Yang Jihong 4d27cf1d9d perf tools: Add missing headers needed by util/data.h
'struct perf_data' in util/data.h uses the "u64" data type, which is
defined in "linux/types.h".

If we only include util/data.h, the following compilation error occurs:

  util/data.h:38:3: error: unknown type name ‘u64’
     u64    version;
     ^~~

Solution: include "linux/types.h." to add the needed type definitions.

Fixes: 258031c017 ("perf header: Add DIR_FORMAT feature to describe directory data")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220429090539.212448-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-30 12:30:16 -03:00
Arnaldo Carvalho de Melo 3297e5547b Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes from perf/urgent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-30 12:23:24 -03:00
Namhyung Kim a5d20d42a2 perf symbol: Remove arch__symbols__fixup_end()
Now the generic code can handle kallsyms fixup properly so no need to
keep the arch-functions anymore.

Fixes: 3cf6a32f3f ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:51:40 -03:00
Namhyung Kim 8799ebce84 perf symbol: Update symbols__fixup_end()
Now arch-specific functions all do the same thing.  When it fixes the
symbol address it needs to check the boundary between the kernel image
and modules.  For the last symbol in the previous region, it cannot
know the exact size as it's discarded already.  Thus it just uses a
small page size (4096) and rounds it up like the last symbol.

Fixes: 3cf6a32f3f ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:51:33 -03:00
Namhyung Kim 838425f2de perf symbol: Pass is_kallsyms to symbols__fixup_end()
The symbol fixup is necessary for symbols in kallsyms since they don't
have size info.  So we use the next symbol's address to calculate the
size.  Now it's also used for user binaries because sometimes they miss
size for hand-written asm functions.

There's a arch-specific function to handle kallsyms differently but
currently it cannot distinguish kallsyms from others.  Pass this
information explicitly to handle it properly.  Note that those arch
functions will be moved to the generic function so I didn't added it to
the arch-functions.

Fixes: 3cf6a32f3f ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:51:20 -03:00
Timothy Hayes 7599b70a3c perf arm-spe: Fix SPE events with phys addresses
This patch corrects a bug whereby SPE collection is invoked with
pa_enable=1 but synthesized events fail to show physical addresses.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220421165205.117662-3-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:39:28 -03:00
Timothy Hayes 4e13f6706d perf arm-spe: Fix addresses of synthesized SPE events
This patch corrects a bug whereby synthesized events from SPE
samples are missing virtual addresses.

Fixes: 54f7815efe ("perf arm-spe: Fill address info for samples")
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220421165205.117662-2-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:39:14 -03:00
Adrian Hunter de8fd13843 perf intel-pt: Fix timeless decoding with perf.data directory
Intel PT does not capture data in separate directories, so do not
use separate directory processing because it doesn't work for
timeless decoding. It also looks like it doesn't support one_mmap
handling.

Example:

  Before:

    # perf record --kcore -a -e intel_pt/tsc=0/k sleep 0.1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 1.799 MB perf.data ]
    # perf script --itrace=bep | head
    #

  After:

    # perf script --itrace=bep | head
    perf 21073 [000]              psb:  psb offs: 0                       ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms])
    perf 21073 [000]              cbr:  cbr: 45 freq: 4505 MHz (161%)     ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:                 0 [unknown] ([unknown]) => ffffffffaa68faf6 native_write_msr+0x6 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa68faf8 native_write_msr+0x8 ([kernel.kallsyms]) => ffffffffaa61aab0 pt_config_start+0x60 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa61aabd pt_config_start+0x6d ([kernel.kallsyms]) => ffffffffaa61b8ad pt_event_start+0x27d ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa61b8bb pt_event_start+0x28b ([kernel.kallsyms]) => ffffffffaa61ba60 pt_event_add+0x40 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa61ba76 pt_event_add+0x56 ([kernel.kallsyms]) => ffffffffaa880e86 event_sched_in+0xc6 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa880e9b event_sched_in+0xdb ([kernel.kallsyms]) => ffffffffaa880ea5 event_sched_in+0xe5 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa880eba event_sched_in+0xfa ([kernel.kallsyms]) => ffffffffaa880f96 event_sched_in+0x1d6 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa880fc8 event_sched_in+0x208 ([kernel.kallsyms]) => ffffffffaa880ec0 event_sched_in+0x100 ([kernel.kallsyms])

Fixes: bb6be405c4 ("perf session: Load data directory files for analysis")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220428093109.274641-1-adrian.hunter@intel.com
Cc: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linux-kernel@vger.kernel.org
2022-04-28 10:20:52 -03:00
Arnaldo Carvalho de Melo e0c1b8f9eb Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes, such as the llvm one for ubuntu:22.04.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-24 07:50:49 -03:00
Zhengjun Xing d7e3c39708 perf stat: Support hybrid --topdown option
Since for cpu_core or cpu_atom, they have different topdown events
groups.

For cpu_core, --topdown equals to:

"{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,
  cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,
  cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,
  cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"

For cpu_atom, --topdown equals to:

"{cpu_atom/topdown-retiring/,cpu_atom/topdown-bad-spec/,
 cpu_atom/topdown-fe-bound/,cpu_atom/topdown-be-bound/}"

To simplify the implementation, on hybrid, --topdown is used
together with --cputype. If without --cputype, it uses cpu_core
topdown events by default.

  # ./perf stat --topdown -a  sleep 1
  WARNING: default to use cpu_core topdown events

   Performance counter stats for 'system wide':

              retiring      bad speculation       frontend bound        backend bound     heavy operations     light operations    branch mispredict       machine clears        fetch latency      fetch bandwidth         memory bound           Core bound
                  4.1%                 0.0%                 5.1%                90.8%                 2.3%                 1.8%                 0.0%                 0.0%                 4.2%                 0.9%                 9.9%                81.0%

         1.002624229 seconds time elapsed

  # ./perf stat --topdown -a --cputype atom  sleep 1

   Performance counter stats for 'system wide':

              retiring      bad speculation       frontend bound        backend bound
                 13.5%                 0.1%                31.2%                55.2%

         1.002366987 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-3-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-24 07:50:18 -03:00
Guilherme Amadio d22588d73b perf clang: Fix header include for LLVM >= 14
The header TargetRegistry.h has moved in LLVM/clang 14.

Committer notes:

The problem as noticed when building in ubuntu:22.04:

    90    98.61 ubuntu:22.04                  : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
      util/c++/clang.cpp:23:10: fatal error: llvm/Support/TargetRegistry.h: No such file or directory
         23 | #include "llvm/Support/TargetRegistry.h"
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.

Fixed after applying this patch.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Guilherme Amadio <amadio@gentoo.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://twitter.com/GuilhermeAmadio/status/1514970524232921088
Link: http://lore.kernel.org/lkml/Ylp0M/VYgHOxtcnF@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 18:39:34 -03:00
Zhengjun Xing 2c8e64514a perf stat: Merge event counts from all hybrid PMUs
For hybrid events, by default stat aggregates and reports the event counts
per pmu.

  # ./perf stat -e cycles -a  sleep 1

   Performance counter stats for 'system wide':

      14,066,877,268      cpu_core/cycles/
       6,814,443,147      cpu_atom/cycles/

         1.002760625 seconds time elapsed

Sometimes, it's also useful to aggregate event counts from all PMUs.
Create a new option '--hybrid-merge' to enable that behavior and report
the counts without PMUs.

  # ./perf stat -e cycles -a --hybrid-merge  sleep 1

   Performance counter stats for 'system wide':

      20,732,982,512      cycles

         1.002776793 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 14:23:35 -03:00
Zhengjun Xing 60344f1a9a perf stat: Support metrics with hybrid events
One metric such as 'Kernel_Utilization' may be from different PMUs and
consists of different events.

For core,
Kernel_Utilization = cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread

For atom,
Kernel_Utilization = cpu_clk_unhalted.core:k / cpu_clk_unhalted.core

The metric group string for core is:
'{cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.thread_p/metric-id=cpu_clk_unhalted.thread_p:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W#cpu_core'

The metric group string for atom is:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W#cpu_atom'

That means the group "{cpu_clk_unhalted.thread:k,cpu_clk_unhalted.thread}:W"
is from cpu_core PMU and the group "{cpu_clk_unhalted.core:k,cpu_clk_unhalted.core}"
is from cpu_atom PMU. And then next, check if the events in the group are
valid on that PMU. If one event is not valid on that PMU, the associated
group would be removed internally.

In this example, cpu_clk_unhalted.thread is valid on cpu_core and
cpu_clk_unhalted.core is valid on cpu_atom. So the checks for these two
groups are passed.

Before:

  # ./perf stat -M Kernel_Utilization -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
  anon group { CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD, CPU_CLK_UNHALTED.THREAD }

 Performance counter stats for 'system wide':

        17,639,501      cpu_atom/CPU_CLK_UNHALTED.CORE/ #     1.00 Kernel_Utilization
        17,578,757      cpu_atom/CPU_CLK_UNHALTED.CORE:k/
     1,005,350,226 ns   duration_time
        43,012,352      cpu_core/CPU_CLK_UNHALTED.THREAD_P:k/ #     0.99 Kernel_Utilization
        17,608,010      cpu_atom/CPU_CLK_UNHALTED.THREAD_P:k/
        43,608,755      cpu_core/CPU_CLK_UNHALTED.THREAD/
        17,630,838      cpu_atom/CPU_CLK_UNHALTED.THREAD/
     1,005,350,226 ns   duration_time

       1.005350226 seconds time elapsed

After:

  # ./perf stat -M Kernel_Utilization -a sleep 1

 Performance counter stats for 'system wide':

        17,981,895      CPU_CLK_UNHALTED.CORE [cpu_atom] #     1.00 Kernel_Utilization
        17,925,405      CPU_CLK_UNHALTED.CORE:k [cpu_atom]
     1,004,811,366 ns   duration_time
        41,246,425      CPU_CLK_UNHALTED.THREAD_P:k [cpu_core] #     0.99 Kernel_Utilization
        41,819,129      CPU_CLK_UNHALTED.THREAD [cpu_core]
     1,004,811,366 ns   duration_time

       1.004811366 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 14:23:17 -03:00
Jiri Olsa 3a7ab60597 perf tools: Move libbpf init in libbpf_init function
Move the libbpf init code into a single function, so that we have a single
place doing that.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220422100025.1469207-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 14:02:15 -03:00
Florian Fischer 75eafc970b perf list: Print all available tool events
Introduce names for the new tool events 'user_time' and 'system_time'.

  $ perf list
  ...
  duration_time                                      [Tool event]
  user_time                                          [Tool event]
  system_time                                        [Tool event]
  ...

Committer testing:

Before:

  $ perf list | grep Tool
  duration_time                                      [Tool event]
  $

After:

  $ perf list | grep Tool
    duration_time                                    [Tool event]
    user_time                                        [Tool event]
    system_time                                      [Tool event]
  $

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/20220420174244.1741958-2-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 15:05:00 -03:00
Florian Fischer b03b89b350 perf stat: Add user_time and system_time events
It bothered me that during benchmarking using 'perf stat' (to collect
for example CPU cache events) I could not simultaneously retrieve the
times spend in user or kernel mode in a machine readable format.

When running 'perf stat' the output for humans contains the times
reported by rusage and wait4.

  $ perf stat -e cache-misses:u -- true

   Performance counter stats for 'true':

             4,206      cache-misses:u

       0.001113619 seconds time elapsed

       0.001175000 seconds user
       0.000000000 seconds sys

But 'perf stat's machine-readable format does not provide this information.

  $ perf stat -x, -e cache-misses:u -- true
  4282,,cache-misses:u,492859,100.00,,

I found no way to retrieve this information using the available events
while using machine-readable output.

This patch adds two new tool internal events 'user_time' and
'system_time', similarly to the already present 'duration_time' event.

Both events use the already collected rusage information obtained by
wait4 and tracked in the global ru_stats.

Examples presenting cache-misses and rusage information in both human
and machine-readable form:

  $ perf stat -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time .

   Performance counter stats for 'grep -q -r duration_time .':

        67,422,542 ns   duration_time:u
        50,517,000 ns   user_time:u
        16,839,000 ns   system_time:u
            30,937      cache-misses:u

       0.067422542 seconds time elapsed

       0.050517000 seconds user
       0.016839000 seconds sys

  $ perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time .
  72134524,ns,duration_time:u,72134524,100.00,,
  65225000,ns,user_time:u,65225000,100.00,,
  6865000,ns,system_time:u,6865000,100.00,,
  38705,,cache-misses:u,71189328,100.00,,

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-3-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:44:56 -03:00
Florian Fischer c735b0a521 perf stat: Introduce stats for the user and system rusage times
This is preparation for exporting rusage values as tool events.

Add new global stats tracking the values obtained via rusage.

For now only ru_utime and ru_stime are part of the tracked stats.

Both are stored as nanoseconds to be consistent with 'duration_time',
although the finest resolution the struct timeval data in rusage
provides are microseconds.

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-2-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:38:41 -03:00
Martin Liška c60664dea7 perf tools: Print warning when HAVE_DEBUGINFOD_SUPPORT is not set and user tries to use debuginfod support
When one requests debuginfod, either via --debuginfod option, or with a
perf-config value, complain when perf is not built with it.

Signed-off-by: Martin Liška <mliska@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/35bae747-3951-dc3d-a66b-abf4cebcd9cb@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:36:36 -03:00
Leo Yan fdefc3750e perf mem: Print memory operation type
The memory operation types are not only for load and store, for easier
reviewing the memory operation type, this patch prints out it.

Before:
  ls 14753 [011]  3678.072400:  1    l1d-miss:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  l1d-access:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  tlb-access:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1      memory:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])

After:

  ls 14753 [011]  3678.072400:  1    l1d-miss:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  l1d-access:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  tlb-access:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1      memory:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220417124524.901148-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 11:44:06 -03:00
Athira Rajeev 8cb7a188ac perf bench: Fix numa testcase to check if CPU used to bind task is online
Perf numa bench test fails with error:

Testcase:

  ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp  1 --no-data_rand_walk

Failure snippet:

<<>>
  Running 'numa/mem' benchmark:

  # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

  perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu's possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus || bind_cpu_1 >=
g->p.nr_cpus) {".

But it could happen that system has say 48 CPU's, but only number of
online CPU's is 0-7. Other CPU's are offlined. Since "g->p.nr_cpus" is
48, so function will go ahead and set bit for CPU 8 also in cpumask (
td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present, set
affinity will fail here with EINVAL.

Fix this issue by adding a check to make sure that, CPU's provided in
the input argument values are online before proceeding further and skip
the test. For this, include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14 09:13:41 -03:00
Lv Ruyi d73f5d14e0 perf stat: Fix error check return value of hashmap__new(), must use IS_ERR()
hashmap__new() returns ERR_PTR(-ENOMEM) when it fails, so we should use
IS_ERR() to check it in error handling path.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220413093302.2538128-1-lv.ruyi@zte.com.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-13 22:20:15 -03:00
Adrian Hunter f034fc50d3 perf tools: Fix misleading add event PMU debug message
Fix incorrect debug message:

   Attempting to add event pmu 'intel_pt' with '' that may result in
   non-fatal errors

which always appears with perf record -vv and intel_pt e.g.

    perf record -vv -e intel_pt//u uname

The message is incorrect because there will never be non-fatal errors.

Suppress the message if the PMU is 'selectable' i.e. meant to be
selected directly as an event.

Fixes: 4ac22b484d ("perf parse-events: Make add PMU verbose output clearer")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20220411061758.2458417-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-13 07:00:31 -03:00
Carsten Haitzler 41204da4c1 perf test: Shell - Limit to only run executable scripts in tests
'perf test''s shell runner will just run everything in the tests
directory (as long as it's not another directory or does not begin
with a dot), but sometimes you find files in there that are not shell
scripts - perf.data output for example if you do some testing and then
the next time you run perf test it tries to run these.

Check the files are executable so they are actually intended to be test
scripts and not just some "random junk" files there.

Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Link: http://lore.kernel.org/lkml/20220309122859.31487-1-carsten.haitzler@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-11 16:39:49 -03:00
Eelco Chaudron ae24e9b53d perf scripting python: Expose symbol offset and source information
This change adds the symbol offset to the data exported for each
call-chain entry. This can not be calculated from the script and
only the ip value, and no related mmap information.

In addition, also export the source file and line information, if
available, to avoid an external lookup if this information is needed.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/164554263724.752731.14651017093796049736.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-11 16:39:49 -03:00
Eric Lin 335f70faa2 perf jitdump: Add riscv64 support
This patch enables perf jitdump for riscv64 and was tested with V8 on
qemu rv64.

Qemu rv64:

  $ perf record -e cpu-clock -c 1000 -g -k mono ./d8_rv64 --perf-prof --no-write-protect-code-memory test.js
  $ perf inject -j -i perf.data -o perf.data.jitted
  $ perf report -i perf.data.jitted

Output:

  To display the perf.data header info, please use --header/--header-only options.

  Total Lost Samples: 0

  Samples: 87K of event 'cpu-clock'
  Event count (approx.): 87974000

  Children  Self   Command   Shared Object      Symbol

  ....
   0.28%    0.06%  d8_rv64   d8_rv64            [.] _ZN2v88internal6WasmJs7InstallEPNS0_7IsolateEb
   0.28%    0.00%  d8_rv64   d8_rv64            [.] _ZN2v88internal10ParserBaseINS0_6ParserEE22ParseLogicalExpressionEv
   0.28%    0.03%  d8_rv64   jitted-112-76.so   [.] Builtin:InterpreterEntryTrampoline
   0.12%    0.00%  d8_rv64   d8_rv64            [.] _ZN2v88internal19ContextDeserializer11DeserializeEPNS0_7IsolateENS0_6HandleINS0_13JSGlobalProxyEEENS_33DeserializeInternalFieldsCallbackE
   0.12%    0.01%  d8_rv64   jitted-112-651.so  [.] Builtin:CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit
  ....

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: greentime.hu@sifive.com
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220406142606.18464-2-eric.lin@sifive.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-11 16:37:26 -03:00
Ian Rogers 940a445a90 perf annotate: Drop objdump stderr to avoid getting stuck waiting for stdout output
If objdump writes to stderr it can block waiting for it to be read. As
perf doesn't read stderr then progress stops with perf waiting for
stdout output.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lexi Shao <shaolexi@huawei.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220407230503.1265036-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:21:00 -03:00
Denis Nikitin bc21e74d47 perf session: Remap buf if there is no space for event
If a perf event doesn't fit into remaining buffer space return NULL to
remap buf and fetch the event again.

Keep the logic to error out on inadequate input from fuzzing.

This fixes perf failing on ChromeOS (with 32b userspace):

  $ perf report -v -i perf.data
  ...
  prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data?
  Error:
  failed to process sample

Fixes: 57fc032ad6 ("perf session: Avoid infinite loop when seeing invalid header.size")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Denis Nikitin <denik@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:20:59 -03:00
James Clark fa7095c5c3 perf unwind: Don't show unwind error messages when augmenting frame pointer stack
Commit Fixes: b9f6fbb3b2 ("perf arm64: Inject missing frames when
using 'perf record --call-graph=fp'") intended to add a 'best effort'
DWARF unwind that improved the frame pointer stack in most scenarios.

It's expected that the unwind will fail sometimes, but this shouldn't be
reported as an error. It only works when the return address can be
determined from the contents of the link register alone.

Fix the error shown when the unwinder requires extra registers by adding
a new flag that suppresses error messages. This flag is not set in the
normal --call-graph=dwarf unwind mode so that behavior is not changed.

Fixes: b9f6fbb3b2 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220406145651.1392529-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Arnaldo Carvalho de Melo 3a8a047586 perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
Using -ffat-lto-objects in the python feature test when building with
clang-13 results in:

  clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
  error: command '/usr/sbin/clang' failed with exit code 1
  cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
  make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1

Noticed when building on a docker.io/library/archlinux:base container.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Arnaldo Carvalho de Melo dd6e1fe91c perf python: Fix probing for some clang command line options
The clang compiler complains about some options even without a source
file being available, while others require one, so use the simple
tools/build/feature/test-hello.c file.

Then check for the "is not supported" string in its output, in addition
to the "unknown argument" already being looked for.

This was noticed when building with clang-13 where -ffat-lto-objects
isn't supported and since we were looking just for "unknown argument"
and not providing a source code to clang, was mistakenly assumed as
being available and not being filtered to set of command line options
provided to clang, leading to a build failure.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Haowen Bai f717d89a2b perf evlist: Directly return instead of using local ret variable
Addresses this coccinelle warning:

  ./tools/perf/util/evlist.c:1333:5-8: Unneeded variable: "err". Return
  "- ENOMEM" on line 1358

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/1648432532-23151-1-git-send-email-baihaowen@meizu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
Ian Rogers 0df6ade711 perf evlist: Rename cpus to user_requested_cpus
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels.

For non-task targets, cpus is set to be cpus requested from the command
line, defaulting to all online cpus if no cpus are specified.

For an uncore event, all_cpus may be just CPU 0 or every online CPU.

This causes all_cpus to have fewer values than the cpus variable which
is confusing given the 'all' in the name.

To try to make the behavior clearer, rename cpus to user_requested_cpus
and add comments on the two struct variables.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
Arnaldo Carvalho de Melo 4d4d00dd32 perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:

  fba60b171a ("libbpf: Use IS_ERR_OR_NULL() in hashmap__free()")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
  diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mauricio Vásquez <mauricio@kinvolk.io>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkMb2SAIai2VeuUD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:34 -03:00
Linus Torvalds 7b58b82b86 perf tools changes for v5.18: 1st batch
New features:
 
 perf ftrace:
 
 - Add -n/--use-nsec option to the 'latency' subcommand.
 
   Default: usecs:
 
   $ sudo perf ftrace latency -T dput -a sleep 1
   #   DURATION     |      COUNT | GRAPH                          |
        0 - 1    us |    2098375 | #############################  |
        1 - 2    us |         61 |                                |
        2 - 4    us |         33 |                                |
        4 - 8    us |         13 |                                |
        8 - 16   us |        124 |                                |
       16 - 32   us |        123 |                                |
       32 - 64   us |          1 |                                |
       64 - 128  us |          0 |                                |
      128 - 256  us |          1 |                                |
      256 - 512  us |          0 |                                |
 
   Better granularity with nsec:
 
   $ sudo perf ftrace latency -T dput -a -n sleep 1
   #   DURATION     |      COUNT | GRAPH                          |
        0 - 1    us |          0 |                                |
        1 - 2    ns |          0 |                                |
        2 - 4    ns |          0 |                                |
        4 - 8    ns |          0 |                                |
        8 - 16   ns |          0 |                                |
       16 - 32   ns |          0 |                                |
       32 - 64   ns |          0 |                                |
       64 - 128  ns |    1163434 | ##############                 |
      128 - 256  ns |     914102 | #############                  |
      256 - 512  ns |        884 |                                |
      512 - 1024 ns |        613 |                                |
        1 - 2    us |         31 |                                |
        2 - 4    us |         17 |                                |
        4 - 8    us |          7 |                                |
        8 - 16   us |        123 |                                |
       16 - 32   us |         83 |                                |
 
 perf lock:
 
 - Add -c/--combine-locks option to merge lock instances in the same class into
   a single entry.
 
   # perf lock report -c
                  Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)
 
         rcu_read_lock   251225         0            0              0            0            0
    hrtimer_bases.lock    39450         0            0              0            0            0
   &sb->s_type->i_l...    10301         1          662            662          662          662
      ptlock_ptr(page)    10173         2          701           1402          760          642
   &(ei->i_block_re...     8732         0            0              0            0            0
          &xa->xa_lock     8088         0            0              0            0            0
           &base->lock     6705         0            0              0            0            0
           &p->pi_lock     5549         0            0              0            0            0
   &dentry->d_lockr...     5010         4         1274           5097         1844          789
             &ep->lock     3958         0            0              0            0            0
 
 - Add -F/--field option to customize the list of fields to output:
 
   $ perf lock report -F contended,wait_max -k avg_wait
                   Name contended max wait(ns) avg wait(ns)
 
         slock-AF_INET6         1        23543        23543
      &lruvec->lru_lock         5        18317        11254
         slock-AF_INET6         1        10379        10379
             rcu_node_1         1         2104         2104
    &dentry->d_lockr...         1         1844         1844
    &dentry->d_lockr...         1         1672         1672
       &newf->file_lock        15         2279         1025
    &dentry->d_lockr...         1          792          792
 
 - Add --synth=no option for record, as there is no need to symbolize,
   lock names comes from the tracepoints.
 
 perf record:
 
 - Threaded recording, opt-in, via the new --threads command line option.
 
 - Improve AMD IBS (Instruction-Based Sampling) error handling messages.
 
 perf script:
 
 - Add 'brstackinsnlen' field (use it with -F) for branch stacks.
 
 - Output branch sample type in 'perf script'.
 
 perf report:
 
 - Add "addr_from" and "addr_to" sort dimensions.
 
 - Print branch stack entry type in 'perf report --dump-raw-trace'
 
 - Fix symbolization for chrooted workloads.
 
 Hardware tracing:
 
 Intel PT:
 
 - Add CFE (Control Flow Event) and EVD (Event Data) packets support.
 
 - Add MODE.Exec IFLAG bit support.
 
 Explanation about these features from the "Intel® 64 and IA-32 architectures
 software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C,
 3D, and 4" PDF at:
 
   https://cdrdv2.intel.com/v1/dl/getContent/671200
 
 At page 3951:
 
 <quote>
 32.2.4
 
 Event Trace is a capability that exposes details about the asynchronous
 events, when they are generated, and when their corresponding software
 event handler completes execution. These include:
 
 o Interrupts, including NMI and SMI, including the interrupt vector when
 defined.
 
 o Faults, exceptions including the fault vector.
 
 — Page faults additionally include the page fault address, when in context.
 
 o Event handler returns, including IRET and RSM.
 
 o VM exits and VM entries.¹
 
 — VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields.
 INIT and SIPI events.
 
 o TSX aborts, including the abort status returned for the RTM instructions.
 
 o Shutdown.
 
 Additionally, it provides indication of the status of the Interrupt Flag
 (IF), to indicate when interrupts are masked.
 </quote>
 
 ARM CoreSight:
 
 - Use advertised caps/min_interval as default sample_period on ARM spe.
 
 - Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM.
 
 Vendor Events (JSON):
 
 Intel:
 
 - Update events and metrics for:
 
     Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake,
     Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX,
     Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing,
     Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
     Tigerlake, TremontX, Westmere EP-SP, Westmere EX.
 
 ARM:
 
 - Add support for HiSilicon CPA PMU aliasing.
 
 perf stat:
 
 - Fix forked applications enablement of counters.
 
 - The 'slots' should only be printed on a different order than the one specified
   on the command line when 'topdown' events are present, fix it.
 
 Miscellaneous:
 
 - Sync msr-index, cpufeatures header files with the kernel sources.
 
 - Stop using some deprecated libbpf APIs in 'perf trace'.
 
 - Fix some spelling mistakes.
 
 - Refactor the maps pointers usage to pave the way for using refcount debugging.
 
 - Only offer the --tui option on perf top, report and annotate when perf was
   built with libslang.
 
 - Don't mention --to-ctf in 'perf data --help' when not linking with the required
   library, libbabeltrace.
 
 - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci.
 
 - Enhance the matching of sub-commands abbreviations:
 	'perf c2c rec' -> 'perf c2c record'
 	'perf c2c recport -> error
 
 - Set build-id using build-id header on new mmap records.
 
 - Fix generation of 'perf --version' string.
 
 perf test:
 
 - Add test for the arm_spe event.
 
 - Add test to check unwinding using fame-pointer (fp) mode on arm64.
 
 - Make metric testing more robust in 'perf test'.
 
 - Add error message for unsupported branch stack cases.
 
 libperf:
 
 - Add API for allocating new thread map array.
 
 - Fix typo in perf_evlist__open() failure error messages in libperf tests.
 
 perf c2c:
 
 - Replace bitmap_weight() with bitmap_empty() where appropriate.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYj8viwAKCRCyPKLppCJ+
 J8K3AQDpN45P4/TWJxVWhZlvYzJtWDSboXHZJfmBiEd4Xu2zbwD7BFW02f1ATHPr
 dGBFXxRQQufBIqfE+OQXG59Awp1m8wE=
 =1l8S
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "New features:

  perf ftrace:

   - Add -n/--use-nsec option to the 'latency' subcommand.

     Default: usecs:

     $ sudo perf ftrace latency -T dput -a sleep 1
     #   DURATION     |      COUNT | GRAPH                          |
          0 - 1    us |    2098375 | #############################  |
          1 - 2    us |         61 |                                |
          2 - 4    us |         33 |                                |
          4 - 8    us |         13 |                                |
          8 - 16   us |        124 |                                |
         16 - 32   us |        123 |                                |
         32 - 64   us |          1 |                                |
         64 - 128  us |          0 |                                |
        128 - 256  us |          1 |                                |
        256 - 512  us |          0 |                                |

     Better granularity with nsec:

     $ sudo perf ftrace latency -T dput -a -n sleep 1
     #   DURATION     |      COUNT | GRAPH                          |
          0 - 1    us |          0 |                                |
          1 - 2    ns |          0 |                                |
          2 - 4    ns |          0 |                                |
          4 - 8    ns |          0 |                                |
          8 - 16   ns |          0 |                                |
         16 - 32   ns |          0 |                                |
         32 - 64   ns |          0 |                                |
         64 - 128  ns |    1163434 | ##############                 |
        128 - 256  ns |     914102 | #############                  |
        256 - 512  ns |        884 |                                |
        512 - 1024 ns |        613 |                                |
          1 - 2    us |         31 |                                |
          2 - 4    us |         17 |                                |
          4 - 8    us |          7 |                                |
          8 - 16   us |        123 |                                |
         16 - 32   us |         83 |                                |

  perf lock:

   - Add -c/--combine-locks option to merge lock instances in the same
     class into a single entry.

     # perf lock report -c
                    Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)

           rcu_read_lock   251225         0            0              0            0            0
      hrtimer_bases.lock    39450         0            0              0            0            0
     &sb->s_type->i_l...    10301         1          662            662          662          662
        ptlock_ptr(page)    10173         2          701           1402          760          642
     &(ei->i_block_re...     8732         0            0              0            0            0
            &xa->xa_lock     8088         0            0              0            0            0
             &base->lock     6705         0            0              0            0            0
             &p->pi_lock     5549         0            0              0            0            0
     &dentry->d_lockr...     5010         4         1274           5097         1844          789
               &ep->lock     3958         0            0              0            0            0

      - Add -F/--field option to customize the list of fields to output:

     $ perf lock report -F contended,wait_max -k avg_wait
                     Name contended max wait(ns) avg wait(ns)

           slock-AF_INET6         1        23543        23543
        &lruvec->lru_lock         5        18317        11254
           slock-AF_INET6         1        10379        10379
               rcu_node_1         1         2104         2104
      &dentry->d_lockr...         1         1844         1844
      &dentry->d_lockr...         1         1672         1672
         &newf->file_lock        15         2279         1025
      &dentry->d_lockr...         1          792          792

   - Add --synth=no option for record, as there is no need to symbolize,
     lock names comes from the tracepoints.

  perf record:

   - Threaded recording, opt-in, via the new --threads command line
     option.

   - Improve AMD IBS (Instruction-Based Sampling) error handling
     messages.

  perf script:

   - Add 'brstackinsnlen' field (use it with -F) for branch stacks.

   - Output branch sample type in 'perf script'.

  perf report:

   - Add "addr_from" and "addr_to" sort dimensions.

   - Print branch stack entry type in 'perf report --dump-raw-trace'

   - Fix symbolization for chrooted workloads.

  Hardware tracing:

  Intel PT:

   - Add CFE (Control Flow Event) and EVD (Event Data) packets support.

   - Add MODE.Exec IFLAG bit support.

     Explanation about these features from the "Intel® 64 and IA-32
     architectures software developer’s manual combined volumes: 1, 2A,
     2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:

        https://cdrdv2.intel.com/v1/dl/getContent/671200

     At page 3951:
      "32.2.4

       Event Trace is a capability that exposes details about the
       asynchronous events, when they are generated, and when their
       corresponding software event handler completes execution. These
       include:

        o Interrupts, including NMI and SMI, including the interrupt
          vector when defined.

        o Faults, exceptions including the fault vector.

           - Page faults additionally include the page fault address,
             when in context.

        o Event handler returns, including IRET and RSM.

        o VM exits and VM entries.¹

           - VM exits include the values written to the “exit reason”
             and “exit qualification” VMCS fields. INIT and SIPI events.

        o TSX aborts, including the abort status returned for the RTM
          instructions.

        o Shutdown.

       Additionally, it provides indication of the status of the
       Interrupt Flag (IF), to indicate when interrupts are masked"

  ARM CoreSight:

   - Use advertised caps/min_interval as default sample_period on ARM
     spe.

   - Update deduction of TRCCONFIGR register for branch broadcast on
     ARM's CoreSight ETM.

  Vendor Events (JSON):

  Intel:

   - Update events and metrics for: Alderlake, Broadwell, Broadwell DE,
     BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont,
     GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX,
     Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP,
     Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
     Tigerlake, TremontX, Westmere EP-SP, and Westmere EX.

  ARM:

   - Add support for HiSilicon CPA PMU aliasing.

  perf stat:

   - Fix forked applications enablement of counters.

   - The 'slots' should only be printed on a different order than the
     one specified on the command line when 'topdown' events are
     present, fix it.

  Miscellaneous:

   - Sync msr-index, cpufeatures header files with the kernel sources.

   - Stop using some deprecated libbpf APIs in 'perf trace'.

   - Fix some spelling mistakes.

   - Refactor the maps pointers usage to pave the way for using refcount
     debugging.

   - Only offer the --tui option on perf top, report and annotate when
     perf was built with libslang.

   - Don't mention --to-ctf in 'perf data --help' when not linking with
     the required library, libbabeltrace.

   - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by
     array_size.cocci.

   - Enhance the matching of sub-commands abbreviations:
	'perf c2c rec' -> 'perf c2c record'
	'perf c2c recport -> error

   - Set build-id using build-id header on new mmap records.

   - Fix generation of 'perf --version' string.

  perf test:

   - Add test for the arm_spe event.

   - Add test to check unwinding using fame-pointer (fp) mode on arm64.

   - Make metric testing more robust in 'perf test'.

   - Add error message for unsupported branch stack cases.

  libperf:

   - Add API for allocating new thread map array.

   - Fix typo in perf_evlist__open() failure error messages in libperf
     tests.

  perf c2c:

   - Replace bitmap_weight() with bitmap_empty() where appropriate"

* tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits)
  perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
  perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
  perf tools: Enhance the matching of sub-commands abbreviations
  libperf tests: Fix typo in perf_evlist__open() failure error messages
  tools arm64: Import cputype.h
  perf lock: Add -F/--field option to control output
  perf lock: Extend struct lock_key to have print function
  perf lock: Add --synth=no option for record
  tools headers cpufeatures: Sync with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  perf stat: Fix forked applications enablement of counters
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf evsel: Make evsel__env() always return a valid env
  perf build-id: Fix spelling mistake "Cant" -> "Can't"
  perf header: Fix spelling mistake "could't" -> "couldn't"
  perf script: Add 'brstackinsnlen' for branch stacks
  perf parse-events: Move slots only with topdown
  perf ftrace latency: Update documentation
  perf ftrace latency: Add -n/--use-nsec option
  perf tools: Fix version kernel tag
  ...
2022-03-27 13:42:32 -07:00
Kim Phillips ab0809af0b perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
Improve the error message returned on failed perf_event_open() on AMD
systems when using IBS (Instruction-Based Sampling).

Output of executing 'perf record -e ibs_op// true' as a non root user
BEFORE this patch (perf will add the 'u' modifier at the end to exclude
kernel/hypervisor sampling):

  The sys_perf_event_open() syscall returned with 22 (Invalid argument)for event (ibs_op//u).
  /bin/dmesg | grep -i perf may provide additional information.

Output after:

  AMD IBS can't exclude kernel events.  Try running at a higher privilege level.

Output of executing 'sudo perf record -e ibs_op// true' BEFORE this patch:

  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (ibs_op//).
  /bin/dmesg | grep -i perf may provide additional information.

Output after:

  Error:
  Invalid event (ibs_op//) in per-thread mode, enable system wide with '-a'.

Folowing the suggestion:

  $ sudo perf record -a -e ibs_op// true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.664 MB perf.data (194 samples) ]
  $

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: João Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20220322221517.2510440-12-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 10:55:58 -03:00
Arnaldo Carvalho de Melo b58230de3c perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
The AMD IBS error message enhancements will use these, but we're not
using evsel__open_strerror() in the python binding so far.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 10:55:57 -03:00
Linus Torvalds 169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Kim Phillips 7b830875d2 perf evsel: Make evsel__env() always return a valid env
It's possible to have an evsel and evsel->evlist populated without
an evsel->evlist->env, when, e.g., cmd_record is in its error path.

Future patches will add support for evsel__open_strerror to be able
to customize error messaging based on perf_env__{arch,cpuid}, so
let's have evsel__env return &perf_env instead of NULL in that case.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211004214114.188477-1-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-22 18:36:02 -03:00
Colin Ian King 011899cc00 perf build-id: Fix spelling mistake "Cant" -> "Can't"
There is a spelling mistake in a pr_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20220316232452.53062-1-colin.i.king@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-22 18:27:56 -03:00
Colin Ian King ccbc9df9ae perf header: Fix spelling mistake "could't" -> "couldn't"
There is a spelling mistake in a pr_debug2 message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20220316232212.52820-1-colin.i.king@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-22 18:27:13 -03:00
Arnaldo Carvalho de Melo 34fe4ccb77 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes that went thru perf/urgent and now are fixed by an
upcoming patch.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-22 17:52:10 -03:00
Namhyung Kim 84005bb614 perf ftrace latency: Add -n/--use-nsec option
Sometimes we want to see nano-second granularity.

  $ sudo perf ftrace latency -T dput -a sleep 1
  #   DURATION     |      COUNT | GRAPH                          |
       0 - 1    us |    2098375 | #############################  |
       1 - 2    us |         61 |                                |
       2 - 4    us |         33 |                                |
       4 - 8    us |         13 |                                |
       8 - 16   us |        124 |                                |
      16 - 32   us |        123 |                                |
      32 - 64   us |          1 |                                |
      64 - 128  us |          0 |                                |
     128 - 256  us |          1 |                                |
     256 - 512  us |          0 |                                |
     512 - 1024 us |          0 |                                |
       1 - 2    ms |          0 |                                |
       2 - 4    ms |          0 |                                |
       4 - 8    ms |          0 |                                |
       8 - 16   ms |          0 |                                |
      16 - 32   ms |          0 |                                |
      32 - 64   ms |          0 |                                |
      64 - 128  ms |          0 |                                |
     128 - 256  ms |          0 |                                |
     256 - 512  ms |          0 |                                |
     512 - 1024 ms |          0 |                                |
       1 - ...   s |          0 |                                |

  $ sudo perf ftrace latency -T dput -a -n sleep 1
  #   DURATION     |      COUNT | GRAPH                          |
       0 - 1    us |          0 |                                |
       1 - 2    ns |          0 |                                |
       2 - 4    ns |          0 |                                |
       4 - 8    ns |          0 |                                |
       8 - 16   ns |          0 |                                |
      16 - 32   ns |          0 |                                |
      32 - 64   ns |          0 |                                |
      64 - 128  ns |    1163434 | ##############                 |
     128 - 256  ns |     914102 | #############                  |
     256 - 512  ns |        884 |                                |
     512 - 1024 ns |        613 |                                |
       1 - 2    us |         31 |                                |
       2 - 4    us |         17 |                                |
       4 - 8    us |          7 |                                |
       8 - 16   us |        123 |                                |
      16 - 32   us |         83 |                                |
      32 - 64   us |          0 |                                |
      64 - 128  us |          0 |                                |
     128 - 256  us |          0 |                                |
     256 - 512  us |          0 |                                |
     512 - 1024 us |          0 |                                |
       1 - ...  ms |          0 |                                |

Committer testing:

Testing it with BPF:

  # perf ftrace latency -b -n -T dput -a sleep 1
  #   DURATION     |      COUNT | GRAPH                                          |
       0 - 1    us |          0 |                                                |
       1 - 2    ns |          0 |                                                |
       2 - 4    ns |          0 |                                                |
       4 - 8    ns |          0 |                                                |
       8 - 16   ns |          0 |                                                |
      16 - 32   ns |          0 |                                                |
      32 - 64   ns |          0 |                                                |
      64 - 128  ns |          0 |                                                |
     128 - 256  ns |     823489 | #############################################  |
     256 - 512  ns |       3232 |                                                |
     512 - 1024 ns |         51 |                                                |
       1 - 2    us |        172 |                                                |
       2 - 4    us |          9 |                                                |
       4 - 8    us |          0 |                                                |
       8 - 16   us |          2 |                                                |
      16 - 32   us |          0 |                                                |
      32 - 64   us |          0 |                                                |
      64 - 128  us |          0 |                                                |
     128 - 256  us |          0 |                                                |
     256 - 512  us |          0 |                                                |
     512 - 1024 us |          0 |                                                |
       1 - ...  ms |          0 |                                                |
  [root@quaco ~]# strace -e bpf perf ftrace latency -b -n -T dput -a sleep 1
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffe2bd574f0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 144) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\08\0\0\08\0\0\0\t\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=89, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\f\0\0\0\f\0\0\0\7\0\0\0\1\0\0\0\0\0\0\20"..., btf_log_buf=NULL, btf_size=43, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\5\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=77, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0(\0\0\0(\0\0\0\5\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=69, btf_log_size=0, btf_log_level=0}, 28) = -1 EINVAL (Invalid argument)
  bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0<\3\0\0<\3\0\0\362\3\0\0\0\0\0\0\0\0\0\2"..., btf_log_buf=NULL, btf_size=1862, btf_log_size=0, btf_log_level=0}, 28) = 3
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 4
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffe2bd571c0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="test", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 144) = 4
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10000, map_flags=0, inner_map_fd=0, map_name="functime", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 4
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="cpu_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 5
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="task_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 7
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_ARRAY, key_size=4, value_size=8, max_entries=22, map_flags=0, inner_map_fd=0, map_name="latency", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 8
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=32, max_entries=1, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 9
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=5, insns=0x7ffe2bd57220, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 144) = 10
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=16, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="func_lat.bss", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=33, btf_vmlinux_value_type_id=0, map_extra=0}, 72) = 9
  bpf(BPF_MAP_UPDATE_ELEM, {map_fd=9, key=0x7ffe2bd57330, value=0x7f9a5fc39000, flags=BPF_ANY}, 144) = 0
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=42, insns=0x113daf0, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 16, 13), prog_flags=0, prog_name="func_begin", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x113fb70, func_info_cnt=1, line_info_rec_size=16, line_info=0x113fb90, line_info_cnt=21, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 144) = 10
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=124, insns=0x113d360, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 16, 13), prog_flags=0, prog_name="func_end", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x113fcf0, func_info_cnt=1, line_info_rec_size=16, line_info=0x1139770, line_info_cnt=60, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 144) = 11
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=2, insns=0x7ffe2bd57150, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 144) = 13
  bpf(BPF_LINK_CREATE, {link_create={prog_fd=13, target_fd=-1, attach_type=BPF_PERF_EVENT, flags=0}}, 144) = -1 EBADF (Bad file descriptor)
  bpf(BPF_LINK_CREATE, {link_create={prog_fd=10, target_fd=12, attach_type=BPF_PERF_EVENT, flags=0}}, 144) = 13
  bpf(BPF_LINK_CREATE, {link_create={prog_fd=11, target_fd=14, attach_type=BPF_PERF_EVENT, flags=0}}, 144) = 15
  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=130075, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7ffe2bd57624, value=0x113fdd0, flags=BPF_ANY}, 144) = 0
  #   DURATION     |      COUNT | GRAPH                                          |
       0 - 1    us |          0 |                                                |
       1 - 2    ns |          0 |                                                |
       2 - 4    ns |          0 |                                                |
       4 - 8    ns |          0 |                                                |
       8 - 16   ns |          0 |                                                |
      16 - 32   ns |          0 |                                                |
      32 - 64   ns |          0 |                                                |
      64 - 128  ns |          0 |                                                |
     128 - 256  ns |      42519 | ###########################################    |
     256 - 512  ns |       2140 | ##                                             |
     512 - 1024 ns |         54 |                                                |
       1 - 2    us |         16 |                                                |
       2 - 4    us |         10 |                                                |
       4 - 8    us |          0 |                                                |
       8 - 16   us |          0 |                                                |
      16 - 32   us |          0 |                                                |
      32 - 64   us |          0 |                                                |
      64 - 128  us |          0 |                                                |
     128 - 256  us |          0 |                                                |
     256 - 512  us |          0 |                                                |
     512 - 1024 us |          0 |                                                |
       1 - ...  ms |          0 |                                                |
  +++ exited with 0 +++
  #

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Changbin Du <changbin.du@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220321234609.90455-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-22 17:43:46 -03:00
John Garry 7572733b84 perf tools: Fix version kernel tag
Generating the version kernel tag relies on "git describe" command to
get the latest Linus kernel tag.

However, when working from clones of Linus' git we may not have the latest
tag. For example, when working on Arnaldo's acme.git, we can have this:

  $ git branch
  perf/core
  $ head -n 5 ../../Makefile  | tail -n 4
  VERSION = 5
  PATCHLEVEL = 17
  SUBLEVEL = 0
  EXTRAVERSION = -rc3
  $ git describe --abbrev=0 --match "v[0-9].[0-9]*"
  v4.13-rc5

Indeed using tags is a problem as it relies on tags being pulled from
Linus' git (and pushed to the clone).

In commit a4147f0f91 ("perf tools: Fix perf version generation")
Robert introduced a change to use the kernelversion rule to generate the
kernel tag when no git tags are available.

However, as mentioned above, the tag we generate may be incorrect, so
just always use kernelversion to get the tag (apart from building perf
out of tree).

Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Link: https://lore.kernel.org/r/1645449409-158238-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-22 17:12:40 -03:00
Linus Torvalds 95ab0e8768 Changes for this cycle were:
- Fix address filtering for Intel/PT,ARM/CoreSight
  - Enable Intel/PEBS format 5
  - Allow more fixed-function counters for x86
  - Intel/PT: Enable not recording Taken-Not-Taken packets
  - Add a few branch-types
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI4WdIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jdTA/7BADTYzFCbdwPzHt2mR8osv7k+pDvYxs9
 wxNjyi1X7N8cPkhqgIg9CfdhdyDOqo7+J4fG17f2qbwjNK7b2Fb1/U6ZoZaf+f8F
 W0e2LX5KZTXUhkA+TEjrXvYD9FmJaCPM/l2RQg8U7okBs2kb0H6QT2Yn21wd1roC
 WwI5KFiWSVS1IzpVLaXjDh+FJfJHd75ReMqJeus+QoVQ9NHeuI+t4DglSB1IBi54
 d/zeVXE/Y4dFTQOrU06S2HxcOEptvXZsPmVLvKab/veeGGyWiGPxQpvu6bXm6u3x
 0sV+dn67zut2m2pQlUZUucgGTSYIZTpOe+rNukTB9hJ4XeN4/1ohOOCrOuYM+63P
 lGFbN1v+LD7Wc6C2eEhw8G5GEL0qbwzFNQ06O3EOFi7C7GKn7WS/ET6XuuMOERFk
 uxEPb4pFtbBlJ0SriCprFJSd5NL3PORZlLIhv4hGH5hilLR1TFeKDuwZaM4noQxU
 dL3rKGLi9H+P46Eni9H28+0gDISbv1xL+WivHOFQNmhBqAZO52ZcF3J+dgBaR1B5
 pBxVTycFpZMjxSZnqTE0gMsFaLIpVGc+75Chns1rajR0mEtRtJUQUbYz4tK4zb0E
 dZR1p+VF6+DYmSRhiqeaTi9uz9oE8kMa8o/EcbFIg/9BgEnUwJXU20bjnar30xQ7
 9OIn7r9hjHI=
 =XPuo
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf event updates from Ingo Molnar:

 - Fix address filtering for Intel/PT,ARM/CoreSight

 - Enable Intel/PEBS format 5

 - Allow more fixed-function counters for x86

 - Intel/PT: Enable not recording Taken-Not-Taken packets

 - Add a few branch-types

* tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT
  perf: Add irq and exception return branch types
  perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses
  perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
  perf/x86/intel/pt: Add a capability and config bit for event tracing
  perf/x86/intel: Increase max number of the fixed counters
  KVM: x86: use the KVM side max supported fixed counter
  perf/x86/intel: Enable PEBS format 5
  perf/core: Allow kernel address filter when not filtering the kernel
  perf/x86/intel/pt: Fix address filter config for 32-bit kernel
  perf/core: Fix address filter parser for multiple filters
  x86: Share definition of __is_canonical_address()
  perf/x86/intel/pt: Relax address filter validation
2022-03-22 13:06:49 -07:00
Linus Torvalds 356a1adca8 arm64 updates for 5.18
- Support for including MTE tags in ELF coredumps
 
 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions
 
 - Improvements to kselftests for MTE and fpsimd
 
 - Symbol aliasing and linker script cleanups
 
 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs
 
 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously
 
 - Support for the latest pointer authentication algorithm ("QARMA3")
 
 - Support for the DDR PMU present in the Marvell CN10K platform
 
 - Support for the CPU PMU present in the Apple M1 platform
 
 - Use the RNDR instruction for arch_get_random_{int,long}()
 
 - Update our copy of the Arm optimised string routines for str{n}cmp()
 
 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions
 
 - Workaround for Marvell GICv3 erratum #38545
 
 - Clarification to our Documentation (booting reqs. and MTE prctl())
 
 - Miscellanous cleanups and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmIvta8QHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNAIhB/oDSva5FryAFExVuIB+mqRkbZO9kj6fy/5J
 ctN9LEVO2GI/U1TVAUWop1lXmP8Kbq5UCZOAuY8sz7dAZs7NRUWkwTrXVhaTpi6L
 oxCfu5Afu76d/TGgivNz+G7/ewIJRFj5zCPmHezLF9iiWPUkcAsP0XCp4a0iOjU4
 04O4d7TL/ap9ujEes+U0oEXHnyDTPrVB2OVE316FKD1fgztcjVJ2U+TxX5O4xitT
 PPIfeQCjQBq1B2OC1cptE3wpP+YEr9OZJbx+Ieweidy1CSInEy0nZ13tLoUnGPGU
 KPhsvO9daUCbhbd5IDRBuXmTi/sHU4NIB8LNEVzT1mUPnU8pCizv
 =ziGg
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:

 - Support for including MTE tags in ELF coredumps

 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions

 - Improvements to kselftests for MTE and fpsimd

 - Symbol aliasing and linker script cleanups

 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs

 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously

 - Support for the latest pointer authentication algorithm ("QARMA3")

 - Support for the DDR PMU present in the Marvell CN10K platform

 - Support for the CPU PMU present in the Apple M1 platform

 - Use the RNDR instruction for arch_get_random_{int,long}()

 - Update our copy of the Arm optimised string routines for str{n}cmp()

 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions

 - Workaround for Marvell GICv3 erratum #38545

 - Clarification to our Documentation (booting reqs. and MTE prctl())

 - Miscellanous cleanups and minor fixes

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
  docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
  arm64/mte: Remove asymmetric mode from the prctl() interface
  arm64: Add cavium_erratum_23154_cpus missing sentinel
  perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  Documentation: vmcoreinfo: Fix htmldocs warning
  kasan: fix a missing header include of static_keys.h
  drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
  drivers/perf: arm_pmu: Handle 47 bit counters
  arm64: perf: Consistently make all event numbers as 16-bits
  arm64: perf: Expose some Armv9 common events under sysfs
  perf/marvell: cn10k DDR perf event core ownership
  perf/marvell: cn10k DDR perfmon event overflow handling
  perf/marvell: CN10k DDR performance monitor support
  dt-bindings: perf: marvell: cn10k ddr performance monitor
  arm64: clean up tools Makefile
  perf/arm-cmn: Update watchpoint format
  perf/arm-cmn: Hide XP PUB events for CMN-600
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  ...
2022-03-21 10:46:39 -07:00
Ian Rogers 8b464eac97 perf evlist: Avoid iteration for empty evlist.
As seen with 'perf stat --null ..' and reported in:
https://lore.kernel.org/lkml/YjCLcpcX2peeQVCH@kernel.org/

v2. Avoids setting evsel in the empty list case as suggested by Jiri Olsa.

    Committer testing:

Before:

  $  perf stat --null sleep 1
  Segmentation fault (core dumped)
  $

After:

  $  perf stat --null sleep 1

   Performance counter stats for 'sleep 1':

         1.010340646 seconds time elapsed

         0.001420000 seconds user
         0.000000000 seconds sys
  $

Fixes: 472832d2c0 ("perf evlist: Refactor evlist__for_each_cpu()")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220317231643.550902-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-18 18:39:09 -03:00
Michael Petlan 3cf6a32f3f perf symbols: Fix symbol size calculation condition
Before this patch, the symbol end address fixup to be called, needed two
conditions being met:

  if (prev->end == prev->start && prev->end != curr->start)

Where
  "prev->end == prev->start" means that prev is zero-long
                             (and thus needs a fixup)
and
  "prev->end != curr->start" means that fixup hasn't been applied yet

However, this logic is incorrect in the following situation:

*curr  = {rb_node = {__rb_parent_color = 278218928,
  rb_right = 0x0, rb_left = 0x0},
  start = 0xc000000000062354,
  end = 0xc000000000062354, namelen = 40, type = 2 '\002',
  binding = 0 '\000', idle = 0 '\000', ignore = 0 '\000',
  inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false,
  name = 0x1159739e "kprobe_optinsn_page\t[__builtin__kprobes]"}

*prev = {rb_node = {__rb_parent_color = 278219041,
  rb_right = 0x109548b0, rb_left = 0x109547c0},
  start = 0xc000000000062354,
  end = 0xc000000000062354, namelen = 12, type = 2 '\002',
  binding = 1 '\001', idle = 0 '\000', ignore = 0 '\000',
  inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false,
  name = 0x1095486e "optinsn_slot"}

In this case, prev->start == prev->end == curr->start == curr->end,
thus the condition above thinks that "we need a fixup due to zero
length of prev symbol, but it has been probably done, since the
prev->end == curr->start", which is wrong.

After the patch, the execution path proceeds to arch__symbols__fixup_end
function which fixes up the size of prev symbol by adding page_size to
its end offset.

Fixes: 3b01a413c1 ("perf symbols: Improve kallsyms symbol end addr calculation")
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20220317135536.805-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-18 18:39:09 -03:00
Jakub Kicinski e243f39685 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 13:56:58 -07:00
Arnaldo Carvalho de Melo 65eab2bc7d Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes that went thru perf/urgent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-14 19:15:16 -03:00
Zhengjun Xing 91c9923a47 perf parse: Fix event parser error for hybrid systems
This bug happened on hybrid systems when both cpu_core and cpu_atom
have the same event name such as "UOPS_RETIRED.MS" while their event
terms are different, then during perf stat, the event for cpu_atom
will parse fail and then no output for cpu_atom.

UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/

It is because event terms in the "head" of parse_events_multi_pmu_add
will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS
for cpu_core, then when parsing the same event for cpu_atom, it still
uses the event terms for cpu_core, but event terms for cpu_atom are
different with cpu_core, the event parses for cpu_atom will fail. This
patch fixes it, the event terms should be parsed from the original
event.

This patch can work for the hybrid systems that have the same event
in more than 2 PMUs. It also can work in non-hybrid systems.

Before:

  # perf stat -v  -e  UOPS_RETIRED.MS  -a sleep 1

  Using CPUID GenuineIntel-6-97-1
  UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
  Control descriptor is not initialized
  UOPS_RETIRED.MS: 2737845 16068518485 16068518485

 Performance counter stats for 'system wide':

         2,737,845      cpu_core/UOPS_RETIRED.MS/

       1.002553850 seconds time elapsed

After:

  # perf stat -v  -e  UOPS_RETIRED.MS  -a sleep 1

  Using CPUID GenuineIntel-6-97-1
  UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
  UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/
  Control descriptor is not initialized
  UOPS_RETIRED.MS: 1977555 16076950711 16076950711
  UOPS_RETIRED.MS: 568684 8038694234 8038694234

 Performance counter stats for 'system wide':

         1,977,555      cpu_core/UOPS_RETIRED.MS/
           568,684      cpu_atom/UOPS_RETIRED.MS/

       1.004758259 seconds time elapsed

Fixes: fb0811535e ("perf parse-events: Allow config on kernel PMU events")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220307151627.30049-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12 11:04:55 -03:00
James Clark f693dac479 perf tools: Set build-id using build-id header on new mmap records
MMAP records that occur after the build-id header is parsed do not have
their build-id set even if the filename matches an entry from the
header. Set the build-id on these dsos as long as the MMAP record
doesn't have its own build-id set.

This fixes an issue with off target analysis where the local version of
a dso is loaded rather than one from ~/.debug via a build-id.

Reported-by: Denis Nikitin <denik@chromium.org>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20220304090956.2048712-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12 11:01:12 -03:00
Weiguo Li a7a72631f6 perf parse-events: Fix NULL check against wrong variable
We did a null check after "tmp->symbol = strdup(...)", but we checked
"list->symbol" other than "tmp->symbol".

Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/tencent_DF39269807EC9425E24787E6DB632441A405@qq.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12 10:47:52 -03:00
Guo Zhengkui eb31228b1d perf tools: Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci
Fix the following coccicheck warning:

tools/perf/util/trace-event-parse.c:209:35-36: WARNING: Use ARRAY_SIZE

ARRAY_SIZE(arr) is a macro provided in tools/include/linux/kernel.h,
which not only measures the size of the array, but also makes sure
that `arr` is really an array.

It has been tested with gcc (Debian 8.3.0-6) 8.3.0.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220307034008.4024-1-guozhengkui@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-07 14:54:54 -03:00
James Clark 66fd6c9d69 perf session: Print branch stack entry type in --dump-raw-trace
This can help with debugging issues. It only prints when -j save_type
is used otherwise an empty string is printed.

Before the change:

  101603801707130 0xa70 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 1108/1108: 0xffff9c1df24c period: 10694 addr: 0
  ... branch stack: nr:64
  .....  0: 0000ffff9c26029c -> 0000ffff9c26f340 0 cycles  P   0
  .....  1: 0000ffff9c2601bc -> 0000ffff9c26f340 0 cycles  P   0

After the change:

  101603801707130 0xa70 [0x630]: PERF_RECORD_SAMPLE(IP, 0x2): 1108/1108: 0xffff9c1df24c period: 10694 addr: 0
  ... branch stack: nr:64
  .....  0: 0000ffff9c26029c -> 0000ffff9c26f340 0 cycles  P   0 CALL
  .....  1: 0000ffff9c2601bc -> 0000ffff9c26f340 0 cycles  P   0 IND_CALL

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220307171917.2555829-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-07 14:51:19 -03:00
James Clark 8f431a2869 perf evsel: Add error message for unsupported branch stack cases
EOPNOTSUPP is a possible return value when branch stacks are requested
but they aren't enabled in the kernel or hardware. It's also returned if
they aren't supported on the specific event type. The currently printed
error message about sampling/overflow-interrupts is not correct in this
case.

Add a check for branch stacks before sample_period is checked because
sample_period is also set (to the default value) when using branch
stacks.

Before this change (when branch stacks aren't supported):

  perf record -j any
  Error:
  cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

After this change:

  perf record -j any
  Error:
  cycles: PMU Hardware or event type doesn't support branch stack sampling.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220307171917.2555829-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-07 14:49:09 -03:00
Jiri Olsa 4cee08fbd2 perf tools: Remove bpf_map__set_priv()/bpf_map__priv() usage
Both bpf_map__set_priv()/bpf_map__priv() are deprecated and will be
eventually removed.

Use hashmap to replace that functionality.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220224155238.714682-3-jolsa@kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: linux-perf-users@vger.kernel.org
2022-03-05 16:11:51 -03:00
Jiri Olsa a3bfc0d76f perf tools: Remove bpf_program__set_priv/bpf_program__priv usage
Both bpf_program__set_priv/bpf_program__priv are deprecated
and will be eventually removed.

Using hashmap to replace that functionality.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220224155238.714682-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-05 16:11:09 -03:00
Jakub Kicinski 80901bff81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/batman-adv/hard-interface.c
  commit 690bb6fb64 ("batman-adv: Request iflink once in batadv-on-batadv check")
  commit 6ee3c393ee ("batman-adv: Demote batadv-on-batadv skip error message")
https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/

net/smc/af_smc.c
  commit 4d08b7b57e ("net/smc: Fix cleanup when register ULP fails")
  commit 462791bbfa ("net/smc: add sysctl interface for SMC")
https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03 11:55:12 -08:00
Anshuman Khandual cedd3614e5 perf: Add irq and exception return branch types
This expands generic branch type classification by adding two more entries
there in i.e irq and exception return. Also updates the x86 implementation
to process X86_BR_IRET and X86_BR_IRQ records as appropriate. This changes
branch types reported to user space on x86 platform but it should not be a
problem. The possible scenarios and impacts are enumerated here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1645681014-3346-1-git-send-email-anshuman.khandual@arm.com
2022-03-01 16:19:01 +01:00
Alexey Bayduraev 65e7c96326 perf data: Adding error message if perf_data__create_dir() fails
Add proper return codes for all cases of data directory creation failure
and add error message output based on these codes.

Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220222091417.11020-1-alexey.v.bayduraev@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-22 21:15:56 -03:00
Alexey Bayduraev 69560e366f perf data: Fix double free in perf_session__delete()
When perf_data__create_dir() fails, it calls close_dir(), but
perf_session__delete() also calls close_dir() and since dir.version and
dir.nr were initialized by perf_data__create_dir(), a double free occurs.

This patch moves the initialization of dir.version and dir.nr after
successful initialization of dir.files, that prevents double freeing.
This behavior is already implemented in perf_data__open_dir().

Fixes: 1455206311 ("perf data: Add perf_data__(create_dir|close_dir) functions")
Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220218152341.5197-2-alexey.v.bayduraev@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-22 17:34:16 -03:00
Mark Rutland be9aea7440 linkage: remove SYM_FUNC_{START,END}_ALIAS()
Now that all aliases are defined using SYM_FUNC_ALIAS(), remove the old
SYM_FUNC_{START,END}_ALIAS() macros.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 16:21:34 +00:00
Mark Rutland e0891269a8 linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
Currently aliasing an asm function requires adding START and END
annotations for each name, as per Documentation/asm-annotations.rst:

	SYM_FUNC_START_ALIAS(__memset)
	SYM_FUNC_START(memset)
	    ... asm insns ...
	SYM_FUNC_END(memset)
	SYM_FUNC_END_ALIAS(__memset)

This is more painful than necessary to maintain, especially where a
function has many aliases, some of which we may wish to define
conditionally. For example, arm64's memcpy/memmove implementation (which
uses some arch-specific SYM_*() helpers) has:

	SYM_FUNC_START_ALIAS(__memmove)
	SYM_FUNC_START_ALIAS_WEAK_PI(memmove)
	SYM_FUNC_START_ALIAS(__memcpy)
	SYM_FUNC_START_WEAK_PI(memcpy)
	    ... asm insns ...
	SYM_FUNC_END_PI(memcpy)
	EXPORT_SYMBOL(memcpy)
	SYM_FUNC_END_ALIAS(__memcpy)
	EXPORT_SYMBOL(__memcpy)
	SYM_FUNC_END_ALIAS_PI(memmove)
	EXPORT_SYMBOL(memmove)
	SYM_FUNC_END_ALIAS(__memmove)
	EXPORT_SYMBOL(__memmove)
	SYM_FUNC_START(name)

It would be much nicer if we could define the aliases *after* the
standard function definition. This would avoid the need to specify each
symbol name twice, and would make it easier to spot the canonical
function definition.

This patch adds new macros to allow us to do so, which allows the above
example to be rewritten more succinctly as:

	SYM_FUNC_START(__pi_memcpy)
	    ... asm insns ...
	SYM_FUNC_END(__pi_memcpy)

	SYM_FUNC_ALIAS(__memcpy, __pi_memcpy)
	EXPORT_SYMBOL(__memcpy)
	SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
	EXPORT_SYMBOL(memcpy)

	SYM_FUNC_ALIAS(__pi_memmove, __pi_memcpy)
	SYM_FUNC_ALIAS(__memmove, __pi_memmove)
	EXPORT_SYMBOL(__memmove)
	SYM_FUNC_ALIAS_WEAK(memmove, __memmove)
	EXPORT_SYMBOL(memmove)

The reduction in duplication will also make it possible to replace some
uses of WEAK with more accurate Kconfig guards, e.g.

	#ifndef CONFIG_KASAN
	SYM_FUNC_ALIAS(memmove, __memmove)
	EXPORT_SYMBOL(memmove)
	#endif

... which should make it easier to ensure that symbols are neither used
nor overidden unexpectedly.

The existing SYM_FUNC_START_ALIAS() and SYM_FUNC_START_LOCAL_ALIAS() are
marked as deprecated, and will be removed once existing users are moved
over to the new scheme.

The tools/perf/ copy of linkage.h is updated to match. A subsequent
patch will depend upon this when updating the x86 asm annotations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 16:21:33 +00:00
Zhengjun Xing 8a3d2ee0de perf evlist: Fix failed to use cpu list for uncore events
The 'perf record' and 'perf stat' commands have supported the option
'-C/--cpus' to count or collect only on the list of CPUs provided.

Commit 1d3351e631 ("perf tools: Enable on a list of CPUs for
hybrid") add it to be supported for hybrid. For hybrid support, it
checks the cpu list are available on hybrid PMU. But when we test only
uncore events(or events not in cpu_core and cpu_atom), there is a bug:

Before:

 # perf stat -C0  -e uncore_clock/clockticks/ sleep 1
   failed to use cpu list 0

In this case, for uncore event, its pmu_name is not cpu_core or
cpu_atom, so in evlist__fix_hybrid_cpus, perf_pmu__find_hybrid_pmu
should return NULL,both events_nr and unmatched_count should be 0 ,then
the cpu list check function evlist__fix_hybrid_cpus return -1 and the
error "failed to use cpu list 0" will happen. Bypass "events_nr=0" case
then the issue is fixed.

After:

 # perf stat -C0  -e uncore_clock/clockticks/ sleep 1

 Performance counter stats for 'CPU(s) 0':

       195,476,873      uncore_clock/clockticks/

       1.004518677 seconds time elapsed

When testing with at least one core event and uncore events, it has no
issue.

 # perf stat -C0  -e cpu_core/cpu-cycles/,uncore_clock/clockticks/ sleep 1

 Performance counter stats for 'CPU(s) 0':

         5,993,774      cpu_core/cpu-cycles/
       301,025,912      uncore_clock/clockticks/

       1.003964934 seconds time elapsed

Fixes: 1d3351e631 ("perf tools: Enable on a list of CPUs for hybrid")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: alexander.shishkin@intel.com
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220218093127.1844241-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-18 09:59:26 -03:00
Arnaldo Carvalho de Melo 859f7e4554 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes from perf/urgent that recently got merged.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-17 18:40:54 -03:00
Jakub Kicinski 6b5567b1b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 11:44:20 -08:00
Arnaldo Carvalho de Melo 31ded1535e perf bpf: Defer freeing string after possible strlen() on it
This was detected by the gcc in Fedora Rawhide's gcc:

  50    11.01 fedora:rawhide                : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
        inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
    util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
     1225 |                 *key_scan_pos += strlen(map_opt);
          |                                  ^~~~~~~~~~~~~~~
    util/bpf-loader.c:1223:9: note: call to 'free' here
     1223 |         free(map_name);
          |         ^~~~~~~~~~~~~~
    cc1: all warnings being treated as errors

So do the calculations on the pointer before freeing it.

Fixes: 04f9bf2bac ("perf bpf-loader: Add missing '*' for key_scan_pos")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-17 07:15:09 -03:00
James Clark 9de0736973 perf cs-etm: Fix corrupt inject files when only last branch option is enabled
'perf inject' with Coresight data generates files that cannot be opened
when only the last branch option is specified:

  perf inject -i perf.data --itrace=l -o inject.data
  perf script -i inject.data
  0x33faa8 [0x8]: failed to process type: 9 [Bad address]

This is because cs_etm__synth_instruction_sample() is called even when
the sample type for instructions hasn't been setup. Last branch records
are attached to instruction samples so it doesn't make sense to generate
them when --itrace=i isn't specified anyway.

This change disables all calls of cs_etm__synth_instruction_sample()
unless --itrace=i is specified, resulting in a file with no samples if
only --itrace=l is provided, rather than a bad file.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220210200620.1227232-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16 13:49:25 -03:00
James Clark 0b31ea6613 perf cs-etm: No-op refactor of synth opt usage
sample_branches and sample_instructions are already saved in the
synth_opts struct. Other usages like synth_opts.last_branch don't save a
value, so make this more consistent by always going through synth_opts
and not saving duplicate values.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220210200620.1227232-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16 13:49:25 -03:00
Stephane Eranian 052747700e perf report: Add "addr_from" and "addr_to" sort dimensions
With the existing symbol_from/symbol_to, branches captured in the same
function would be collapsed into a single function if the latencies
associated with the each branch (cycles) were all the same.  That is the
case on Intel Broadwell, for instance. Since Intel Skylake, the latency
is captured by hardware and therefore is used to disambiguate branches.

Add addr_from/addr_to sort dimensions to sort branches based on their
addresses and not the function there are in. The output is still the
function name but the offset within the function is provided to uniquely
identify each branch.  These new sort dimensions also help with annotate
because they create different entries in the histogram which, in turn,
generates proper branch annotations.

Here is an example using AMD's branch sampling:

  $ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg

  $ perf report
  Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
  Overhead  Command          Source Shared Object  Source Symbol                                   Target Symbol                                   Basic Block Cycle
    99.65%  test_prg	   test_prg              [.] test_thread                                 [.] test_thread                                 -
     0.02%  test_prg         [kernel.vmlinux]      [k] asm_sysvec_apic_timer_interrupt             [k] error_entry                                 -

  $ perf report -F overhead,comm,dso,addr_from,addr_to
  Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
  Overhead  Command          Shared Object     Source Address          Target Address
     4.22%  test_prg         test_prg          [.] test_thread+0x3c    [.] test_thread+0x4
     4.13%  test_prg         test_prg          [.] test_thread+0x4     [.] test_thread+0x3a
     4.09%  test_prg         test_prg          [.] test_thread+0x3a    [.] test_thread+0x6
     4.08%  test_prg         test_prg          [.] test_thread+0x2     [.] test_thread+0x3c
     4.06%  test_prg         test_prg          [.] test_thread+0x3e    [.] test_thread+0x2
     3.87%  test_prg         test_prg          [.] test_thread+0x6     [.] test_thread+0x38
     3.84%  test_prg         test_prg          [.] test_thread         [.] test_thread+0x3e
     3.76%  test_prg         test_prg          [.] test_thread+0x1e    [.] test_thread
     3.76%  test_prg         test_prg          [.] test_thread+0x38    [.] test_thread+0x8
     3.56%  test_prg         test_prg          [.] test_thread+0x22    [.] test_thread+0x1e
     3.54%  test_prg         test_prg          [.] test_thread+0x8     [.] test_thread+0x36
     3.47%  test_prg         test_prg          [.] test_thread+0x1c    [.] test_thread+0x22
     3.45%  test_prg         test_prg          [.] test_thread+0x36    [.] test_thread+0xa
     3.28%  test_prg         test_prg          [.] test_thread+0x24    [.] test_thread+0x1c
     3.25%  test_prg         test_prg          [.] test_thread+0xa     [.] test_thread+0x34
     3.24%  test_prg         test_prg          [.] test_thread+0x1a    [.] test_thread+0x24
     3.20%  test_prg         test_prg          [.] test_thread+0x34    [.] test_thread+0xc
     3.04%  test_prg         test_prg          [.] test_thread+0x26    [.] test_thread+0x1a
     3.01%  test_prg         test_prg          [.] test_thread+0xc     [.] test_thread+0x32
     2.98%  test_prg         test_prg          [.] test_thread+0x18    [.] test_thread+0x26
     2.94%  test_prg         test_prg          [.] test_thread+0x32    [.] test_thread+0xe
     2.76%  test_prg         test_prg          [.] test_thread+0x28    [.] test_thread+0x18
     2.73%  test_prg         test_prg          [.] test_thread+0xe     [.] test_thread+0x30
     2.67%  test_prg         test_prg          [.] test_thread+0x30    [.] test_thread+0x10
     2.67%  test_prg         test_prg          [.] test_thread+0x16    [.] test_thread+0x28
     2.46%  test_prg         test_prg          [.] test_thread+0x10    [.] test_thread+0x2e
     2.44%  test_prg         test_prg          [.] test_thread+0x2a    [.] test_thread+0x16
     2.38%  test_prg         test_prg          [.] test_thread+0x14    [.] test_thread+0x2a
     2.32%  test_prg         test_prg          [.] test_thread+0x2e    [.] test_thread+0x12
     2.28%  test_prg         test_prg          [.] test_thread+0x12    [.] test_thread+0x2c
     2.16%  test_prg         test_prg          [.] test_thread+0x2c    [.] test_thread+0x14
     0.02%  test_prg         [kernel.vmlinux]  [k] asm_sysvec_apic_ti+0x5  [k] error_entry

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20220208211637.2221872-13-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16 11:21:22 -03:00
Colin Ian King b47f18d85c perf tools: Fix spelling mistake "commpressor" -> "compressor"
There is a spelling mistake in a debug message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220214093547.44590-1-colin.i.king@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16 10:54:28 -03:00
Ian Rogers 3402ae0a2e perf tui: Only support --tui with slang
Make the --tui command line flags dependent HAVE_SLANG_SUPPORT. This was
reported as confusing in:
https://lore.kernel.org/linux-perf-users/YevaTkzdXmFKdGpc@zx-spectrum.none/

Reported-by: xaizek <xaizek@posteo.net>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: xaizek <xaizek@posteo.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220123191849.3655855-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:15:29 -03:00
Adrian Hunter c096fff62d perf scripting python: Add all sample flags to DB export
Currently, the transaction flag (x) is kept separate from branch flags.
Instead of doing the same for the interrupt disabled flags (D and t), add
all flags so that new flags will not need to be handled separately in the
future.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-23-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:15:04 -03:00
Adrian Hunter e92403553b perf intel-pt: Force 'quick' mode when TNT (Taken/Not-Taken packet) is disabled
It is not possible to walk the executable code without TNT packets, so
force 'quick' mode when TNT is disabled, because 'quick' mode does not walk
the code.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-18-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:13:43 -03:00
Adrian Hunter 11f18e4773 perf intel-pt: Synthesize new D (Intr Disabled) and t (Intr Toggle) flags
Update sample flags to represent the state and changes to the interrupt
flag.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-17-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:13:23 -03:00
Adrian Hunter 069ca70e48 perf intel-pt: Synthesize iflag event
Synthesize an attribute event and sample events for changes to the
interrupt flag represented by the MODE.Exec packet.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-16-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:13:02 -03:00
Adrian Hunter ef3b2ba964 perf intel-pt: Synthesize CFE (Control Flow Event) / EVD (Event Data) event
Synthesize an attribute event and sample events for Intel PT Event Trace
events represented by CFE and EVD packets.

Committer notes:

Make 'struct perf_synth_intel_evd evd[]' evd[0] at the end of 'struct
perf_synth_intel_evt' as it is breaking the build with in many compilers
with (e.g. clang version 13.0.0 (Fedora 13.0.0-3.fc35)):

  util/intel-pt.c:2213:31: error: field 'cfe' with variable sized type 'struct perf_synth_intel_evt' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
                  struct perf_synth_intel_evt cfe;
                                              ^

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-15-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:12:13 -03:00
Adrian Hunter f2be829e72 perf intel-pt: Record Event Trace capability flag
The change to the MODE.Exec packet means processing must distinguish
between the old and new cases. Record the Event Trace capability flag to
make that possible.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-14-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:11:16 -03:00
Adrian Hunter 8ee9a9ab81 perf auxtrace: Add itrace option "I"
Add itrace option "I" to synthesize interrupt or similar (asynchronous)
events. This will be used for Intel PT Event Trace events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-13-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:10:30 -03:00
Adrian Hunter 1d0dc1ddf0 perf tools: Define new D (Intr Disable) and t (Intr Toggle) flags
Define 2 new flags to represent:

 - when interrupts are disabled (D)
 - when interrupt disabling toggles (t)

This gives 4 combinations:
		no flag, interrupts enabled
	t	interrupts were enabled but become disabled
	D	interrupts are disabled
	Dt	interrupts were disabled but become enabled

Committer notes:

Those are control flow flags, as per 'tools/perf/Documentation/perf-intel-pt.txt:

<quote>
An interesting field that is not printed by default is 'flags' which can be
displayed as follows:

        perf script --itrace=ibxwpe -F+flags

The flags are "bcrosyiABExgh" which stand for branch, call, return, conditional,
system, asynchronous, interrupt, transaction abort, trace begin, trace end,
in transaction, VM-entry, and VM-exit respectively.
</quote>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-12-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:09:25 -03:00
Adrian Hunter 0d26ba8fec perf tools: Define Intel PT iflag synthesized event
Similar to other Intel PT synth events, define a structure to hold
information about a change to the interrupt flag.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:08:56 -03:00
Adrian Hunter edb4d8432b perf tools: Define Intel PT CFE (Control Flow Event) / EVD (Event Data) event
Similar to other Intel PT synth events, define structures to hold CFE
and EVD data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:08:27 -03:00
Adrian Hunter cf0c98e2ef perf intel-pt: decoder: Add MODE.Exec IFLAG processing
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which adds a bit to the existing
MODE.Exec packet to record the interrupt flag.

Previously, the MODE.Exec packet did not generate any events, so the
new processing required is practically the same as a new packet.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:08:03 -03:00
Adrian Hunter 3733a98bd2 perf intel-pt: decoder: Add CFE (Control Flow Event) and EVD (Event Data) processing
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which requires 2 new packets CFE
(Control Flow Event) and EVD (Event Data).

Each Event Trace event is represented by a CFE packet that is preceded
by zero or more EVD packets. It may be bound to a following FUP (Flow
Update) packet that provides the IP.

Event Trace exposes details about asynchronous events. The CFE packet
contains a type field to identify one of the following:

	 1	INTR		interrupt, fault, exception, NMI
	 2	IRET		interrupt return
	 3	SMI		system management interrupt
	 4	RSM		resume from system management mode
	 5	SIPI		startup interprocessor interrupt
	 6	INIT		INIT signal
	 7	VMENTRY		VM-Entry
	 8	VMEXIT		VM-Entry
	 9	VMEXIT_INTR	VM-Exit due to interrupt
	10	SHUTDOWN	Shutdown

For more details, refer to the Intel SDM, Intel Processor Trace chapter.

Add processing to the decoder for the new packets.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:07:23 -03:00
Adrian Hunter 68ff3cba17 perf intel-pt: decoder: Factor out clearing of FUP (Flow Update) event variables
Factor out clearing of FUP (Flow Update) event variables, to avoid code duplication.

Committer Notes:

From the Intel documentation:

<quote>
Flow Update Packets (FUP): FUPs provide the source IP addresses for
asynchronous events (interrupt and exceptions), as well as other cases
where the source address cannot be determined from the binary.
</quote>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:06:58 -03:00
Adrian Hunter cd9111e670 perf intel-pt: decoder: Add config bit definitions
Tidy up config bit constants to use #define.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:06:32 -03:00
Adrian Hunter f7934477ce perf intel-pt: pkt-decoder: Add MODE.Exec IFLAG bit
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which adds a bit to the existing
MODE.Exec packet to record the interrupt flag. Amend the packet decoder and
packet decoder test accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:06:11 -03:00
Adrian Hunter 2750af50a3 perf intel-pt: pkt-decoder: Add CFE and EVD packets
As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new
Intel PT feature called Event Trace which requires 2 new packets CFE and
EVD. Add them to the packet decoder and packet decoder test.

Committer notes:

I got the "Intel® 64 and IA-32 architectures software developer’s manual
combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:

  https://cdrdv2.intel.com/v1/dl/getContent/671200

And these new packets are described in page 3951:

<quote>
32.2.4

Event Trace is a capability that exposes details about the asynchronous
events, when they are generated, and when their corresponding software
event handler completes execution. These include:

o Interrupts, including NMI and SMI, including the interrupt vector when
defined.

o Faults, exceptions including the fault vector.

— Page faults additionally include the page fault address, when in context.

o Event handler returns, including IRET and RSM.

o VM exits and VM entries.¹

— VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields.
INIT and SIPI events.

o TSX aborts, including the abort status returned for the RTM instructions.

o Shutdown.

Additionally, it provides indication of the status of the Interrupt Flag
(IF), to indicate when interrupts are masked.
</quote>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:05:44 -03:00
Adrian Hunter 6816c25478 perf intel-pt: pkt-decoder: Remove misplaced linebreak
Minor whitespace fix up.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20220124084201.2699795-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-15 17:05:41 -03:00
Ian Rogers 59835f55ce perf map: Make map__contains_symbol() args const
Now unmap_ip is const, make contains symbol const.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: http://lore.kernel.org/lkml/20220211103415.2737789-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-14 16:59:55 -03:00
Ian Rogers 9d31d18bbb perf maps: Move maps code to own C file
The maps code has its own header, move the corresponding C function
definitions to their own C file. In the process tidy and minimize
includes.

Committer notes:

Add back the 'static' for maps__init() and maps__exit().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: http://lore.kernel.org/lkml/20220211103415.2737789-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-14 16:56:32 -03:00
Ian Rogers 0f1b914905 perf maps: Reduce scope of init and exit
Now purely accessed through new and delete, so reduce to file scope.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: http://lore.kernel.org/lkml/20220211103415.2737789-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-14 16:52:14 -03:00
Ian Rogers 1a97cee604 perf maps: Use a pointer for kmaps
struct maps is reference counted, using a pointer is more idiomatic.

Committer notes:

Delay:

   maps = machine__kernel_maps(&vmlinux);

To after:

  machine__init(&vmlinux, "", HOST_KERNEL_ID);

To avoid this on f34:

  In file included from /var/home/acme/git/perf/tools/perf/util/build-id.h:10,
                   from /var/home/acme/git/perf/tools/perf/util/dso.h:13,
                   from tests/vmlinux-kallsyms.c:8:
  In function ‘machine__kernel_maps’,
      inlined from ‘test__vmlinux_matches_kallsyms’ at tests/vmlinux-kallsyms.c:122:22:
  /var/home/acme/git/perf/tools/perf/util/machine.h:86:23: error: ‘vmlinux.kmaps’ is used uninitialized [-Werror=uninitialized]
     86 |         return machine->kmaps;
        |                ~~~~~~~^~~~~~~
  tests/vmlinux-kallsyms.c: In function ‘test__vmlinux_matches_kallsyms’:
  tests/vmlinux-kallsyms.c:121:34: note: ‘vmlinux’ declared here
    121 |         struct machine kallsyms, vmlinux;
        |                                  ^~~~~~~
  cc1: all warnings being treated as errors

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: http://lore.kernel.org/lkml/20220211103415.2737789-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-14 16:47:13 -03:00