Commit Graph

2240 Commits

Author SHA1 Message Date
Andrii Nakryiko d10ef2b825 libbpf: Unify low-level BPF_PROG_LOAD APIs into bpf_prog_load()
Add a new unified OPTS-based low-level API for program loading,
bpf_prog_load() ([0]).  bpf_prog_load() accepts few "mandatory"
parameters as input arguments (program type, name, license,
instructions) and all the other optional (as in not required to specify
for all types of BPF programs) fields into struct bpf_prog_load_opts.

This makes all the other non-extensible APIs variant for BPF_PROG_LOAD
obsolete and they are slated for deprecation in libbpf v0.7:
  - bpf_load_program();
  - bpf_load_program_xattr();
  - bpf_verify_program().

Implementation-wise, internal helper libbpf__bpf_prog_load is refactored
to become a public bpf_prog_load() API. struct bpf_prog_load_params used
internally is replaced by public struct bpf_prog_load_opts.

Unfortunately, while conceptually all this is pretty straightforward,
the biggest complication comes from the already existing bpf_prog_load()
*high-level* API, which has nothing to do with BPF_PROG_LOAD command.

We try really hard to have a new API named bpf_prog_load(), though,
because it maps naturally to BPF_PROG_LOAD command.

For that, we rename old bpf_prog_load() into bpf_prog_load_deprecated()
and mark it as COMPAT_VERSION() for shared library users compiled
against old version of libbpf. Statically linked users and shared lib
users compiled against new version of libbpf headers will get "rerouted"
to bpf_prog_deprecated() through a macro helper that decides whether to
use new or old bpf_prog_load() based on number of input arguments (see
___libbpf_overload in libbpf_common.h).

To test that existing
bpf_prog_load()-using code compiles and works as expected, I've compiled
and ran selftests as is. I had to remove (locally) selftest/bpf/Makefile
-Dbpf_prog_load=bpf_prog_test_load hack because it was conflicting with
the macro-based overload approach. I don't expect anyone else to do
something like this in practice, though. This is testing-specific way to
replace bpf_prog_load() calls with special testing variant of it, which
adds extra prog_flags value. After testing I kept this selftests hack,
but ensured that we use a new bpf_prog_load_deprecated name for this.

This patch also marks bpf_prog_load() and bpf_prog_load_xattr() as deprecated.
bpf_object interface has to be used for working with struct bpf_program.
Libbpf doesn't support loading just a bpf_program.

The silver lining is that when we get to libbpf 1.0 all these
complication will be gone and we'll have one clean bpf_prog_load()
low-level API with no backwards compatibility hackery surrounding it.

  [0] Closes: https://github.com/libbpf/libbpf/issues/284

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-4-andrii@kernel.org
2021-11-07 08:34:23 -08:00
Andrii Nakryiko 45493cbaf5 libbpf: Pass number of prog load attempts explicitly
Allow to control number of BPF_PROG_LOAD attempts from outside the
sys_bpf_prog_load() helper.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-3-andrii@kernel.org
2021-11-07 08:34:23 -08:00
Andrii Nakryiko be80e9cdbc libbpf: Rename DECLARE_LIBBPF_OPTS into LIBBPF_OPTS
It's confusing that libbpf-provided helper macro doesn't start with
LIBBPF. Also "declare" vs "define" is confusing terminology, I can never
remember and always have to look up previous examples.

Bypass both issues by renaming DECLARE_LIBBPF_OPTS into a short and
clean LIBBPF_OPTS. To avoid breaking existing code, provide:

  #define DECLARE_LIBBPF_OPTS LIBBPF_OPTS

in libbpf_legacy.h. We can decide later if we ever want to remove it or
we'll keep it forever because it doesn't add any maintainability burden.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20211103220845.2676888-2-andrii@kernel.org
2021-11-07 08:34:22 -08:00
Andrii Nakryiko b8b5cb55f5 libbpf: Fix non-C89 loop variable declaration in gen_loader.c
Fix the `int i` declaration inside the for statement. This is non-C89
compliant. See [0] for user report breaking BCC build.

  [0] https://github.com/libbpf/libbpf/issues/403

Fixes: 18f4fccbf3 ("libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20211105191055.3324874-1-andrii@kernel.org
2021-11-06 16:16:52 -07:00
Arnaldo Carvalho de Melo 7f9f879243 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up some tools/perf/ patches that went via tip/perf/core, such
as:

  tools/perf: Add mem_hops field in perf_mem_data_src structure

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-06 15:49:33 -03:00
Mehrdad Arshad Rad 64165ddf8e libbpf: Fix lookup_and_delete_elem_flags error reporting
Fix bpf_map_lookup_and_delete_elem_flags() to pass the return code through
libbpf_err_errno() as we do similarly in bpf_map_lookup_and_delete_elem().

Fixes: f12b654327 ("libbpf: Streamline error reporting for low-level APIs")
Signed-off-by: Mehrdad Arshad Rad <arshad.rad@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211104171354.11072-1-arshad.rad@gmail.com
2021-11-05 16:26:08 +01:00
Andrii Nakryiko be2f2d1680 libbpf: Deprecate bpf_program__load() API
Mark bpf_program__load() as deprecated ([0]) since v0.6. Also rename few
internal program loading bpf_object helper functions to have more
consistent naming.

  [0] Closes: https://github.com/libbpf/libbpf/issues/301

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211103051449.1884903-1-andrii@kernel.org
2021-11-03 14:29:56 -07:00
Andrii Nakryiko b7332d2820 libbpf: Improve ELF relo sanitization
Add few sanity checks for relocations to prevent div-by-zero and
out-of-bounds array accesses in libbpf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211103173213.1376990-6-andrii@kernel.org
2021-11-03 13:25:37 -07:00
Andrii Nakryiko 0d6988e16a libbpf: Fix section counting logic
e_shnum does include section #0 and as such is exactly the number of ELF
sections that we need to allocate memory for to use section indices as
array indices. Fix the off-by-one error.

This is purely accounting fix, previously we were overallocating one
too many array items. But no correctness errors otherwise.

Fixes: 25bbbd7a44 ("libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211103173213.1376990-5-andrii@kernel.org
2021-11-03 13:25:37 -07:00
Andrii Nakryiko 62554d52e7 libbpf: Validate that .BTF and .BTF.ext sections contain data
.BTF and .BTF.ext ELF sections should have SHT_PROGBITS type and contain
data. If they are not, ELF is invalid or corrupted, so bail out.
Otherwise this can lead to data->d_buf being NULL and SIGSEGV later on.
Reported by oss-fuzz project.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211103173213.1376990-4-andrii@kernel.org
2021-11-03 13:25:37 -07:00
Andrii Nakryiko 88918dc12d libbpf: Improve sanity checking during BTF fix up
If BTF is corrupted DATASEC's variable type ID might be incorrect.
Prevent this easy to detect situation with extra NULL check.
Reported by oss-fuzz project.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211103173213.1376990-3-andrii@kernel.org
2021-11-03 13:25:37 -07:00
Andrii Nakryiko 833907876b libbpf: Detect corrupted ELF symbols section
Prevent divide-by-zero if ELF is corrupted and has zero sh_entsize.
Reported by oss-fuzz project.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211103173213.1376990-2-andrii@kernel.org
2021-11-03 13:25:37 -07:00
Dave Marchevsky f5aafbc2af libbpf: Deprecate bpf_program__get_prog_info_linear
As part of the road to libbpf 1.0, and discussed in libbpf issue tracker
[0], bpf_program__get_prog_info_linear and its associated structs and
helper functions should be deprecated. The functionality is too specific
to the needs of 'perf', and there's little/no out-of-tree usage to
preclude introduction of a more general helper in the future.

  [0] Closes: https://github.com/libbpf/libbpf/issues/313

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211101224357.2651181-5-davemarchevsky@fb.com
2021-11-03 11:25:36 -07:00
Jakub Kicinski b7b98f8689 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-11-01

We've added 181 non-merge commits during the last 28 day(s) which contain
a total of 280 files changed, 11791 insertions(+), 5879 deletions(-).

The main changes are:

1) Fix bpf verifier propagation of 64-bit bounds, from Alexei.

2) Parallelize bpf test_progs, from Yucong and Andrii.

3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus.

4) Improve bpf selftests on s390, from Ilya.

5) bloomfilter bpf map type, from Joanne.

6) Big improvements to JIT tests especially on Mips, from Johan.

7) Support kernel module function calls from bpf, from Kumar.

8) Support typeless and weak ksym in light skeleton, from Kumar.

9) Disallow unprivileged bpf by default, from Pawan.

10) BTF_KIND_DECL_TAG support, from Yonghong.

11) Various bpftool cleanups, from Quentin.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits)
  libbpf: Deprecate AF_XDP support
  kbuild: Unify options for BTF generation for vmlinux and modules
  selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
  bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
  bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
  selftests/bpf: Fix also no-alu32 strobemeta selftest
  bpf: Add missing map_delete_elem method to bloom filter map
  selftests/bpf: Add bloom map success test for userspace calls
  bpf: Add alignment padding for "map_extra" + consolidate holes
  bpf: Bloom filter map naming fixups
  selftests/bpf: Add test cases for struct_ops prog
  bpf: Add dummy BPF STRUCT_OPS for test purpose
  bpf: Factor out helpers for ctx access checking
  bpf: Factor out a helper to prepare trampoline for struct_ops prog
  selftests, bpf: Fix broken riscv build
  riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h
  tools, build: Add RISC-V to HOSTARCH parsing
  riscv, bpf: Increase the maximum number of iterations
  selftests, bpf: Add one test for sockmap with strparser
  selftests, bpf: Fix test_txmsg_ingress_parser error
  ...
====================

Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 19:59:46 -07:00
Magnus Karlsson 0b170456e0 libbpf: Deprecate AF_XDP support
Deprecate AF_XDP support in libbpf ([0]). This has been moved to
libxdp as it is a better fit for that library. The AF_XDP support only
uses the public libbpf functions and can therefore just use libbpf as
a library from libxdp. The libxdp APIs are exactly the same so it
should just be linking with libxdp instead of libbpf for the AF_XDP
functionality. If not, please submit a bug report. Linking with both
libraries is supported but make sure you link in the correct order so
that the new functions in libxdp are used instead of the deprecated
ones in libbpf.

Libxdp can be found at https://github.com/xdp-project/xdp-tools.

  [0] Closes: https://github.com/libbpf/libbpf/issues/270

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20211029090111.4733-1-magnus.karlsson@gmail.com
2021-11-01 18:12:44 -07:00
Björn Töpel 589fed479b riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h
Add macros for 64-bit RISC-V PT_REGS to bpf_tracing.h.

Signed-off-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211028161057.520552-4-bjorn@kernel.org
2021-11-01 17:08:21 +01:00
Kumar Kartikeya Dwivedi 92274e24b0 libbpf: Use O_CLOEXEC uniformly when opening fds
There are some instances where we don't use O_CLOEXEC when opening an
fd, fix these up. Otherwise, it is possible that a parallel fork causes
these fds to leak into a child process on execve.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-6-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi 549a632386 libbpf: Ensure that BPF syscall fds are never 0, 1, or 2
Add a simple wrapper for passing an fd and getting a new one >= 3 if it
is one of 0, 1, or 2. There are two primary reasons to make this change:
First, libbpf relies on the assumption a certain BPF fd is never 0 (e.g.
most recently noticed in [0]). Second, Alexei pointed out in [1] that
some environments reset stdin, stdout, and stderr if they notice an
invalid fd at these numbers. To protect against both these cases, switch
all internal BPF syscall wrappers in libbpf to always return an fd >= 3.
We only need to modify the syscall wrappers and not other code that
assumes a valid fd by doing >= 0, to avoid pointless churn, and because
it is still a valid assumption. The cost paid is two additional syscalls
if fd is in range [0, 2].

  [0]: e31eec77e4 ("bpf: selftests: Fix fd cleanup in get_branch_snapshot")
  [1]: https://lore.kernel.org/bpf/CAADnVQKVKY8o_3aU8Gzke443+uHa-eGoM0h7W4srChMXU1S4Bg@mail.gmail.com

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-5-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi 585a357198 libbpf: Add weak ksym support to gen_loader
This extends existing ksym relocation code to also support relocating
weak ksyms. Care needs to be taken to zero out the src_reg (currently
BPF_PSEUOD_BTF_ID, always set for gen_loader by bpf_object__relocate_data)
when the BTF ID lookup fails at runtime.  This is not a problem for
libbpf as it only sets ext->is_set when BTF ID lookup succeeds (and only
proceeds in case of failure if ext->is_weak, leading to src_reg
remaining as 0 for weak unresolved ksym).

A pattern similar to emit_relo_kfunc_btf is followed of first storing
the default values and then jumping over actual stores in case of an
error. For src_reg adjustment, we also need to perform it when copying
the populated instruction, so depending on if copied insn[0].imm is 0 or
not, we decide to jump over the adjustment.

We cannot reach that point unless the ksym was weak and resolved and
zeroed out, as the emit_check_err will cause us to jump to cleanup
label, so we do not need to recheck whether the ksym is weak before
doing the adjustment after copying BTF ID and BTF FD.

This is consistent with how libbpf relocates weak ksym. Logging
statements are added to show the relocation result and aid debugging.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-4-memxor@gmail.com
2021-10-28 16:30:06 -07:00
Kumar Kartikeya Dwivedi c24941cd37 libbpf: Add typeless ksym support to gen_loader
This uses the bpf_kallsyms_lookup_name helper added in previous patches
to relocate typeless ksyms. The return value ENOENT can be ignored, and
the value written to 'res' can be directly stored to the insn, as it is
overwritten to 0 on lookup failure. For repeating symbols, we can simply
copy the previously populated bpf_insn.

Also, we need to take care to not close fds for typeless ksym_desc, so
reuse the 'off' member's space to add a marker for typeless ksym and use
that to skip them in cleanup_relos.

We add a emit_ksym_relo_log helper that avoids duplicating common
logging instructions between typeless and weak ksym (for future commit).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-3-memxor@gmail.com
2021-10-28 16:30:06 -07:00
Joanne Koong 47512102cd libbpf: Add "map_extra" as a per-map-type extra flag
This patch adds the libbpf infrastructure for supporting a
per-map-type "map_extra" field, whose definition will be
idiosyncratic depending on map type.

For example, for the bloom filter map, the lower 4 bits of
map_extra is used to denote the number of hash functions.

Please note that until libbpf 1.0 is here, the
"bpf_create_map_params" struct is used as a temporary
means for propagating the map_extra field to the kernel.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joe Burton 689624f037 libbpf: Deprecate bpf_objects_list
Add a flag to `enum libbpf_strict_mode' to disable the global
`bpf_objects_list', preventing race conditions when concurrent threads
call bpf_object__open() or bpf_object__close().

bpf_object__next() will return NULL if this option is set.

Callers may achieve the same workflow by tracking bpf_objects in
application code.

  [0] Closes: https://github.com/libbpf/libbpf/issues/293

Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com
2021-10-27 11:00:12 -07:00
Arnaldo Carvalho de Melo 3a55445f11 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up the fixes from upstream.

Fix simple conflict on session.c related to the file position fix that
went upstream and is touched by the active decomp changes in perf/core.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-26 11:03:02 -03:00
Ilya Leoshkevich 3930198dc9 libbpf: Use __BYTE_ORDER__
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-3-iii@linux.ibm.com
2021-10-25 20:39:41 -07:00
Ilya Leoshkevich 45f2bebc80 libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()
__BYTE_ORDER is supposed to be defined by a libc, and __BYTE_ORDER__ -
by a compiler. bpf_core_read.h checks __BYTE_ORDER == __LITTLE_ENDIAN,
which is true if neither are defined, leading to incorrect behavior on
big-endian hosts if libc headers are not included, which is often the
case.

Fixes: ee26dade0e ("libbpf: Add support for relocatable bitfields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-2-iii@linux.ibm.com
2021-10-25 20:39:41 -07:00
Andrii Nakryiko c4813e969a libbpf: Deprecate ambiguously-named bpf_program__size() API
The name of the API doesn't convey clearly that this size is in number
of bytes (there needed to be a separate comment to make this clear in
libbpf.h). Further, measuring the size of BPF program in bytes is not
exactly the best fit, because BPF programs always consist of 8-byte
instructions. As such, bpf_program__insn_cnt() is a better alternative
in pretty much any imaginable case.

So schedule bpf_program__size() deprecation starting from v0.7 and it
will be removed in libbpf 1.0.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-5-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko e21d585cb3 libbpf: Deprecate multi-instance bpf_program APIs
Schedule deprecation of a set of APIs that are related to multi-instance
bpf_programs:
  - bpf_program__set_prep() ([0]);
  - bpf_program__{set,unset}_instance() ([1]);
  - bpf_program__nth_fd().

These APIs are obscure, very niche, and don't seem to be used much in
practice. bpf_program__set_prep() is pretty useless for anything but the
simplest BPF programs, as it doesn't allow to adjust BPF program load
attributes, among other things. In short, it already bitrotted and will
bitrot some more if not removed.

With bpf_program__insns() API, which gives access to post-processed BPF
program instructions of any given entry-point BPF program, it's now
possible to do whatever necessary adjustments were possible with
set_prep() API before, but also more. Given any such use case is
automatically an advanced use case, requiring users to stick to
low-level bpf_prog_load() APIs and managing their own prog FDs is
reasonable.

  [0] Closes: https://github.com/libbpf/libbpf/issues/299
  [1] Closes: https://github.com/libbpf/libbpf/issues/300

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-4-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko 65a7fa2e4e libbpf: Add ability to fetch bpf_program's underlying instructions
Add APIs providing read-only access to bpf_program BPF instructions ([0]).
This is useful for diagnostics purposes, but it also allows a cleaner
support for cloning BPF programs after libbpf did all the FD resolution
and CO-RE relocations, subprog instructions appending, etc. Currently,
cloning BPF program is possible only through hijacking a half-broken
bpf_program__set_prep() API, which doesn't really work well for anything
but most primitive programs. For instance, set_prep() API doesn't allow
adjusting BPF program load parameters which are necessary for loading
fentry/fexit BPF programs (the case where BPF program cloning is
a necessity if doing some sort of mass-attachment functionality).

Given bpf_program__set_prep() API is set to be deprecated, having
a cleaner alternative is a must. libbpf internally already keeps track
of linear array of struct bpf_insn, so it's not hard to expose it. The
only gotcha is that libbpf previously freed instructions array during
bpf_object load time, which would make this API much less useful overall,
because in between bpf_object__open() and bpf_object__load() a lot of
changes to instructions are done by libbpf.

So this patch makes libbpf hold onto prog->insns array even after BPF
program loading. I think this is a small price for added functionality
and improved introspection of BPF program code.

See retsnoop PR ([1]) for how it can be used in practice and code
savings compared to relying on bpf_program__set_prep().

  [0] Closes: https://github.com/libbpf/libbpf/issues/298
  [1] https://github.com/anakryiko/retsnoop/pull/1

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-3-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko de5d0dcef6 libbpf: Fix off-by-one bug in bpf_core_apply_relo()
Fix instruction index validity check which has off-by-one error.

Fixes: 3ee4f53355 ("libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-2-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko c825f5fee1 libbpf: Fix BTF header parsing checks
Original code assumed fixed and correct BTF header length. That's not
always the case, though, so fix this bug with a proper additional check.
And use actual header length instead of sizeof(struct btf_header) in
sanity checks.

Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211023003157.726961-2-andrii@kernel.org
2021-10-22 17:33:31 -07:00
Andrii Nakryiko 5245dafe3d libbpf: Fix overflow in BTF sanity checks
btf_header's str_off+str_len or type_off+type_len can overflow as they
are u32s. This will lead to bypassing the sanity checks during BTF
parsing, resulting in crashes afterwards. Fix by using 64-bit signed
integers for comparison.

Fixes: d812362450 ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211023003157.726961-1-andrii@kernel.org
2021-10-22 17:33:31 -07:00
Stanislav Fomichev a77f879ba1 libbpf: Use func name when pinning programs with LIBBPF_STRICT_SEC_NAME
We can't use section name anymore because they are not unique
and pinning objects with multiple programs with the same
progtype/secname will fail.

  [0] Closes: https://github.com/libbpf/libbpf/issues/273

Fixes: 33a2c75c55 ("libbpf: add internal pin_name")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20211021214814.1236114-2-sdf@google.com
2021-10-22 16:53:11 -07:00
Hengqi Chen 6a886de070 libbpf: Add btf__type_cnt() and btf__raw_data() APIs
Add btf__type_cnt() and btf__raw_data() APIs and deprecate
btf__get_nr_type() and btf__get_raw_data() since the old APIs
don't follow the libbpf naming convention for getters which
omit 'get' in the name (see [0]). btf__raw_data() is just an
alias to the existing btf__get_raw_data(). btf__type_cnt()
now returns the number of all types of the BTF object
including 'void'.

  [0] Closes: https://github.com/libbpf/libbpf/issues/279

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211022130623.1548429-2-hengqi.chen@gmail.com
2021-10-22 16:09:14 -07:00
Mauricio Vásquez 1000298c76 libbpf: Fix memory leak in btf__dedup()
Free btf_dedup if btf_ensure_modifiable() returns error.

Fixes: 919d2b1dbb ("libbpf: Allow modification of BTF and add btf__add_str API")
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211022202035.48868-1-mauricio@kinvolk.io
2021-10-22 16:00:53 -07:00
Andrii Nakryiko fae1b05e6f libbpf: Fix the use of aligned attribute
Building libbpf sources out of kernel tree (in Github repo) we run into
compilation error due to unknown __aligned attribute. It must be coming
from some kernel header, which is not available to Github sources. Use
explicit __attribute__((aligned(16))) instead.

Fixes: 961632d541 ("libbpf: Fix dumping non-aligned __int128")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022192502.2975553-1-andrii@kernel.org
2021-10-22 14:24:36 -07:00
David S. Miller bdfa75ad70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions.

With a build fix from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22 11:41:16 +01:00
Andrii Nakryiko 26071635ac libbpf: Simplify look up by name of internal maps
Map name that's assigned to internal maps (.rodata, .data, .bss, etc)
consist of a small prefix of bpf_object's name and ELF section name as
a suffix. This makes it hard for users to "guess" the name to use for
looking up by name with bpf_object__find_map_by_name() API.

One proposal was to drop object name prefix from the map name and just
use ".rodata", ".data", etc, names. One downside called out was that
when multiple BPF applications are active on the host, it will be hard
to distinguish between multiple instances of .rodata and know which BPF
object (app) they belong to. Having few first characters, while quite
limiting, still can give a bit of a clue, in general.

Note, though, that btf_value_type_id for such global data maps (ARRAY)
points to DATASEC type, which encodes full ELF name, so tools like
bpftool can take advantage of this fact to "recover" full original name
of the map. This is also the reason why for custom .data.* and .rodata.*
maps libbpf uses only their ELF names and doesn't prepend object name at
all.

Another downside of such approach is that it is not backwards compatible
and, among direct use of bpf_object__find_map_by_name() API, will break
any BPF skeleton generated using bpftool that was compiled with older
libbpf version.

Instead of causing all this pain, libbpf will still generate map name
using a combination of object name and ELF section name, but it will
allow looking such maps up by their natural names, which correspond to
their respective ELF section names. This means non-truncated ELF section
names longer than 15 characters are going to be expected and supported.

With such set up, we get the best of both worlds: leave small bits of
a clue about BPF application that instantiated such maps, as well as
making it easy for user apps to lookup such maps at runtime. In this
sense it closes corresponding libbpf 1.0 issue ([0]).

BPF skeletons will continue using full names for lookups.

  [0] Closes: https://github.com/libbpf/libbpf/issues/275

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021014404.2635234-10-andrii@kernel.org
2021-10-21 17:10:10 -07:00
Andrii Nakryiko aed659170a libbpf: Support multiple .rodata.* and .data.* BPF maps
Add support for having multiple .rodata and .data data sections ([0]).
.rodata/.data are supported like the usual, but now also
.rodata.<whatever> and .data.<whatever> are also supported. Each such
section will get its own backing BPF_MAP_TYPE_ARRAY, just like
.rodata and .data.

Multiple .bss maps are not supported, as the whole '.bss' name is
confusing and might be deprecated soon, as well as user would need to
specify custom ELF section with SEC() attribute anyway, so might as well
stick to just .data.* and .rodata.* convention.

User-visible map name for such new maps is going to be just their ELF
section names.

  [0] https://github.com/libbpf/libbpf/issues/274

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021014404.2635234-8-andrii@kernel.org
2021-10-21 17:10:10 -07:00
Andrii Nakryiko 25bbbd7a44 libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps
Remove internal libbpf assumption that there can be only one .rodata,
.data, and .bss map per BPF object. To achieve that, extend and
generalize the scheme that was used for keeping track of relocation ELF
sections. Now each ELF section has a temporary extra index that keeps
track of logical type of ELF section (relocations, data, read-only data,
BSS). Switch relocation to this scheme, as well as .rodata/.data/.bss
handling.

We don't yet allow multiple .rodata, .data, and .bss sections, but no
libbpf internal code makes an assumption that there can be only one of
each and thus they can be explicitly referenced by a single index. Next
patches will actually allow multiple .rodata and .data sections.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021014404.2635234-5-andrii@kernel.org
2021-10-21 17:10:10 -07:00
Andrii Nakryiko ad23b72384 libbpf: Use Elf64-specific types explicitly for dealing with ELF
Minimize the usage of class-agnostic gelf_xxx() APIs from libelf. These
APIs require copying ELF data structures into local GElf_xxx structs and
have a more cumbersome API. BPF ELF file is defined to be always 64-bit
ELF object, even when intended to be run on 32-bit host architectures,
so there is no need to do class-agnostic conversions everywhere. BPF
static linker implementation within libbpf has been using Elf64-specific
types since initial implementation.

Add two simple helpers, elf_sym_by_idx() and elf_rel_by_idx(), for more
succinct direct access to ELF symbol and relocation records within ELF
data itself and switch all the GElf_xxx usage into Elf64_xxx
equivalents. The only remaining place within libbpf.c that's still using
gelf API is gelf_getclass(), as there doesn't seem to be a direct way to
get underlying ELF bitness.

No functional changes intended.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021014404.2635234-4-andrii@kernel.org
2021-10-21 17:10:10 -07:00
Andrii Nakryiko 29a30ff501 libbpf: Extract ELF processing state into separate struct
Name currently anonymous internal struct that keeps ELF-related state
for bpf_object. Just a bit of clean up, no functional changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021014404.2635234-3-andrii@kernel.org
2021-10-21 17:10:10 -07:00
Andrii Nakryiko b96c07f3b5 libbpf: Deprecate btf__finalize_data() and move it into libbpf.c
There isn't a good use case where anyone but libbpf itself needs to call
btf__finalize_data(). It was implemented for internal use and it's not
clear why it was made into public API in the first place. To function, it
requires active ELF data, which is stored inside bpf_object for the
duration of opening phase only. But the only BTF that needs bpf_object's
ELF is that bpf_object's BTF itself, which libbpf fixes up automatically
during bpf_object__open() operation anyways. There is no need for any
additional fix up and no reasonable scenario where it's useful and
appropriate.

Thus, btf__finalize_data() is just an API atavism and is better removed.
So this patch marks it as deprecated immediately (v0.6+) and moves the
code from btf.c into libbpf.c where it's used in the context of
bpf_object opening phase. Such code co-location allows to make code
structure more straightforward and remove bpf_object__section_size() and
bpf_object__variable_offset() internal helpers from libbpf_internal.h,
making them static. Their naming is also adjusted to not create
a wrong illusion that they are some sort of method of bpf_object. They
are internal helpers and are called appropriately.

This is part of libbpf 1.0 effort ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/276

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211021014404.2635234-2-andrii@kernel.org
2021-10-21 17:10:10 -07:00
Ilya Leoshkevich 632f96d265 libbpf: Fix ptr_is_aligned() usages
Currently ptr_is_aligned() takes size, and not alignment, as a
parameter, which may be overly pessimistic e.g. for __i128 on s390,
which must be only 8-byte aligned. Fix by using btf__align_of().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211021104658.624944-2-iii@linux.ibm.com
2021-10-21 15:50:04 -07:00
Ilya Leoshkevich 961632d541 libbpf: Fix dumping non-aligned __int128
Non-aligned integers are dumped as bitfields, which is supported for at
most 64-bit integers. Fix by using the same trick as
btf_dump_float_data(): copy non-aligned values to the local buffer.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211013160902.428340-4-iii@linux.ibm.com
2021-10-20 11:40:45 -07:00
Ilya Leoshkevich c9e982b879 libbpf: Fix dumping big-endian bitfields
On big-endian arches not only bytes, but also bits are numbered in
reverse order (see e.g. S/390 ELF ABI Supplement, but this is also true
for other big-endian arches as well).

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211013160902.428340-3-iii@linux.ibm.com
2021-10-20 11:40:01 -07:00
Dave Marchevsky ebc7b50a38 libbpf: Migrate internal use of bpf_program__get_prog_info_linear
In preparation for bpf_program__get_prog_info_linear deprecation, move
the single use in libbpf to call bpf_obj_get_info_by_fd directly.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211011082031.4148337-2-davemarchevsky@fb.com
2021-10-20 10:35:49 -07:00
Adrian Hunter 6175047358 perf tools: Add support for PERF_RECORD_AUX_OUTPUT_HW_ID
The PERF_RECORD_AUX_OUTPUT_HW_ID event provides a way to match AUX output
data like Intel PT PEBS-via-PT back to the event that it came from, by
providing a hardware ID that is present in the AUX output.

Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20210907163903.11820-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-20 11:22:27 -03:00
Ian Rogers 92ec3cc94c tools lib: Adopt list_sort() from the kernel sources
Add list_sort.[ch] from the main kernel tree. The linux/bug.h #include
is removed due to conflicting definitions. Add check-headers and modify
perf build accordingly.

MANIFEST and python-ext-sources fixes suggested by Arnaldo.

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Denys Zagorui <dzagorui@cisco.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nicholas Fraser <nfraser@codeweavers.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: ShihCheng Tu <mrtoastcheng@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20211015172132.1162559-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-20 10:30:59 -03:00
Quentin Monnet d51b6b2287 libbpf: Remove Makefile warnings on out-of-sync netlink.h/if_link.h
Although relying on some definitions from the netlink.h and if_link.h
headers copied into tools/include/uapi/linux/, libbpf does not need
those headers to stay entirely up-to-date with their original versions,
and the warnings emitted by the Makefile when it detects a difference
are usually just noise. Let's remove those warnings.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211010002528.9772-1-quentin@isovalent.com
2021-10-19 16:33:10 -07:00
Yonghong Song 223f903e9c bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAG
Patch set [1] introduced BTF_KIND_TAG to allow tagging
declarations for struct/union, struct/union field, var, func
and func arguments and these tags will be encoded into
dwarf. They are also encoded to btf by llvm for the bpf target.

After BTF_KIND_TAG is introduced, we intended to use it
for kernel __user attributes. But kernel __user is actually
a type attribute. Upstream and internal discussion showed
it is not a good idea to mix declaration attribute and
type attribute. So we proposed to introduce btf_type_tag
as a type attribute and existing btf_tag renamed to
btf_decl_tag ([2]).

This patch renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG and some
other declarations with *_tag to *_decl_tag to make it clear
the tag is for declaration. In the future, BTF_KIND_TYPE_TAG
might be introduced per [3].

 [1] https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
 [2] https://reviews.llvm.org/D111588
 [3] https://reviews.llvm.org/D111199

Fixes: b5ea834dde ("bpf: Support for new btf kind BTF_KIND_TAG")
Fixes: 5b84bd1036 ("libbpf: Add support for BTF_KIND_TAG")
Fixes: 5c07f2fec0 ("bpftool: Add support for BTF_KIND_TAG")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com
2021-10-18 18:35:36 -07:00
Shunsuke Nakamura 3ff6d64e68 libperf tests: Fix test_stat_cpu
The `cpu` argument of perf_evsel__read() must specify the cpu index.

perf_cpu_map__for_each_cpu() is for iterating the cpu number (not index)
and is thus not appropriate for use with perf_evsel__read().

So, if there is an offline CPU, the cpu number specified in the argument
may point out of range because the cpu number and the cpu index are
different.

Fix test_stat_cpu().

Testing it:

  # make tests -C tools/lib/perf/
  make: Entering directory '/home/nakamura/kernel_src/linux-5.15-rc4_fix/tools/lib/perf'
  running static:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...OK
  - running tests/test-evsel.c...OK
  running dynamic:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...OK
  - running tests/test-evsel.c...OK
  make: Leaving directory '/home/nakamura/kernel_src/linux-5.15-rc4_fix/tools/lib/perf'

Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20211011083704.4108720-1-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-14 15:41:35 -03:00
Shunsuke Nakamura f304c8d949 libperf test evsel: Fix build error on !x86 architectures
In test_stat_user_read, following build error occurs except i386 and
x86_64 architectures:

tests/test-evsel.c:129:31: error: variable 'pc' set but not used [-Werror=unused-but-set-variable]
  struct perf_event_mmap_page *pc;

Fix build error.

Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211006095703.477826-1-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-14 15:41:35 -03:00
Hou Tao ccaf12d621 libbpf: Support detecting and attaching of writable tracepoint program
Program on writable tracepoint is BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
but its attachment is the same as BPF_PROG_TYPE_RAW_TRACEPOINT.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211004094857.30868-3-hotforest@gmail.com
2021-10-08 13:22:57 -07:00
Quentin Monnet b79c2ce3ba libbpf: Skip re-installing headers file if source is older than target
The "install_headers" target in libbpf's Makefile would unconditionally
export all API headers to the target directory. When those headers are
installed to compile another application, this means that make always
finds newer dependencies for the source files relying on those headers,
and deduces that the targets should be rebuilt.

Avoid that by making "install_headers" depend on the source header
files, and (re-)install them only when necessary.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211007194438.34443-2-quentin@isovalent.com
2021-10-08 11:47:34 -07:00
Riccardo Mancini 73e40c9bd4 libperf cpumap: Use binary search in perf_cpu_map__idx() as array are sorted
Since 7074674e73 ("perf cpumap: Maintain cpumaps ordered and
without dups") perf_cpu_map elements are sorted in ascending order.

This patch improves perf_cpu_map__idx() by using a binary search.

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/f1543c15797169c21e8b205a4a6751159180580d.1629490974.git.rickyman7@gmail.com
[ Removed 'else' after if + return, declared variables where needed ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-08 11:29:16 -03:00
Jakub Kicinski 9fe1155233 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07 15:24:06 -07:00
Hengqi Chen 2088a3a71d libbpf: Deprecate bpf_{map,program}__{prev,next} APIs since v0.7
Deprecate bpf_{map,program}__{prev,next} APIs. Replace them with
a new set of APIs named bpf_object__{prev,next}_{program,map} which
follow the libbpf API naming convention ([0]). No functionality changes.

  [0] Closes: https://github.com/libbpf/libbpf/issues/296

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211003165844.4054931-2-hengqi.chen@gmail.com
2021-10-06 12:34:02 -07:00
Hengqi Chen 4a404a7e8a libbpf: Deprecate bpf_object__unload() API since v0.6
BPF objects are not reloadable after unload. Users are expected to use
bpf_object__close() to unload and free up resources in one operation.
No need to expose bpf_object__unload() as a public API, deprecate it
([0]).  Add bpf_object__unload() as an alias to internal
bpf_object_unload() and replace all bpf_object__unload() uses to avoid
compilation errors.

  [0] Closes: https://github.com/libbpf/libbpf/issues/290

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211002161000.3854559-1-hengqi.chen@gmail.com
2021-10-06 12:34:02 -07:00
Quentin Monnet 929bef4677 bpf: Use $(pound) instead of \# in Makefiles
Recent-ish versions of make do no longer consider number signs ("#") as
comment symbols when they are inserted inside of a macro reference or in
a function invocation. In such cases, the symbols should not be escaped.

There are a few occurrences of "\#" in libbpf's and samples' Makefiles.
In the former, the backslash is harmless, because grep associates no
particular meaning to the escaped symbol and reads it as a regular "#".
In samples' Makefile, recent versions of make will pass the backslash
down to the compiler, making the probe fail all the time and resulting
in the display of a warning about "make headers_install" being required,
even after headers have been installed.

A similar issue has been addressed at some other locations by commit
9564a8cf42 ("Kbuild: fix # escaping in .cmd files for future Make").
Let's address it for libbpf's and samples' Makefiles in the same
fashion, by using a "$(pound)" variable (pulled from
tools/scripts/Makefile.include for libbpf, or re-defined for the
samples).

Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57

Fixes: 2f38304127 ("libbpf: Make libbpf_version.h non-auto-generated")
Fixes: 07c3bbdb1a ("samples: bpf: print a warning about headers_install")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211006111049.20708-1-quentin@isovalent.com
2021-10-06 12:34:02 -07:00
Andrii Nakryiko 7ca6112159 libbpf: Add API that copies all BTF types from one BTF object to another
Add a bulk copying api, btf__add_btf(), that speeds up and simplifies
appending entire contents of one BTF object to another one, taking care
of copying BTF type data, adjusting resulting BTF type IDs according to
their new locations in the destination BTF object, as well as copying
and deduplicating all the referenced strings and updating all the string
offsets in new BTF types as appropriate.

This API is intended to be used from tools that are generating and
otherwise manipulating BTFs generically, such as pahole. In pahole's
case, this API is useful for speeding up parallelized BTF encoding, as
it allows pahole to offload all the intricacies of BTF type copying to
libbpf and handle the parallelization aspects of the process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/bpf/20211006051107.17921-2-andrii@kernel.org
2021-10-06 15:35:46 +02:00
Kumar Kartikeya Dwivedi 18f4fccbf3 libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations
This change updates the BPF syscall loader to relocate BTF_KIND_FUNC
relocations, with support for weak kfunc relocations. The general idea
is to move map_fds to loader map, and also use the data for storing
kfunc BTF fds. Since both reuse the fd_array parameter, they need to be
kept together.

For map_fds, we reserve MAX_USED_MAPS slots in a region, and for kfunc,
we reserve MAX_KFUNC_DESCS. This is done so that insn->off has more
chances of being <= INT16_MAX than treating data map as a sparse array
and adding fd as needed.

When the MAX_KFUNC_DESCS limit is reached, we fall back to the sparse
array model, so that as long as it does remain <= INT16_MAX, we pass an
index relative to the start of fd_array.

We store all ksyms in an array where we try to avoid calling the
bpf_btf_find_by_name_kind helper, and also reuse the BTF fd that was
already stored. This also speeds up the loading process compared to
emitting calls in all cases, in later tests.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-9-memxor@gmail.com
2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi 466b2e1397 libbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0
Preserve these calls as it allows verifier to succeed in loading the
program if they are determined to be unreachable after dead code
elimination during program load. If not, the verifier will fail at
runtime. This is done for ext->is_weak symbols similar to the case for
variable ksyms.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-8-memxor@gmail.com
2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi 9dbe601563 libbpf: Support kernel module function calls
This patch adds libbpf support for kernel module function call support.
The fd_array parameter is used during BPF program load to pass module
BTFs referenced by the program. insn->off is set to index into this
array, but starts from 1, because insn->off as 0 is reserved for
btf_vmlinux.

We try to use existing insn->off for a module, since the kernel limits
the maximum distinct module BTFs for kfuncs to 256, and also because
index must never exceed the maximum allowed value that can fit in
insn->off (INT16_MAX). In the future, if kernel interprets signed offset
as unsigned for kfunc calls, this limit can be increased to UINT16_MAX.

Also introduce a btf__find_by_name_kind_own helper to start searching
from module BTF's start id when we know that the BTF ID is not present
in vmlinux BTF (in find_ksym_btf_id).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211002011757.311265-7-memxor@gmail.com
2021-10-05 17:07:42 -07:00
Jakub Kicinski 6b7b0c3091 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2021-10-02

We've added 85 non-merge commits during the last 15 day(s) which contain
a total of 132 files changed, 13779 insertions(+), 6724 deletions(-).

The main changes are:

1) Massive update on test_bpf.ko coverage for JITs as preparatory work for
   an upcoming MIPS eBPF JIT, from Johan Almbladh.

2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool,
   with driver support for i40e and ice from Magnus Karlsson.

3) Add legacy uprobe support to libbpf to complement recently merged legacy
   kprobe support, from Andrii Nakryiko.

4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky.

5) Support saving the register state in verifier when spilling <8byte bounded
   scalar to the stack, from Martin Lau.

6) Add libbpf opt-in for stricter BPF program section name handling as part
   of libbpf 1.0 effort, from Andrii Nakryiko.

7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov.

8) Fix skel_internal.h to propagate errno if the loader indicates an internal
   error, from Kumar Kartikeya Dwivedi.

9) Fix build warnings with -Wcast-function-type so that the option can later
   be enabled by default for the kernel, from Kees Cook.

10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it
    otherwise errors out when encountering them, from Toke Høiland-Jørgensen.

11) Teach libbpf to recognize specialized maps (such as for perf RB) and
    internally remove BTF type IDs when creating them, from Hengqi Chen.

12) Various fixes and improvements to BPF selftests.
====================

Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 19:58:02 -07:00
Hengqi Chen f731052325 libbpf: Support uniform BTF-defined key/value specification across all BPF maps
A bunch of BPF maps do not support specifying BTF types for key and value.
This is non-uniform and inconvenient[0]. Currently, libbpf uses a retry
logic which removes BTF type IDs when BPF map creation failed. Instead
of retrying, this commit recognizes those specialized maps and removes
BTF type IDs when creating BPF map.

  [0] Closes: https://github.com/libbpf/libbpf/issues/355

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210930161456.3444544-2-hengqi.chen@gmail.com
2021-10-01 15:31:50 -07:00
Andrii Nakryiko b0e875bac0 libbpf: Fix memory leak in strset
Free struct strset itself, not just its internal parts.

Fixes: 90d76d3ece ("libbpf: Extract internal set-of-strings datastructure APIs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211001185910.86492-1-andrii@kernel.org
2021-10-01 22:54:38 +02:00
Jakub Kicinski dd9a887b35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/phy/bcm7xxx.c
  d88fd1b546 ("net: phy: bcm7xxx: Fixed indirect MMD operations")
  f68d08c437 ("net: phy: bcm7xxx: Add EPHY entry for 72165")

net/sched/sch_api.c
  b193e15ac6 ("net: prevent user from passing illegal stab size")
  69508d4333 ("net_sched: Use struct_size() and flex_array_size() helpers")

Both cases trivial - adjacent code additions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 14:49:21 -07:00
Kumar Kartikeya Dwivedi 4729445b47 libbpf: Fix segfault in light skeleton for objects without BTF
When fed an empty BPF object, bpftool gen skeleton -L crashes at
btf__set_fd() since it assumes presence of obj->btf, however for
the sequence below clang adds no .BTF section (hence no BTF).

Reproducer:

  $ touch a.bpf.c
  $ clang -O2 -g -target bpf -c a.bpf.c
  $ bpftool gen skeleton -L a.bpf.o
  /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
  /* THIS FILE IS AUTOGENERATED! */

  struct a_bpf {
	struct bpf_loader_ctx ctx;
  Segmentation fault (core dumped)

The same occurs for files compiled without BTF info, i.e. without
clang's -g flag.

Fixes: 6723474373 (libbpf: Generate loader program out of BPF ELF file.)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210930061634.1840768-1-memxor@gmail.com
2021-09-30 23:19:58 +02:00
Kumar Kartikeya Dwivedi e68ac00827 libbpf: Fix skel_internal.h to set errno on loader retval < 0
When the loader indicates an internal error (result of a checked bpf
system call), it returns the result in attr.test.retval. However, tests
that rely on ASSERT_OK_PTR on NULL (returned from light skeleton) may
miss that NULL denotes an error if errno is set to 0. This would result
in skel pointer being NULL, while ASSERT_OK_PTR returning 1, leading to
a SEGV on dereference of skel, because libbpf_get_error relies on the
assumption that errno is always set in case of error for ptr == NULL.

In particular, this was observed for the ksyms_module test. When
executed using `./test_progs -t ksyms`, prior tests manipulated errno
and the test didn't crash when it failed at ksyms_module load, while
using `./test_progs -t ksyms_module` crashed due to errno being
untouched.

Fixes: 6723474373 (libbpf: Generate loader program out of BPF ELF file.)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210927145941.1383001-11-memxor@gmail.com
2021-09-29 20:42:32 -07:00
Toke Høiland-Jørgensen 161ecd5379 libbpf: Properly ignore STT_SECTION symbols in legacy map definitions
The previous patch to ignore STT_SECTION symbols only added the ignore
condition in one of them. This fails if there's more than one map
definition in the 'maps' section, because the subsequent modulus check will
fail, resulting in error messages like:

libbpf: elf: unable to determine legacy map definition size in ./xdpdump_xdp.o

Fix this by also ignoring STT_SECTION in the first loop.

Fixes: c3e8c44a90 ("libbpf: Ignore STT_SECTION symbols in 'maps' section")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210929213837.832449-1-toke@redhat.com
2021-09-29 15:50:32 -07:00
Alexei Starovoitov 66fe332417 libbpf: Make gen_loader data aligned.
Align gen_loader data to 8 byte boundary to make sure union bpf_attr,
bpf_insns and other structs are aligned.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210927145941.1383001-9-memxor@gmail.com
2021-09-29 13:27:19 -07:00
Andrii Nakryiko 7c80c87ad5 selftests/bpf: Switch sk_lookup selftests to strict SEC("sk_lookup") use
Update "sk_lookup/" definition to be a stand-alone type specifier,
with backwards-compatible prefix match logic in non-libbpf-1.0 mode.

Currently in selftests all the "sk_lookup/<whatever>" uses just use
<whatever> for duplicated unique name encoding, which is redundant as
BPF program's name (C function name) uniquely and descriptively
identifies the intended use for such BPF programs.

With libbpf's SEC_DEF("sk_lookup") definition updated, switch existing
sk_lookup programs to use "unqualified" SEC("sk_lookup") section names,
with no random text after it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-11-andrii@kernel.org
2021-09-28 13:51:20 -07:00
Andrii Nakryiko dd94d45cf0 libbpf: Add opt-in strict BPF program section name handling logic
Implement strict ELF section name handling for BPF programs. It utilizes
`libbpf_set_strict_mode()` framework and adds new flag: LIBBPF_STRICT_SEC_NAME.

If this flag is set, libbpf will enforce exact section name matching for
a lot of program types that previously allowed just partial prefix
match. E.g., if previously SEC("xdp_whatever_i_want") was allowed, now
in strict mode only SEC("xdp") will be accepted, which makes SEC("")
definitions cleaner and more structured. SEC() now won't be used as yet
another way to uniquely encode BPF program identifier (for that
C function name is better and is guaranteed to be unique within
bpf_object). Now SEC() is strictly BPF program type and, depending on
program type, extra load/attach parameter specification.

Libbpf completely supports multiple BPF programs in the same ELF
section, so multiple BPF programs of the same type/specification easily
co-exist together within the same bpf_object scope.

Additionally, a new (for now internal) convention is introduced: section
name that can be a stand-alone exact BPF program type specificator, but
also could have extra parameters after '/' delimiter. An example of such
section is "struct_ops", which can be specified by itself, but also
allows to specify the intended operation to be attached to, e.g.,
"struct_ops/dctcp_init". Note, that "struct_ops_some_op" is not allowed.
Such section definition is specified as "struct_ops+".

This change is part of libbpf 1.0 effort ([0], [1]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/271
  [1] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-10-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko d41ea045a6 libbpf: Complete SEC() table unification for BPF_APROG_SEC/BPF_EAPROG_SEC
Complete SEC() table refactoring towards unified form by rewriting
BPF_APROG_SEC and BPF_EAPROG_SEC definitions with
SEC_DEF(SEC_ATTACHABLE_OPT) (for optional expected_attach_type) and
SEC_DEF(SEC_ATTACHABLE) (mandatory expected_attach_type), respectively.
Drop BPF_APROG_SEC, BPF_EAPROG_SEC, and BPF_PROG_SEC_IMPL macros after
that, leaving SEC_DEF() macro as the only one used.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-9-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko 15ea31fadd libbpf: Refactor ELF section handler definitions
Refactor ELF section handler definitions table to use a set of flags and
unified SEC_DEF() macro. This allows for more succinct and table-like
set of definitions, and allows to more easily extend the logic without
adding more verbosity (this is utilized in later patches in the series).

This approach is also making libbpf-internal program pre-load callback
not rely on bpf_sec_def definition, which demonstrates that future
pluggable ELF section handlers will be able to achieve similar level of
integration without libbpf having to expose extra types and APIs.

For starters, update SEC_DEF() definitions and make them more succinct.
Also convert BPF_PROG_SEC() and BPF_APROG_COMPAT() definitions to
a common SEC_DEF() use.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-8-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko 13d35a0cf1 libbpf: Reduce reliance of attach_fns on sec_def internals
Move closer to not relying on bpf_sec_def internals that won't be part
of public API, when pluggable SEC() handlers will be allowed. Drop
pre-calculated prefix length, and in various helpers don't rely on this
prefix length availability. Also minimize reliance on knowing
bpf_sec_def's prefix for few places where section prefix shortcuts are
supported (e.g., tp vs tracepoint, raw_tp vs raw_tracepoint).

Given checking some string for having a given string-constant prefix is
such a common operation and so annoying to be done with pure C code, add
a small macro helper, str_has_pfx(), and reuse it throughout libbpf.c
where prefix comparison is performed. With __builtin_constant_p() it's
possible to have a convenient helper that checks some string for having
a given prefix, where prefix is either string literal (or compile-time
known string due to compiler optimization) or just a runtime string
pointer, which is quite convenient and saves a lot of typing and string
literal duplication.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-7-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko 12d9466d8b libbpf: Refactor internal sec_def handling to enable pluggability
Refactor internals of libbpf to allow adding custom SEC() handling logic
easily from outside of libbpf. To that effect, each SEC()-handling
registration sets mandatory program type/expected attach type for
a given prefix and can provide three callbacks called at different
points of BPF program lifetime:

  - init callback for right after bpf_program is initialized and
  prog_type/expected_attach_type is set. This happens during
  bpf_object__open() step, close to the very end of constructing
  bpf_object, so all the libbpf APIs for querying and updating
  bpf_program properties should be available;

  - pre-load callback is called right before BPF_PROG_LOAD command is
  called in the kernel. This callbacks has ability to set both
  bpf_program properties, as well as program load attributes, overriding
  and augmenting the standard libbpf handling of them;

  - optional auto-attach callback, which makes a given SEC() handler
  support auto-attachment of a BPF program through bpf_program__attach()
  API and/or BPF skeletons <skel>__attach() method.

Each callbacks gets a `long cookie` parameter passed in, which is
specified during SEC() handling. This can be used by callbacks to lookup
whatever additional information is necessary.

This is not yet completely ready to be exposed to the outside world,
mainly due to non-public nature of struct bpf_prog_load_params. Instead
of making it part of public API, we'll wait until the planned low-level
libbpf API improvements for BPF_PROG_LOAD and other typical bpf()
syscall APIs, at which point we'll have a public, probably OPTS-based,
way to fully specify BPF program load parameters, which will be used as
an interface for custom pre-load callbacks.

But this change itself is already a good first step to unify the BPF
program hanling logic even within the libbpf itself. As one example, all
the extra per-program type handling (sleepable bit, attach_btf_id
resolution, unsetting optional expected attach type) is now more obvious
and is gathered in one place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-6-andrii@kernel.org
2021-09-28 13:51:19 -07:00
Andrii Nakryiko 9673268f03 libbpf: Add "tc" SEC_DEF which is a better name for "classifier"
As argued in [0], add "tc" ELF section definition for SCHED_CLS BPF
program type. "classifier" is a misleading terminology and should be
migrated away from.

  [0] https://lore.kernel.org/bpf/270e27b1-e5be-5b1c-b343-51bd644d0747@iogearbox.net/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-2-andrii@kernel.org
2021-09-28 13:51:19 -07:00
David S. Miller 4ccb9f03fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-09-28

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 14 day(s) which contain
a total of 11 files changed, 139 insertions(+), 53 deletions(-).

The main changes are:

1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk.

2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh.

3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann.

4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi.

5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer.

6) Fix return value handling for struct_ops BPF programs, from Hou Tao.

7) Various fixes to BPF selftests, from Jiri Benc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
,
2021-09-28 13:52:46 +01:00
Kumar Kartikeya Dwivedi bcfd367c28 libbpf: Fix segfault in static linker for objects without BTF
When a BPF object is compiled without BTF info (without -g),
trying to link such objects using bpftool causes a SIGSEGV due to
btf__get_nr_types accessing obj->btf which is NULL. Fix this by
checking for the NULL pointer, and return error.

Reproducer:
$ cat a.bpf.c
extern int foo(void);
int bar(void) { return foo(); }
$ cat b.bpf.c
int foo(void) { return 0; }
$ clang -O2 -target bpf -c a.bpf.c
$ clang -O2 -target bpf -c b.bpf.c
$ bpftool gen obj out a.bpf.o b.bpf.o
Segmentation fault (core dumped)

After fix:
$ bpftool gen obj out a.bpf.o b.bpf.o
libbpf: failed to find BTF info for object 'a.bpf.o'
Error: failed to link 'a.bpf.o': Unknown error -22 (-22)

Fixes: a46349227c (libbpf: Add linker extern resolution support for functions and global variables)
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210924023725.70228-1-memxor@gmail.com
2021-09-28 09:29:03 +02:00
Toke Høiland-Jørgensen c3e8c44a90 libbpf: Ignore STT_SECTION symbols in 'maps' section
When parsing legacy map definitions, libbpf would error out when
encountering an STT_SECTION symbol. This becomes a problem because some
versions of binutils will produce SECTION symbols for every section when
processing an ELF file, so BPF files run through 'strip' will end up with
such symbols, making libbpf refuse to load them.

There's not really any reason why erroring out is strictly necessary, so
change libbpf to just ignore SECTION symbols when parsing the ELF.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210927205810.715656-1-toke@redhat.com
2021-09-27 21:29:37 -07:00
Jakub Kicinski 2fcd14d0f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/mptcp/protocol.c
  977d293e23 ("mptcp: ensure tx skbs always have the MPTCP ext")
  efe686ffce ("mptcp: ensure tx skbs always have the MPTCP ext")

same patch merged in both trees, keep net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23 11:19:49 -07:00
Andrii Nakryiko cc10623c68 libbpf: Add legacy uprobe attaching support
Similarly to recently added legacy kprobe attach interface support
through tracefs, support attaching uprobes using the legacy interface if
host kernel doesn't support newer FD-based interface.

For uprobes event name consists of "libbpf_" prefix, PID, sanitized
binary path and offset within that binary. Structuraly the code is
aligned with kprobe logic refactoring in previous patch. struct
bpf_link_perf is re-used and all the same legacy_probe_name and
legacy_is_retprobe fields are used to ensure proper cleanup on
bpf_link__destroy().

Users should be aware, though, that on old kernels which don't support
FD-based interface for kprobe/uprobe attachment, if the application
crashes before bpf_link__destroy() is called, uprobe legacy
events will be left in tracefs. This is the same limitation as with
legacy kprobe interfaces.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-5-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko 46ed5fc33d libbpf: Refactor and simplify legacy kprobe code
Refactor legacy kprobe handling code to follow the same logic as uprobe
legacy logic added in the next patchs:
  - add append_to_file() helper that makes it simpler to work with
    tracefs file-based interface for creating and deleting probes;
  - move out probe/event name generation outside of the code that
    adds/removes it, which simplifies bookkeeping significantly;
  - change the probe name format to start with "libbpf_" prefix and
    include offset within kernel function;
  - switch 'unsigned long' to 'size_t' for specifying kprobe offsets,
    which is consistent with how uprobes define that, simplifies
    printf()-ing internally, and also avoids unnecessary complications on
    architectures where sizeof(long) != sizeof(void *).

This patch also implicitly fixes the problem with invalid open() error
handling present in poke_kprobe_events(), which (the function) this
patch removes.

Fixes: ca304b40c2 ("libbpf: Introduce legacy kprobe events support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-4-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko 303a257223 libbpf: Fix memory leak in legacy kprobe attach logic
In some error scenarios legacy_probe string won't be free()'d. Fix this.
This was reported by Coverity static analysis.

Fixes: ca304b40c2 ("libbpf: Introduce legacy kprobe events support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-2-andrii@kernel.org
2021-09-21 19:40:08 -07:00
Grant Seltzer 97c140d94e libbpf: Add doc comments in libbpf.h
This adds comments above functions in libbpf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org

These doc comments are for:
- bpf_object__find_map_by_name()
- bpf_map__fd()
- bpf_map__is_internal()
- libbpf_get_error()
- libbpf_num_possible_cpus()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210918031457.36204-1-grantseltzer@gmail.com
2021-09-20 17:23:31 -07:00
Ian Rogers aba5daeb64 libperf evsel: Make use of FD robust.
FD uses xyarray__entry that may return NULL if an index is out of
bounds. If NULL is returned then a segv happens as FD unconditionally
dereferences the pointer. This was happening in a case of with perf
iostat as shown below. The fix is to make FD an "int*" rather than an
int and handle the NULL case as either invalid input or a closed fd.

  $ sudo gdb --args perf stat --iostat  list
  ...
  Breakpoint 1, perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50
  50      {
  (gdb) bt
   #0  perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50
   #1  0x000055555585c188 in evsel__open_cpu (evsel=0x5555560951a0, cpus=0x555556093410,
      threads=0x555556086fb0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
   #2  0x000055555585cfb2 in evsel__open (evsel=0x5555560951a0, cpus=0x0, threads=0x555556086fb0)
      at util/evsel.c:2045
   #3  0x000055555585d0db in evsel__open_per_thread (evsel=0x5555560951a0, threads=0x555556086fb0)
      at util/evsel.c:2065
   #4  0x00005555558ece64 in create_perf_stat_counter (evsel=0x5555560951a0,
      config=0x555555c34700 <stat_config>, target=0x555555c2f1c0 <target>, cpu=0) at util/stat.c:590
   #5  0x000055555578e927 in __run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0)
      at builtin-stat.c:833
   #6  0x000055555578f3c6 in run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0)
      at builtin-stat.c:1048
   #7  0x0000555555792ee5 in cmd_stat (argc=1, argv=0x7fffffffe4a0) at builtin-stat.c:2534
   #8  0x0000555555835ed3 in run_builtin (p=0x555555c3f540 <commands+288>, argc=3,
      argv=0x7fffffffe4a0) at perf.c:313
   #9  0x0000555555836154 in handle_internal_command (argc=3, argv=0x7fffffffe4a0) at perf.c:365
   #10 0x000055555583629f in run_argv (argcp=0x7fffffffe2ec, argv=0x7fffffffe2e0) at perf.c:409
   #11 0x0000555555836692 in main (argc=3, argv=0x7fffffffe4a0) at perf.c:539
  ...
  (gdb) c
  Continuing.
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_iio_0/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/).
  /bin/dmesg | grep -i perf may provide additional information.

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555559b03ea in perf_evsel__close_fd_cpu (evsel=0x5555560951a0, cpu=1) at evsel.c:166
  166                     if (FD(evsel, cpu, thread) >= 0)

v3. fixes a bug in perf_evsel__run_ioctl where the sense of a branch was
    backward.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210918054440.2350466-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-18 17:43:06 -03:00
Dave Marchevsky 6c66b0e7c9 libbpf: Use static const fmt string in __bpf_printk
The __bpf_printk convenience macro was using a 'char' fmt string holder
as it predates support for globals in libbpf. Move to more efficient
'static const char', but provide a fallback to the old way via
BPF_NO_GLOBAL_DATA so users on old kernels can still use the macro.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-6-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky c2758baa97 libbpf: Modify bpf_printk to choose helper based on arg count
Instead of being a thin wrapper which calls into bpf_trace_printk,
libbpf's bpf_printk convenience macro now chooses between
bpf_trace_printk and bpf_trace_vprintk. If the arg count (excluding
format string) is >3, use bpf_trace_vprintk, otherwise use the older
helper.

The motivation behind this added complexity - instead of migrating
entirely to bpf_trace_vprintk - is to maintain good developer experience
for users compiling against new libbpf but running on older kernels.
Users who are passing <=3 args to bpf_printk will see no change in their
bytecode.

__bpf_vprintk functions similarly to BPF_SEQ_PRINTF and BPF_SNPRINTF
macros elsewhere in the file - it allows use of bpf_trace_vprintk
without manual conversion of varargs to u64 array. Previous
implementation of bpf_printk macro is moved to __bpf_printk for use by
the new implementation.

This does change behavior of bpf_printk calls with >3 args in the "new
libbpf, old kernels" scenario. Before this patch, attempting to use 4
args to bpf_printk results in a compile-time error. After this patch,
using bpf_printk with 4 args results in a trace_vprintk helper call
being emitted and a load-time failure on older kernels.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-5-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Andrii Nakryiko 942025c9f3 libbpf: Constify all high-level program attach APIs
Attach APIs shouldn't need to modify bpf_program/bpf_map structs, so
change all struct bpf_program and struct bpf_map pointers to const
pointers. This is completely backwards compatible with no functional
change.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-8-andrii@kernel.org
2021-09-17 09:05:41 -07:00
Andrii Nakryiko 91b555d73e libbpf: Schedule open_opts.attach_prog_fd deprecation since v0.7
bpf_object_open_opts.attach_prog_fd makes a pretty strong assumption
that bpf_object contains either only single freplace BPF program or all
of BPF programs in BPF object are freplaces intended to replace
different subprograms of the same target BPF program. This seems both
a bit confusing, too assuming, and limiting.

We've had bpf_program__set_attach_target() API which allows more
fine-grained control over this, on a per-program level. As such, mark
open_opts.attach_prog_fd as deprecated starting from v0.7, so that we
have one more universal way of setting freplace targets. With previous
change to allow NULL attach_func_name argument, and especially combined
with BPF skeleton, arguable bpf_program__set_attach_target() is a more
convenient and explicit API as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-7-andrii@kernel.org
2021-09-17 09:05:41 -07:00
Andrii Nakryiko 2d5ec1c66e libbpf: Allow skipping attach_func_name in bpf_program__set_attach_target()
Allow to use bpf_program__set_attach_target to only set target attach
program FD, while letting libbpf to use target attach function name from
SEC() definition. This might be useful for some scenarios where
bpf_object contains multiple related freplace BPF programs intended to
replace different sub-programs in target BPF program. In such case all
programs will have the same attach_prog_fd, but different
attach_func_name. It's convenient to specify such target function names
declaratively in SEC() definitions, but attach_prog_fd is a dynamic
runtime setting.

To simplify such scenario, allow bpf_program__set_attach_target() to
delay BTF ID resolution till the BPF program load time by providing NULL
attach_func_name. In that case the behavior will be similar to using
bpf_object_open_opts.attach_prog_fd (which is marked deprecated since
v0.7), but has the benefit of allowing more control by user in what is
attached to what. Such setup allows having BPF programs attached to
different target attach_prog_fd with target functions still declaratively
recorded in BPF source code in SEC() definitions.

Selftests changes in the next patch should make this more obvious.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-5-andrii@kernel.org
2021-09-17 09:05:00 -07:00
Andrii Nakryiko 277641859e libbpf: Deprecated bpf_object_open_opts.relaxed_core_relocs
It's relevant and hasn't been doing anything for a long while now.
Deprecated it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-4-andrii@kernel.org
2021-09-17 09:04:12 -07:00
Andrii Nakryiko f11f86a393 libbpf: Use pre-setup sec_def in libbpf_find_attach_btf_id()
Don't perform another search for sec_def inside
libbpf_find_attach_btf_id(), as each recognized bpf_program already has
prog->sec_def set.

Also remove unnecessary NULL check for prog->sec_name, as it can never
be NULL.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210916015836.1248906-2-andrii@kernel.org
2021-09-17 09:04:12 -07:00
Grant Seltzer 69cd823956 libbpf: Add sphinx code documentation comments
This adds comments above five functions in btf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210915021951.117186-1-grantseltzer@gmail.com
2021-09-15 13:16:02 -07:00
Yonghong Song 5b84bd1036 libbpf: Add support for BTF_KIND_TAG
Add BTF_KIND_TAG support for parsing and dedup.
Also added sanitization for BTF_KIND_TAG. If BTF_KIND_TAG is not
supported in the kernel, sanitize it to INTs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223025.246687-1-yhs@fb.com
2021-09-14 18:45:52 -07:00
Yonghong Song 30025e8bd8 libbpf: Rename btf_{hash,equal}_int to btf_{hash,equal}_int_tag
This patch renames functions btf_{hash,equal}_int() to
btf_{hash,equal}_int_tag() so they can be reused for
BTF_KIND_TAG support. There is no functionality change for
this patch.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210914223020.245829-1-yhs@fb.com
2021-09-14 18:45:52 -07:00
Andrii Nakryiko b6291a6f30 libbpf: Minimize explicit iterator of section definition array
Remove almost all the code that explicitly iterated BPF program section
definitions in favor of using find_sec_def(). The only remaining user of
section_defs is libbpf_get_type_names that has to iterate all of them to
construct its result.

Having one internal API entry point for section definitions will
simplify further refactorings around libbpf's program section
definitions parsing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-5-andrii@kernel.org
2021-09-14 15:49:24 -07:00
Andrii Nakryiko 5532dfd42e libbpf: Simplify BPF program auto-attach code
Remove the need to explicitly pass bpf_sec_def for auto-attachable BPF
programs, as it is already recorded at bpf_object__open() time for all
recognized type of BPF programs. This further reduces number of explicit
calls to find_sec_def(), simplifying further refactorings.

No functional changes are done by this patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-4-andrii@kernel.org
2021-09-14 15:49:24 -07:00
Andrii Nakryiko 91b4d1d1d5 libbpf: Ensure BPF prog types are set before relocations
Refactor bpf_object__open() sequencing to perform BPF program type
detection based on SEC() definitions before we get to relocations
collection. This allows to have more information about BPF program by
the time we get to, say, struct_ops relocation gathering. This,
subsequently, simplifies struct_ops logic and removes the need to
perform extra find_sec_def() resolution.

With this patch libbpf will require all struct_ops BPF programs to be
marked with SEC("struct_ops") or SEC("struct_ops/xxx") annotations.
Real-world applications are already doing that through something like
selftests's BPF_STRUCT_OPS() macro. This change streamlines libbpf's
internal handling of SEC() definitions and is in the sprit of
upcoming libbpf-1.0 section strictness changes ([0]).

  [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914014733.2768-3-andrii@kernel.org
2021-09-14 15:49:24 -07:00
Rafael David Tinoco ca304b40c2 libbpf: Introduce legacy kprobe events support
Allow kprobe tracepoint events creation through legacy interface, as the
kprobe dynamic PMUs support, used by default, was only created in v4.17.

Store legacy kprobe name in struct bpf_perf_link, instead of creating
a new "subclass" off of bpf_perf_link. This is ok as it's just two new
fields, which are also going to be reused for legacy uprobe support in
follow up patches.

Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210912064844.3181742-1-rafaeldtinoco@gmail.com
2021-09-14 14:44:45 -07:00
Andrii Nakryiko 2f38304127 libbpf: Make libbpf_version.h non-auto-generated
Turn previously auto-generated libbpf_version.h header into a normal
header file. This prevents various tricky Makefile integration issues,
simplifies the overall build process, but also allows to further extend
it with some more versioning-related APIs in the future.

To prevent accidental out-of-sync versions as defined by libbpf.map and
libbpf_version.h, Makefile checks their consistency at build time.

Simultaneously with this change bump libbpf.map to v0.6.

Also undo adding libbpf's output directory into include path for
kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
because libbpf_version.h is just a normal header like any other.

Fixes: 0b46b75505 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
2021-09-13 15:36:47 -07:00
Quentin Monnet 0b46b75505 libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations
Introduce a macro LIBBPF_DEPRECATED_SINCE(major, minor, message) to prepare
the deprecation of two API functions. This macro marks functions as deprecated
when libbpf's version reaches the values passed as an argument.

As part of this change libbpf_version.h header is added with recorded major
(LIBBPF_MAJOR_VERSION) and minor (LIBBPF_MINOR_VERSION) libbpf version macros.
They are now part of libbpf public API and can be relied upon by user code.
libbpf_version.h is installed system-wide along other libbpf public headers.

Due to this new build-time auto-generated header, in-kernel applications
relying on libbpf (resolve_btfids, bpftool, bpf_preload) are updated to
include libbpf's output directory as part of a list of include search paths.
Better fix would be to use libbpf's make_install target to install public API
headers, but that clean up is left out as a future improvement. The build
changes were tested by building kernel (with KBUILD_OUTPUT and O= specified
explicitly), bpftool, libbpf, selftests/bpf, and resolve_btfids builds. No
problems were detected.

Note that because of the constraints of the C preprocessor we have to write
a few lines of macro magic for each version used to prepare deprecation (0.6
for now).

Also, use LIBBPF_DEPRECATED_SINCE() to schedule deprecation of
btf__get_from_id() and btf__load(), which are replaced by
btf__load_from_kernel_by_id() and btf__load_into_kernel(), respectively,
starting from future libbpf v0.6. This is part of libbpf 1.0 effort ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/278

Co-developed-by: Quentin Monnet <quentin@isovalent.com>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210908213226.1871016-1-andrii@kernel.org
2021-09-09 23:28:05 +02:00
Andrii Nakryiko 006a5099fc libbpf: Fix build with latest gcc/binutils with LTO
After updating to binutils 2.35, the build began to fail with an
assembler error. A bug was opened on the Red Hat Bugzilla a few days
later for the same issue.

Work around the problem by using the new `symver` attribute (introduced
in GCC 10) as needed instead of assembler directives.

This addresses Red Hat ([0]) and OpenSUSE ([1]) bug reports, as well as libbpf
issue ([2]).

  [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1863059
  [1]: https://bugzilla.opensuse.org/show_bug.cgi?id=1188749
  [2]: Closes: https://github.com/libbpf/libbpf/issues/338

Co-developed-by: Patrick McCarty <patrick.mccarty@intel.com>
Co-developed-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Patrick McCarty <patrick.mccarty@intel.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210907221023.2660953-1-andrii@kernel.org
2021-09-07 19:32:04 -07:00
Matt Smith 08a6f22ef6 libbpf: Change bpf_object_skeleton data field to const pointer
This change was necessary to enforce the implied contract
that bpf_object_skeleton->data should not be mutated.  The data
will be cast to `void *` during assignment to handle the case
where a user is compiling with older libbpf headers to avoid
a compiler warning of `const void *` data being cast to `void *`

Signed-off-by: Matt Smith <alastorze@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901194439.3853238-2-alastorze@fb.com
2021-09-07 17:33:49 -07:00
Toke Høiland-Jørgensen 03e601f48b libbpf: Don't crash on object files with no symbol tables
If libbpf encounters an ELF file that has been stripped of its symbol
table, it will crash in bpf_object__add_programs() when trying to
dereference the obj->efile.symbols pointer.

Fix this by erroring out of bpf_object__elf_collect() if it is not able
able to find the symbol table.

v2:
  - Move check into bpf_object__elf_collect() and add nice error message

Fixes: 6245947c1b ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210901114812.204720-1-toke@redhat.com
2021-09-07 17:18:48 -07:00
Linus Torvalds 27151f1778 perf tools changes for v5.15:
New features:
 
 - Improvements for the flamegraph python script, including:
 
   - Display perf.data header
   - Display PIDs of user stacks
   - Added option to change color scheme
   - Default to blue/green color scheme to improve accessibility
   - Correctly identify kernel stacks when debuginfo is available
 
 - Improvements for 'perf bench futex':
   - Add --mlockall parameter
   - Add --broadcast and --pi to the 'requeue' sub benchmark
 
 - Add support for PMU aliases.
 
 - Introduce an ARM Coresight ETE decoder.
 
 - Add a 'perf bench' entry for evlist open/close operations, to help quantify
   improvements with multithreading 'perf record'.
 
 - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf script's
   python scripting.
 
 - Add a 'perf test' entry for PMU aliases.
 
 - Add a 'perf test' entry for 'perf record/perf report/perf script' pipe mode.
 
 Fixes:
 
 - perf script dlfilter (API for filtering via dynamically loaded shared object
   introduced in v5.14) fixes and a 'perf test' entry for it.
 
 - Fix get_current_dir_name() compilation on Android.
 
 - Fix issues with asciidoc and double dashes uses.
 
 - Fix memory leaks in the BTF handling code.
 
 - Fix leftover problems in the Documentation from the infrastructure originally
   lifted from the git codebase.
 
 - Fix *probe_vfs_getname.sh 'perf test' failures.
 
 - Handle fd gaps in 'perf test's test__dso_data_reopen().
 
 - Make sure to show disasembly warnings for 'perf annotate --stdio'.
 
 - Fix output from pipe to file and vice-versa in 'perf record/report/script'.
 
 - Correct 'perf data -h' output.
 
 - Fix wrong comm in system-wide mode with 'perf record --delay'.
 
 - Do not allow --for-each-cgroup without cpu in 'perf stat'
 
 - Make 'perf test --skip' work on shell tests.
 
 - Fix libperf's verbose printing.
 
 Misc improvements:
 
 - Preparatory patches for multithreading varios 'perf record' phases
   (synthesizing, opening, recording, etc).
 
 - Add sparse context/locking annotations in compiler-types.h, also to help with
   the multithreading effort.
 
 - Optimize the generation of the arch specific erno tables used in 'perf trace'.
 
 - Optimize libperf's perf_cpu_map__max().
 
 - Improve ARM's CoreSight warnings.
 
 - Report collisions in AUX records.
 
 - Improve warnings for the LLVM 'perf test' entry.
 
 - Improve the PMU events 'perf test' codebase.
 
 - perf test: Do not compare overheads in the zstd comp test
 
 - Better support annotation on ARM.
 
 - Update 'perf trace's cmd string table to decode sys_bpf() first arg.
 
 Vendor events:
 
 - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and Elhart Lake.
 
 - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake servers.
 
 Hardware tracing:
 
 - Improvements for the ARM hardware tracing auxtrace support.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYTO8OAAKCRCyPKLppCJ+
 J/N0AQCTAfxKsVyh9vjIY5vwHvNkYMDM4Wvs7TmlfXbVgKLP3AEA0wYlTyar/teI
 NC6jQpipxpWASFUJbLSmU8qn/XFXwQo=
 =4r0w
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Improvements for the flamegraph python script, including:
       - Display perf.data header
       - Display PIDs of user stacks
       - Added option to change color scheme
       - Default to blue/green color scheme to improve accessibility
       - Correctly identify kernel stacks when debuginfo is available

   - Improvements for 'perf bench futex':
       - Add --mlockall parameter
       - Add --broadcast and --pi to the 'requeue' sub benchmark

   - Add support for PMU aliases.

   - Introduce an ARM Coresight ETE decoder.

   - Add a 'perf bench' entry for evlist open/close operations, to help
     quantify improvements with multithreading 'perf record'.

   - Allow reporting the [un]throttle PERF_RECORD_ meta event in 'perf
     script's python scripting.

   - Add a 'perf test' entry for PMU aliases.

   - Add a 'perf test' entry for 'perf record/perf report/perf script'
     pipe mode.

  Fixes:

   - perf script dlfilter (API for filtering via dynamically loaded
     shared object introduced in v5.14) fixes and a 'perf test' entry
     for it.

   - Fix get_current_dir_name() compilation on Android.

   - Fix issues with asciidoc and double dashes uses.

   - Fix memory leaks in the BTF handling code.

   - Fix leftover problems in the Documentation from the infrastructure
     originally lifted from the git codebase.

   - Fix *probe_vfs_getname.sh 'perf test' failures.

   - Handle fd gaps in 'perf test's test__dso_data_reopen().

   - Make sure to show disasembly warnings for 'perf annotate --stdio'.

   - Fix output from pipe to file and vice-versa in 'perf
     record/report/script'.

   - Correct 'perf data -h' output.

   - Fix wrong comm in system-wide mode with 'perf record --delay'.

   - Do not allow --for-each-cgroup without cpu in 'perf stat'

   - Make 'perf test --skip' work on shell tests.

   - Fix libperf's verbose printing.

  Misc improvements:

   - Preparatory patches for multithreading various 'perf record' phases
     (synthesizing, opening, recording, etc).

   - Add sparse context/locking annotations in compiler-types.h, also to
     help with the multithreading effort.

   - Optimize the generation of the arch specific erno tables used in
     'perf trace'.

   - Optimize libperf's perf_cpu_map__max().

   - Improve ARM's CoreSight warnings.

   - Report collisions in AUX records.

   - Improve warnings for the LLVM 'perf test' entry.

   - Improve the PMU events 'perf test' codebase.

   - perf test: Do not compare overheads in the zstd comp test

   - Better support annotation on ARM.

   - Update 'perf trace's cmd string table to decode sys_bpf() first
     arg.

  Vendor events:

   - Add JSON events and metrics for Intel's Ice Lake, Tiger Lake and
     Elhart Lake.

   - Update JSON eventsand metrics for Intel's Cascade Lake and Sky Lake
     servers.

  Hardware tracing:

   - Improvements for the ARM hardware tracing auxtrace support"

* tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (130 commits)
  perf tests: Add test for PMU aliases
  perf pmu: Add PMU alias support
  perf session: Report collisions in AUX records
  perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event
  perf build: Report failure for testing feature libopencsd
  perf cs-etm: Show a warning for an unknown magic number
  perf cs-etm: Print the decoder name
  perf cs-etm: Create ETE decoder
  perf cs-etm: Update OpenCSD decoder for ETE
  perf cs-etm: Fix typo
  perf cs-etm: Save TRCDEVARCH register
  perf cs-etm: Refactor out ETMv4 header saving
  perf cs-etm: Initialise architecture based on TRCIDR1
  perf cs-etm: Refactor initialisation of decoder params.
  tools build: Fix feature detect clean for out of source builds
  perf evlist: Add evlist__for_each_entry_from() macro
  perf evsel: Handle precise_ip fallback in evsel__open_cpu()
  perf evsel: Move bpf_counter__install_pe() to success path in evsel__open_cpu()
  perf evsel: Move test_attr__open() to success path in evsel__open_cpu()
  perf evsel: Move ignore_missing_thread() to fallback code
  ...
2021-09-05 11:56:18 -07:00
Riccardo Mancini 6e93bc534f libperf cpumap: Take into advantage it is sorted to optimize perf_cpu_map__max()
From commit 7074674e73 ("perf cpumap: Maintain cpumaps ordered and
without dups"), perf_cpu_map elements are sorted in ascending order.

This patch improves the perf_cpu_map__max function by returning the last
element.

Committer notes:

Do it as a ternary to keep it in just one return line, add a comment
explaining it is sorted and what functions does it.

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/fb79f02e7b86ea8044d563adb1e9890c906f982f.1629490974.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-31 16:17:02 -03:00
Riccardo Mancini b75f299d69 libsubcmd: add OPT_UINTEGER_OPTARG option type
This patch adds OPT_UINTEGER_OPTARG, which is the same as OPT_UINTEGER,
but also makes it possible to use the option without any value, setting
the variable to a default value, d.

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/c46749b3dff796729078352ff164d363457a3587.1629490974.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-31 15:44:05 -03:00
Arnaldo Carvalho de Melo c635813fef Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-30 10:05:46 -03:00
Shunsuke Nakamura 37c3193fa4 libperf tests: Fix verbose printing
libperf's verbose printing checks the -v option every time the macro _T_ START
is called.

Since there are currently four libperf tests registered, the macro _T_ START is
called four times, but verbose printing after the second time is not output.

Resets the index of the element processed by getopt() and fix verbose printing
so that it prints in all tests.

Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210820093908.734503-3-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-24 15:05:35 -03:00
Andrii Nakryiko 5e3b8356de libbpf: Add uprobe ref counter offset support for USDT semaphores
When attaching to uprobes through perf subsystem, it's possible to specify
offset of a so-called USDT semaphore, which is just a reference counted u16,
used by kernel to keep track of how many tracers are attached to a given
location. Support for this feature was added in [0], so just wire this through
uprobe_opts. This is important to enable implementing USDT attachment and
tracing through libbpf's bpf_program__attach_uprobe_opts() API.

  [0] a6ca88b241 ("trace_uprobe: support reference counter in fd-based uprobe")

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko 47faff3717 libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs
Wire through bpf_cookie for all attach APIs that use perf_event_open under the
hood:
  - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
  - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
    pass bpf_cookie through opts.

For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
bpf_cookie is not supported either, return error and log warning for user.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko 3ec84f4b16 libbpf: Add bpf_cookie support to bpf_link_create() API
Add ability to specify bpf_cookie value when creating BPF perf link with
bpf_link_create() low-level API.

Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
and corresponding OPTS struct to accomodate such changes. Add extra checks to
prevent using incompatible/unexpected combinations of fields.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko 668ace0ea5 libbpf: Use BPF perf link when supported by kernel
Detect kernel support for BPF perf link and prefer it when attaching to
perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
open until BPF link is destroyed, at which point both perf_event FD and BPF
link FD will be closed.

This preserves current behavior in which perf_event FD is open for the
duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
will keep both perf_event and bpf_link FDs open, so it will be up to
(advanced) user to close them. This approach is demonstrated in bpf_cookie.c
selftests, added in this patch set.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko d88b71d4a9 libbpf: Remove unused bpf_link's destroy operation, but add dealloc
bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
to override deallocation procedure, with default doing plain free(link). This
is necessary for cases when we want to "subclass" struct bpf_link to keep
extra information, as is the case in the next patch adding struct
bpf_link_perf.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-9-andrii@kernel.org
2021-08-17 00:45:08 +02:00
Andrii Nakryiko 61c7aa5020 libbpf: Re-build libbpf.so when libbpf.map changes
Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
changes to libbpf.map are not detected and versioned symbols mismatch error
will be reported until `make clean && make` is used, which is a suboptimal
developer experience.

Fixes: 306b267cb3 ("libbpf: Verify versioned symbols")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-8-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Hao Luo 2211c825e7 libbpf: Support weak typed ksyms.
Currently weak typeless ksyms have default value zero, when they don't
exist in the kernel. However, weak typed ksyms are rejected by libbpf
if they can not be resolved. This means that if a bpf object contains
the declaration of a nonexistent weak typed ksym, it will be rejected
even if there is no program that references the symbol.

Nonexistent weak typed ksyms can also default to zero just like
typeless ones. This allows programs that access weak typed ksyms to be
accepted by verifier, if the accesses are guarded. For example,

extern const int bpf_link_fops3 __ksym __weak;

/* then in BPF program */

if (&bpf_link_fops3) {
   /* use bpf_link_fops3 */
}

If actual use of nonexistent typed ksym is not guarded properly,
verifier would see that register is not PTR_TO_BTF_ID and wouldn't
allow to use it for direct memory reads or passing it to BPF helpers.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210812003819.2439037-1-haoluo@google.com
2021-08-13 15:56:28 -07:00
Jakub Kicinski f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f2580 ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda65 ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557d ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a376 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Jin Yao 2696d6e59c libperf: Add perf_cpu_map__default_new()
libperf already has a static function called 'cpu_map__default_new()'.

Add a new API perf_cpu_map__default_new() to export the function.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https //lore.kernel.org/r/20210723063433.7318-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-11 16:03:36 -03:00
Daniel Xu c34c338a40 libbpf: Do not close un-owned FD 0 on errors
Before this patch, btf_new() was liable to close an arbitrary FD 0 if
BTF parsing failed. This was because:

* btf->fd was initialized to 0 through the calloc()
* btf__free() (in the `done` label) closed any FDs >= 0
* btf->fd is left at 0 if parsing fails

This issue was discovered on a system using libbpf v0.3 (without
BTF_KIND_FLOAT support) but with a kernel that had BTF_KIND_FLOAT types
in BTF. Thus, parsing fails.

While this patch technically doesn't fix any issues b/c upstream libbpf
has BTF_KIND_FLOAT support, it'll help prevent issues in the future if
more BTF types are added. It also allow the fix to be backported to
older libbpf's.

Fixes: 3289959b97 ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/5969bb991adedb03c6ae93e051fd2a00d293cf25.1627513670.git.dxu@dxuuu.xyz
2021-08-07 01:39:15 +02:00
Robin Gögge 78d14bda86 libbpf: Fix probe for BPF_PROG_TYPE_CGROUP_SOCKOPT
This patch fixes the probe for BPF_PROG_TYPE_CGROUP_SOCKOPT,
so the probe reports accurate results when used by e.g.
bpftool.

Fixes: 4cdbfb59c4 ("libbpf: support sockopt hooks")
Signed-off-by: Robin Gögge <r.goegge@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20210728225825.2357586-1-r.goegge@gmail.com
2021-08-07 01:38:52 +02:00
Hengqi Chen a710eed386 libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
Add two new APIs: btf__load_vmlinux_btf and btf__load_module_btf.
btf__load_vmlinux_btf is just an alias to the existing API named
libbpf_find_kernel_btf, rename to be more precisely and consistent
with existing BTF APIs. btf__load_module_btf can be used to load
module BTF, add it for completeness. These two APIs are useful for
implementing tracing tools and introspection tools. This is part
of the effort towards libbpf 1.0 ([0]).

  [0] Closes: https://github.com/libbpf/libbpf/issues/280

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210730114012.494408-1-hengqi.chen@gmail.com
2021-07-30 12:21:29 -07:00
Quentin Monnet 61fc51b1d3 libbpf: Add split BTF support for btf__load_from_kernel_by_id()
Add a new API function btf__load_from_kernel_by_id_split(), which takes
a pointer to a base BTF object in order to support split BTF objects
when retrieving BTF information from the kernel.

Reference: https://github.com/libbpf/libbpf/issues/314

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210729162028.29512-8-quentin@isovalent.com
2021-07-29 17:23:59 -07:00
Quentin Monnet 6cc93e2f2c libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
Rename function btf__get_from_id() as btf__load_from_kernel_by_id() to
better indicate what the function does. Change the new function so that,
instead of requiring a pointer to the pointer to update and returning
with an error code, it takes a single argument (the id of the BTF
object) and returns the corresponding pointer. This is more in line with
the existing constructors.

The other tools calling the (soon-to-be) deprecated btf__get_from_id()
function will be updated in a future commit.

References:

- https://github.com/libbpf/libbpf/issues/278
- https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210729162028.29512-4-quentin@isovalent.com
2021-07-29 17:09:19 -07:00
Quentin Monnet 3c7e585906 libbpf: Rename btf__load() as btf__load_into_kernel()
As part of the effort to move towards a v1.0 for libbpf, rename
btf__load() function, used to "upload" BTF information into the kernel,
as btf__load_into_kernel(). This new name better reflects what the
function does.

References:

- https://github.com/libbpf/libbpf/issues/278
- https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#btfh-apis

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210729162028.29512-3-quentin@isovalent.com
2021-07-29 17:03:41 -07:00
Quentin Monnet 6d2d73cdd6 libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
Variable "err" is initialised to -EINVAL so that this error code is
returned when something goes wrong in libbpf_find_prog_btf_id().
However, a recent change in the function made use of the variable in
such a way that it is set to 0 if retrieving linear information on the
program is successful, and this 0 value remains if we error out on
failures at later stages.

Let's fix this by setting err to -EINVAL later in the function.

Fixes: e9fc3ce99b ("libbpf: Streamline error reporting for high-level APIs")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210729162028.29512-2-quentin@isovalent.com
2021-07-29 17:03:41 -07:00
Martynas Pumputis 043c5bb3c4 libbpf: Fix race when pinning maps in parallel
When loading in parallel multiple programs which use the same to-be
pinned map, it is possible that two instances of the loader will call
bpf_object__create_maps() at the same time. If the map doesn't exist
when both instances call bpf_object__reuse_map(), then one of the
instances will fail with EEXIST when calling bpf_map__pin().

Fix the race by retrying reusing a map if bpf_map__pin() returns
EEXIST. The fix is similar to the one in iproute2: e4c4685fd6e4 ("bpf:
Fix race condition with map pinning").

Before retrying the pinning, we don't do any special cleaning of an
internal map state. The closer code inspection revealed that it's not
required:

    - bpf_object__create_map(): map->inner_map is destroyed after a
      successful call, map->fd is closed if pinning fails.
    - bpf_object__populate_internal_map(): created map elements is
      destroyed upon close(map->fd).
    - init_map_slots(): slots are freed after their initialization.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210726152001.34845-1-m@lambda.lt
2021-07-27 14:32:03 -07:00
Jason Wang c139e40a51 libbpf: Fix comment typo
Remove the repeated word 'the' in line 48.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210727115928.74600-1-wangborong@cdjrlc.com
2021-07-27 14:18:47 -07:00
Alexei Starovoitov b0588390db libbpf: Split CO-RE logic into relo_core.c.
Move CO-RE logic into separate file.
The internal interface between libbpf and CO-RE is through
bpf_core_apply_relo_insn() function and few structs defined in relo_core.h.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-5-alexei.starovoitov@gmail.com
2021-07-26 12:29:14 -07:00
Alexei Starovoitov 301ba4d710 libbpf: Move CO-RE types into relo_core.h.
In order to make a clean split of CO-RE logic move its types
into independent header file.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-4-alexei.starovoitov@gmail.com
2021-07-26 12:19:41 -07:00
Alexei Starovoitov 3ee4f53355 libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.
bpf_core_apply_relo() doesn't need to know bpf_program internals
and hashmap details.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-3-alexei.starovoitov@gmail.com
2021-07-26 12:13:05 -07:00
Alexei Starovoitov 6e43b28607 libbpf: Cleanup the layering between CORE and bpf_program.
CO-RE processing functions don't need to know 'struct bpf_program' details.
Cleanup the layering to eventually be able to move CO-RE logic into a separate file.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721000822.40958-2-alexei.starovoitov@gmail.com
2021-07-26 12:11:23 -07:00
Evgeniy Litvinenko e244d34d0e libbpf: Add bpf_map__pin_path function
Add bpf_map__pin_path, so that the inconsistently named
bpf_map__get_pin_path can be deprecated later. This is part of the
effort towards libbpf v1.0: https://github.com/libbpf/libbpf/issues/307

Also, add a selftest for the new function.

Signed-off-by: Evgeniy Litvinenko <evgeniyl@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210723221511.803683-1-evgeniyl@fb.com
2021-07-23 16:57:03 -07:00
Jiri Olsa da97553ec6 libbpf: Export bpf_program__attach_kprobe_opts function
Export bpf_program__attach_kprobe_opts as a public API.

Rename bpf_program_attach_kprobe_opts to bpf_kprobe_opts and turn it into OPTS
struct.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210721215810.889975-4-jolsa@kernel.org
2021-07-22 20:09:16 -07:00
Jiri Olsa e3f9bc35ea libbpf: Allow decimal offset for kprobes
Allow to specify decimal offset in SEC macro, like:
  SEC("kprobe/bpf_fentry_test7+5")

Add selftest for that.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210721215810.889975-3-jolsa@kernel.org
2021-07-22 20:09:16 -07:00
Jiri Olsa 1f71a468a7 libbpf: Fix func leak in attach_kprobe
Add missing free() for func pointer in attach_kprobe function.

Fixes: a2488b5f48 ("libbpf: Allow specification of "kprobe/function+offset"")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210721215810.889975-2-jolsa@kernel.org
2021-07-22 20:08:39 -07:00
Alan Maguire 720c29fca9 libbpf: Propagate errors when retrieving enum value for typed data display
When retrieving the enum value associated with typed data during
"is data zero?" checking in btf_dump_type_data_check_zero(), the
return value of btf_dump_get_enum_value() is not passed to the caller
if the function returns a non-zero (error) value.  Currently, 0
is returned if the function returns an error.  We should instead
propagate the error to the caller.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626770993-11073-4-git-send-email-alan.maguire@oracle.com
2021-07-20 13:49:25 -07:00
Alan Maguire a1d3cc3c5e libbpf: Avoid use of __int128 in typed dump display
__int128 is not supported for some 32-bit platforms (arm and i386).
__int128 was used in carrying out computations on bitfields which
aid display, but the same calculations could be done with __u64
with the small effect of not supporting 128-bit bitfields.

With these changes, a big-endian issue with casting 128-bit integers
to 64-bit for enum bitfields is solved also, as we now use 64-bit
integers for bitfield calculations.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626770993-11073-2-git-send-email-alan.maguire@oracle.com
2021-07-20 13:49:25 -07:00
Martynas Pumputis a21ab4c59e libbpf: Fix removal of inner map in bpf_object__create_map
If creating an outer map of a BTF-defined map-in-map fails (via
bpf_object__create_map()), then the previously created its inner map
won't be destroyed.

Fix this by ensuring that the destroy routines are not bypassed in the
case of a failure.

Fixes: 646f02ffdd ("libbpf: Add BTF-defined map-in-map support")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210719173838.423148-2-m@lambda.lt
2021-07-19 15:48:20 -07:00
Alan Maguire add192f81a libbpf: Btf typed dump does not need to allocate dump data
By using the stack for this small structure, we avoid the need
for freeing memory in error paths.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626475617-25984-4-git-send-email-alan.maguire@oracle.com
2021-07-16 17:29:56 -07:00
Alan Maguire 04eb4dff6a libbpf: Fix compilation errors on ppc64le for btf dump typed data
__s64 can be defined as either long or long long, depending on the
architecture. On ppc64le it's defined as long, giving this error:

 In file included from btf_dump.c:22:
btf_dump.c: In function 'btf_dump_type_data_check_overflow':
libbpf_internal.h:111:22: error: format '%lld' expects argument of
type 'long long int', but argument 3 has type '__s64' {aka 'long int'}
[-Werror=format=]
  111 |  libbpf_print(level, "libbpf: " fmt, ##__VA_ARGS__); \
      |                      ^~~~~~~~~~
libbpf_internal.h:114:27: note: in expansion of macro '__pr'
  114 | #define pr_warn(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__)
      |                           ^~~~
btf_dump.c:1992:3: note: in expansion of macro 'pr_warn'
 1992 |   pr_warn("unexpected size [%lld] for id [%u]\n",
      |   ^~~~~~~
btf_dump.c:1992:32: note: format string is defined here
 1992 |   pr_warn("unexpected size [%lld] for id [%u]\n",
      |                             ~~~^
      |                                |
      |                                long long int
      |                             %ld

Cast to size_t and use %zu instead.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626475617-25984-3-git-send-email-alan.maguire@oracle.com
2021-07-16 17:25:57 -07:00
Alan Maguire 8d44c3578b libbpf: Clarify/fix unaligned data issues for btf typed dump
If data is packed, data structures can store it outside of usual
boundaries.  For example a 4-byte int can be stored on a unaligned
boundary in a case like this:

struct s {
	char f1;
	int f2;
} __attribute((packed));

...the int is stored at an offset of one byte.  Some platforms have
problems dereferencing data that is not aligned with its size, and
code exists to handle most cases of this for BTF typed data display.
However pointer display was missed, and a simple function to test if
"ptr_is_aligned(data, data_sz)" would help clarify this code.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626475617-25984-2-git-send-email-alan.maguire@oracle.com
2021-07-16 17:24:55 -07:00
Alan Maguire 920d16af9b libbpf: BTF dumper support for typed data
Add a BTF dumper for typed data, so that the user can dump a typed
version of the data provided.

The API is

int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
                             void *data, size_t data_sz,
                             const struct btf_dump_type_data_opts *opts);

...where the id is the BTF id of the data pointed to by the "void *"
argument; for example the BTF id of "struct sk_buff" for a
"struct skb *" data pointer.  Options supported are

 - a starting indent level (indent_lvl)
 - a user-specified indent string which will be printed once per
   indent level; if NULL, tab is chosen but any string <= 32 chars
   can be provided.
 - a set of boolean options to control dump display, similar to those
   used for BPF helper bpf_snprintf_btf().  Options are
        - compact : omit newlines and other indentation
        - skip_names: omit member names
        - emit_zeroes: show zero-value members

Default output format is identical to that dumped by bpf_snprintf_btf(),
for example a "struct sk_buff" representation would look like this:

struct sk_buff){
	(union){
		(struct){
			.next = (struct sk_buff *)0xffffffffffffffff,
			.prev = (struct sk_buff *)0xffffffffffffffff,
		(union){
			.dev = (struct net_device *)0xffffffffffffffff,
			.dev_scratch = (long unsigned int)18446744073709551615,
		},
	},
...

If the data structure is larger than the *data_sz*
number of bytes that are available in *data*, as much
of the data as possible will be dumped and -E2BIG will
be returned.  This is useful as tracers will sometimes
not be able to capture all of the data associated with
a type; for example a "struct task_struct" is ~16k.
Being able to specify that only a subset is available is
important for such cases.  On success, the amount of data
dumped is returned.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626362126-27775-2-git-send-email-alan.maguire@oracle.com
2021-07-16 13:46:59 -07:00
Shuyi Cheng 18353c87e0 libbpf: Fix the possible memory leak on error
If the strdup() fails then we need to call bpf_object__close(obj) to
avoid a resource leak.

Fixes: 166750bc1d ("libbpf: Support libbpf-provided extern variables")
Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626180159-112996-3-git-send-email-chengshuyi@linux.alibaba.com
2021-07-16 13:22:47 -07:00
Shuyi Cheng 1373ff5995 libbpf: Introduce 'btf_custom_path' to 'bpf_obj_open_opts'
btf_custom_path allows developers to load custom BTF which libbpf will
subsequently use for CO-RE relocation instead of vmlinux BTF.

Having btf_custom_path in bpf_object_open_opts one can directly use the
skeleton's <objname>_bpf__open_opts() API to pass in the btf_custom_path
parameter, as opposed to using bpf_object__load_xattr() which is slated to be
deprecated ([0]).

This work continues previous work started by another developer ([1]).

  [0] https://lore.kernel.org/bpf/CAEf4BzbJZLjNoiK8_VfeVg_Vrg=9iYFv+po-38SMe=UzwDKJ=Q@mail.gmail.com/#t
  [1] https://yhbt.net/lore/all/CAEf4Bzbgw49w2PtowsrzKQNcxD4fZRE6AKByX-5-dMo-+oWHHA@mail.gmail.com/

Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626180159-112996-2-git-send-email-chengshuyi@linux.alibaba.com
2021-07-16 13:22:47 -07:00
David S. Miller 82a1ffe57e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-07-15

The following pull-request contains BPF updates for your *net-next* tree.

We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).

The main changes are:

1) Introduce bpf timers, from Alexei.

2) Add sockmap support for unix datagram socket, from Cong.

3) Fix potential memleak and UAF in the verifier, from He.

4) Add bpf_get_func_ip helper, from Jiri.

5) Improvements to generic XDP mode, from Kumar.

6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15 22:40:10 -07:00
Alan Maguire a2488b5f48 libbpf: Allow specification of "kprobe/function+offset"
kprobes can be placed on most instructions in a function, not
just entry, and ftrace and bpftrace support the function+offset
notification for probe placement.  Adding parsing of func_name
into func+offset to bpf_program__attach_kprobe() allows the
user to specify

SEC("kprobe/bpf_fentry_test5+0x6")

...for example, and the offset can be passed to perf_event_open_probe()
to support kprobe attachment.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-8-jolsa@kernel.org
2021-07-15 17:59:14 -07:00
Jiri Olsa ac0ed48829 libbpf: Add bpf_program__attach_kprobe_opts function
Adding bpf_program__attach_kprobe_opts that does the same
as bpf_program__attach_kprobe, but takes opts argument.

Currently opts struct holds just retprobe bool, but we will
add new field in following patch.

The function is not exported, so there's no need to add
size to the struct bpf_program_attach_kprobe_opts for now.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-7-jolsa@kernel.org
2021-07-15 17:59:14 -07:00
Linus Torvalds 8096acd744 Networking fixes for 5.14-rc2, including fixes from bpf and netfilter.
Current release - regressions:
 
  - sock: fix parameter order in sock_setsockopt()
 
 Current release - new code bugs:
 
  - netfilter: nft_last:
      - fix incorrect arithmetic when restoring last used
      - honor NFTA_LAST_SET on restoration
 
 Previous releases - regressions:
 
  - udp: properly flush normal packet at GRO time
 
  - sfc: ensure correct number of XDP queues; don't allow enabling the
         feature if there isn't sufficient resources to Tx from any CPU
 
  - dsa: sja1105: fix address learning getting disabled on the CPU port
 
  - mptcp: addresses a rmem accounting issue that could keep packets
         in subflow receive buffers longer than necessary, delaying
 	MPTCP-level ACKs
 
  - ip_tunnel: fix mtu calculation for ETHER tunnel devices
 
  - do not reuse skbs allocated from skbuff_fclone_cache in the napi
    skb cache, we'd try to return them to the wrong slab cache
 
  - tcp: consistently disable header prediction for mptcp
 
 Previous releases - always broken:
 
  - bpf: fix subprog poke descriptor tracking use-after-free
 
  - ipv6:
       - allocate enough headroom in ip6_finish_output2() in case
         iptables TEE is used
       - tcp: drop silly ICMPv6 packet too big messages to avoid
         expensive and pointless lookups (which may serve as a DDOS
 	vector)
       - make sure fwmark is copied in SYNACK packets
       - fix 'disable_policy' for forwarded packets (align with IPv4)
 
  - netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
 
  - netfilter: conntrack: do not mark RST in the reply direction coming
       after SYN packet for an out-of-sync entry
 
  - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
 
  - mptcp: fix double free when rejecting a join due to port mismatch
 
  - validate lwtstate->data before returning from skb_tunnel_info()
 
  - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
 
  - mt76: mt7921: continue to probe driver when fw already downloaded
 
  - bonding: fix multiple issues with offloading IPsec to (thru?) bond
 
  - stmmac: ptp: fix issues around Qbv support and setting time back
 
  - bcmgenet: always clear wake-up based on energy detection
 
 Misc:
 
  - sctp: move 198 addresses from unusable to private scope
 
  - ptp: support virtual clocks and timestamping
 
  - openvswitch: optimize operation for key comparison
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
 Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
 HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
 wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
 GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
 vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
 rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
 peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
 xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
 FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
 4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
 eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
 =QFnb
 -----END PGP SIGNATURE-----

Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski.
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - sock: fix parameter order in sock_setsockopt()

  Current release - new code bugs:

   - netfilter: nft_last:
       - fix incorrect arithmetic when restoring last used
       - honor NFTA_LAST_SET on restoration

  Previous releases - regressions:

   - udp: properly flush normal packet at GRO time

   - sfc: ensure correct number of XDP queues; don't allow enabling the
     feature if there isn't sufficient resources to Tx from any CPU

   - dsa: sja1105: fix address learning getting disabled on the CPU port

   - mptcp: addresses a rmem accounting issue that could keep packets in
     subflow receive buffers longer than necessary, delaying MPTCP-level
     ACKs

   - ip_tunnel: fix mtu calculation for ETHER tunnel devices

   - do not reuse skbs allocated from skbuff_fclone_cache in the napi
     skb cache, we'd try to return them to the wrong slab cache

   - tcp: consistently disable header prediction for mptcp

  Previous releases - always broken:

   - bpf: fix subprog poke descriptor tracking use-after-free

   - ipv6:
       - allocate enough headroom in ip6_finish_output2() in case
         iptables TEE is used
       - tcp: drop silly ICMPv6 packet too big messages to avoid
         expensive and pointless lookups (which may serve as a DDOS
         vector)
       - make sure fwmark is copied in SYNACK packets
       - fix 'disable_policy' for forwarded packets (align with IPv4)

   - netfilter: conntrack:
       - do not renew entry stuck in tcp SYN_SENT state
       - do not mark RST in the reply direction coming after SYN packet
         for an out-of-sync entry

   - mptcp: cleanly handle error conditions with MP_JOIN and syncookies

   - mptcp: fix double free when rejecting a join due to port mismatch

   - validate lwtstate->data before returning from skb_tunnel_info()

   - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path

   - mt76: mt7921: continue to probe driver when fw already downloaded

   - bonding: fix multiple issues with offloading IPsec to (thru?) bond

   - stmmac: ptp: fix issues around Qbv support and setting time back

   - bcmgenet: always clear wake-up based on energy detection

  Misc:

   - sctp: move 198 addresses from unusable to private scope

   - ptp: support virtual clocks and timestamping

   - openvswitch: optimize operation for key comparison"

* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
  net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
  sfc: add logs explaining XDP_TX/REDIRECT is not available
  sfc: ensure correct number of XDP queues
  sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
  net: fddi: fix UAF in fza_probe
  net: dsa: sja1105: fix address learning getting disabled on the CPU port
  net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
  net: Use nlmsg_unicast() instead of netlink_unicast()
  octeontx2-pf: Fix uninitialized boolean variable pps
  ipv6: allocate enough headroom in ip6_finish_output2()
  net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
  net: bridge: multicast: fix MRD advertisement router port marking race
  net: bridge: multicast: fix PIM hello router port marking race
  net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
  dsa: fix for_each_child.cocci warnings
  virtio_net: check virtqueue_add_sgs() return value
  mptcp: properly account bulk freed memory
  selftests: mptcp: fix case multiple subflows limited by server
  mptcp: avoid processing packet if a subflow reset
  mptcp: fix syncookie process if mptcp can not_accept new subflow
  ...
2021-07-14 09:24:32 -07:00
Martynas Pumputis 97eb31384a libbpf: Fix reuse of pinned map on older kernel
When loading a BPF program with a pinned map, the loader checks whether
the pinned map can be reused, i.e. their properties match. To derive
such of the pinned map, the loader invokes BPF_OBJ_GET_INFO_BY_FD and
then does the comparison.

Unfortunately, on < 4.12 kernels the BPF_OBJ_GET_INFO_BY_FD is not
available, so loading the program fails with the following error:

	libbpf: failed to get map info for map FD 5: Invalid argument
	libbpf: couldn't reuse pinned map at
		'/sys/fs/bpf/tc/globals/cilium_call_policy': parameter
		mismatch"
	libbpf: map 'cilium_call_policy': error reusing pinned map
	libbpf: map 'cilium_call_policy': failed to create:
		Invalid argument(-22)
	libbpf: failed to load object 'bpf_overlay.o'

To fix this, fallback to derivation of the map properties via
/proc/$PID/fdinfo/$MAP_FD if BPF_OBJ_GET_INFO_BY_FD fails with EINVAL,
which can be used as an indicator that the kernel doesn't support
the latter.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210712125552.58705-1-m@lambda.lt
2021-07-12 16:15:27 -07:00
Jiri Olsa afd4ad01ff libperf: Add tests for perf_evlist__set_leader()
Add a test for the newly added perf_evlist__set_leader() function.

Committer testing:

  $ cd tools/lib/perf/
  $ sudo make tests
  [sudo] password for acme:
  running static:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...OK
  - running tests/test-evsel.c...OK
  running dynamic:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...OK
  - running tests/test-evsel.c...OK
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 15:34:41 -03:00
Arnaldo Carvalho de Melo e2c18168c3 libperf: Remove BUG_ON() from library code in get_group_fd()
We shouldn't just panic, return a value that doesn't clash with what
perf_evsel__open() was already returning in case of error, i.e. errno
when sys_perf_event_open() fails.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Link: http://lore.kernel.org/lkml/YOiOA5zOtVH9IBbE@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 15:33:39 -03:00
Jiri Olsa 3fd35de168 libperf: Add group support to perf_evsel__open()
Add support to set group_fd in perf_evsel__open() and make it follow the
group setup.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:49:47 -03:00
Jiri Olsa 2e6263ab54 libperf: Adopt evlist__set_leader() from tools/perf as perf_evlist__set_leader()
Move the implementation of evlist__set_leader() to a new libperf
perf_evlist__set_leader() function with the same functionality make it a
libperf exported API.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:32 -03:00
Jiri Olsa 3a683120d8 libperf: Move 'nr_groups' from tools/perf to evlist::nr_groups
Move evsel::nr_groups to perf_evsel::nr_groups, so we can move the group
interface to libperf.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:32 -03:00
Jiri Olsa fba7c86601 libperf: Move 'leader' from tools/perf to perf_evsel::leader
Move evsel::leader to perf_evsel::leader, so we can move the group
interface to libperf.

Also add several evsel helpers to ease up the transition:

  struct evsel *evsel__leader(struct evsel *evsel);
  - get leader evsel

  bool evsel__has_leader(struct evsel *evsel, struct evsel *leader);
  - true if evsel has leader as leader

  bool evsel__is_leader(struct evsel *evsel);
  - true if evsel is itw own leader

  void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
  - set leader for evsel

Committer notes:

Fix this when building with 'make BUILD_BPF_SKEL=1'

  tools/perf/util/bpf_counter.c

  -       if (evsel->leader->core.nr_members > 1) {
  +       if (evsel->core.leader->nr_members > 1) {

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:31 -03:00
Jiri Olsa 38fe0e0156 libperf: Move 'idx' from tools/perf to perf_evsel::idx
Move evsel::idx to perf_evsel::idx, so we can move the group interface
to libperf.

Committer notes:

Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that
appeared in my tree in my local tree.

Also fixed up these:

$ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx'
tools/perf/ui/gtk/annotate.c:                      evsel->idx + i);
tools/perf/ui/gtk/annotate.c:                   evsel->idx);
$

That running 'make -C tools/perf build-test' caught.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:28 -03:00
Jiri Olsa 3d970601da libperf: Change tests to single static and shared binaries
Make tests to be two binaries 'tests_static' and 'tests_shared', so the
maintenance is easier.

Adding tests under libperf build system, so we define all the flags just
once.

Adding make-tests tule to just compile tests without running them.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Link: http://lore.kernel.org/lkml/20210706151704.73662-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-07 11:41:58 -03:00
Toke Høiland-Jørgensen af0efa050c libbpf: Restore errno return for functions that were already returning it
The update to streamline libbpf error reporting intended to change all
functions to return the errno as a negative return value if
LIBBPF_STRICT_DIRECT_ERRS is set. However, if the flag is *not* set, the
return value changes for the two functions that were already returning a
negative errno unconditionally: bpf_link__unpin() and perf_buffer__poll().

This is a user-visible API change that breaks applications; so let's revert
these two functions back to unconditionally returning a negative errno
value.

Fixes: e9fc3ce99b ("libbpf: Streamline error reporting for high-level APIs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210706122355.236082-1-toke@redhat.com
2021-07-06 21:13:08 -07:00
Linus Torvalds 406254918b perf tools changes for v5.14:
Tools:
 
 - Add cgroup support for 'perf top' (-G).
 
 - Add support for KVM MSRs in 'perf kvm stat'
 
 - Support probes on init functions in 'perf probe', to support the
   bootconfig format.
 
 - Improve error reporting in 'perf probe'.
 
 - No need to synthesize BUILD_ID records in 'perf inject' if the MMAP2
   records have build ids already.
 
 - Allow toggling source code ('s' hotkey) in 'perf annotate' in all
   lines.
 
 - Add itrace options support to 'perf annotate'.
 
 - Support to custom DSO filters for 'perf script'.
 
 Hardware enablement:
 
 - Support the HYBRID_TOPOLOGY and HYBRID_CPU_PMU_CAPS features in the
   perf.data file header.
 
 - Support PMU prefix for mem-load and mem-store events, to support
   hybrid (BIG little) CPUs such as Intel's Alderlake.
 
 - Support hybrid CPUs in 'perf mem' and 'perf c2c'.
 
 Hardware tracing:
 
 - Intel PT now supports tracing KVM guests.
 
 - Timestamp improvements for ARM's Coresight.
 
 Build:
 
 - Add 'make -C tools/perf build-test' entries for libopencsd/CORESIGHT=1
   and libbpf/LIBBPF_DYNAMIC=1.
 
 - Use bison's --file-prefix-map option to avoid storing full paths when
   using O= in the perf build.
 
 Tests:
 
 - Improve the 'perf test' entries for libpfm4 and BPF counters.
 
 Misc:
 
 - Sync msr-index.h, mount.h, kvm headers with the kernel originals.
 
 - Add vendor events and metrics for Intel's Icelake Server & Client.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYN5WhAAKCRCyPKLppCJ+
 J3b9APwO5iDjSMvVgKT84njXo1EqURMz6nmV3kkjBkaMo0KK2wEAvXysIEgwx1cu
 hakfFw63ztxVQctcWShOzP7jnJOOwwg=
 =PpGz
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.14-2021-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "Tools:

   - Add cgroup support for 'perf top' (-G).

   - Add support for KVM MSRs in 'perf kvm stat'

   - Support probes on init functions in 'perf probe', to support the
     bootconfig format.

   - Improve error reporting in 'perf probe'.

   - No need to synthesize BUILD_ID records in 'perf inject' if the
     MMAP2 records have build ids already.

   - Allow toggling source code ('s' hotkey) in 'perf annotate' in all
     lines.

   - Add itrace options support to 'perf annotate'.

   - Support to custom DSO filters for 'perf script'.

  Hardware enablement:

   - Support the HYBRID_TOPOLOGY and HYBRID_CPU_PMU_CAPS features in the
     perf.data file header.

   - Support PMU prefix for mem-load and mem-store events, to support
     hybrid (BIG little) CPUs such as Intel's Alderlake.

   - Support hybrid CPUs in 'perf mem' and 'perf c2c'.

  Hardware tracing:

   - Intel PT now supports tracing KVM guests.

   - Timestamp improvements for ARM's Coresight.

  Build:

   - Add 'make -C tools/perf build-test' entries for
     libopencsd/CORESIGHT=1 and libbpf/LIBBPF_DYNAMIC=1.

   - Use bison's --file-prefix-map option to avoid storing full paths
     when using O= in the perf build.

  Tests:

   - Improve the 'perf test' entries for libpfm4 and BPF counters.

  Misc:

   - Sync msr-index.h, mount.h, kvm headers with the kernel originals.

   - Add vendor events and metrics for Intel's Icelake Server & Client"

* tag 'perf-tools-for-v5.14-2021-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (123 commits)
  perf session: Add missing evlist__delete when deleting a session
  perf annotate: Allow 's' on source code lines
  perf dlfilter: Add object_code() to perf_dlfilter_fns
  perf dlfilter: Add attr() to perf_dlfilter_fns
  perf dlfilter: Add srcline() to perf_dlfilter_fns
  perf dlfilter: Add insn() to perf_dlfilter_fns
  perf dlfilter: Add resolve_address() to perf_dlfilter_fns
  perf build: Install perf_dlfilter.h
  perf script: Add option to pass arguments to dlfilters
  perf script: Add option to list dlfilters
  perf script: Add dlfilter__filter_event_early()
  perf script: Add API for filtering via dynamically loaded shared object
  perf llvm: Return -ENOMEM when asprintf() fails
  perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  tools headers UAPI: Synch KVM's svm.h header with the kernel
  tools kvm headers arm64: Update KVM headers from the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools include UAPI: Update linux/mount.h copy
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  ...
2021-07-02 12:24:31 -07:00
Linus Torvalds dbe69e4337 Networking changes for 5.14.
Core:
 
  - BPF:
    - add syscall program type and libbpf support for generating
      instructions and bindings for in-kernel BPF loaders (BPF loaders
      for BPF), this is a stepping stone for signed BPF programs
    - infrastructure to migrate TCP child sockets from one listener
      to another in the same reuseport group/map to improve flexibility
      of service hand-off/restart
    - add broadcast support to XDP redirect
 
  - allow bypass of the lockless qdisc to improving performance
    (for pktgen: +23% with one thread, +44% with 2 threads)
 
  - add a simpler version of "DO_ONCE()" which does not require
    jump labels, intended for slow-path usage
 
  - virtio/vsock: introduce SOCK_SEQPACKET support
 
  - add getsocketopt to retrieve netns cookie
 
  - ip: treat lowest address of a IPv4 subnet as ordinary unicast address
        allowing reclaiming of precious IPv4 addresses
 
  - ipv6: use prandom_u32() for ID generation
 
  - ip: add support for more flexible field selection for hashing
        across multi-path routes (w/ offload to mlxsw)
 
  - icmp: add support for extended RFC 8335 PROBE (ping)
 
  - seg6: add support for SRv6 End.DT46 behavior
 
  - mptcp:
     - DSS checksum support (RFC 8684) to detect middlebox meddling
     - support Connection-time 'C' flag
     - time stamping support
 
  - sctp: packetization Layer Path MTU Discovery (RFC 8899)
 
  - xfrm: speed up state addition with seq set
 
  - WiFi:
     - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
     - aggregation handling improvements for some drivers
     - minstrel improvements for no-ack frames
     - deferred rate control for TXQs to improve reaction times
     - switch from round robin to virtual time-based airtime scheduler
 
  - add trace points:
     - tcp checksum errors
     - openvswitch - action execution, upcalls
     - socket errors via sk_error_report
 
 Device APIs:
 
  - devlink: add rate API for hierarchical control of max egress rate
             of virtual devices (VFs, SFs etc.)
 
  - don't require RCU read lock to be held around BPF hooks
    in NAPI context
 
  - page_pool: generic buffer recycling
 
 New hardware/drivers:
 
  - mobile:
     - iosm: PCIe Driver for Intel M.2 Modem
     - support for Qualcomm MSM8998 (ipa)
 
  - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
 
  - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
 
  - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
 
  - NXP SJA1110 Automotive Ethernet 10-port switch
 
  - Qualcomm QCA8327 switch support (qca8k)
 
  - Mikrotik 10/25G NIC (atl1c)
 
 Driver changes:
 
  - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
    (our first foray into MAC/PHY description via ACPI)
 
  - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
 
  - Mellanox/Nvidia NIC (mlx5)
    - NIC VF offload of L2 bridging
    - support IRQ distribution to Sub-functions
 
  - Marvell (prestera):
     - add flower and match all
     - devlink trap
     - link aggregation
 
  - Netronome (nfp): connection tracking offload
 
  - Intel 1GE (igc): add AF_XDP support
 
  - Marvell DPU (octeontx2): ingress ratelimit offload
 
  - Google vNIC (gve): new ring/descriptor format support
 
  - Qualcomm mobile (rmnet & ipa): inline checksum offload support
 
  - MediaTek WiFi (mt76)
     - mt7915 MSI support
     - mt7915 Tx status reporting
     - mt7915 thermal sensors support
     - mt7921 decapsulation offload
     - mt7921 enable runtime pm and deep sleep
 
  - Realtek WiFi (rtw88)
     - beacon filter support
     - Tx antenna path diversity support
     - firmware crash information via devcoredump
 
  - Qualcomm 60GHz WiFi (wcn36xx)
     - Wake-on-WLAN support with magic packets and GTK rekeying
 
  - Micrel PHY (ksz886x/ksz8081): add cable test support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
 Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
 M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
 mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
 OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
 w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
 HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
 io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
 5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
 78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
 Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
 hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
 =LvtX
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - BPF:
      - add syscall program type and libbpf support for generating
        instructions and bindings for in-kernel BPF loaders (BPF loaders
        for BPF), this is a stepping stone for signed BPF programs
      - infrastructure to migrate TCP child sockets from one listener to
        another in the same reuseport group/map to improve flexibility
        of service hand-off/restart
      - add broadcast support to XDP redirect

   - allow bypass of the lockless qdisc to improving performance (for
     pktgen: +23% with one thread, +44% with 2 threads)

   - add a simpler version of "DO_ONCE()" which does not require jump
     labels, intended for slow-path usage

   - virtio/vsock: introduce SOCK_SEQPACKET support

   - add getsocketopt to retrieve netns cookie

   - ip: treat lowest address of a IPv4 subnet as ordinary unicast
     address allowing reclaiming of precious IPv4 addresses

   - ipv6: use prandom_u32() for ID generation

   - ip: add support for more flexible field selection for hashing
     across multi-path routes (w/ offload to mlxsw)

   - icmp: add support for extended RFC 8335 PROBE (ping)

   - seg6: add support for SRv6 End.DT46 behavior

   - mptcp:
      - DSS checksum support (RFC 8684) to detect middlebox meddling
      - support Connection-time 'C' flag
      - time stamping support

   - sctp: packetization Layer Path MTU Discovery (RFC 8899)

   - xfrm: speed up state addition with seq set

   - WiFi:
      - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
      - aggregation handling improvements for some drivers
      - minstrel improvements for no-ack frames
      - deferred rate control for TXQs to improve reaction times
      - switch from round robin to virtual time-based airtime scheduler

   - add trace points:
      - tcp checksum errors
      - openvswitch - action execution, upcalls
      - socket errors via sk_error_report

  Device APIs:

   - devlink: add rate API for hierarchical control of max egress rate
     of virtual devices (VFs, SFs etc.)

   - don't require RCU read lock to be held around BPF hooks in NAPI
     context

   - page_pool: generic buffer recycling

  New hardware/drivers:

   - mobile:
      - iosm: PCIe Driver for Intel M.2 Modem
      - support for Qualcomm MSM8998 (ipa)

   - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices

   - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches

   - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)

   - NXP SJA1110 Automotive Ethernet 10-port switch

   - Qualcomm QCA8327 switch support (qca8k)

   - Mikrotik 10/25G NIC (atl1c)

  Driver changes:

   - ACPI support for some MDIO, MAC and PHY devices from Marvell and
     NXP (our first foray into MAC/PHY description via ACPI)

   - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx

   - Mellanox/Nvidia NIC (mlx5)
      - NIC VF offload of L2 bridging
      - support IRQ distribution to Sub-functions

   - Marvell (prestera):
      - add flower and match all
      - devlink trap
      - link aggregation

   - Netronome (nfp): connection tracking offload

   - Intel 1GE (igc): add AF_XDP support

   - Marvell DPU (octeontx2): ingress ratelimit offload

   - Google vNIC (gve): new ring/descriptor format support

   - Qualcomm mobile (rmnet & ipa): inline checksum offload support

   - MediaTek WiFi (mt76)
      - mt7915 MSI support
      - mt7915 Tx status reporting
      - mt7915 thermal sensors support
      - mt7921 decapsulation offload
      - mt7921 enable runtime pm and deep sleep

   - Realtek WiFi (rtw88)
      - beacon filter support
      - Tx antenna path diversity support
      - firmware crash information via devcoredump

   - Qualcomm WiFi (wcn36xx)
      - Wake-on-WLAN support with magic packets and GTK rekeying

   - Micrel PHY (ksz886x/ksz8081): add cable test support"

* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
  tcp: change ICSK_CA_PRIV_SIZE definition
  tcp_yeah: check struct yeah size at compile time
  gve: DQO: Fix off by one in gve_rx_dqo()
  stmmac: intel: set PCI_D3hot in suspend
  stmmac: intel: Enable PHY WOL option in EHL
  net: stmmac: option to enable PHY WOL with PMT enabled
  net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
  net: use netdev_info in ndo_dflt_fdb_{add,del}
  ptp: Set lookup cookie when creating a PTP PPS source.
  net: sock: add trace for socket errors
  net: sock: introduce sk_error_report
  net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
  net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
  net: dsa: include fdb entries pointing to bridge in the host fdb list
  net: dsa: include bridge addresses which are local in the host fdb list
  net: dsa: sync static FDB entries on foreign interfaces to hardware
  net: dsa: install the host MDB and FDB entries in the master's RX filter
  net: dsa: reference count the FDB addresses at the cross-chip notifier level
  net: dsa: introduce a separate cross-chip notifier type for host FDBs
  net: dsa: reference count the MDB entries at the cross-chip notifier level
  ...
2021-06-30 15:51:09 -07:00
Alexey Bayduraev f20510d552 tools lib: Adopt bitmap_intersects() operation from the kernel sources
Adopt bitmap_intersects() routine that tests whether bitmaps bitmap1 and
bitmap2 intersects. This routine will be used during thread masks
initialization.

Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Link: http://lore.kernel.org/lkml/f75aa738d8ff8f9cffd7532d671f3ef3deb97a7c.1625065643.git.alexey.v.bayduraev@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-30 15:28:00 -03:00
Linus Torvalds 36824f198c ARM:
- Add MTE support in guests, complete with tag save/restore interface
 
 - Reduce the impact of CMOs by moving them in the page-table code
 
 - Allow device block mappings at stage-2
 
 - Reduce the footprint of the vmemmap in protected mode
 
 - Support the vGIC on dumb systems such as the Apple M1
 
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 
 - Add selftests for the debug architecture
 
 - The usual crop of PMU fixes
 
 PPC:
 
 - Support for the H_RPT_INVALIDATE hypercall
 
 - Conversion of Book3S entry/exit to C
 
 - Bug fixes
 
 S390:
 
 - new HW facilities for guests
 
 - make inline assembly more robust with KASAN and co
 
 x86:
 
 - Allow userspace to handle emulation errors (unknown instructions)
 
 - Lazy allocation of the rmap (host physical -> guest physical address)
 
 - Support for virtualizing TSC scaling on VMX machines
 
 - Optimizations to avoid shattering huge pages at the beginning of live migration
 
 - Support for initializing the PDPTRs without loading them from memory
 
 - Many TLB flushing cleanups
 
 - Refuse to load if two-stage paging is available but NX is not (this has
   been a requirement in practice for over a year)
 
 - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
   CR0/CR4/EFER, using the MMU mode everywhere once it is computed
   from the CPU registers
 
 - Use PM notifier to notify the guest about host suspend or hibernate
 
 - Support for passing arguments to Hyper-V hypercalls using XMM registers
 
 - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on
   AMD processors
 
 - Hide Hyper-V hypercalls that are not included in the guest CPUID
 
 - Fixes for live migration of virtual machines that use the Hyper-V
   "enlightened VMCS" optimization of nested virtualization
 
 - Bugfixes (not many)
 
 Generic:
 
 - Support for retrieving statistics without debugfs
 
 - Cleanups for the KVM selftests API
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4
 QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn
 iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO
 h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6
 +/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY
 QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w==
 =NcRh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This covers all architectures (except MIPS) so I don't expect any
  other feature pull requests this merge window.

  ARM:

   - Add MTE support in guests, complete with tag save/restore interface

   - Reduce the impact of CMOs by moving them in the page-table code

   - Allow device block mappings at stage-2

   - Reduce the footprint of the vmemmap in protected mode

   - Support the vGIC on dumb systems such as the Apple M1

   - Add selftest infrastructure to support multiple configuration and
     apply that to PMU/non-PMU setups

   - Add selftests for the debug architecture

   - The usual crop of PMU fixes

  PPC:

   - Support for the H_RPT_INVALIDATE hypercall

   - Conversion of Book3S entry/exit to C

   - Bug fixes

  S390:

   - new HW facilities for guests

   - make inline assembly more robust with KASAN and co

  x86:

   - Allow userspace to handle emulation errors (unknown instructions)

   - Lazy allocation of the rmap (host physical -> guest physical
     address)

   - Support for virtualizing TSC scaling on VMX machines

   - Optimizations to avoid shattering huge pages at the beginning of
     live migration

   - Support for initializing the PDPTRs without loading them from
     memory

   - Many TLB flushing cleanups

   - Refuse to load if two-stage paging is available but NX is not (this
     has been a requirement in practice for over a year)

   - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
     CR0/CR4/EFER, using the MMU mode everywhere once it is computed
     from the CPU registers

   - Use PM notifier to notify the guest about host suspend or hibernate

   - Support for passing arguments to Hyper-V hypercalls using XMM
     registers

   - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
     on AMD processors

   - Hide Hyper-V hypercalls that are not included in the guest CPUID

   - Fixes for live migration of virtual machines that use the Hyper-V
     "enlightened VMCS" optimization of nested virtualization

   - Bugfixes (not many)

  Generic:

   - Support for retrieving statistics without debugfs

   - Cleanups for the KVM selftests API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
  KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
  kvm: x86: disable the narrow guest module parameter on unload
  selftests: kvm: Allows userspace to handle emulation errors.
  kvm: x86: Allow userspace to handle emulation errors
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  KVM: x86: Enhance comments for MMU roles and nested transition trickiness
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  KVM: x86/mmu: Use MMU's role to determine PTTYPE
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  ...
2021-06-28 15:40:51 -07:00
David S. Miller e1289cfb63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).

The main changes are:

1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.

2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.

3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.

4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.

5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.

6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.

7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.

8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.

9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.

10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.

11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.

12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28 15:28:03 -07:00
Paolo Bonzini b8917b4ae4 KVM/arm64 updates for v5.14.
- Add MTE support in guests, complete with tag save/restore interface
 - Reduce the impact of CMOs by moving them in the page-table code
 - Allow device block mappings at stage-2
 - Reduce the footprint of the vmemmap in protected mode
 - Support the vGIC on dumb systems such as the Apple M1
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 - Add selftests for the debug architecture
 - The usual crop of PMU fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDV2bEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDEr8P/ivwROx5NwGcHGmU5RfUCT3aFqhtVHHwD/lu
 jPcgoO61kz9TelOu6QRaVuK+mVHxcq3iP4R8nPq/QCkUlEXTmK2xkyhXhGXSYpH4
 6jM8+BbC3eG7iAxx6H0UM4JTl4Riwat6ZZtXpWEWs9TKqOHOQYFpMkxSttwVZ1CZ
 SjbtFvXLEdzKn6PzUWnKdBNMV/mHsdAtohZit9oJOc4ttc8072XxETQ4TFQ+MSvA
 j9zY9QPmWzgcZnotqRRu9sbTGO2vxtXuUtY3sjdD8+C9OgSe9qvpnNjymcmfwaMu
 1fBkfh65oaO4ItJBdGOUOoEcFqwN5imPiI7CB/O+ZYkO9sBCuTUPSQwPkyiwXb9r
 bUkTaQw2nZiNWsqR1x07fQ2sGYbMp5mnmgmqiV4MUWkLmFp9LZATCWYTTn24cBNS
 6SjVP6/8S0r3EhLnYjH0Pn1we5PooU1EF6RlCAd3ewYoo+9fPnwjNYwIWH5i5wB7
 +tnei44NACAw9cfbos+BYQQ/dY15OSFzLzIMomlabB7OpXOdDg3H6tJnPbFwWwXb
 9nF8XdHqxeDVVVrDCAx1BSodSXm9xqgnQM2RDGTUnpVcAfqAr3MXX6VsyKQDzj8T
 QXF9qOVCBAABv6BXAvSQ6mvMJZDUVbUPEPhf7kXzF46JsRd6A7wWoU/OnMGHQ/w7
 wjvH8HVy
 =fWBV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Sean Christopherson 167f8a5cae KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans
Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
<reg>_<bit> for all CR0, CR4, and EFER bits that included in the role.
Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
not the NX bit in the corresponding PTE.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 18:00:41 -04:00
Kumar Kartikeya Dwivedi ee62a5c6bb libbpf: Switch to void * casting in netlink helpers
Netlink helpers I added in 8bbb77b7c7 ("libbpf: Add various netlink
helpers") used char * casts everywhere, and there were a few more that
existed from before.

Convert all of them to void * cast, as it is treated equivalently by
clang/gcc for the purposes of pointer arithmetic and to follow the
convention elsewhere in the kernel/libbpf.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210619041454.417577-2-memxor@gmail.com
2021-06-22 17:04:02 +02:00
Kumar Kartikeya Dwivedi 0ae64fb6b6 libbpf: Add request buffer type for netlink messages
Coverity complains about OOB writes to nlmsghdr. There is no OOB as we
write to the trailing buffer, but static analyzers and compilers may
rightfully be confused as the nlmsghdr pointer has subobject provenance
(and hence subobject bounds).

Fix this by using an explicit request structure containing the nlmsghdr,
struct tcmsg/ifinfomsg, and attribute buffer.

Also switch nh_tail (renamed to req_tail) to cast req * to char * so
that it can be understood as arithmetic on pointer to the representation
array (hence having same bound as request structure), which should
further appease analyzers.

As a bonus, callers don't have to pass sizeof(req) all the time now, as
size is implicitly obtained using the pointer. While at it, also reduce
the size of attribute buffer to 128 bytes (132 for ifinfomsg using
functions due to the padding).

Summary of problem:

  Even though C standard allows interconvertibility of pointer to first
  member and pointer to struct, for the purposes of alias analysis it
  would still consider the first as having pointer value "pointer to T"
  where T is type of first member hence having subobject bounds,
  allowing analyzers within reason to complain when object is accessed
  beyond the size of pointed to object.

  The only exception to this rule may be when a char * is formed to a
  member subobject. It is not possible for the compiler to be able to
  tell the intent of the programmer that it is a pointer to member
  object or the underlying representation array of the containing
  object, so such diagnosis is suppressed.

Fixes: 715c5ce454 ("libbpf: Add low level TC-BPF management API")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210619041454.417577-1-memxor@gmail.com
2021-06-22 17:03:52 +02:00
Jonathan Edwards 5c10a3dbe9 libbpf: Add extra BPF_PROG_TYPE check to bpf_object__probe_loading
eBPF has been backported for RHEL 7 w/ kernel 3.10-940+ [0]. However only
the following program types are supported [1]:

  BPF_PROG_TYPE_KPROBE
  BPF_PROG_TYPE_TRACEPOINT
  BPF_PROG_TYPE_PERF_EVENT

For libbpf this causes an EINVAL return during the bpf_object__probe_loading
call which only checks to see if programs of type BPF_PROG_TYPE_SOCKET_FILTER
can load.

The following will try BPF_PROG_TYPE_TRACEPOINT as a fallback attempt before
erroring out. BPF_PROG_TYPE_KPROBE was not a good candidate because on some
kernels it requires knowledge of the LINUX_VERSION_CODE.

  [0] https://www.redhat.com/en/blog/introduction-ebpf-red-hat-enterprise-linux-7
  [1] https://access.redhat.com/articles/3550581

Signed-off-by: Jonathan Edwards <jonathan.edwards@165gc.onmicrosoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210619151007.GA6963@165gc.onmicrosoft.com
2021-06-21 17:21:07 +02:00
Jakub Kicinski adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Grant Seltzer f42cfb469f bpf: Add documentation for libbpf including API autogen
This patch is meant to start the initiative to document libbpf.
It includes .rst files which are text documentation describing building,
API naming convention, as well as an index to generated API documentation.

In this approach the generated API documentation is enabled by the kernels
existing kernel documentation system which uses sphinx. The resulting docs
would then be synced to kernel.org/doc

You can test this by running `make htmldocs` and serving the html in
Documentation/output. Since libbpf does not yet have comments in kernel
doc format, see kernel.org/doc/html/latest/doc-guide/kernel-doc.html for
an example so you can test this.

The advantage of this approach is to use the existing sphinx
infrastructure that the kernel has, and have libbpf docs in
the same place as everything else.

The current plan is to have the libbpf mirror sync the generated docs
and version them based on the libbpf releases which are cut on github.

This patch includes the addition of libbpf_api.rst which pulls comment
documentation from header files in libbpf under tools/lib/bpf/. The comment
docs would be of the standard kernel doc format.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210618140459.9887-2-grantseltzer@gmail.com
2021-06-18 23:06:06 +02:00
David S. Miller a52171ae7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-17

The following pull-request contains BPF updates for your *net-next* tree.

We've added 50 non-merge commits during the last 25 day(s) which contain
a total of 148 files changed, 4779 insertions(+), 1248 deletions(-).

The main changes are:

1) BPF infrastructure to migrate TCP child sockets from a listener to another
   in the same reuseport group/map, from Kuniyuki Iwashima.

2) Add a provably sound, faster and more precise algorithm for tnum_mul() as
   noted in https://arxiv.org/abs/2105.05398, from Harishankar Vishwanathan.

3) Streamline error reporting changes in libbpf as planned out in the
   'libbpf: the road to v1.0' effort, from Andrii Nakryiko.

4) Add broadcast support to xdp_redirect_map(), from Hangbin Liu.

5) Extends bpf_map_lookup_and_delete_elem() functionality to 4 more map
   types, that is, {LRU_,PERCPU_,LRU_PERCPU_,}HASH, from Denis Salopek.

6) Support new LLVM relocations in libbpf to make them more linker friendly,
   also add a doc to describe the BPF backend relocations, from Yonghong Song.

7) Silence long standing KUBSAN complaints on register-based shifts in
   interpreter, from Daniel Borkmann and Eric Biggers.

8) Add dummy PT_REGS macros in libbpf to fail BPF program compilation when
   target arch cannot be determined, from Lorenz Bauer.

9) Extend AF_XDP to support large umems with 1M+ pages, from Magnus Karlsson.

10) Fix two minor libbpf tc BPF API issues, from Kumar Kartikeya Dwivedi.

11) Move libbpf BPF_SEQ_PRINTF/BPF_SNPRINTF macros that can be used by BPF
    programs to bpf_helpers.h header, from Florent Revest.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17 11:54:56 -07:00
Lorenz Bauer 4a638d581a libbpf: Fail compilation if target arch is missing
bpf2go is the Go equivalent of libbpf skeleton. The convention is that
the compiled BPF is checked into the repository to facilitate distributing
BPF as part of Go packages. To make this portable, bpf2go by default
generates both bpfel and bpfeb variants of the C.

Using bpf_tracing.h is inherently non-portable since the fields of
struct pt_regs differ between platforms, so CO-RE can't help us here.
The only way of working around this is to compile for each target
platform independently. bpf2go can't do this by default since there
are too many platforms.

Define the various PT_... macros when no target can be determined and
turn them into compilation failures. This works because bpf2go always
compiles for bpf targets, so the compiler fallback doesn't kick in.
Conditionally define __BPF_MISSING_TARGET so that we can inject a
more appropriate error message at build time. The user can then
choose which platform to target explicitly.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616083635.11434-1-lmb@cloudflare.com
2021-06-16 20:15:30 -07:00
Kuniyuki Iwashima 50501271e7 libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT.
This commit introduces a new section (sk_reuseport/migrate) and sets
expected_attach_type to two each section in BPF_PROG_TYPE_SK_REUSEPORT
program.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210612123224.12525-11-kuniyu@amazon.co.jp
2021-06-15 18:01:06 +02:00
Kumar Kartikeya Dwivedi bbf29d3a2e libbpf: Set NLM_F_EXCL when creating qdisc
This got lost during the refactoring across versions. We always use
NLM_F_EXCL when creating some TC object, so reflect what the function
says and set the flag.

Fixes: 715c5ce454 ("libbpf: Add low level TC-BPF management API")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210612023502.1283837-3-memxor@gmail.com
2021-06-15 14:00:30 +02:00
Kumar Kartikeya Dwivedi 4e164f8716 libbpf: Remove unneeded check for flags during tc detach
Coverity complained about this being unreachable code. It is right
because we already enforce flags to be unset, so a check validating
the flag value is redundant.

Fixes: 715c5ce454 ("libbpf: Add low level TC-BPF management API")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210612023502.1283837-2-memxor@gmail.com
2021-06-15 13:58:56 +02:00
Wang Hai 3b3af91cb6 libbpf: Simplify the return expression of bpf_object__init_maps function
There is no need for special treatment of the 'ret == 0' case.
This patch simplifies the return expression.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210609115651.3392580-1-wanghai38@huawei.com
2021-06-11 15:30:52 -07:00
Michal Suchanek edc0571c5f libbpf: Fix pr_warn type warnings on 32bit
The printed value is ptrdiff_t and is formatted wiht %ld. This works on
64bit but produces a warning on 32bit. Fix the format specifier to %td.

Fixes: 6723474373 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210604112448.32297-1-msuchanek@suse.de
2021-06-08 22:02:10 +02:00
Kev Jackson 11fc79fc9f libbpf: Fixes incorrect rx_ring_setup_done
When calling xsk_socket__create_shared(), the logic at line 1097 marks a
boolean flag true within the xsk_umem structure to track setup progress
in order to support multiple calls to the function.  However, instead of
marking umem->tx_ring_setup_done, the code incorrectly sets
umem->rx_ring_setup_done.  This leads to improper behaviour when
creating and destroying xsk and umem structures.

Multiple calls to this function is documented as supported.

Fixes: ca7a83e248 ("libbpf: Only create rx and tx XDP rings when necessary")
Signed-off-by: Kev Jackson <foamdino@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/YL4aU4f3Aaik7CN0@linux-dev
2021-06-07 17:44:03 -07:00
Andrii Nakryiko 7d8a819dd3 libbpf: Install skel_internal.h header used from light skeletons
Light skeleton code assumes skel_internal.h header to be installed system-wide
by libbpf package. Make sure it is actually installed.

Fixes: 6723474373 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-4-andrii@kernel.org
2021-06-03 15:50:11 +02:00
Andrii Nakryiko 232c9e8bd5 libbpf: Refactor header installation portions of Makefile
As we gradually get more headers that have to be installed, it's quite
annoying to copy/paste long $(call) commands. So extract that logic and do
a simple $(foreach) over the list of headers.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-3-andrii@kernel.org
2021-06-03 15:50:11 +02:00
Andrii Nakryiko 16cac00606 libbpf: Move few APIs from 0.4 to 0.5 version
Official libbpf 0.4 release doesn't include three APIs that were tentatively
put into 0.4 section. Fix libbpf.map and move these three APIs:

  - bpf_map__initial_value;
  - bpf_map_lookup_and_delete_elem_flags;
  - bpf_object__gen_loader.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210603004026.2698513-2-andrii@kernel.org
2021-06-03 15:50:05 +02:00
Jakub Kicinski 5ada57a9a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 09:55:10 -07:00
Florent Revest d6a6a55518 libbpf: Move BPF_SEQ_PRINTF and BPF_SNPRINTF to bpf_helpers.h
These macros are convenient wrappers around the bpf_seq_printf and
bpf_snprintf helpers. They are currently provided by bpf_tracing.h which
targets low level tracing primitives. bpf_helpers.h is a better fit.

The __bpf_narg and __bpf_apply are needed in both files and provided
twice. __bpf_empty isn't used anywhere and is removed from bpf_tracing.h

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210526164643.2881368-1-revest@chromium.org
2021-05-26 10:45:41 -07:00
Andrii Nakryiko e9fc3ce99b libbpf: Streamline error reporting for high-level APIs
Implement changes to error reporting for high-level libbpf APIs to make them
less surprising and less error-prone to users:
  - in all the cases when error happens, errno is set to an appropriate error
    value;
  - in libbpf 1.0 mode, all pointer-returning APIs return NULL on error and
    error code is communicated through errno; this applies both to APIs that
    already returned NULL before (so now they communicate more detailed error
    codes), as well as for many APIs that used ERR_PTR() macro and encoded
    error numbers as fake pointers.
  - in legacy (default) mode, those APIs that were returning ERR_PTR(err),
    continue doing so, but still set errno.

With these changes, errno can be always used to extract actual error,
regardless of legacy or libbpf 1.0 modes. This is utilized internally in
libbpf in places where libbpf uses it's own high-level APIs.
libbpf_get_error() is adapted to handle both cases completely transparently to
end-users (and is used by libbpf consistently as well).

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org
2021-05-25 17:32:35 -07:00
Andrii Nakryiko f12b654327 libbpf: Streamline error reporting for low-level APIs
Ensure that low-level APIs behave uniformly across the libbpf as follows:
  - in case of an error, errno is always set to the correct error code;
  - when libbpf 1.0 mode is enabled with LIBBPF_STRICT_DIRECT_ERRS option to
    libbpf_set_strict_mode(), return -Exxx error value directly, instead of -1;
  - by default, until libbpf 1.0 is released, keep returning -1 directly.

More context, justification, and discussion can be found in "Libbpf: the road
to v1.0" document ([0]).

  [0] https://docs.google.com/document/d/1UyjTZuPFWiPFyKk1tV5an11_iaRuec6U-ZESZ54nNTY

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-4-andrii@kernel.org
2021-05-25 17:32:35 -07:00
Andrii Nakryiko 5981881d21 libbpf: Add libbpf_set_strict_mode() API to turn on libbpf 1.0 behaviors
Add libbpf_set_strict_mode() API that allows application to simulate libbpf
1.0 breaking changes before libbpf 1.0 is released. This will help users
migrate gradually and with confidence.

For now only ALL or NONE options are available, subsequent patches will add
more flags. This patch is preliminary for selftests/bpf changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210525035935.1461796-2-andrii@kernel.org
2021-05-25 17:32:34 -07:00
Yonghong Song 9f0c317f6a libbpf: Add support for new llvm bpf relocations
LLVM patch https://reviews.llvm.org/D102712
narrowed the scope of existing R_BPF_64_64
and R_BPF_64_32 relocations, and added three
new relocations, R_BPF_64_ABS64, R_BPF_64_ABS32
and R_BPF_64_NODYLD32. The main motivation is
to make relocations linker friendly.

This change, unfortunately, breaks libbpf build,
and we will see errors like below:
  libbpf: ELF relo #0 in section #6 has unexpected type 2 in
     /home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o
  Error: failed to link
     '/home/yhs/work/bpf-next/tools/testing/selftests/bpf/bpf_tcp_nogpl.o':
     Unknown error -22 (-22)
The new relocation R_BPF_64_ABS64 is generated
and libbpf linker sanity check doesn't understand it.
Relocation section '.rel.struct_ops' at offset 0x1410 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000018  0000000700000002 R_BPF_64_ABS64         0000000000000000 nogpltcp_init

Look at the selftests/bpf/bpf_tcp_nogpl.c,
  void BPF_STRUCT_OPS(nogpltcp_init, struct sock *sk)
  {
  }

  SEC(".struct_ops")
  struct tcp_congestion_ops bpf_nogpltcp = {
          .init           = (void *)nogpltcp_init,
          .name           = "bpf_nogpltcp",
  };
The new llvm relocation scheme categorizes 'nogpltcp_init' reference
as R_BPF_64_ABS64 instead of R_BPF_64_64 which is used to specify
ld_imm64 relocation in the new scheme.

Let us fix the linker sanity checking by including
R_BPF_64_ABS64 and R_BPF_64_ABS32. There is no need to
check R_BPF_64_NODYLD32 which is used for .BTF and .BTF.ext.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210522162341.3687617-1-yhs@fb.com
2021-05-24 21:03:05 -07:00
Denis Salopek d59b9f2d1b bpf: Extend libbpf with bpf_map_lookup_and_delete_elem_flags
Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
the BPF_F_LOCK flag with the map_lookup_and_delete_elem() function.

Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/15b05dafe46c7e0750d110f233977372029d1f62.1620763117.git.denis.salopek@sartura.hr
2021-05-24 13:30:51 -07:00
Stanislav Fomichev f9bceaa59c libbpf: Skip bpf_object__probe_loading for light skeleton
I'm getting the following error when running 'gen skeleton -L' as
regular user:

libbpf: Error in bpf_object__probe_loading():Operation not permitted(1).
Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.

Fixes: 6723474373 ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210521030653.2626513-1-sdf@google.com
2021-05-24 12:22:09 -07:00
Alexei Starovoitov 5d67f34959 bpf: Add cmd alias BPF_PROG_RUN
Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
2021-05-19 15:35:12 +02:00
Alexei Starovoitov 7723256bf2 libbpf: Introduce bpf_map__initial_value().
Introduce bpf_map__initial_value() to read initial contents
of mmaped data/rodata/bss maps.
Note that bpf_map__set_initial_value() doesn't allow modifying
kconfig map while bpf_map__initial_value() allows reading
its values.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-17-alexei.starovoitov@gmail.com
2021-05-19 00:40:56 +02:00
Alexei Starovoitov 30f51aedab libbpf: Cleanup temp FDs when intermediate sys_bpf fails.
Fix loader program to close temporary FDs when intermediate
sys_bpf command fails.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-16-alexei.starovoitov@gmail.com
2021-05-19 00:40:44 +02:00
Alexei Starovoitov 6723474373 libbpf: Generate loader program out of BPF ELF file.
The BPF program loading process performed by libbpf is quite complex
and consists of the following steps:
"open" phase:
- parse elf file and remember relocations, sections
- collect externs and ksyms including their btf_ids in prog's BTF
- patch BTF datasec (since llvm couldn't do it)
- init maps (old style map_def, BTF based, global data map, kconfig map)
- collect relocations against progs and maps
"load" phase:
- probe kernel features
- load vmlinux BTF
- resolve externs (kconfig and ksym)
- load program BTF
- init struct_ops
- create maps
- apply CO-RE relocations
- patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID
- reposition subprograms and adjust call insns
- sanitize and load progs

During this process libbpf does sys_bpf() calls to load BTF, create maps,
populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map with:
- union bpf_attr(s)
- BTF bytes
- map value bytes
- insns bytes
and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.

kconfig, typeless ksym, struct_ops and CO-RE are not supported yet.

The order of relocate_data and relocate_calls had to change, so that
bpf_gen__prog_load() can see all relocations for a given program with
correct insn_idx-es.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
2021-05-19 00:39:40 +02:00
Alexei Starovoitov e2fa0156a4 libbpf: Preliminary support for fd_idx
Prep libbpf to use FD_IDX kernel feature when generating loader program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-14-alexei.starovoitov@gmail.com
2021-05-19 00:33:41 +02:00
Alexei Starovoitov 9ca1f56aba libbpf: Add bpf_object pointer to kernel_supports().
Add a pointer to 'struct bpf_object' to kernel_supports() helper.
It will be used in the next patch.
No functional changes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-13-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov b126882672 libbpf: Change the order of data and text relocations.
In order to be able to generate loader program in the later
patches change the order of data and text relocations.
Also improve the test to include data relos.

If the kernel supports "FD array" the map_fd relocations can be processed
before text relos since generated loader program won't need to manually
patch ld_imm64 insns with map_fd.
But ksym and kfunc relocations can only be processed after all calls
are relocated, since loader program will consist of a sequence
of calls to bpf_btf_find_by_name_kind() followed by patching of btf_id
and btf_obj_fd into corresponding ld_imm64 insns. The locations of those
ld_imm64 insns are specified in relocations.
Hence process all data relocations (maps, ksym, kfunc) together after call relos.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-12-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Alexei Starovoitov 5452fc9a17 libbpf: Support for syscall program type
Trivial support for syscall program type.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-5-alexei.starovoitov@gmail.com
2021-05-19 00:33:40 +02:00
Kumar Kartikeya Dwivedi 715c5ce454 libbpf: Add low level TC-BPF management API
This adds functions that wrap the netlink API used for adding, manipulating,
and removing traffic control filters.

The API summary:

A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
This means that creating a hook leads to creation of the backing qdisc,
while destruction either removes all filters attached to a hook, or destroys
qdisc if requested explicitly (as discussed below).

The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters. All functions return 0 on success, and a
negative error code on failure.

bpf_tc_hook_create - Create a hook
Parameters:
	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
		proper enum constant. Note that parent must be unset when
		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
		valid value for attach_point.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_hook_destroy - Destroy a hook
Parameters:
	@hook - Cannot be NULL. The behaviour depends on value of
		attach_point. If BPF_TC_INGRESS, all filters attached to
		the ingress hook will be detached. If BPF_TC_EGRESS, all
		filters attached to the egress hook will be detached. If
		BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
		deleted, also detaching all filters. As before, parent must
		be unset for these attach_points, and set for BPF_TC_CUSTOM.

		It is advised that if the qdisc is operated on by many programs,
		then the program at least check that there are no other existing
		filters before deleting the clsact qdisc. An example is shown
		below:

		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
				    .attach_point = BPF_TC_INGRESS);
		/* set opts as NULL, as we're not really interested in
		 * getting any info for a particular filter, but just
	 	 * detecting its presence.
		 */
		r = bpf_tc_query(&hook, NULL);
		if (r == -ENOENT) {
			/* no filters */
			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
			return bpf_tc_hook_destroy(&hook);
		} else {
			/* failed or r == 0, the latter means filters do exist */
			return r;
		}

		Note that there is a small race between checking for no
		filters and deleting the qdisc. This is currently unavoidable.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_attach - Attach a filter to a hook
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		attached to. Requirements for ifindex and attach_point are
		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
		is also supported.  In that case, parent must be set to the
		handle where the filter will be attached (using BPF_TC_PARENT).
		E.g. to set parent to 1:16 like in tc command line, the
		equivalent would be BPF_TC_PARENT(1, 16).

	@opts - Cannot be NULL. The following opts are optional:
		* handle   - The handle of the filter
		* priority - The priority of the filter
			     Must be >= 0 and <= UINT16_MAX
		Note that when left unset, they will be auto-allocated by
		the kernel. The following opts must be set:
		* prog_fd - The fd of the loaded SCHED_CLS prog
		The following opts must be unset:
		* prog_id - The ID of the BPF prog
		The following opts are optional:
		* flags - Currently only BPF_TC_F_REPLACE is allowed. It
			  allows replacing an existing filter instead of
			  failing with -EEXIST.
		The following opts will be filled by bpf_tc_attach on a
		successful attach operation if they are unset:
		* handle   - The handle of the attached filter
		* priority - The priority of the attached filter
		* prog_id  - The ID of the attached SCHED_CLS prog
		This way, the user can know what the auto allocated values
		for optional opts like handle and priority are for the newly
		attached filter, if they were unset.

		Note that some other attributes are set to fixed default
		values listed below (this holds for all bpf_tc_* APIs):
		protocol as ETH_P_ALL, direct action mode, chain index of 0,
		and class ID of 0 (this can be set by writing to the
		skb->tc_classid field from the BPF program).

bpf_tc_detach
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		detached from. Requirements are same as described above
		in bpf_tc_attach.

	@opts - Cannot be NULL. The following opts must be set:
		* handle, priority
		The following opts must be unset:
		* prog_fd, prog_id, flags

bpf_tc_query
Parameters:
	@hook - Cannot be NULL. Represents the hook where the filter lookup will
		be performed. Requirements are same as described above in
		bpf_tc_attach().

	@opts - Cannot be NULL. The following opts must be set:
		* handle, priority
		The following opts must be unset:
		* prog_fd, prog_id, flags
		The following fields will be filled by bpf_tc_query upon a
		successful lookup:
		* prog_id

Some usage examples (using BPF skeleton infrastructure):

BPF program (test_tc_bpf.c):

	#include <linux/bpf.h>
	#include <bpf/bpf_helpers.h>

	SEC("classifier")
	int cls(struct __sk_buff *skb)
	{
		return 0;
	}

Userspace loader:

	struct test_tc_bpf *skel = NULL;
	int fd, r;

	skel = test_tc_bpf__open_and_load();
	if (!skel)
		return -ENOMEM;

	fd = bpf_program__fd(skel->progs.cls);

	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
			    if_nametoindex("lo"), .attach_point =
			    BPF_TC_INGRESS);
	/* Create clsact qdisc */
	r = bpf_tc_hook_create(&hook);
	if (r < 0)
		goto end;

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
	r = bpf_tc_attach(&hook, &opts);
	if (r < 0)
		goto end;
	/* Print the auto allocated handle and priority */
	printf("Handle=%u", opts.handle);
	printf("Priority=%u", opts.priority);

	opts.prog_fd = opts.prog_id = 0;
	bpf_tc_detach(&hook, &opts);
end:
	test_tc_bpf__destroy(skel);

This is equivalent to doing the following using tc command line:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da
  # tc filter del dev lo ingress handle <h> prio <p> bpf
... where the handle and priority can be found using:
  # tc filter show dev lo ingress

Another example replacing a filter (extending prior example):

	/* We can also choose both (or one), let's try replacing an
	 * existing filter.
	 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
			    opts.handle, .priority = opts.priority,
			    .prog_fd = fd);
	r = bpf_tc_attach(&hook, &replace_opts);
	if (r == -EEXIST) {
		/* Expected, now use BPF_TC_F_REPLACE to replace it */
		replace_opts.flags = BPF_TC_F_REPLACE;
		return bpf_tc_attach(&hook, &replace_opts);
	} else if (r < 0) {
		return r;
	}
	/* There must be no existing filter with these
	 * attributes, so cleanup and return an error.
	 */
	replace_opts.prog_fd = replace_opts.prog_id = 0;
	bpf_tc_detach(&hook, &replace_opts);
	return -1;

To obtain info of a particular filter:

	/* Find info for filter with handle 1 and priority 50 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
			    .priority = 50);
	r = bpf_tc_query(&hook, &info_opts);
	if (r == -ENOENT)
		printf("Filter not found");
	else if (r < 0)
		return r;
	printf("Prog ID: %u", info_opts.prog_id);
	return 0;

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
[ Daniel: also did major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
2021-05-17 17:48:45 +02:00
Kumar Kartikeya Dwivedi 8bbb77b7c7 libbpf: Add various netlink helpers
This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
is useful to wrap send + recv handling in a generic way. Subsequent patch
will also use this function for sending and receiving a netlink response.
The libbpf_nl_get_link() helper has been removed instead, moving socket
creation into the newly named libbpf_netlink_send_recv().

Every nested attribute's closure must happen using the helper
nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested() helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

  struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
  } req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with these
helpers. Note that the only failure the internal call to nlattr_add()
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

The libbpf_netlink_send_recv() call takes care of opening the socket,
sending the netlink message, receiving the response, potentially invoking
callbacks, and return errors if any, and then finally close the socket.
This allows users to avoid identical socket setup code in different places.
The only user of libbpf_nl_get_link() has been converted to make use of it.
__bpf_set_link_xdp_fd_replace() has also been refactored to use it.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
[ Daniel: major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
2021-05-17 17:48:36 +02:00
Andrii Nakryiko 513f485ca5 libbpf: Reject static entry-point BPF programs
Detect use of static entry-point BPF programs (those with SEC() markings) and
emit error message. This is similar to
c1cccec9c6 ("libbpf: Reject static maps") but for BPF programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210514195534.1440970-1-andrii@kernel.org
2021-05-14 16:11:09 -07:00
Andrii Nakryiko c1cccec9c6 libbpf: Reject static maps
Static maps never really worked with libbpf, because all such maps were always
silently resolved to the very first map. Detect static maps (both legacy and
BTF-defined) and report user-friendly error.

Tested locally by switching few maps (legacy and BTF-defined) in selftests to
static ones and verifying that now libbpf rejects them loudly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-2-andrii@kernel.org
2021-05-13 17:23:58 -07:00
David S. Miller df6f823703 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-05-11

The following pull-request contains BPF updates for your *net* tree.

We've added 13 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 817 insertions(+), 382 deletions(-).

The main changes are:

1) Fix multiple ringbuf bugs in particular to prevent writable mmap of
   read-only pages, from Andrii Nakryiko & Thadeu Lima de Souza Cascardo.

2) Fix verifier alu32 known-const subregister bound tracking for bitwise
   operations and/or/xor, from Daniel Borkmann.

3) Reject trampoline attachment for functions with variable arguments,
   and also add a deny list of other forbidden functions, from Jiri Olsa.

4) Fix nested bpf_bprintf_prepare() calls used by various helpers by
   switching to per-CPU buffers, from Florent Revest.

5) Fix kernel compilation with BTF debug info on ppc64 due to pahole
   missing TCP-CC functions like cubictcp_init, from Martin KaFai Lau.

6) Add a kconfig entry to provide an option to disallow unprivileged
   BPF by default, from Daniel Borkmann.

7) Fix libbpf compilation for older libelf when GELF_ST_VISIBILITY()
   macro is not available, from Arnaldo Carvalho de Melo.

8) Migrate test_tc_redirect to test_progs framework as prep work
   for upcoming skb_change_head() fix & selftest, from Jussi Maki.

9) Fix a libbpf segfault in add_dummy_ksym_var() if BTF is not
   present, from Ian Rogers.

10) Fix tx_only micro-benchmark in xdpsock BPF sample with proper frame
    size, from Magnus Karlsson.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:05:56 -07:00
Andrii Nakryiko e5670fa029 libbpf: Treat STV_INTERNAL same as STV_HIDDEN for functions
Do the same global -> static BTF update for global functions with STV_INTERNAL
visibility to turn on static BPF verification mode.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-7-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko 247b8634e6 libbpf: Fix ELF symbol visibility update logic
Fix silly bug in updating ELF symbol's visibility.

Fixes: a46349227c ("libbpf: Add linker extern resolution support for functions and global variables")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-6-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Andrii Nakryiko fdbf5ddeb8 libbpf: Add per-file linker opts
For better future extensibility add per-file linker options. Currently
the set of available options is empty. This changes bpf_linker__add_file()
API, but it's not a breaking change as bpf_linker APIs hasn't been released
yet.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-3-andrii@kernel.org
2021-05-11 15:07:17 -07:00
Arnaldo Carvalho de Melo 67e7ec0bd4 libbpf: Provide GELF_ST_VISIBILITY() define for older libelf
Where that macro isn't available.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/YJaspEh0qZr4LYOc@kernel.org
2021-05-11 23:07:33 +02:00
Linus Torvalds fc858a5231 Networking fixes for 5.13-rc1, including fixes from bpf, can
and netfilter trees. Self-contained fixes, nothing risky.
 
 Current release - new code bugs:
 
  - dsa: ksz: fix a few bugs found by static-checker in the new driver
 
  - stmmac: fix frame preemption handshake not triggering after
            interface restart
 
 Previous releases - regressions:
 
  - make nla_strcmp handle more then one trailing null character
 
  - fix stack OOB reads while fragmenting IPv4 packets in openvswitch
    and net/sched
 
  - sctp: do asoc update earlier in sctp_sf_do_dupcook_a
 
  - sctp: delay auto_asconf init until binding the first addr
 
  - stmmac: clear receive all(RA) bit when promiscuous mode is off
 
  - can: mcp251x: fix resume from sleep before interface was brought up
 
 Previous releases - always broken:
 
  - bpf: fix leakage of uninitialized bpf stack under speculation
 
  - bpf: fix masking negation logic upon negative dst register
 
  - netfilter: don't assume that skb_header_pointer() will never fail
 
  - only allow init netns to set default tcp cong to a restricted algo
 
  - xsk: fix xp_aligned_validate_desc() when len == chunk_size to
         avoid false positive errors
 
  - ethtool: fix missing NLM_F_MULTI flag when dumping
 
  - can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
 
  - sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
 
  - bridge: fix NULL-deref caused by a races between assigning
            rx_handler_data and setting the IFF_BRIDGE_PORT bit
 
 Latecomer:
 
  - seg6: add counters support for SRv6 Behaviors
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCV3YoACgkQMUZtbf5S
 IrsQ2w//Q8/qbl6wGTKUfu6DZHYUU5j5sTwiHR823PKKSgXI+okWMN0KUlZszOsz
 qnPkH6GuojRooOE1s8PFLSlt9axKhQ0y7uzMTrWYafQ+JZTtgg9/MiPxQ8fdiE5i
 uOG1ngttZ+1jlE5tMPL4GAOSegg3rWVDclzqnJTdsPPOco3MWj6SL9xN0LDPxCEL
 BDysRqL/UiOIoh4v6IXQRx2UWjsNGu4biM1po+Jfumnd9T0zKoEpzu6UN6yPShbx
 284LihZSQtughCbhGqkErBOxfjZcvpFOQrqmjEvI+Z/eYg4InfWZemt8Sa92/alE
 yAFjK76MUTaUxaAO/gk8XauhvkYOzJJwKpqhbOmlaM7oj55QdzT5/8JxMxVoA6hV
 pscHOixk15GVse49PdPV8v47cyTLc/Xi69i+/uUdNVVfuORL1wft1w1xbd0S6Pbe
 7Gqax21S7zxcDsrUli7cFheYiqtbQAL0anlIUz8tUOZFz0VQ/zPuFd4rUYZ/o38V
 Mrevdk3t6CXNxS4CRXyUW4UejYB1O6Qw12sUue31e3h73d6LiN3NAiN5Qp7SEk1/
 fvk+jfOf8vvmtimYvcUK2i0D+vqj4Ec/qRIE/XXuUDBcp22tPL9uWMfWavwTdAj1
 Se4SzksTWF+NM0lO0ItonMyPh3ZXcSLhIv/gHrZwEKuWkXCGO4M=
 =JmWS
 -----END PGP SIGNATURE-----

Merge tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.13-rc1, including fixes from bpf, can and
  netfilter trees. Self-contained fixes, nothing risky.

  Current release - new code bugs:

   - dsa: ksz: fix a few bugs found by static-checker in the new driver

   - stmmac: fix frame preemption handshake not triggering after
     interface restart

  Previous releases - regressions:

   - make nla_strcmp handle more then one trailing null character

   - fix stack OOB reads while fragmenting IPv4 packets in openvswitch
     and net/sched

   - sctp: do asoc update earlier in sctp_sf_do_dupcook_a

   - sctp: delay auto_asconf init until binding the first addr

   - stmmac: clear receive all(RA) bit when promiscuous mode is off

   - can: mcp251x: fix resume from sleep before interface was brought up

  Previous releases - always broken:

   - bpf: fix leakage of uninitialized bpf stack under speculation

   - bpf: fix masking negation logic upon negative dst register

   - netfilter: don't assume that skb_header_pointer() will never fail

   - only allow init netns to set default tcp cong to a restricted algo

   - xsk: fix xp_aligned_validate_desc() when len == chunk_size to avoid
     false positive errors

   - ethtool: fix missing NLM_F_MULTI flag when dumping

   - can: m_can: m_can_tx_work_queue(): fix tx_skb race condition

   - sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b

   - bridge: fix NULL-deref caused by a races between assigning
     rx_handler_data and setting the IFF_BRIDGE_PORT bit

  Latecomer:

   - seg6: add counters support for SRv6 Behaviors"

* tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
  atm: firestream: Use fallthrough pseudo-keyword
  net: stmmac: Do not enable RX FIFO overflow interrupts
  mptcp: fix splat when closing unaccepted socket
  i40e: Remove LLDP frame filters
  i40e: Fix PHY type identifiers for 2.5G and 5G adapters
  i40e: fix the restart auto-negotiation after FEC modified
  i40e: Fix use-after-free in i40e_client_subtask()
  i40e: fix broken XDP support
  netfilter: nftables: avoid potential overflows on 32bit arches
  netfilter: nftables: avoid overflows in nft_hash_buckets()
  tcp: Specify cmsgbuf is user pointer for receive zerocopy.
  mlxsw: spectrum_mr: Update egress RIF list before route's action
  net: ipa: fix inter-EE IRQ register definitions
  can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
  can: mcp251x: fix resume from sleep before interface was brought up
  can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
  can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
  netfilter: nftables: Fix a memleak from userdata error path in new objects
  netfilter: remove BUG_ON() after skb_header_pointer()
  netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
  ...
2021-05-08 08:31:46 -07:00
Linus Torvalds a48b0872e6 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "This is everything else from -mm for this merge window.

  90 patches.

  Subsystems affected by this patch series: mm (cleanups and slub),
  alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat,
  checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov,
  panic, delayacct, gdb, resource, selftests, async, initramfs, ipc,
  drivers/char, and spelling"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits)
  mm: fix typos in comments
  mm: fix typos in comments
  treewide: remove editor modelines and cruft
  ipc/sem.c: spelling fix
  fs: fat: fix spelling typo of values
  kernel/sys.c: fix typo
  kernel/up.c: fix typo
  kernel/user_namespace.c: fix typos
  kernel/umh.c: fix some spelling mistakes
  include/linux/pgtable.h: few spelling fixes
  mm/slab.c: fix spelling mistake "disired" -> "desired"
  scripts/spelling.txt: add "overflw"
  scripts/spelling.txt: Add "diabled" typo
  scripts/spelling.txt: add "overlfow"
  arm: print alloc free paths for address in registers
  mm/vmalloc: remove vwrite()
  mm: remove xlate_dev_kmem_ptr()
  drivers/char: remove /dev/kmem for good
  mm: fix some typos and code style problems
  ipc/sem.c: mundane typo fixes
  ...
2021-05-07 00:34:51 -07:00
Yury Norov eaae7841ba tools: sync lib/find_bit implementation
Add fast paths to find_*_bit() functions as per kernel implementation.

Link: https://lkml.kernel.org/r/20210401003153.97325-12-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06 19:24:12 -07:00
Yury Norov ea81c1ef44 tools: sync find_next_bit implementation
Sync the implementation with recent kernel changes.

Link: https://lkml.kernel.org/r/20210401003153.97325-9-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06 19:24:12 -07:00
Yury Norov e5b9252d90 tools: bitmap: sync function declarations with the kernel
Some functions in tools/include/linux/bitmap.h declare nbits as int.  In
the kernel nbits is declared as unsigned int.

Link: https://lkml.kernel.org/r/20210401003153.97325-3-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-06 19:24:11 -07:00
Ian Rogers 9683e5775c libbpf: Add NULL check to add_dummy_ksym_var
Avoids a segv if btf isn't present. Seen on the call path
__bpf_object__open calling bpf_object__collect_externs.

Fixes: 5bd022ec01 (libbpf: Support extern kernel function)
Suggested-by: Stanislav Fomichev <sdf@google.com>
Suggested-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210504234910.976501-1-irogers@google.com
2021-05-05 11:29:33 -07:00
Brendan Jackman 2a30f94406 libbpf: Fix signed overflow in ringbuf_process_ring
One of our benchmarks running in (Google-internal) CI pushes data
through the ringbuf faster htan than userspace is able to consume
it. In this case it seems we're actually able to get >INT_MAX entries
in a single ring_buffer__consume() call. ASAN detected that cnt
overflows in this case.

Fix by using 64-bit counter internally and then capping the result to
INT_MAX before converting to the int return type. Do the same for
the ring_buffer__poll().

Fixes: bf99c936f9 (libbpf: Add BPF ring buffer support)
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210429130510.1621665-1-jackmanb@google.com
2021-05-03 09:54:12 -07:00
Linus Torvalds 10a3efd0fe perf tools changes for v5.13: 1st batch
perf stat:
 
 - Add support for hybrid PMUs to support systems such as Intel Alderlake
   and its BIG/little core/atom cpus.
 
 - Introduce 'bperf' to share hardware PMCs with BPF.
 
 - New --iostat option to collect and present IO stats on Intel hardware.
 
   This functionality is based on recently introduced sysfs attributes
   for Intel® Xeon® Scalable processor family (code name Skylake-SP):
 
     commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
 
   It is intended to provide four I/O performance metrics in MB per each
   PCIe root port:
 
    - Inbound Read: I/O devices below root port read from the host memory
    - Inbound Write: I/O devices below root port write to the host memory
    - Outbound Read: CPU reads from I/O devices below root port
    - Outbound Write: CPU writes to I/O devices below root port
 
 - Align CSV output for summary.
 
 - Clarify --null use cases: Assess raw overhead of 'perf stat' or
   measure just wall clock time.
 
 - Improve readability of shadow stats.
 
 perf record:
 
 - Change the COMM when starting tha workload so that --exclude-perf
   doesn't seem to be not honoured.
 
 - Improve 'Workload failed' message printing events + what was exec'ed.
 
 - Fix cross-arch support for TIME_CONV.
 
 perf report:
 
 - Add option to disable raw event ordering.
 
 - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'.
 
 - Improvements to --stat output, that shows information about PERF_RECORD_ events.
 
 - Preserve identifier id in OCaml demangler.
 
 perf annotate:
 
 - Show full source location with 'l' hotkey in the 'perf annotate' TUI.
 
 - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode.
 
 - Add --demangle and --demangle-kernel to 'perf annotate'.
 
 - Allow configuring annotate.demangle{,_kernel} in 'perf config'.
 
 - Fix sample events lost in stdio mode.
 
 perf data:
 
 - Allow converting a perf.data file to JSON.
 
 libperf:
 
 - Add support for user space counter access.
 
 - Update topdown documentation to permit rdpmc calls.
 
 perf test:
 
 - Add 'perf test' for 'perf stat' CSV output.
 
 - Add 'perf test' entries to test the hybrid PMU support.
 
 - Cleanup 'perf test daemon' if its 'perf test' is interrupted.
 
 - Handle metric reuse in pmu-events parsing 'perf test' entry.
 
 - Add test for PE executable support.
 
 - Add timeout for wait for daemon start in its 'perf test' entries.
 
 Build:
 
 - Enable libtraceevent dynamic linking.
 
 - Improve feature detection output.
 
 - Fix caching of feature checks caching.
 
 - First round of updates for tools copies of kernel headers.
 
 - Enable warnings when compiling BPF programs.
 
 Vendor specific events:
 
 Intel:
 
 - Add missing skylake & icelake model numbers.
 
 arm64:
 
 - Add Hisi hip08 L1, L2 and L3 metrics.
 
 - Add Fujitsu A64FX PMU events.
 
 PowerPC:
 
 - Initial JSON/events list for power10 platform.
 
 - Remove unsupported power9 metrics.
 
 AMD:
 
 - Add Zen3 events.
 
 - Fix broken L2 Cache Hits from L2 HWPF metric.
 
 - Use lowercases for all the eventcodes and umasks.
 
 Hardware tracing:
 
 arm64:
 
 - Update CoreSight ETM metadata format.
 
 - Fix bitmap for CS-ETM option.
 
 - Support PID tracing in config.
 
 - Detect pid in VMID for kernel running at EL2.
 
 Arch specific:
 
 MIPS:
 
 - Support MIPS unwinding and dwarf-regs.
 
 - Generate mips syscalls_n64.c syscall table.
 
 PowerPC:
 
 - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC.
 
 - Support pipeline stage cycles for powerpc.
 
 libbeauty:
 
 - Fix fsconfig generator.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYIshAwAKCRCyPKLppCJ+
 J8oWAP9c1POclDQ7AZDe5/t/InZYSQKJFIku1sE1SNCSOupy7wEAuPBtaN7wDaRj
 BFBibfUGd4MNzLPvMMHneIhSY3DgJwg=
 =FLLr
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "perf stat:

   - Add support for hybrid PMUs to support systems such as Intel
     Alderlake and its BIG/little core/atom cpus.

   - Introduce 'bperf' to share hardware PMCs with BPF.

   - New --iostat option to collect and present IO stats on Intel
     hardware.

     This functionality is based on recently introduced sysfs attributes
     for Intel® Xeon® Scalable processor family (code name Skylake-SP)
     in commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore
     unit to IIO PMON mapping")

     It is intended to provide four I/O performance metrics in MB per
     each PCIe root port:

       - Inbound Read: I/O devices below root port read from the host memory
       - Inbound Write: I/O devices below root port write to the host memory
       - Outbound Read: CPU reads from I/O devices below root port
       - Outbound Write: CPU writes to I/O devices below root port

   - Align CSV output for summary.

   - Clarify --null use cases: Assess raw overhead of 'perf stat' or
     measure just wall clock time.

   - Improve readability of shadow stats.

  perf record:

   - Change the COMM when starting tha workload so that --exclude-perf
     doesn't seem to be not honoured.

   - Improve 'Workload failed' message printing events + what was
     exec'ed.

   - Fix cross-arch support for TIME_CONV.

  perf report:

   - Add option to disable raw event ordering.

   - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'.

   - Improvements to --stat output, that shows information about
     PERF_RECORD_ events.

   - Preserve identifier id in OCaml demangler.

  perf annotate:

   - Show full source location with 'l' hotkey in the 'perf annotate'
     TUI.

   - Add line number like in TUI and source location at EOL to the 'perf
     annotate' --stdio mode.

   - Add --demangle and --demangle-kernel to 'perf annotate'.

   - Allow configuring annotate.demangle{,_kernel} in 'perf config'.

   - Fix sample events lost in stdio mode.

  perf data:

   - Allow converting a perf.data file to JSON.

  libperf:

   - Add support for user space counter access.

   - Update topdown documentation to permit rdpmc calls.

  perf test:

   - Add 'perf test' for 'perf stat' CSV output.

   - Add 'perf test' entries to test the hybrid PMU support.

   - Cleanup 'perf test daemon' if its 'perf test' is interrupted.

   - Handle metric reuse in pmu-events parsing 'perf test' entry.

   - Add test for PE executable support.

   - Add timeout for wait for daemon start in its 'perf test' entries.

  Build:

   - Enable libtraceevent dynamic linking.

   - Improve feature detection output.

   - Fix caching of feature checks caching.

   - First round of updates for tools copies of kernel headers.

   - Enable warnings when compiling BPF programs.

  Vendor specific events:

   - Intel:
      - Add missing skylake & icelake model numbers.

   - arm64:
      - Add Hisi hip08 L1, L2 and L3 metrics.
      - Add Fujitsu A64FX PMU events.

   - PowerPC:
      - Initial JSON/events list for power10 platform.
      - Remove unsupported power9 metrics.

   - AMD:
      - Add Zen3 events.
      - Fix broken L2 Cache Hits from L2 HWPF metric.
      - Use lowercases for all the eventcodes and umasks.

  Hardware tracing:

   - arm64:
      - Update CoreSight ETM metadata format.
      - Fix bitmap for CS-ETM option.
      - Support PID tracing in config.
      - Detect pid in VMID for kernel running at EL2.

  Arch specific updates:

   - MIPS:
      - Support MIPS unwinding and dwarf-regs.
      - Generate mips syscalls_n64.c syscall table.

   - PowerPC:
      - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC.
      - Support pipeline stage cycles for powerpc.

  libbeauty:

   - Fix fsconfig generator"

* tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits)
  perf build: Defer printing detected features to the end of all feature checks
  tools build: Allow deferring printing the results of feature detection
  perf build: Regenerate the FEATURE_DUMP file after extra feature checks
  perf session: Dump PERF_RECORD_TIME_CONV event
  perf session: Add swap operation for event TIME_CONV
  perf jit: Let convert_timestamp() to be backwards-compatible
  perf tools: Change fields type in perf_record_time_conv
  perf tools: Enable libtraceevent dynamic linking
  perf Documentation: Document intel-hybrid support
  perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
  perf tests: Support 'Convert perf time to TSC' test for hybrid
  perf tests: Support 'Session topology' test for hybrid
  perf tests: Support 'Parse and process metrics' test for hybrid
  perf tests: Support 'Track with sched_switch' test for hybrid
  perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  perf tests: Add hybrid cases for 'Parse event definition strings' test
  perf record: Uniquify hybrid event name
  perf stat: Warn group events from different hybrid PMU
  perf stat: Filter out unmatched aggregation for hybrid event
  ...
2021-05-01 12:22:38 -07:00
Leo Yan aa616f5a8a perf jit: Let convert_timestamp() to be backwards-compatible
Commit d110162caf ("perf tsc: Support cap_user_time_short for
event TIME_CONV") supports the extended parameters for event TIME_CONV,
but it broke the backwards compatibility, so any perf data file with old
event format fails to convert timestamp.

This patch introduces a helper event_contains() to check if an event
contains a specific member or not.  For the backwards-compatibility, if
the event size confirms the extended parameters are supported in the
event TIME_CONV, then copies these parameters.

Committer notes:

To make this compiler backwards compatible add this patch:

  -       struct perf_tsc_conversion tc = { 0 };
  +       struct perf_tsc_conversion tc = { .time_shift = 0, };

Fixes: d110162caf ("perf tsc: Support cap_user_time_short for event TIME_CONV")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steve MacLean <Steve.MacLean@Microsoft.com>
Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Link: https://lore.kernel.org/r/20210428120915.7123-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-29 10:31:00 -03:00
Leo Yan e1d380ea8b perf tools: Change fields type in perf_record_time_conv
C standard claims "An object declared as type _Bool is large enough to
store the values 0 and 1", bool type size can be 1 byte or larger than
1 byte.  Thus it's uncertian for bool type size with different
compilers.

This patch changes the bool type in structure perf_record_time_conv to
__u8 type, and pads extra bytes for 8-byte alignment; this can give
reliable structure size.

Fixes: d110162caf ("perf tsc: Support cap_user_time_short for event TIME_CONV")
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steve MacLean <Steve.MacLean@Microsoft.com>
Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Link: https://lore.kernel.org/r/20210428120915.7123-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-29 10:31:00 -03:00
Song Liu ec8149fba6 perf util: Move bpf_perf definitions to a libperf header
By following the same protocol, other tools can share hardware PMCs with
perf. Move perf_event_attr_map_entry and BPF_PERF_DEFAULT_ATTR_MAP_PATH to
bpf_perf.h for other tools to use.

Signed-off-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: kernel-team@fb.com
Link: https://lore.kernel.org/r/20210425214333.1090950-2-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-29 10:30:58 -03:00
Andrii Nakryiko 0f20615d64 selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable
bitfields. Missing breaks in a switch caused 8-byte reads always. This can
confuse libbpf because it does strict checks that memory load size corresponds
to the original size of the field, which in this case quite often would be
wrong.

After fixing that, we run into another problem, which quite subtle, so worth
documenting here. The issue is in Clang optimization and CO-RE relocation
interactions. Without that asm volatile construct (also known as
barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and
will apply BYTE_OFFSET 4 times for each switch case arm. This will result in
the same error from libbpf about mismatch of memory load size and original
field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *),
*(u32 *), and *(u64 *) memory loads, three of which will fail. Using
barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to
calculate p, after which value of p is used without relocation in each of
switch case arms, doing appropiately-sized memory load.

Here's the list of relevant relocations and pieces of generated BPF code
before and after this patch for test_core_reloc_bitfields_direct selftests.

BEFORE
=====
 #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32
 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32

     157:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
     159:       7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
     160:       b7 02 00 00 04 00 00 00 r2 = 4
; BYTE_SIZE relocation here                 ^^^
     161:       66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63>
     162:       16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65>
     163:       16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66>
     164:       05 00 12 00 00 00 00 00 goto +18 <LBB0_69>

0000000000000528 <LBB0_66>:
     165:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     167:       69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
     168:       05 00 0e 00 00 00 00 00 goto +14 <LBB0_69>

0000000000000548 <LBB0_63>:
     169:       16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67>
     170:       16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68>
     171:       05 00 0b 00 00 00 00 00 goto +11 <LBB0_69>

0000000000000560 <LBB0_68>:
     172:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     174:       79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
     175:       05 00 07 00 00 00 00 00 goto +7 <LBB0_69>

0000000000000580 <LBB0_65>:
     176:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     178:       71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8)
; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
     179:       05 00 03 00 00 00 00 00 goto +3 <LBB0_69>

00000000000005a0 <LBB0_67>:
     180:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
     182:       61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8)
; BYTE_OFFSET relo here w/ RIGHT size        ^^^^^^^^^^^^^^^^

00000000000005b8 <LBB0_69>:
     183:       67 01 00 00 20 00 00 00 r1 <<= 32
     184:       b7 02 00 00 00 00 00 00 r2 = 0
     185:       16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
     186:       c7 01 00 00 20 00 00 00 r1 s>>= 32
     187:       05 00 01 00 00 00 00 00 goto +1 <LBB0_72>

00000000000005e0 <LBB0_71>:
     188:       77 01 00 00 20 00 00 00 r1 >>= 32

AFTER
=====

 #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32

     129:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
     131:       7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
     132:       b7 01 00 00 08 00 00 00 r1 = 8
; BYTE_OFFSET relo here                     ^^^
; no size check for non-memory dereferencing instructions
     133:       0f 12 00 00 00 00 00 00 r2 += r1
     134:       b7 03 00 00 04 00 00 00 r3 = 4
; BYTE_SIZE relocation here                 ^^^
     135:       66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63>
     136:       16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65>
     137:       16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66>
     138:       05 00 0a 00 00 00 00 00 goto +10 <LBB0_69>

0000000000000458 <LBB0_66>:
     139:       69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
     140:       05 00 08 00 00 00 00 00 goto +8 <LBB0_69>

0000000000000468 <LBB0_63>:
     141:       16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67>
     142:       16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68>
     143:       05 00 05 00 00 00 00 00 goto +5 <LBB0_69>

0000000000000480 <LBB0_68>:
     144:       79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
     145:       05 00 03 00 00 00 00 00 goto +3 <LBB0_69>

0000000000000490 <LBB0_65>:
     146:       71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
     147:       05 00 01 00 00 00 00 00 goto +1 <LBB0_69>

00000000000004a0 <LBB0_67>:
     148:       61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0)
; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^

00000000000004a8 <LBB0_69>:
     149:       67 01 00 00 20 00 00 00 r1 <<= 32
     150:       b7 02 00 00 00 00 00 00 r2 = 0
     151:       16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
     152:       c7 01 00 00 20 00 00 00 r1 s>>= 32
     153:       05 00 01 00 00 00 00 00 goto +1 <LBB0_72>

00000000000004d0 <LBB0_71>:
     154:       77 01 00 00 20 00 00 00 r1 >>= 323

Fixes: ee26dade0e ("libbpf: Add support for relocatable bitfields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-4-andrii@kernel.org
2021-04-26 18:37:13 -07:00
Andrii Nakryiko 6709a914c8 libbpf: Support BTF_KIND_FLOAT during type compatibility checks in CO-RE
Add BTF_KIND_FLOAT support when doing CO-RE field type compatibility check.
Without this, relocations against float/double fields will fail.

Also adjust one error message to emit instruction index instead of less
convenient instruction byte offset.

Fixes: 22541a9eeb ("libbpf: Add BTF_KIND_FLOAT support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-3-andrii@kernel.org
2021-04-26 18:37:13 -07:00
Arnaldo Carvalho de Melo 26bda3ca19 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-26 09:35:41 -03:00
David S. Miller 5f6c2f536d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-04-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 69 non-merge commits during the last 22 day(s) which contain
a total of 69 files changed, 3141 insertions(+), 866 deletions(-).

The main changes are:

1) Add BPF static linker support for extern resolution of global, from Andrii.

2) Refine retval for bpf_get_task_stack helper, from Dave.

3) Add a bpf_snprintf helper, from Florent.

4) A bunch of miscellaneous improvements from many developers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-25 18:02:32 -07:00
Andrii Nakryiko 0a342457b3 libbpf: Support extern resolution for BTF-defined maps in .maps section
Add extra logic to handle map externs (only BTF-defined maps are supported for
linking). Re-use the map parsing logic used during bpf_object__open(). Map
externs are currently restricted to always match complete map definition. So
all the specified attributes will be compared (down to pining, map_flags,
numa_node, etc). In the future this restriction might be relaxed with no
backwards compatibility issues. If any attribute is mismatched between extern
and actual map definition, linker will report an error, pointing out which one
mismatches.

The original intent was to allow for extern to specify attributes that matters
(to user) to enforce. E.g., if you specify just key information and omit
value, then any value fits. Similarly, it should have been possible to enforce
map_flags, pinning, and any other possible map attribute. Unfortunately, that
means that multiple externs can be only partially overlapping with each other,
which means linker would need to combine their type definitions to end up with
the most restrictive and fullest map definition. This requires an extra amount
of BTF manipulation which at this time was deemed unnecessary and would
require further extending generic BTF writer APIs. So that is left for future
follow ups, if there will be demand for that. But the idea seems intresting
and useful, so I want to document it here.

Weak definitions are also supported, but are pretty strict as well, just
like externs: all weak map definitions have to match exactly. In the follow up
patches this most probably will be relaxed, with __weak map definitions being
able to differ between each other (with non-weak definition always winning, of
course).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko a46349227c libbpf: Add linker extern resolution support for functions and global variables
Add BPF static linker logic to resolve extern variables and functions across
multiple linked together BPF object files.

For that, linker maintains a separate list of struct glob_sym structures,
which keeps track of few pieces of metadata (is it extern or resolved global,
is it a weak symbol, which ELF section it belongs to, etc) and ties together
BTF type info and ELF symbol information and keeps them in sync.

With adding support for extern variables/funcs, it's now possible for some
sections to contain both extern and non-extern definitions. This means that
some sections may start out as ephemeral (if only externs are present and thus
there is not corresponding ELF section), but will be "upgraded" to actual ELF
section as symbols are resolved or new non-extern definitions are appended.

Additional care is taken to not duplicate extern entries in sections like
.kconfig and .ksyms.

Given libbpf requires BTF type to always be present for .kconfig/.ksym
externs, linker extends this requirement to all the externs, even those that
are supposed to be resolved during static linking and which won't be visible
to libbpf. With BTF information always present, static linker will check not
just ELF symbol matches, but entire BTF type signature match as well. That
logic is stricter that BPF CO-RE checks. It probably should be re-used by
.ksym resolution logic in libbpf as well, but that's left for follow up
patches.

To make it unnecessary to rewrite ELF symbols and minimize BTF type
rewriting/removal, ELF symbols that correspond to externs initially will be
updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC
and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but
types they point to might get replaced when extern is resolved. This might
leave some left-over types (even though we try to minimize this for common
cases of having extern funcs with not argument names vs concrete function with
names properly specified). That can be addresses later with a generic BTF
garbage collection. That's left for a follow up as well.

Given BTF type appending phase is separate from ELF symbol
appending/resolution, special struct glob_sym->underlying_btf_id variable is
used to communicate resolution and rewrite decisions. 0 means
underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0
values are used for temporary storage of source BTF type ID (not yet
rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by
the end of linker_append_btf() phase, that underlying_btf_id will be remapped
and will always be > 0. This is the uglies part of the whole process, but
keeps the other parts much simpler due to stability of sec_var and VAR/FUNC
types, as well as ELF symbol, so please keep that in mind while reviewing.

BTF-defined maps require some extra custom logic and is addressed separate in
the next patch, so that to keep this one smaller and easier to review.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko 83a157279f libbpf: Tighten BTF type ID rewriting with error checking
It should never fail, but if it does, it's better to know about this rather
than end up with nonsensical type IDs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
2021-04-23 14:05:27 -07:00
Andrii Nakryiko 386b1d241e libbpf: Extend sanity checking ELF symbols with externs validation
Add logic to validate extern symbols, plus some other minor extra checks, like
ELF symbol #0 validation, general symbol visibility and binding validations.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko 42869d2852 libbpf: Make few internal helpers available outside of libbpf.c
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers
available outside of libbpf.c, to be used by static linker code.

Also do few cleanups (error code fixes, comment clean up, etc) that don't
deserve their own commit.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko beaa3711ad libbpf: Factor out symtab and relos sanity checks
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into
separate sections. They are already quite extensive and are suffering from too
deep indentation. Subsequent changes will extend SYMTAB sanity checking
further, so it's better to factor each into a separate function.

No functional changes are intended.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko c7ef5ec957 libbpf: Refactor BTF map definition parsing
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF
static linker. Further, at least for BPF static linker, it's important to know
which attributes of a BPF map were defined explicitly, so provide a bit set
for each known portion of BTF map definition. This allows BPF static linker to
do a simple check when dealing with extern map declarations.

The same capabilities allow to distinguish attributes explicitly set to zero
(e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no
max_entries attribute at all). Libbpf is currently not utilizing that, but it
could be useful for backwards compatibility reasons later.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko 6245947c1b libbpf: Allow gaps in BPF program sections to support overriden weak functions
Currently libbpf is very strict about parsing BPF program instruction
sections. No gaps are allowed between sequential BPF programs within a given
ELF section. Libbpf enforced that by keeping track of the next section offset
that should start a new BPF (sub)program and cross-checks that by searching
for a corresponding STT_FUNC ELF symbol.

But this is too restrictive once we allow to have weak BPF programs and link
together two or more BPF object files. In such case, some weak BPF programs
might be "overridden" by either non-weak BPF program with the same name and
signature, or even by another weak BPF program that just happened to be linked
first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program
intact, but there is no corresponding ELF symbol, because no one is going to
be referencing it.

Libbpf already correctly handles such cases in the sense that it won't append
such dead code to actual BPF programs loaded into kernel. So the only change
that needs to be done is to relax the logic of parsing BPF instruction
sections. Instead of assuming next BPF (sub)program section offset, iterate
available STT_FUNC ELF symbols to discover all available BPF subprograms and
programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko aea28a602f libbpf: Mark BPF subprogs with hidden visibility as static for BPF verifier
Define __hidden helper macro in bpf_helpers.h, which is a short-hand for
__attribute__((visibility("hidden"))). Add libbpf support to mark BPF
subprograms marked with __hidden as static in BTF information to enforce BPF
verifier's static function validation algorithm, which takes more information
(caller's context) into account during a subprogram validation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-5-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Andrii Nakryiko 0fec7a3cee libbpf: Suppress compiler warning when using SEC() macro with externs
When used on externs SEC() macro will trigger compilation warning about
inapplicable `__attribute__((used))`. That's expected for extern declarations,
so suppress it with the corresponding _Pragma.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210423181348.1801389-4-andrii@kernel.org
2021-04-23 14:05:26 -07:00
Rob Herring 818869489b libperf xyarray: Add bounds checks to xyarray__entry()
xyarray__entry() is missing any bounds checking yet often the x and y
parameters come from external callers. Add bounds checks and an
unchecked __xyarray__entry().

Committer notes:

Make the 'x' and 'y' arguments to the new xyarray__entry() that does
bounds check to be of type 'size_t', so that we cover also the case
where 'x' and 'y' could be negative, which is needed anyway as having
them as 'int' breaks the build with:

  /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h: In function ‘xyarray__entry’:
  /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:8: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
     28 |  if (x >= xy->max_x || y >= xy->max_y)
        |        ^~
  /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:26: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
     28 |  if (x >= xy->max_x || y >= xy->max_y)
        |                          ^~
  cc1: all warnings being treated as errors

Signed-off-by: Rob Herring <robh@kernel.org>
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210414195758.4078803-1-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:11:33 -03:00
Rob Herring 47d01e7b99 libperf: Add support for user space counter access
x86 and arm64 can both support direct access of event counters in
userspace. The access sequence is less than trivial and currently exists
in perf test code (tools/perf/arch/x86/tests/rdpmc.c) with copies in
projects such as PAPI and libpfm4.

In order to support userspace access, an event must be mmapped first
with perf_evsel__mmap(). Then subsequent calls to perf_evsel__read()
will use the fast path (assuming the arch supports it).

Committer notes:

Added a '__maybe_unused' attribute to the read_perf_counter() argument
to fix the build on arches other than x86_64 and arm.

Committer testing:

  Building and running the libperf tests in verbose mode (V=1) now shows
  those "loop = N, count = N" extra lines, testing user space counter
  access.

  # make V=1 -C tools/lib/perf tests
  make: Entering directory '/home/acme/git/perf/tools/lib/perf'
  make -f /home/acme/git/perf/tools/build/Makefile.build dir=. obj=libperf
  make -C /home/acme/git/perf/tools/lib/api/ O= libapi.a
  make -f /home/acme/git/perf/tools/build/Makefile.build dir=./fd obj=libapi
  make -f /home/acme/git/perf/tools/build/Makefile.build dir=./fs obj=libapi
  make -C tests
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-cpumap-a test-cpumap.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-threadmap-a test-threadmap.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-evlist-a test-evlist.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o test-evsel-a test-evsel.c ../libperf.a /home/acme/git/perf/tools/lib/api/libapi.a
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-cpumap-so test-cpumap.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-threadmap-so test-threadmap.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-evlist-so test-evlist.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  gcc -I/home/acme/git/perf/tools/lib/perf/include -I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -L.. -o test-evsel-so test-evsel.c /home/acme/git/perf/tools/lib/api/libapi.a -lperf
  make -C tests run
  running static:
  - running test-cpumap.c...OK
  - running test-threadmap.c...OK
  - running test-evlist.c...OK
  - running test-evsel.c...
  	loop = 65536, count = 333926
  	loop = 131072, count = 655781
  	loop = 262144, count = 1311141
  	loop = 524288, count = 2630126
  	loop = 1048576, count = 5256955
  	loop = 65536, count = 524594
  	loop = 131072, count = 1058916
  	loop = 262144, count = 2097458
  	loop = 524288, count = 4205429
  	loop = 1048576, count = 8406606
  OK
  running dynamic:
  - running test-cpumap.c...OK
  - running test-threadmap.c...OK
  - running test-evlist.c...OK
  - running test-evsel.c...
  	loop = 65536, count = 328102
  	loop = 131072, count = 655782
  	loop = 262144, count = 1317494
  	loop = 524288, count = 2627851
  	loop = 1048576, count = 5255187
  	loop = 65536, count = 524601
  	loop = 131072, count = 1048923
  	loop = 262144, count = 2107917
  	loop = 524288, count = 4194606
  	loop = 1048576, count = 8409322
  OK
  make: Leaving directory '/home/acme/git/perf/tools/lib/perf'
  #

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-4-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:10:45 -03:00
Florent Revest 58c2b1f5e0 libbpf: Introduce a BPF_SNPRINTF helper macro
Similarly to BPF_SEQ_PRINTF, this macro turns variadic arguments into an
array of u64, making it more natural to call the bpf_snprintf helper.

Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-6-revest@chromium.org
2021-04-19 15:27:37 -07:00
Florent Revest 83cd92b464 libbpf: Initialize the bpf_seq_printf parameters array field by field
When initializing the __param array with a one liner, if all args are
const, the initial array value will be placed in the rodata section but
because libbpf does not support relocation in the rodata section, any
pointer in this array will stay NULL.

Fixes: c09add2fbc ("tools/libbpf: Add bpf_iter support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210419155243.1632274-5-revest@chromium.org
2021-04-19 15:27:37 -07:00
Jakub Kicinski 8203c7ce4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
 - keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
 - fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-17 11:08:07 -07:00
Alexei Starovoitov d3d93e34bd libbpf: Remove unused field.
relo->processed is set, but not used. Remove it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210415141817.53136-1-alexei.starovoitov@gmail.com
2021-04-15 15:34:16 -07:00
Rob Herring d3003d9e68 libperf tests: Add support for verbose printing
Add __T_VERBOSE() so tests can add verbose output. The verbose output is
enabled with the '-v' command line option. Running 'make tests V=1' will
enable the '-v' option when running the tests.

It'll be used in the next patch, for a user space counter access test.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-3-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:40:15 -03:00
Rob Herring 6cd70754f2 libperf: Add evsel mmap support
In order to support usersapce access, an event must be mmapped. While
there's already mmap support for evlist, the usecase is a bit different
than the self monitoring with userspace access. So let's add new
perf_evsel__mmap()/perf_evsel_munmap() functions to mmap/munmap an
evsel. This allows implementing userspace access as a fastpath for
perf_evsel__read().

The mmapped address is returned by perf_evsel__mmap_base() which
primarily for users/tests to check if userspace access is enabled.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210414155412.3697605-2-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-15 16:38:51 -03:00
Jakub Kicinski 8859a44ea0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

MAINTAINERS
 - keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
 - simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
 - trivial
include/linux/ethtool.h
 - trivial, fix kdoc while at it
include/linux/skmsg.h
 - move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
 - add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
 - trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 20:48:35 -07:00
Andrii Nakryiko b3278099b2 libbpf: Add bpf_map__inner_map API
The API gives access to inner map for map in map types (array or
hash of map). It will be used to dynamically set max_entries in it.

Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210408061310.95877-7-yauheni.kaliuta@redhat.com
2021-04-08 23:54:48 -07:00
Ciara Loftus afd0be7299 libbpf: Fix potential NULL pointer dereference
Wait until after the UMEM is checked for null to dereference it.

Fixes: 43f1bc1eff ("libbpf: Restore umem state after socket create failure")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210408052009.7844-1-ciara.loftus@intel.com
2021-04-08 23:36:46 +02:00
Hengqi Chen 1e1032b0c4 libbpf: Fix KERNEL_VERSION macro
Add missing ')' for KERNEL_VERSION macro.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210405040119.802188-1-hengqi.chen@gmail.com
2021-04-05 07:14:13 -07:00
Yang Yingliang f07669df4c libbpf: Remove redundant semi-colon
Remove redundant semi-colon in finalize_btf_ext().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210402012634.1965453-1-yangyingliang@huawei.com
2021-04-03 01:49:38 +02:00
Ciara Loftus ca7a83e248 libbpf: Only create rx and tx XDP rings when necessary
Prior to this commit xsk_socket__create(_shared) always attempted to create
the rx and tx rings for the socket. However this causes an issue when the
socket being setup is that which shares the fd with the UMEM. If a
previous call to this function failed with this socket after the rings were
set up, a subsequent call would always fail because the rings are not torn
down after the first call and when we try to set them up again we encounter
an error because they already exist. Solve this by remembering whether the
rings were set up by introducing new bools to struct xsk_umem which
represent the ring setup status and using them to determine whether or
not to set up the rings.

Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-4-ciara.loftus@intel.com
2021-04-01 14:45:43 -07:00
Ciara Loftus 43f1bc1eff libbpf: Restore umem state after socket create failure
If the call to xsk_socket__create fails, the user may want to retry the
socket creation using the same umem. Ensure that the umem is in the
same state on exit if the call fails by:
1. ensuring the umem _save pointers are unmodified.
2. not unmapping the set of umem rings that were set up with the umem
during xsk_umem__create, since those maps existed before the call to
xsk_socket__create and should remain in tact even in the event of
failure.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331061218.1647-3-ciara.loftus@intel.com
2021-04-01 14:45:43 -07:00
Ciara Loftus df66201631 libbpf: Ensure umem pointer is non-NULL before dereferencing
Calls to xsk_socket__create dereference the umem to access the
fill_save and comp_save pointers. Make sure the umem is non-NULL
before doing this.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210331061218.1647-2-ciara.loftus@intel.com
2021-04-01 14:45:43 -07:00
Maciej Fijalkowski 10397994d3 libbpf: xsk: Use bpf_link
Currently, if there are multiple xdpsock instances running on a single
interface and in case one of the instances is terminated, the rest of
them are left in an inoperable state due to the fact of unloaded XDP
prog from interface.

Consider the scenario below:

// load xdp prog and xskmap and add entry to xskmap at idx 10
$ sudo ./xdpsock -i ens801f0 -t -q 10

// add entry to xskmap at idx 11
$ sudo ./xdpsock -i ens801f0 -t -q 11

terminate one of the processes and another one is unable to work due to
the fact that the XDP prog was unloaded from interface.

To address that, step away from setting bpf prog in favour of bpf_link.
This means that refcounting of BPF resources will be done automatically
by bpf_link itself.

Provide backward compatibility by checking if underlying system is
bpf_link capable. Do this by looking up/creating bpf_link on loopback
device. If it failed in any way, stick with netlink-based XDP prog.
therwise, use bpf_link-based logic.

When setting up BPF resources during xsk socket creation, check whether
bpf_link for a given ifindex already exists via set of calls to
bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
and comparing the ifindexes from bpf_link and xsk socket.

For case where resources exist but they are not AF_XDP related, bail out
and ask user to remove existing prog and then retry.

Lastly, do a bit of refactoring within __xsk_setup_xdp_prog and pull out
existing code branches based on prog_id value onto separate functions
that are responsible for resource initialization if prog_id was 0 and
for lookup existing resources for non-zero prog_id as that implies that
XDP program is present on the underlying net device. This in turn makes
it easier to follow, especially the teardown part of both branches.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-7-maciej.fijalkowski@intel.com
2021-03-30 09:24:38 -07:00
Andrii Nakryiko 05d817031f libbpf: Fix memory leak when emitting final btf_ext
Free temporary allocated memory used to construct finalized .BTF.ext data.
Found by Coverity static analysis on libbpf's Github repo.

Fixes: 8fd27bf69b ("libbpf: Add BPF static linker BTF and BTF.ext support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210327042502.969745-1-andrii@kernel.org
2021-03-30 07:38:36 -07:00
Martin KaFai Lau 5bd022ec01 libbpf: Support extern kernel function
This patch is to make libbpf able to handle the following extern
kernel function declaration and do the needed relocations before
loading the bpf program to the kernel.

extern int foo(struct sock *) __attribute__((section(".ksyms")))

In the collect extern phase, needed changes is made to
bpf_object__collect_externs() and find_extern_btf_id() to collect
extern function in ".ksyms" section.  The func in the BTF datasec also
needs to be replaced by an int var.  The idea is similar to the existing
handling in extern var.  In case the BTF may not have a var, a dummy ksym
var is added at the beginning of bpf_object__collect_externs()
if there is func under ksyms datasec.  It will also change the
func linkage from extern to global which the kernel can support.
It also assigns a param name if it does not have one.

In the collect relo phase, it will record the kernel function
call as RELO_EXTERN_FUNC.

bpf_object__resolve_ksym_func_btf_id() is added to find the func
btf_id of the running kernel.

During actual relocation, it will patch the BPF_CALL instruction with
src_reg = BPF_PSEUDO_FUNC_CALL and insn->imm set to the running
kernel func's btf_id.

The required LLVM patch: https://reviews.llvm.org/D93563

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015234.1548923-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Martin KaFai Lau aa0b8d43e9 libbpf: Record extern sym relocation first
This patch records the extern sym relocs first before recording
subprog relocs.  The later patch will have relocs for extern
kernel function call which is also using BPF_JMP | BPF_CALL.
It will be easier to handle the extern symbols first in
the later patch.

is_call_insn() helper is added.  The existing is_ldimm64() helper
is renamed to is_ldimm64_insn() for consistency.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015227.1548623-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Martin KaFai Lau 0c091e5c2d libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR
This patch renames RELO_EXTERN to RELO_EXTERN_VAR.
It is to avoid the confusion with a later patch adding
RELO_EXTERN_FUNC.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015221.1547722-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Martin KaFai Lau 774e132e83 libbpf: Refactor codes for finding btf id of a kernel symbol
This patch refactors code, that finds kernel btf_id by kind
and symbol name, to a new function find_ksym_btf_id().

It also adds a new helper __btf_kind_str() to return
a string by the numeric kind value.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015214.1547069-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Martin KaFai Lau 933d1aa324 libbpf: Refactor bpf_object__resolve_ksyms_btf_id
This patch refactors most of the logic from
bpf_object__resolve_ksyms_btf_id() into a new function
bpf_object__resolve_ksym_var_btf_id().
It is to get ready for a later patch adding
bpf_object__resolve_ksym_func_btf_id() which resolves
a kernel function to the running kernel btf_id.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015207.1546749-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Andrii Nakryiko 36e7985160 libbpf: Preserve empty DATASEC BTFs during static linking
Ensure that BPF static linker preserves all DATASEC BTF types, even if some of
them might not have any variable information at all. This may happen if the
compiler promotes local initialized variable contents into .rodata section and
there are no global or static functions in the program.

For example,

  $ cat t.c
  struct t { char a; char b; char c; };
  void bar(struct t*);
  void find() {
     struct t tmp = {1, 2, 3};
     bar(&tmp);
  }

  $ clang -target bpf -O2 -g -S t.c
         .long   104                             # BTF_KIND_DATASEC(id = 8)
         .long   251658240                       # 0xf000000
         .long   0

         .ascii  ".rodata"                       # string offset=104

  $ clang -target bpf -O2 -g -c t.c
  $ readelf -S t.o | grep data
     [ 4] .rodata           PROGBITS         0000000000000000  00000090

Fixes: 8fd27bf69b ("libbpf: Add BPF static linker BTF and BTF.ext support")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210326043036.3081011-1-andrii@kernel.org
2021-03-26 17:45:17 +01:00
Pedro Tammela 6032ebb54c libbpf: Fix bail out from 'ringbuf_process_ring()' on error
The current code bails out with negative and positive returns.
If the callback returns a positive return code, 'ring_buffer__consume()'
and 'ring_buffer__poll()' will return a spurious number of records
consumed, but mostly important will continue the processing loop.

This patch makes positive returns from the callback a no-op.

Fixes: bf99c936f9 ("libbpf: Add BPF ring buffer support")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210325150115.138750-1-pctammela@mojatatu.com
2021-03-25 21:13:24 -07:00
Rafael David Tinoco 155f556d64 libbpf: Add bpf object kern_version attribute setter
Unfortunately some distros don't have their kernel version defined
accurately in <linux/version.h> due to different long term support
reasons.

It is important to have a way to override the bpf kern_version
attribute during runtime: some old kernels might still check for
kern_version attribute during bpf_prog_load().

Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210323040952.2118241-1-rafaeldtinoco@ubuntu.com
2021-03-25 19:23:27 -07:00
Andrii Nakryiko a46410d5e4 libbpf: Constify few bpf_program getters
bpf_program__get_type() and bpf_program__get_expected_attach_type() shouldn't
modify given bpf_program, so mark input parameter as const struct bpf_program.
This eliminates unnecessary compilation warnings or explicit casts in user
programs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210324172941.2609884-1-andrii@kernel.org
2021-03-26 01:17:04 +01:00
David S. Miller 241949e488 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-24

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 15 day(s) which contain
a total of 65 files changed, 3200 insertions(+), 738 deletions(-).

The main changes are:

1) Static linking of multiple BPF ELF files, from Andrii.

2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo.

3) Spelling fixes from various folks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:30:46 -07:00
David S. Miller efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Andrii Nakryiko 78b226d481 libbpf: Skip BTF fixup if object file has no BTF
Skip BTF fixup step when input object file is missing BTF altogether.

Fixes: 8fd27bf69b ("libbpf: Add BPF static linker BTF and BTF.ext support")
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20210319205909.1748642-3-andrii@kernel.org
2021-03-22 18:58:25 -07:00
Jean-Philippe Brucker 901ee1d750 libbpf: Fix BTF dump of pointer-to-array-of-struct
The vmlinux.h generated from BTF is invalid when building
drivers/phy/ti/phy-gmii-sel.c with clang:

vmlinux.h:61702:27: error: array type has incomplete element type ‘struct reg_field’
61702 |  const struct reg_field (*regfields)[3];
      |                           ^~~~~~~~~

bpftool generates a forward declaration for this struct regfield, which
compilers aren't happy about. Here's a simplified reproducer:

	struct inner {
		int val;
	};
	struct outer {
		struct inner (*ptr_to_array)[2];
	} A;

After build with clang -> bpftool btf dump c -> clang/gcc:
./def-clang.h:11:23: error: array has incomplete element type 'struct inner'
        struct inner (*ptr_to_array)[2];

Member ptr_to_array of struct outer is a pointer to an array of struct
inner. In the DWARF generated by clang, struct outer appears before
struct inner, so when converting BTF of struct outer into C, bpftool
issues a forward declaration to struct inner. With GCC the DWARF info is
reversed so struct inner gets fully defined.

That forward declaration is not sufficient when compilers handle an
array of the struct, even when it's only used through a pointer. Note
that we can trigger the same issue with an intermediate typedef:

	struct inner {
	        int val;
	};
	typedef struct inner inner2_t[2];
	struct outer {
	        inner2_t *ptr_to_array;
	} A;

Becomes:

	struct inner;
	typedef struct inner inner2_t[2];

And causes:

./def-clang.h:10:30: error: array has incomplete element type 'struct inner'
	typedef struct inner inner2_t[2];

To fix this, clear through_ptr whenever we encounter an intermediate
array, to make the inner struct part of a strong link and force full
declaration.

Fixes: 351131b51c ("libbpf: add btf_dump API for BTF-to-C conversion")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210319112554.794552-2-jean-philippe@linaro.org
2021-03-19 14:14:44 -07:00
KP Singh ea24b19562 libbpf: Add explicit padding to btf_dump_emit_type_decl_opts
Similar to
https://lore.kernel.org/bpf/20210313210920.1959628-2-andrii@kernel.org/

When DECLARE_LIBBPF_OPTS is used with inline field initialization, e.g:

  DECLARE_LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts,
    .field_name = var_ident,
    .indent_level = 2,
    .strip_mods = strip_mods,
  );

and compiled in debug mode, the compiler generates code which
leaves the padding uninitialized and triggers errors within libbpf APIs
which require strict zero initialization of OPTS structs.

Adding anonymous padding field fixes the issue.

Fixes: 9f81654eeb ("libbpf: Expose BTF-to-C type declaration emitting API")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210319192117.2310658-1-kpsingh@kernel.org
2021-03-19 14:03:39 -07:00
Andrii Nakryiko 8fd27bf69b libbpf: Add BPF static linker BTF and BTF.ext support
Add .BTF and .BTF.ext static linking logic.

When multiple BPF object files are linked together, their respective .BTF and
.BTF.ext sections are merged together. BTF types are not just concatenated,
but also deduplicated. .BTF.ext data is grouped by type (func info, line info,
core_relos) and target section names, and then all the records are
concatenated together, preserving their relative order. All the BTF type ID
references and string offsets are updated as necessary, to take into account
possibly deduplicated strings and types.

BTF DATASEC types are handled specially. Their respective var_secinfos are
accumulated separately in special per-section data and then final DATASEC
types are emitted at the very end during bpf_linker__finalize() operation,
just before emitting final ELF output file.

BTF data can also provide "section annotations" for some extern variables.
Such concept is missing in ELF, but BTF will have DATASEC types for such
special extern datasections (e.g., .kconfig, .ksyms). Such sections are called
"ephemeral" internally. Internally linker will keep metadata for each such
section, collecting variables information, but those sections won't be emitted
into the final ELF file.

Also, given LLVM/Clang during compilation emits BTF DATASECS that are
incomplete, missing section size and variable offsets for static variables,
BPF static linker will initially fix up such DATASECs, using ELF symbols data.
The final DATASECs will preserve section sizes and all variable offsets. This
is handled correctly by libbpf already, so won't cause any new issues. On the
other hand, it's actually a nice property to have a complete BTF data without
runtime adjustments done during bpf_object__open() by libbpf. In that sense,
BPF static linker is also a BTF normalizer.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-8-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko faf6ed321c libbpf: Add BPF static linker APIs
Introduce BPF static linker APIs to libbpf. BPF static linker allows to
perform static linking of multiple BPF object files into a single combined
resulting object file, preserving all the BPF programs, maps, global
variables, etc.

Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are
concatenated together. Similarly, code sections are also concatenated. All the
symbols and ELF relocations are also concatenated in their respective ELF
sections and are adjusted accordingly to the new object file layout.

Static variables and functions are handled correctly as well, adjusting BPF
instructions offsets to reflect new variable/function offset within the
combined ELF section. Such relocations are referencing STT_SECTION symbols and
that stays intact.

Data sections in different files can have different alignment requirements, so
that is taken care of as well, adjusting sizes and offsets as necessary to
satisfy both old and new alignment requirements.

DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG
section, which is ignored by libbpf in bpf_object__open() anyways. So, in
a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty
nice property, especially if resulting .o file is then used to generate BPF
skeleton.

Original string sections are ignored and instead we construct our own set of
unique strings using libbpf-internal `struct strset` API.

To reduce the size of the patch, all the .BTF and .BTF.ext processing was
moved into a separate patch.

The high-level API consists of just 4 functions:
  - bpf_linker__new() creates an instance of BPF static linker. It accepts
    output filename and (currently empty) options struct;
  - bpf_linker__add_file() takes input filename and appends it to the already
    processed ELF data; it can be called multiple times, one for each BPF
    ELF object file that needs to be linked in;
  - bpf_linker__finalize() needs to be called to dump final ELF contents into
    the output file, specified when bpf_linker was created; after
    bpf_linker__finalize() is called, no more bpf_linker__add_file() and
    bpf_linker__finalize() calls are allowed, they will return error;
  - regardless of whether bpf_linker__finalize() was called or not,
    bpf_linker__free() will free up all the used resources.

Currently, BPF static linker doesn't resolve cross-object file references
(extern variables and/or functions). This will be added in the follow up patch
set.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko 9af44bc5d4 libbpf: Add generic BTF type shallow copy API
Add btf__add_type() API that performs shallow copy of a given BTF type from
the source BTF into the destination BTF. All the information and type IDs are
preserved, but all the strings encountered are added into the destination BTF
and corresponding offsets are rewritten. BTF type IDs are assumed to be
correct or such that will be (somehow) modified afterwards.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-6-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko 90d76d3ece libbpf: Extract internal set-of-strings datastructure APIs
Extract BTF logic for maintaining a set of strings data structure, used for
BTF strings section construction in writable mode, into separate re-usable
API. This data structure is going to be used by bpf_linker to maintains ELF
STRTAB section, which has the same layout as BTF strings section.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-5-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko 3b029e06f6 libbpf: Rename internal memory-management helpers
Rename btf_add_mem() and btf_ensure_mem() helpers that abstract away details
of dynamically resizable memory to use libbpf_ prefix, as they are not
BTF-specific. No functional changes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-4-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko f36e99a45d libbpf: Generalize BTF and BTF.ext type ID and strings iteration
Extract and generalize the logic to iterate BTF type ID and string offset
fields within BTF types and .BTF.ext data. Expose this internally in libbpf
for re-use by bpf_linker.

Additionally, complete strings deduplication handling for BTF.ext (e.g., CO-RE
access strings), which was previously missing. There previously was no
case of deduplicating .BTF.ext data, but bpf_linker is going to use it.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-3-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko e14ef4bf01 libbpf: Expose btf_type_by_id() internally
btf_type_by_id() is internal-only convenience API returning non-const pointer
to struct btf_type. Expose it outside of btf.c for re-use.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-2-andrii@kernel.org
2021-03-18 16:14:22 -07:00
Andrii Nakryiko 9ae2c26e43 libbpf: provide NULL and KERNEL_VERSION macros in bpf_helpers.h
Given that vmlinux.h is not compatible with headers like stddef.h, NULL poses
an annoying problem: it is defined as #define, so is not captured in BTF, so
is not emitted into vmlinux.h. This leads to users either sticking to explicit
0, or defining their own NULL (as progs/skb_pkt_end.c does).

But it's easy for bpf_helpers.h to provide (conditionally) NULL definition.
Similarly, KERNEL_VERSION is another commonly missed macro that came up
multiple times. So this patch adds both of them, along with offsetof(), that
also is typically defined in stddef.h, just like NULL.

This might cause compilation warning for existing BPF applications defining
their own NULL and/or KERNEL_VERSION already:

  progs/skb_pkt_end.c:7:9: warning: 'NULL' macro redefined [-Wmacro-redefined]
  #define NULL 0
          ^
  /tmp/linux/tools/testing/selftests/bpf/tools/include/vmlinux.h:4:9: note: previous definition is here
  #define NULL ((void *)0)
	  ^

It is trivial to fix, though, so long-term benefits outweight temporary
inconveniences.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20210317200510.1354627-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17 18:48:05 -07:00
Kumar Kartikeya Dwivedi 58bfd95b55 libbpf: Use SOCK_CLOEXEC when opening the netlink socket
Otherwise, there exists a small window between the opening and closing
of the socket fd where it may leak into processes launched by some other
thread.

Fixes: 949abbe884 ("libbpf: add function to setup XDP")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210317115857.6536-1-memxor@gmail.com
2021-03-18 00:50:21 +01:00
Namhyung Kim 8f3f5792f2 libbpf: Fix error path in bpf_object__elf_init()
When it failed to get section names, it should call into
bpf_object__elf_finish() like others.

Fixes: 88a8212028 ("libbpf: Factor out common ELF operations and improve logging")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210317145414.884817-1-namhyung@kernel.org
2021-03-18 00:42:21 +01:00
Andrii Nakryiko dde7b3f5f2 libbpf: Add explicit padding to bpf_xdp_set_link_opts
Adding such anonymous padding fixes the issue with uninitialized portions of
bpf_xdp_set_link_opts when using LIBBPF_DECLARE_OPTS macro with inline field
initialization:

DECLARE_LIBBPF_OPTS(bpf_xdp_set_link_opts, opts, .old_fd = -1);

When such code is compiled in debug mode, compiler is generating code that
leaves padding bytes uninitialized, which triggers error inside libbpf APIs
that do strict zero initialization checks for OPTS structs.

Adding anonymous padding field fixes the issue.

Fixes: bd5ca3ef93 ("libbpf: Add function to set link XDP fd while specifying old program")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210313210920.1959628-2-andrii@kernel.org
2021-03-16 12:26:49 -07:00
Pedro Tammela 0205e9de42 libbpf: Avoid inline hint definition from 'linux/stddef.h'
Linux headers might pull 'linux/stddef.h' which defines
'__always_inline' as the following:

   #ifndef __always_inline
   #define __always_inline inline
   #endif

This becomes an issue if the program picks up the 'linux/stddef.h'
definition as the macro now just hints inline to clang.

This change now enforces the proper definition for BPF programs
regardless of the include order.

Signed-off-by: Pedro Tammela <pctammela@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210314173839.457768-1-pctammela@gmail.com
2021-03-15 22:11:17 -07:00
David S. Miller 547fd08377 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-03-10

The following pull-request contains BPF updates for your *net* tree.

We've added 8 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 136 insertions(+), 17 deletions(-).

The main changes are:

1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.

2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
   percpu cgroup storage from tracing programs when nested, from Yonghong Song.

3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.

4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
   L3 input length, from Jesper Dangaard Brouer.

5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.

6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10 15:14:56 -08:00
Björn Töpel 7e8bbe24cb libbpf: xsk: Move barriers from libbpf_util.h to xsk.h
The only user of libbpf_util.h is xsk.h. Move the barriers to xsk.h,
and remove libbpf_util.h. The barriers are used as an implementation
detail, and should not be considered part of the stable API.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210310080929.641212-3-bjorn.topel@gmail.com
2021-03-10 13:45:16 -08:00
Björn Töpel 2882c48bf8 libbpf: xsk: Remove linux/compiler.h header
In commit 291471dd15 ("libbpf, xsk: Add libbpf_smp_store_release
libbpf_smp_load_acquire") linux/compiler.h was added as a dependency
to xsk.h, which is the user-facing API. This makes it harder for
userspace application to consume the library. Here the header
inclusion is removed, and instead {READ,WRITE}_ONCE() is added
explicitly.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210310080929.641212-2-bjorn.topel@gmail.com
2021-03-10 13:38:07 -08:00
David S. Miller c1acda9807 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-09

The following pull-request contains BPF updates for your *net-next* tree.

We've added 90 non-merge commits during the last 17 day(s) which contain
a total of 114 files changed, 5158 insertions(+), 1288 deletions(-).

The main changes are:

1) Faster bpf_redirect_map(), from Björn.

2) skmsg cleanup, from Cong.

3) Support for floating point types in BTF, from Ilya.

4) Documentation for sys_bpf commands, from Joe.

5) Support for sk_lookup in bpf_prog_test_run, form Lorenz.

6) Enable task local storage for tracing programs, from Song.

7) bpf_for_each_map_elem() helper, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-09 18:07:05 -08:00
Linus Torvalds 05a59d7979 Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

 2) TX skb error handling fix in mt76 driver, also from Felix.

 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

 4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

 5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

 7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...
2021-03-09 17:15:56 -08:00
Georgi Valkov e7fb6465d4 libbpf: Fix INSTALL flag order
It was reported ([0]) that having optional -m flag between source and
destination arguments in install command breaks bpftools cross-build
on MacOS. Move -m to the front to fix this issue.

  [0] https://github.com/openwrt/openwrt/pull/3959

Fixes: 7110d80d53 ("libbpf: Makefile set specified permission mode")
Signed-off-by: Georgi Valkov <gvalkov@abv.bg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210308183038.613432-1-andrii@kernel.org
2021-03-08 19:43:27 +01:00
Jean-Philippe Brucker a6aac408c5 libbpf: Fix arm64 build
The macro for libbpf_smp_store_release() doesn't build on arm64, fix it.

Fixes: 291471dd15 ("libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210308182521.155536-1-jean-philippe@linaro.org
2021-03-08 10:35:33 -08:00
Björn Töpel 291471dd15 libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire
Now that the AF_XDP rings have load-acquire/store-release semantics,
move libbpf to that as well.

The library-internal libbpf_smp_{load_acquire,store_release} are only
valid for 32-bit words on ARM64.

Also, remove the barriers that are no longer in use.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210305094113.413544-3-bjorn.topel@gmail.com
2021-03-08 08:52:28 -08:00
Namhyung Kim e2a99c9a9a libperf: Add perf_evlist__reset_id_hash()
Add the perf_evlist__reset_id_hash() function as an internal function so
that it can be called by perf to reset the hash table.  This is
necessary for 'perf stat' to run the workload multiple times.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210225035148.778569-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:54:31 -03:00
Joe Stringer 923a932c98 scripts/bpf: Abstract eBPF API target parameter
Abstract out the target parameter so that upcoming commits, more than
just the existing "helpers" target can be called to generate specific
portions of docs from the eBPF UAPI headers.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-10-joe@cilium.io
2021-03-04 18:39:45 -08:00
Ilya Leoshkevich 22541a9eeb libbpf: Add BTF_KIND_FLOAT support
The logic follows that of BTF_KIND_INT most of the time. Sanitization
replaces BTF_KIND_FLOATs with equally-sized empty BTF_KIND_STRUCTs on
older kernels, for example, the following:

    [4] FLOAT 'float' size=4

becomes the following:

    [4] STRUCT '(anon)' size=4 vlen=0

With dwarves patch [1] and this patch, the older kernels, which were
failing with the floating-point-related errors, will now start working
correctly.

[1] https://github.com/iii-i/dwarves/commit/btf-kind-float-v2

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-4-iii@linux.ibm.com
2021-03-04 17:58:15 -08:00
Ilya Leoshkevich 1b1ce92b24 libbpf: Fix whitespace in btf_add_composite() comment
Remove trailing space.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-3-iii@linux.ibm.com
2021-03-04 17:58:15 -08:00
Maciej Fijalkowski 2b2aedabc4 libbpf: Clear map_info before each bpf_obj_get_info_by_fd
xsk_lookup_bpf_maps, based on prog_fd, looks whether current prog has a
reference to XSKMAP. BPF prog can include insns that work on various BPF
maps and this is covered by iterating through map_ids.

The bpf_map_info that is passed to bpf_obj_get_info_by_fd for filling
needs to be cleared at each iteration, so that it doesn't contain any
outdated fields and that is currently missing in the function of
interest.

To fix that, zero-init map_info via memset before each
bpf_obj_get_info_by_fd call.

Also, since the area of this code is touched, in general strcmp is
considered harmful, so let's convert it to strncmp and provide the
size of the array name for current map_info.

While at it, do s/continue/break/ once we have found the xsks_map to
terminate the search.

Fixes: 5750902a6e ("libbpf: proper XSKMAP cleanup")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210303185636.18070-4-maciej.fijalkowski@intel.com
2021-03-04 15:53:37 +01:00
Yonghong Song 53eddb5e04 libbpf: Support subprog address relocation
A new relocation RELO_SUBPROG_ADDR is added to capture
subprog addresses loaded with ld_imm64 insns. Such ld_imm64
insns are marked with BPF_PSEUDO_FUNC and will be passed to
kernel. For bpf_for_each_map_elem() case, kernel will
check that the to-be-used subprog address must be a static
function and replace it with proper actual jited func address.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204930.3885367-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Yonghong Song b8f871fa32 libbpf: Move function is_ldimm64() earlier in libbpf.c
Move function is_ldimm64() close to the beginning of libbpf.c,
so it can be reused by later code and the next patch as well.
There is no functionality change.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204929.3885295-1-yhs@fb.com
2021-02-26 13:23:52 -08:00
Linus Torvalds 3a36281a17 New features:
- Support instruction latency in 'perf report', with both memory latency
   (weight) and instruction latency information, users can locate expensive load
   instructions and understand time spent in different stages.
 
 - Extend 'perf c2c' to display the number of loads which were blocked by data
   or address conflict.
 
 - Add 'perf stat' support for L2 topdown events in systems such as Intel's
   Sapphire rapids server.
 
 - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a sort key, for instance:
 
     perf report --stdio --sort=comm,symbol,code_page_size
 
 - New 'perf daemon' command to run long running sessions while providing a way to control
   the enablement of events without restarting a traditional 'perf record' session.
 
 - Enable counting events for BPF programs in 'perf stat' just like for other
   targets (tid, cgroup, cpu, etc), e.g.:
 
       # perf stat -e ref-cycles,cycles -b 254 -I 1000
          1.487903822            115,200      ref-cycles
          1.487903822             86,012      cycles
          2.489147029             80,560      ref-cycles
          2.489147029             73,784      cycles
       ^C#
 
   The example above counts 'cycles' and 'ref-cycles' of BPF program of id 254.
   It is similar to bpftool-prog-profile command, but more flexible.
 
 - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO build-id using infrastructure
   generalised from the eBPF subsystem, removing the need for traversing the perf.data file
   to collect build-ids at the end of 'perf record' sessions and helping with long running
   sessions where binaries can get replaced in updates, leading to possible mis-resolution
   of symbols.
 
 - Support filtering by hex address in 'perf script'.
 
 - Support DSO filter in 'perf script', like in other perf tools.
 
 - Add namespaces support to 'perf inject'
 
 - Add support for SDT (Dtrace Style Markers) events on ARM64.
 
 perf record:
 
 - Fix handling of eventfd() when draining a buffer in 'perf record'.
 
 - Improvements to the generation of metadata events for pre-existing threads (mmaps, comm, etc),
   speeding up the work done at the start of system wide or per CPU 'perf record' sessions.
 
 Hardware tracing:
 
 - Initial support for tracing KVM with Intel PT.
 
 - Intel PT fixes for IPC
 
 - Support Intel PT PSB (synchronization packets) events.
 
 - Automatically group aux-output events to overcome --filter syntax.
 
 - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.
 
 - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.
 
 perf annotate TUI:
 
 - Fix handling of 'k' ("show line number") hotkey
 
 - Fix jump parsing for C++ code.
 
 perf probe:
 
 - Add protection to avoid endless loop.
 
 cgroups:
 
 - Avoid reading cgroup mountpoint multiple times, caching it.
 
 - Fix handling of cgroup v1/v2 in mixed hierarchy.
 
 Symbol resolving:
 
 - Add OCaml symbol demangling.
 
 - Further fixes for handling PE executables when using perf with Wine and .exe/.dll files.
 
 - Fix 'perf unwind' DSO handling.
 
 - Resolve symbols against debug file first, to deal with artifacts related to LTO.
 
 - Fix gap between kernel end and module start on powerpc.
 
 Reporting tools:
 
 - The DSO filter shouldn't show samples in unresolved maps.
 
 - Improve debuginfod support in various tools.
 
 build ids:
 
 - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test' entry for that case.
 
 perf test:
 
 - Support for PERF_SAMPLE_WEIGHT_STRUCT.
 
 - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.
 
 - Shell based tests for 'perf daemon's commands ('start', 'stop, 'reconfig', 'list', etc).
 
 - ARM cs-etm 'perf test' fixes.
 
 - Add parse-metric memory bandwidth testcase.
 
 Compiler related:
 
 - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used with -fpatchable-function-entry.
 
 - Fix ARM64 build with gcc 11's -Wformat-overflow.
 
 - Fix unaligned access in sample parsing test.
 
 - Fix printf conversion specifier for IP addresses on arm64, s390 and powerpc.
 
 Arch specific:
 
 - Support exposing Performance Monitor Counter SPRs as part of extended regs on powerpc.
 
 - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn DDR, fix imx8mm ones.
 
 - Fix common and uarch events for ARM64's A76 and Ampere eMag
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYDANTQAKCRCyPKLppCJ+
 J4veAQCISY1BPHscUTRYhq9cwU/Zs0ImtX7zDT4jxaP39JkduAD/eSqYavAJrtQh
 HDyEiTgZ7CQSp5eCbXkzrnet4n3G9QE=
 =H/Jk
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Support instruction latency in 'perf report', with both memory
     latency (weight) and instruction latency information, users can
     locate expensive load instructions and understand time spent in
     different stages.

   - Extend 'perf c2c' to display the number of loads which were blocked
     by data or address conflict.

   - Add 'perf stat' support for L2 topdown events in systems such as
     Intel's Sapphire rapids server.

   - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a
     sort key, for instance:

        perf report --stdio --sort=comm,symbol,code_page_size

   - New 'perf daemon' command to run long running sessions while
     providing a way to control the enablement of events without
     restarting a traditional 'perf record' session.

   - Enable counting events for BPF programs in 'perf stat' just like
     for other targets (tid, cgroup, cpu, etc), e.g.:

        # perf stat -e ref-cycles,cycles -b 254 -I 1000
           1.487903822            115,200      ref-cycles
           1.487903822             86,012      cycles
           2.489147029             80,560      ref-cycles
           2.489147029             73,784      cycles
        ^C

     The example above counts 'cycles' and 'ref-cycles' of BPF program
     of id 254. It is similar to bpftool-prog-profile command, but more
     flexible.

   - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO
     build-id using infrastructure generalised from the eBPF subsystem,
     removing the need for traversing the perf.data file to collect
     build-ids at the end of 'perf record' sessions and helping with
     long running sessions where binaries can get replaced in updates,
     leading to possible mis-resolution of symbols.

   - Support filtering by hex address in 'perf script'.

   - Support DSO filter in 'perf script', like in other perf tools.

   - Add namespaces support to 'perf inject'

   - Add support for SDT (Dtrace Style Markers) events on ARM64.

  perf record:

   - Fix handling of eventfd() when draining a buffer in 'perf record'.

   - Improvements to the generation of metadata events for pre-existing
     threads (mmaps, comm, etc), speeding up the work done at the start
     of system wide or per CPU 'perf record' sessions.

  Hardware tracing:

   - Initial support for tracing KVM with Intel PT.

   - Intel PT fixes for IPC

   - Support Intel PT PSB (synchronization packets) events.

   - Automatically group aux-output events to overcome --filter syntax.

   - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.

   - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.

  perf annotate TUI:

   - Fix handling of 'k' ("show line number") hotkey

   - Fix jump parsing for C++ code.

  perf probe:

   - Add protection to avoid endless loop.

  cgroups:

   - Avoid reading cgroup mountpoint multiple times, caching it.

   - Fix handling of cgroup v1/v2 in mixed hierarchy.

  Symbol resolving:

   - Add OCaml symbol demangling.

   - Further fixes for handling PE executables when using perf with Wine
     and .exe/.dll files.

   - Fix 'perf unwind' DSO handling.

   - Resolve symbols against debug file first, to deal with artifacts
     related to LTO.

   - Fix gap between kernel end and module start on powerpc.

  Reporting tools:

   - The DSO filter shouldn't show samples in unresolved maps.

   - Improve debuginfod support in various tools.

  build ids:

   - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test'
     entry for that case.

  perf test:

   - Support for PERF_SAMPLE_WEIGHT_STRUCT.

   - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.

   - Shell based tests for 'perf daemon's commands ('start', 'stop,
     'reconfig', 'list', etc).

   - ARM cs-etm 'perf test' fixes.

   - Add parse-metric memory bandwidth testcase.

  Compiler related:

   - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used
     with -fpatchable-function-entry.

   - Fix ARM64 build with gcc 11's -Wformat-overflow.

   - Fix unaligned access in sample parsing test.

   - Fix printf conversion specifier for IP addresses on arm64, s390 and
     powerpc.

  Arch specific:

   - Support exposing Performance Monitor Counter SPRs as part of
     extended regs on powerpc.

   - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn
     DDR, fix imx8mm ones.

   - Fix common and uarch events for ARM64's A76 and Ampere eMag"

* tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits)
  perf buildid-cache: Don't skip 16-byte build-ids
  perf buildid-cache: Add test for 16-byte build-id
  perf symbol: Remove redundant libbfd checks
  perf test: Output the sub testing result in cs-etm
  perf test: Suppress logs in cs-etm testing
  perf tools: Fix arm64 build error with gcc-11
  perf intel-pt: Add documentation for tracing virtual machines
  perf intel-pt: Split VM-Entry and VM-Exit branches
  perf intel-pt: Adjust sample flags for VM-Exit
  perf intel-pt: Allow for a guest kernel address filter
  perf intel-pt: Support decoding of guest kernel
  perf machine: Factor out machine__idle_thread()
  perf machine: Factor out machines__find_guest()
  perf intel-pt: Amend decoder to track the NR flag
  perf intel-pt: Retain the last PIP packet payload as is
  perf intel_pt: Add vmlaunch and vmresume as branches
  perf script: Add branch types for VM-Entry and VM-Exit
  perf auxtrace: Automatically group aux-output events
  perf test: Fix unaligned access in sample parsing test
  perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing
  ...
2021-02-22 13:59:43 -08:00
Namhyung Kim 48859e5293 tools api fs: Cache cgroupfs mount point
Currently it parses the /proc file everytime it opens a file in the
cgroupfs.  Save the last result to avoid it (assuming it won't be
changed between the accesses).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201216090556.813996-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-17 15:09:08 -03:00
Namhyung Kim 6fd99b7f62 tools api fs: Diet cgroupfs_find_mountpoint()
Reduce the number of buffers and hopefully make it more efficient. :)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201216090556.813996-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-17 15:08:32 -03:00
Namhyung Kim 27ab1c1c06 tools api fs: Prefer cgroup v1 path in cgroupfs_find_mountpoint()
The cgroupfs_find_mountpoint() looks up the /proc/mounts file to find
a directory for the given cgroup subsystem.  It keeps both cgroup v1
and v2 path since there's a possibility of the mixed hierarchly.

But we can simply use v1 path if it's found as it will override the v2
hierarchy.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201216090556.813996-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-17 15:06:22 -03:00
David S. Miller b8af417e4d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706e ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f81 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &zc, &tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &zc, &len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16 13:14:06 -08:00
Martin KaFai Lau d2836dddc9 libbpf: Ignore non function pointer member in struct_ops
When libbpf initializes the kernel's struct_ops in
"bpf_map__init_kern_struct_ops()", it enforces all
pointer types must be a function pointer and rejects
others.  It turns out to be too strict.  For example,
when directly using "struct tcp_congestion_ops" from vmlinux.h,
it has a "struct module *owner" member and it is set to NULL
in a bpf_tcp_cc.o.

Instead, it only needs to ensure the member is a function
pointer if it has been set (relocated) to a bpf-prog.
This patch moves the "btf_is_func_proto(kern_mtype)" check
after the existing "if (!prog) { continue; }".  The original debug
message in "if (!prog) { continue; }" is also removed since it is
no longer valid.  Beside, there is a later debug message to tell
which function pointer is set.

The "btf_is_func_proto(mtype)" has already been guaranteed
in "bpf_object__collect_st_ops_relos()" which has been run
before "bpf_map__init_kern_struct_ops()".  Thus, this check
is removed.

v2:
- Remove outdated debug message (Andrii)
  Remove because there is a later debug message to tell
  which function pointer is set.
- Following mtype->type is no longer needed. Remove:
  "skip_mods_and_typedefs(btf, mtype->type, &mtype_id)"
- Do "if (!prog)" test before skip_mods_and_typedefs.

Fixes: 590a008882 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
2021-02-12 11:49:36 -08:00
Stanislav Fomichev 1e0aa3fb05 libbpf: Use AF_LOCAL instead of AF_INET in xsk.c
We have the environments where usage of AF_INET is prohibited
(cgroup/sock_create returns EPERM for AF_INET). Let's use
AF_LOCAL instead of AF_INET, it should perfectly work with SIOCETHTOOL.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210209221826.922940-1-sdf@google.com
2021-02-12 11:36:48 -08:00
Andrii Nakryiko 5f10c1aac8 libbpf: Stop using feature-detection Makefiles
Libbpf's Makefile relies on Linux tools infrastructure's feature detection
framework, but libbpf's needs are very modest: it detects the presence of
libelf and libz, both of which are mandatory. So it doesn't benefit much from
the framework, but pays significant costs in terms of maintainability and
debugging experience, when something goes wrong. The other feature detector,
testing for the presernce of minimal BPF API in system headers is long
obsolete as well, providing no value.

So stop using feature detection and just assume the presence of libelf and
libz during build time. Worst case, user will get a clear and actionable
linker error, e.g.:

  /usr/bin/ld: cannot find -lelf

On the other hand, we completely bypass recurring issues various users
reported over time with false negatives of feature detection (libelf or libz
not being detected, while they are actually present in the system).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20210203203445.3356114-1-andrii@kernel.org
2021-02-04 01:22:00 +01:00
Jakub Kicinski c358f95205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/dev.c
  b552766c87 ("can: dev: prevent potential information leak in can_fill_info()")
  3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")
  0a042c6ec9 ("can: dev: move netlink related code into seperate file")

  Code move.

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  57ac4a31c4 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
  214baf2287 ("net/mlx5e: Support HTB offload")

  Adjacent code changes

net/switchdev/switchdev.c
  20776b465c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
  ffb68fc58e ("net: switchdev: remove the transaction structure from port object notifiers")
  bae33f2b5a ("net: switchdev: remove the transaction structure from port attributes")

  Transaction parameter gets dropped otherwise keep the fix.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 17:09:31 -08:00
Arnaldo Carvalho de Melo 70f0ba9f24 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-27 16:48:04 -03:00
Björn Töpel 78ed404591 libbpf, xsk: Select AF_XDP BPF program based on kernel version
Add detection for kernel version, and adapt the BPF program based on
kernel support. This way, users will get the best possible performance
from the BPF program.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Marek Majtyka  <alardam@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210122105351.11751-4-bjorn.topel@gmail.com
2021-01-25 23:57:59 +01:00
Jiri Olsa 6095d5a271 libbpf: Use string table index from index table if needed
For very large ELF objects (with many sections), we could
get special value SHN_XINDEX (65535) for elf object's string
table index - e_shstrndx.

Call elf_getshdrstrndx to get the proper string table index,
instead of reading it directly from ELF header.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org
2021-01-21 15:38:01 -08:00
Adrian Hunter fc705fecf3 perf evlist: Fix id index for heterogeneous systems
perf_evlist__set_sid_idx() updates perf_sample_id with the evlist map
index, CPU number and TID. It is passed indexes to the evsel's cpu and
thread maps, but references the evlist's maps instead. That results in
using incorrect CPU numbers on heterogeneous systems. Fix it by using
evsel maps.

The id index (PERF_RECORD_ID_INDEX) is used by AUX area tracing when in
sampling mode. Having an incorrect CPU number causes the trace data to
be attributed to the wrong CPU, and can result in decoder errors because
the trace data is then associated with the wrong process.

Committer notes:

Keep the class prefix convention in the function name, switching from
perf_evlist__set_sid_idx() to perf_evsel__set_sid_idx().

Fixes: 3c659eedad ("perf tools: Add id index")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210121125446.11287-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21 17:25:33 -03:00
Jakub Kicinski 0fe2f273ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/can/dev.c
  commit 03f16c5075 ("can: dev: can_restart: fix use after free bug")
  commit 3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")

  Code move.

drivers/net/dsa/b53/b53_common.c
 commit 8e4052c32d ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
 commit b7a9e0da2d ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")

 Field rename.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20 12:16:11 -08:00
Linus Torvalds 75439bc439 Networking fixes for 5.11-rc5, including fixes from bpf, wireless,
and can trees.
 
 Current release - regressions:
 
  - nfc: nci: fix the wrong NCI_CORE_INIT parameters
 
 Current release - new code bugs:
 
  - bpf: allow empty module BTFs
 
 Previous releases - regressions:
 
  - bpf: fix signed_{sub,add32}_overflows type handling
 
  - tcp: do not mess with cloned skbs in tcp_add_backlog()
 
  - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach
 
  - bpf: don't leak memory in bpf getsockopt when optlen == 0
 
  - tcp: fix potential use-after-free due to double kfree()
 
  - mac80211: fix encryption issues with WEP
 
  - devlink: use right genl user_ptr when handling port param get/set
 
  - ipv6: set multicast flag on the multicast route
 
  - tcp: fix TCP_USER_TIMEOUT with zero window
 
 Previous releases - always broken:
 
  - bpf: local storage helpers should check nullness of owner ptr passed
 
  - mac80211: fix incorrect strlen of .write in debugfs
 
  - cls_flower: call nla_ok() before nla_next()
 
  - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmAIa+UACgkQMUZtbf5S
 IruZTQ/+O263ZyI0C5S1uCbHPCsAyjZyxECWDNfQ3tRzTfvldoRRP4YbC1ekSoXu
 8Y9GKDDLMI2pYkNlCqfMhrFaop8sudosntOZDSeRm/2TkkQFnkM/bxAlz++7Rnwx
 vHu1Xo2t2bKJxooSw8gLJ5iZNTbkw/M5iA3qR9kP+BG1yDP7By4P/Y4ziFphffad
 gPlfLQaU8nRVuDBYYrGIX0GoMg05IH1zt2/MxvN4ReXuex/9tq2TrU8jxHiwT2ja
 K1DHR+g2VVZf55TWrL9Yw8V5Rr+F7bxf6i+yer9hWWhENXgoTv6QkndAnTFOcoat
 VQh44GzoNoL1dAHD8kyUOOxJCyjItJJe58Evcwjnls4o+5BC2aDNQADwrSyz3sHe
 l9iNMSMEylymu7Xu+cJw2kjOq/BK6TdjaGSxwm1M2ErPehf36eJuc4FkaJz3RO55
 nkYMfm0+5rYWSsR5CTTJp8r2urCAT4SSx1iLoZknUXE6qa5AcMSNhIjGbw6pUp4q
 RDBtAKqiV0l37vdUag4Z+QgjPA0cH9E4aMQKYmD9dop20Zuzp4ug38qR32aEFC6q
 Qfb0VBMKgwu6OWjuWARbwYktVQNcoelKiGnsGnORJ5S9cyc1N4HeKEnb5Hw8ky5q
 4FBpNMfx3Ief14iNkh65KrzA+uyZBjqEG+joTSzn+9R7Lof60QA=
 =KyY7
 -----END PGP SIGNATURE-----

Merge tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
  can trees.

  Current release - regressions:

   - nfc: nci: fix the wrong NCI_CORE_INIT parameters

  Current release - new code bugs:

   - bpf: allow empty module BTFs

  Previous releases - regressions:

   - bpf: fix signed_{sub,add32}_overflows type handling

   - tcp: do not mess with cloned skbs in tcp_add_backlog()

   - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach

   - bpf: don't leak memory in bpf getsockopt when optlen == 0

   - tcp: fix potential use-after-free due to double kfree()

   - mac80211: fix encryption issues with WEP

   - devlink: use right genl user_ptr when handling port param get/set

   - ipv6: set multicast flag on the multicast route

   - tcp: fix TCP_USER_TIMEOUT with zero window

  Previous releases - always broken:

   - bpf: local storage helpers should check nullness of owner ptr passed

   - mac80211: fix incorrect strlen of .write in debugfs

   - cls_flower: call nla_ok() before nla_next()

   - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"

* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net: systemport: free dev before on error path
  net: usb: cdc_ncm: don't spew notifications
  net: mscc: ocelot: Fix multicast to the CPU port
  tcp: Fix potential use-after-free due to double kfree()
  bpf: Fix signed_{sub,add32}_overflows type handling
  can: peak_usb: fix use after free bugs
  can: vxcan: vxcan_xmit: fix use after free bug
  can: dev: can_restart: fix use after free bug
  tcp: fix TCP socket rehash stats mis-accounting
  net: dsa: b53: fix an off by one in checking "vlan->vid"
  tcp: do not mess with cloned skbs in tcp_add_backlog()
  selftests: net: fib_tests: remove duplicate log test
  net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
  sh_eth: Fix power down vs. is_opened flag ordering
  net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
  netfilter: rpfilter: mask ecn bits before fib lookup
  udp: mask TOS bits in udp_v4_early_demux()
  xsk: Clear pool even for inactive queues
  bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
  sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
  ...
2021-01-20 11:52:21 -08:00
Arnaldo Carvalho de Melo cd07e536b0 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:35:31 -03:00
Ian Rogers 66dd86b2a2 libperf tests: Fail when failing to get a tracepoint id
Permissions are necessary to get a tracepoint id. Fail the test when the
read fails.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:27 -03:00
Ian Rogers bba2ea17ef libperf tests: If a test fails return non-zero
If a test fails return -1 rather than 0. This is consistent with the
return value in test-cpumap.c

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:27 -03:00
Ian Rogers be82fddca8 libperf tests: Avoid uninitialized variable warning
The variable 'bf' is read (for a write call) without being initialized
triggering a memory sanitizer warning. Use 'bf' in the read and switch
the write to reading from a string.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114212304.4018119-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:27 -03:00
Ian Rogers ce5a518e9d bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
Add inline to __always_inline making it match the linux/compiler.h.
Adding this avoids an unused function warning on bpf_tail_call_static
when compining with -Wall.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210113223609.3358812-1-irogers@google.com
2021-01-13 19:11:14 -08:00
Andrii Nakryiko 284d2587ea libbpf: Support kernel module ksym externs
Add support for searching for ksym externs not just in vmlinux BTF, but across
all module BTFs, similarly to how it's done for CO-RE relocations. Kernels
that expose module BTFs through sysfs are assumed to support new ldimm64
instruction extension with BTF FD provided in insn[1].imm field, so no extra
feature detection is performed.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-7-andrii@kernel.org
2021-01-12 17:24:30 -08:00
Andrii Nakryiko b8d52264df libbpf: Allow loading empty BTFs
Empty BTFs do come up (e.g., simple kernel modules with no new types and
strings, compared to the vmlinux BTF) and there is nothing technically wrong
with them. So remove unnecessary check preventing loading empty BTFs.

Fixes: d812362450 ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-2-andrii@kernel.org
2021-01-12 21:12:05 +01:00
Andrii Nakryiko e22d7f05e4 libbpf: Clarify kernel type use with USER variants of CORE reading macros
Add comments clarifying that USER variants of CO-RE reading macro are still
only going to work with kernel types, defined in kernel or kernel module BTF.
This should help preventing invalid use of those macro to read user-defined
types (which doesn't work with CO-RE).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210108194408.3468860-1-andrii@kernel.org
2021-01-08 14:06:03 -08:00
Andrii Nakryiko a4b09a9ef9 libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family
BPF_CORE_READ(), in addition to handling CO-RE relocations, also allows much
nicer way to read data structures with nested pointers. Instead of writing
a sequence of bpf_probe_read() calls to follow links, one can just write
BPF_CORE_READ(a, b, c, d) to effectively do a->b->c->d read. This is a welcome
ability when porting BCC code, which (in most cases) allows exactly the
intuitive a->b->c->d variant.

This patch adds non-CO-RE variants of BPF_CORE_READ() family of macros for
cases where CO-RE is not supported (e.g., old kernels). In such cases, the
property of shortening a sequence of bpf_probe_read()s to a simple
BPF_PROBE_READ(a, b, c, d) invocation is still desirable, especially when
porting BCC code to libbpf. Yet, no CO-RE relocation is going to be emitted.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201218235614.2284956-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08 13:39:24 -08:00
Andrii Nakryiko 792001f4f7 libbpf: Add user-space variants of BPF_CORE_READ() family of macros
Add BPF_CORE_READ_USER(), BPF_CORE_READ_USER_STR() and their _INTO()
variations to allow reading CO-RE-relocatable kernel data structures from the
user-space. One of such cases is reading input arguments of syscalls, while
reaping the benefits of CO-RE relocations w.r.t. handling 32/64 bit
conversions and handling missing/new fields in UAPI data structs.

Suggested-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201218235614.2284956-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08 13:39:24 -08:00
Jiri Olsa 1ca6e80254 perf tools: Store build id when available in PERF_RECORD_MMAP2 metadata events
When processing a PERF_RECORD_MMAP2 metadata event, check on the build
id misc bit: PERF_RECORD_MISC_MMAP_BUILD_ID and if it is set, store the
build id in mmap's dso object.

Also adding the build id data to struct perf_record_mmap2 event
definition.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:01:55 -03:00
Arnaldo Carvalho de Melo 281a94b0f2 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and check what UAPI headers need to be synched.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:37:24 -03:00
Jakub Kicinski a6b5e026e6 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-12-14

1) Expose bpf_sk_storage_*() helpers to iterator programs, from Florent Revest.

2) Add AF_XDP selftests based on veth devs to BPF selftests, from Weqaar Janjua.

3) Support for finding BTF based kernel attach targets through libbpf's
   bpf_program__set_attach_target() API, from Andrii Nakryiko.

4) Permit pointers on stack for helper calls in the verifier, from Yonghong Song.

5) Fix overflows in hash map elem size after rlimit removal, from Eric Dumazet.

6) Get rid of direct invocation of llc in BPF selftests, from Andrew Delgadillo.

7) Fix xsk_recvmsg() to reorder socket state check before access, from Björn Töpel.

8) Add new libbpf API helper to retrieve ring buffer epoll fd, from Brendan Jackman.

9) Batch of minor BPF selftest improvements all over the place, from Florian Lehner,
   KP Singh, Jiri Olsa and various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits)
  selftests/bpf: Add a test for ptr_to_map_value on stack for helper access
  bpf: Permits pointers on stack for helper calls
  libbpf: Expose libbpf ring_buffer epoll_fd
  selftests/bpf: Add set_attach_target() API selftest for module target
  libbpf: Support modules in bpf_program__set_attach_target() API
  selftests/bpf: Silence ima_setup.sh when not running in verbose mode.
  selftests/bpf: Drop the need for LLVM's llc
  selftests/bpf: fix bpf_testmod.ko recompilation logic
  samples/bpf: Fix possible hang in xdpsock with multiple threads
  selftests/bpf: Make selftest compilation work on clang 11
  selftests/bpf: Xsk selftests - adding xdpxceiver to .gitignore
  selftests/bpf: Drop tcp-{client,server}.py from Makefile
  selftests/bpf: Xsk selftests - Bi-directional Sockets - SKB, DRV
  selftests/bpf: Xsk selftests - Socket Teardown - SKB, DRV
  selftests/bpf: Xsk selftests - DRV POLL, NOPOLL
  selftests/bpf: Xsk selftests - SKB POLL, NOPOLL
  selftests/bpf: Xsk selftests framework
  bpf: Only provide bpf_sock_from_file with CONFIG_NET
  bpf: Return -ENOTSUPP when attaching to non-kernel BTF
  xsk: Validate socket state in xsk_recvmsg, prior touching socket members
  ...
====================

Link: https://lore.kernel.org/r/20201214214316.20642-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 15:34:36 -08:00
Brendan Jackman a4d2a7ad86 libbpf: Expose libbpf ring_buffer epoll_fd
This provides a convenient perf ringbuf -> libbpf ringbuf migration
path for users of external polling systems. It is analogous to
perf_buffer__epoll_fd.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201214113812.305274-1-jackmanb@google.com
2020-12-14 10:04:55 -08:00
Andrii Nakryiko fe62de310e libbpf: Support modules in bpf_program__set_attach_target() API
Support finding kernel targets in kernel modules when using
bpf_program__set_attach_target() API. This brings it up to par with what
libbpf supports when doing declarative SEC()-based target determination.

Some minor internal refactoring was needed to make sure vmlinux BTF can be
loaded before bpf_object's load phase.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201211215825.3646154-2-andrii@kernel.org
2020-12-14 16:39:42 +01:00
Jakub Kicinski 46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Jakub Kicinski a1dd1d8697 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-12-03

The main changes are:

1) Support BTF in kernel modules, from Andrii.

2) Introduce preferred busy-polling, from Björn.

3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh.

4) Memcg-based memory accounting for bpf objects, from Roman.

5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits)
  selftests/bpf: Fix invalid use of strncat in test_sockmap
  libbpf: Use memcpy instead of strncpy to please GCC
  selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module
  selftests/bpf: Add tp_btf CO-RE reloc test for modules
  libbpf: Support attachment of BPF tracing programs to kernel modules
  libbpf: Factor out low-level BPF program loading helper
  bpf: Allow to specify kernel module BTFs when attaching BPF programs
  bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier
  selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF
  selftests/bpf: Add support for marking sub-tests as skipped
  selftests/bpf: Add bpf_testmod kernel module for testing
  libbpf: Add kernel module BTF support for CO-RE relocations
  libbpf: Refactor CO-RE relocs to not assume a single BTF object
  libbpf: Add internal helper to load BTF data by FD
  bpf: Keep module's btf_data_size intact after load
  bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
  selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP
  bpf: Adds support for setting window clamp
  samples/bpf: Fix spelling mistake "recieving" -> "receiving"
  bpf: Fix cold build of test_progs-no_alu32
  ...
====================

Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04 07:48:12 -08:00
Andrii Nakryiko 3015b500ae libbpf: Use memcpy instead of strncpy to please GCC
Some versions of GCC are really nit-picky about strncpy() use. Use memcpy(),
as they are pretty much equivalent for the case of fixed length strings.

Fixes: e459f49b43 ("libbpf: Separate XDP program load with xsk socket creation")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203235440.2302137-1-andrii@kernel.org
2020-12-03 18:07:05 -08:00
Andrii Nakryiko 91abb4a6d7 libbpf: Support attachment of BPF tracing programs to kernel modules
Teach libbpf to search for BTF types in kernel modules for tracing BPF
programs. This allows attachment of raw_tp/fentry/fexit/fmod_ret/etc BPF
program types to tracepoints and functions in kernel modules.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-13-andrii@kernel.org
2020-12-03 17:38:21 -08:00
Andrii Nakryiko 6aef10a481 libbpf: Factor out low-level BPF program loading helper
Refactor low-level API for BPF program loading to not rely on public API
types. This allows painless extension without constant efforts to cleverly not
break backwards compatibility.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-12-andrii@kernel.org
2020-12-03 17:38:21 -08:00
Andrii Nakryiko 4f33a53d56 libbpf: Add kernel module BTF support for CO-RE relocations
Teach libbpf to search for candidate types for CO-RE relocations across kernel
modules BTFs, in addition to vmlinux BTF. If at least one candidate type is
found in vmlinux BTF, kernel module BTFs are not iterated. If vmlinux BTF has
no matching candidates, then find all kernel module BTFs and search for all
matching candidates across all of them.

Kernel's support for module BTFs are inferred from the support for BTF name
pointer in BPF UAPI.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-6-andrii@kernel.org
2020-12-03 17:38:20 -08:00
Andrii Nakryiko 0f7515ca7c libbpf: Refactor CO-RE relocs to not assume a single BTF object
Refactor CO-RE relocation candidate search to not expect a single BTF, rather
return all candidate types with their corresponding BTF objects. This will
allow to extend CO-RE relocations to accommodate kernel module BTFs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-5-andrii@kernel.org
2020-12-03 17:38:20 -08:00
Andrii Nakryiko a19f93cfaf libbpf: Add internal helper to load BTF data by FD
Add a btf_get_from_fd() helper, which constructs struct btf from in-kernel BTF
data by FD. This is used for loading module BTFs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201203204634.1325171-4-andrii@kernel.org
2020-12-03 17:38:20 -08:00
Stanislav Fomichev d6d418bd8f libbpf: Cap retries in sys_bpf_prog_load
I've seen a situation, where a process that's under pprof constantly
generates SIGPROF which prevents program loading indefinitely.
The right thing to do probably is to disable signals in the upper
layers while loading, but it still would be nice to get some error from
libbpf instead of an endless loop.

Let's add some small retry limit to the program loading:
try loading the program 5 (arbitrary) times and give up.

v2:
* 10 -> 5 retires (Andrii Nakryiko)

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201202231332.3923644-1-sdf@google.com
2020-12-03 12:01:18 -08:00
Toke Høiland-Jørgensen 9cf309c56f libbpf: Sanitise map names before pinning
When we added sanitising of map names before loading programs to libbpf, we
still allowed periods in the name. While the kernel will accept these for
the map names themselves, they are not allowed in file names when pinning
maps. This means that bpf_object__pin_maps() will fail if called on an
object that contains internal maps (such as sections .rodata).

Fix this by replacing periods with underscores when constructing map pin
paths. This only affects the paths generated by libbpf when
bpf_object__pin_maps() is called with a path argument. Any pin paths set
by bpf_map__set_pin_path() are unaffected, and it will still be up to the
caller to avoid invalid characters in those.

Fixes: 113e6b7e15 ("libbpf: Sanitise internal map names so they are not rejected by the kernel")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201203093306.107676-1-toke@redhat.com
2020-12-03 11:57:42 -08:00
Andrei Matei 80b2b5c3a7 libbpf: Fail early when loading programs with unspecified type
Before this patch, a program with unspecified type
(BPF_PROG_TYPE_UNSPEC) would be passed to the BPF syscall, only to have
the kernel reject it with an opaque invalid argument error. This patch
makes libbpf reject such programs with a nicer error message - in
particular libbpf now tries to diagnose bad ELF section names at both
open time and load time.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201203043410.59699-1-andreimatei1@gmail.com
2020-12-03 11:37:05 -08:00
Mariusz Dudek e459f49b43 libbpf: Separate XDP program load with xsk socket creation
Add support for separation of eBPF program load and xsk socket
creation.

This is needed for use-case when you want to privide as little
privileges as possible to the data plane application that will
handle xsk socket creation and incoming traffic.

With this patch the data entity container can be run with only
CAP_NET_RAW capability to fulfill its purpose of creating xsk
socket and handling packages. In case your umem is larger or
equal process limit for MEMLOCK you need either increase the
limit or CAP_IPC_LOCK capability.

To resolve privileges issue two APIs are introduced:

- xsk_setup_xdp_prog - loads the built in XDP program. It can
also return xsks_map_fd which is needed by unprivileged process
to update xsks_map with AF_XDP socket "fd"

- xsk_socket__update_xskmap - inserts an AF_XDP socket into an xskmap
for a particular xsk_socket

Signed-off-by: Mariusz Dudek <mariuszx.dudek@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20201203090546.11976-2-mariuszx.dudek@intel.com
2020-12-03 10:37:59 -08:00
Andrii Nakryiko 0cfdcd6378 libbpf: Add base BTF accessor
Add ability to get base BTF. It can be also used to check if BTF is split BTF.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201202065244.530571-3-andrii@kernel.org
2020-12-03 10:17:26 -08:00
Andrii Nakryiko f6a8250ea1 libbpf: Fix ring_buffer__poll() to return number of consumed samples
Fix ring_buffer__poll() to return the number of non-discarded records
consumed, just like its documentation states. It's also consistent with
ring_buffer__consume() return. Fix up selftests with wrong expected results.

Fixes: bf99c936f9 ("libbpf: Add BPF ring buffer support")
Fixes: cb1c9ddd55 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org
2020-12-01 20:21:45 -08:00
Arnaldo Carvalho de Melo 1f195e557d Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 08:56:55 -03:00
Magnus Karlsson 105c4e75fe libbpf: Replace size_t with __u32 in xsk interfaces
Replace size_t with __u32 in the xsk interfaces that contain this.
There is no reason to have size_t since the internal variable that
is manipulated is a __u32. The following APIs are affected:

__u32 xsk_ring_prod__reserve(struct xsk_ring_prod *prod, __u32 nb, __u32 *idx)
void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb)
__u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx)
void xsk_ring_cons__cancel(struct xsk_ring_cons *cons, __u32 nb)
void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb)

The "nb" variable and the return values have been changed from size_t
to __u32.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1606383455-8243-1-git-send-email-magnus.karlsson@gmail.com
2020-11-27 21:58:38 +01:00
Jiri Olsa b3e453272d tools lib: Adopt memchr_inv() from kernel
We'll use it to check for undefined/zero data.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:34:52 -03:00
Li RongQing db13db9f67 libbpf: Add support for canceling cached_cons advance
Add a new function for returning descriptors the user received
after an xsk_ring_cons__peek call. After the application has
gotten a number of descriptors from a ring, it might not be able
to or want to process them all for various reasons. Therefore,
it would be useful to have an interface for returning or
cancelling a number of them so that they are returned to the ring.

This patch adds a new function called xsk_ring_cons__cancel that
performs this operation on nb descriptors counted from the end of
the batch of descriptors that was received through the peek call.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ Magnus Karlsson: rewrote changelog ]
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/1606202474-8119-1-git-send-email-lirongqing@baidu.com
2020-11-25 13:18:47 +01:00
Jakub Kicinski 56495a2442 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 19:08:46 -08:00
Jiri Olsa 1fd6cee127 libbpf: Fix VERSIONED_SYM_COUNT number parsing
We remove "other info" from "readelf -s --wide" output when
parsing GLOBAL_SYM_COUNT variable, which was added in [1].
But we don't do that for VERSIONED_SYM_COUNT and it's failing
the check_abi target on powerpc Fedora 33.

The extra "other info" wasn't problem for VERSIONED_SYM_COUNT
parsing until commit [2] added awk in the pipe, which assumes
that the last column is symbol, but it can be "other info".

Adding "other info" removal for VERSIONED_SYM_COUNT the same
way as we did for GLOBAL_SYM_COUNT parsing.

[1] aa915931ac ("libbpf: Fix readelf output parsing for Fedora")
[2] 746f534a48 ("tools/libbpf: Avoid counting local symbols in ABI check")

Fixes: 746f534a48 ("tools/libbpf: Avoid counting local symbols in ABI check")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201118211350.1493421-1-jolsa@kernel.org
2020-11-19 08:45:12 -08:00
Alan Maguire de91e631bd libbpf: bpf__find_by_name[_kind] should use btf__get_nr_types()
When operating on split BTF, btf__find_by_name[_kind] will not
iterate over all types since they use btf->nr_types to show
the number of types to iterate over. For split BTF this is
the number of types _on top of base BTF_, so it will
underestimate the number of types to iterate over, especially
for vmlinux + module BTF, where the latter is much smaller.

Use btf__get_nr_types() instead.

Fixes: ba451366bf ("libbpf: Implement basic split BTF support")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com
2020-11-16 20:51:34 -08:00
Jakub Kicinski 07cbce2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-11-14

1) Add BTF generation for kernel modules and extend BTF infra in kernel
   e.g. support for split BTF loading and validation, from Andrii Nakryiko.

2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
   on inlined branch conditions, from Alexei Starovoitov.

3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.

4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
   infra, from Martin KaFai Lau.

5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
   XDP_REDIRECT path, from Lorenzo Bianconi.

6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
   context, from Song Liu.

7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
   for different target archs on same source tree, from Jean-Philippe Brucker.

8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.

9) Move functionality from test_tcpbpf_user into the test_progs framework so it
   can run in BPF CI, from Alexander Duyck.

10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.

Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.

  [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
      https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
  net: mlx5: Add xdp tx return bulking support
  net: mvpp2: Add xdp tx return bulking support
  net: mvneta: Add xdp tx return bulking support
  net: page_pool: Add bulk support for ptr_ring
  net: xdp: Introduce bulking for xdp tx return path
  bpf: Expose bpf_d_path helper to sleepable LSM hooks
  bpf: Augment the set of sleepable LSM hooks
  bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Rename some functions in bpf_sk_storage
  bpf: Folding omem_charge() into sk_storage_charge()
  selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
  selftests/bpf: Add skb_pkt_end test
  bpf: Support for pointers beyond pkt_end.
  tools/bpf: Always run the *-clean recipes
  tools/bpf: Add bootstrap/ to .gitignore
  bpf: Fix NULL dereference in bpf_task_storage
  tools/bpftool: Fix build slowdown
  tools/runqslower: Build bpftool using HOSTCC
  tools/runqslower: Enable out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 09:13:41 -08:00
Andrii Nakryiko 197afc6314 libbpf: Don't attempt to load unused subprog as an entry-point BPF program
If BPF code contains unused BPF subprogram and there are no other subprogram
calls (which can realistically happen in real-world applications given
sufficiently smart Clang code optimizations), libbpf will erroneously assume
that subprograms are entry-point programs and will attempt to load them with
UNSPEC program type.

Fix by not relying on subcall instructions and rather detect it based on the
structure of BPF object's sections.

Fixes: 9a94f277c4 ("tools: libbpf: restore the ability to load programs from .text section")
Reported-by: Dmitrii Banshchikov <dbanschikov@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201107000251.256821-1-andrii@kernel.org
2020-11-09 22:15:23 +01:00
KP Singh 8885274d22 libbpf: Add support for task local storage
Updates the bpf_probe_map_type API to also support
BPF_MAP_TYPE_TASK_STORAGE similar to other local storage maps.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201106103747.2780972-4-kpsingh@chromium.org
2020-11-06 08:08:37 -08:00
Andrii Nakryiko 6b6e6b1d09 libbpf: Accomodate DWARF/compiler bug with duplicated identical arrays
In some cases compiler seems to generate distinct DWARF types for identical
arrays within the same CU. That seems like a bug, but it's already out there
and breaks type graph equivalence checks, so accommodate it anyway by checking
for identical arrays, regardless of their type ID.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-10-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko f86524efcf libbpf: Support BTF dedup of split BTFs
Add support for deduplication split BTFs. When deduplicating split BTF, base
BTF is considered to be immutable and can't be modified or adjusted. 99% of
BTF deduplication logic is left intact (module some type numbering adjustments).
There are only two differences.

First, each type in base BTF gets hashed (expect VAR and DATASEC, of course,
those are always considered to be self-canonical instances) and added into
a table of canonical table candidates. Hashing is a shallow, fast operation,
so mostly eliminates the overhead of having entire base BTF to be a part of
BTF dedup.

Second difference is very critical and subtle. While deduplicating split BTF
types, it is possible to discover that one of immutable base BTF BTF_KIND_FWD
types can and should be resolved to a full STRUCT/UNION type from the split
BTF part.  This is, obviously, can't happen because we can't modify the base
BTF types anymore. So because of that, any type in split BTF that directly or
indirectly references that newly-to-be-resolved FWD type can't be considered
to be equivalent to the corresponding canonical types in base BTF, because
that would result in a loss of type resolution information. So in such case,
split BTF types will be deduplicated separately and will cause some
duplication of type information, which is unavoidable.

With those two changes, the rest of the algorithm manages to deduplicate split
BTF correctly, pointing all the duplicates to their canonical counter-parts in
base BTF, but also is deduplicating whatever unique types are present in split
BTF on their own.

Also, theoretically, split BTF after deduplication could end up with either
empty type section or empty string section. This is handled by libbpf
correctly in one of previous patches in the series.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-9-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko d812362450 libbpf: Fix BTF data layout checks and allow empty BTF
Make data section layout checks stricter, disallowing overlap of types and
strings data.

Additionally, allow BTFs with no type data. There is nothing inherently wrong
with having BTF with no types (put potentially with some strings). This could
be a situation with kernel module BTFs, if module doesn't introduce any new
type information.

Also fix invalid offset alignment check for btf->hdr->type_off.

Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-8-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko ba451366bf libbpf: Implement basic split BTF support
Support split BTF operation, in which one BTF (base BTF) provides basic set of
types and strings, while another one (split BTF) builds on top of base's types
and strings and adds its own new types and strings. From API standpoint, the
fact that the split BTF is built on top of the base BTF is transparent.

Type numeration is transparent. If the base BTF had last type ID #N, then all
types in the split BTF start at type ID N+1. Any type in split BTF can
reference base BTF types, but not vice versa. Programmatically construction of
a split BTF on top of a base BTF is supported: one can create an empty split
BTF with btf__new_empty_split() and pass base BTF as an input, or pass raw
binary data to btf__new_split(), or use btf__parse_xxx_split() variants to get
initial set of split types/strings from the ELF file with .BTF section.

String offsets are similarly transparent and are a logical continuation of
base BTF's strings. When building BTF programmatically and adding a new string
(explicitly with btf__add_str() or implicitly through appending new
types/members), string-to-be-added would first be looked up from the base
BTF's string section and re-used if it's there. If not, it will be looked up
and/or added to the split BTF string section. Similarly to type IDs, types in
split BTF can refer to strings from base BTF absolutely transparently (but not
vice versa, of course, because base BTF doesn't "know" about existence of
split BTF).

Internal type index is slightly adjusted to be zero-indexed, ignoring a fake
[0] VOID type. This allows to handle split/base BTF type lookups transparently
by using btf->start_id type ID offset, which is always 1 for base/non-split
BTF and equals btf__get_nr_types(base_btf) + 1 for the split BTF.

BTF deduplication is not yet supported for split BTF and support for it will
be added in separate patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-5-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko 88a82c2a9a libbpf: Unify and speed up BTF string deduplication
Revamp BTF dedup's string deduplication to match the approach of writable BTF
string management. This allows to transfer deduplicated strings index back to
BTF object after deduplication without expensive extra memory copying and hash
map re-construction. It also simplifies the code and speeds it up, because
hashmap-based string deduplication is faster than sort + unique approach.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-4-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Andrii Nakryiko c81ed6d81e libbpf: Factor out common operations in BTF writing APIs
Factor out commiting of appended type data. Also extract fetching the very
last type in the BTF (to append members to). These two operations are common
across many APIs and will be easier to refactor with split BTF, if they are
extracted into a single place.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201105043402.2530976-2-andrii@kernel.org
2020-11-05 18:37:30 -08:00
Magnus Karlsson 25cf73b9ff libbpf: Fix possible use after free in xsk_socket__delete
Fix a possible use after free in xsk_socket__delete that will happen
if xsk_put_ctx() frees the ctx. To fix, save the umem reference taken
from the context and just use that instead.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1604396490-12129-3-git-send-email-magnus.karlsson@gmail.com
2020-11-04 21:37:29 +01:00
Magnus Karlsson f78331f74c libbpf: Fix null dereference in xsk_socket__delete
Fix a possible null pointer dereference in xsk_socket__delete that
will occur if a null pointer is fed into the function.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1604396490-12129-2-git-send-email-magnus.karlsson@gmail.com
2020-11-04 21:37:28 +01:00
Ian Rogers 7a078d2d18 libbpf, hashmap: Fix undefined behavior in hash_bits
If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.

Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.

Fixes: e3b9242240 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko <andriin@fb.com>,
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com
2020-11-02 23:33:51 +01:00
Linus Torvalds 3cb12d27ff Fixes for 5.10-rc1 from the networking tree:
Cross-tree/merge window issues:
 
  - rtl8150: don't incorrectly assign random MAC addresses; fix late
    in the 5.9 cycle started depending on a return code from
    a function which changed with the 5.10 PR from the usb subsystem
 
 Current release - regressions:
 
  - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
    crashes at probe when control vq was not negotiated/available
 
 Previous releases - regressions:
 
  - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
    bus, only first device would be probed correctly
 
  - nexthop: Fix performance regression in nexthop deletion by
    effectively switching from recently added synchronize_rcu()
    to synchronize_rcu_expedited()
 
  - netsec: ignore 'phy-mode' device property on ACPI systems;
    the property is not populated correctly by the firmware,
    but firmware configures the PHY so just keep boot settings
 
 Previous releases - always broken:
 
  - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
    bulk transfers getting "stuck"
 
  - icmp: randomize the global rate limiter to prevent attackers from
    getting useful signal
 
  - r8169: fix operation under forced interrupt threading, make the
    driver always use hard irqs, even on RT, given the handler is
    light and only wants to schedule napi (and do so through
    a _irqoff() variant, preferably)
 
  - bpf: Enforce pointer id generation for all may-be-null register
    type to avoid pointers erroneously getting marked as null-checked
 
  - tipc: re-configure queue limit for broadcast link
 
  - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
    tunnels
 
  - fix various issues in chelsio inline tls driver
 
 Misc:
 
  - bpf: improve just-added bpf_redirect_neigh() helper api to support
    supplying nexthop by the caller - in case BPF program has already
    done a lookup we can avoid doing another one
 
  - remove unnecessary break statements
 
  - make MCTCP not select IPV6, but rather depend on it
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+R+5UACgkQMUZtbf5S
 Irt9KxAAiYme2aSvMOni0NQsOgQ5mVsy7tk0/4dyRqkAx0ggrfGcFuhgZYNm8ZKY
 KoQsQyn30Wb/2wAp1vX2I4Fod67rFyBfQg/8iWiEAu47X7Bj1lpPPJexSPKhF9/X
 e0TuGxZtoaDuV9C3Su/FOjRmnShGSFQu1SCyJThshwaGsFL3YQ0Ut07VRgRF8x05
 A5fy2SVVIw0JOQgV1oH0GP5oEK3c50oGnaXt8emm56PxVIfAYY0oq69hQUzrfMFP
 zV9R0XbnbCIibT8R3lEghjtXavtQTzK5rYDKazTeOyDU87M+yuykNYj7MhgDwl9Q
 UdJkH2OpMlJylEH3asUjz/+ObMhXfOuj/ZS3INtO5omBJx7x76egDZPMQe4wlpcC
 NT5EZMS7kBdQL8xXDob7hXsvFpuEErSUGruYTHp4H52A9ke1dRTH2kQszcKk87V3
 s+aVVPtJ5bHzF3oGEvfwP0DFLTF6WvjD0Ts0LmTY2DhpE//tFWV37j60Ni5XU21X
 fCPooihQbLOsq9D8zc0ydEvCg2LLWMXM5ovCkqfIAJzbGVYhnxJSryZwpOlKDS0y
 LiUmLcTZDoNR/szx0aJhVHdUUVgXDX/GsllHoc1w7ZvDRMJn40K+xnaF3dSMwtIl
 imhfc5pPi6fdBgjB0cFYRPfhwiwlPMQ4YFsOq9JvynJzmt6P5FQ=
 =ceke
 -----END PGP SIGNATURE-----

Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Cross-tree/merge window issues:

   - rtl8150: don't incorrectly assign random MAC addresses; fix late in
     the 5.9 cycle started depending on a return code from a function
     which changed with the 5.10 PR from the usb subsystem

  Current release regressions:

   - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
     crashes at probe when control vq was not negotiated/available

  Previous release regressions:

   - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
     bus, only first device would be probed correctly

   - nexthop: Fix performance regression in nexthop deletion by
     effectively switching from recently added synchronize_rcu() to
     synchronize_rcu_expedited()

   - netsec: ignore 'phy-mode' device property on ACPI systems; the
     property is not populated correctly by the firmware, but firmware
     configures the PHY so just keep boot settings

  Previous releases - always broken:

   - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
     bulk transfers getting "stuck"

   - icmp: randomize the global rate limiter to prevent attackers from
     getting useful signal

   - r8169: fix operation under forced interrupt threading, make the
     driver always use hard irqs, even on RT, given the handler is light
     and only wants to schedule napi (and do so through a _irqoff()
     variant, preferably)

   - bpf: Enforce pointer id generation for all may-be-null register
     type to avoid pointers erroneously getting marked as null-checked

   - tipc: re-configure queue limit for broadcast link

   - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
     tunnels

   - fix various issues in chelsio inline tls driver

  Misc:

   - bpf: improve just-added bpf_redirect_neigh() helper api to support
     supplying nexthop by the caller - in case BPF program has already
     done a lookup we can avoid doing another one

   - remove unnecessary break statements

   - make MCTCP not select IPV6, but rather depend on it"

* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  tcp: fix to update snd_wl1 in bulk receiver fast path
  net: Properly typecast int values to set sk_max_pacing_rate
  netfilter: nf_fwd_netdev: clear timestamp in forwarding path
  ibmvnic: save changed mac address to adapter->mac_addr
  selftests: mptcp: depends on built-in IPv6
  Revert "virtio-net: ethtool configurable RXCSUM"
  rtnetlink: fix data overflow in rtnl_calcit()
  net: ethernet: mtk-star-emac: select REGMAP_MMIO
  net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
  net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
  bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
  bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
  bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
  mptcp: depends on IPV6 but not as a module
  sfc: move initialisation of efx->filter_sem to efx_init_struct()
  mpls: load mpls_gso after mpls_iptunnel
  net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
  net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
  net: dsa: bcm_sf2: make const array static, makes object smaller
  mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
  ...
2020-10-23 12:05:49 -07:00
Daniel Borkmann 3652c9a1b1 bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
Yaniv reported a compilation error after pulling latest libbpf:

  [...]
  ../libbpf/src/root/usr/include/bpf/bpf_helpers.h:99:10: error:
  unknown register name 'r0' in asm
                     : "r0", "r1", "r2", "r3", "r4", "r5");
  [...]

The issue got triggered given Yaniv was compiling tracing programs with native
target (e.g. x86) instead of BPF target, hence no BTF generated vmlinux.h nor
CO-RE used, and later llc with -march=bpf was invoked to compile from LLVM IR
to BPF object file. Given that clang was expecting x86 inline asm and not BPF
one the error complained that these regs don't exist on the former.

Guard bpf_tail_call_static() with defined(__bpf__) where BPF inline asm is valid
to use. BPF tracing programs on more modern kernels use BPF target anyway and
thus the bpf_tail_call_static() function will be available for them. BPF inline
asm is supported since clang 7 (clang <= 6 otherwise throws same above error),
and __bpf_unreachable() since clang 8, therefore include the latter condition
in order to prevent compilation errors for older clang versions. Given even an
old Ubuntu 18.04 LTS has official LLVM packages all the way up to llvm-10, I did
not bother to special case the __bpf_unreachable() inside bpf_tail_call_static()
further.

Also, undo the sockex3_kern's use of bpf_tail_call_static() sample given they
still have the old hacky way to even compile networking progs with native instead
of BPF target so bpf_tail_call_static() won't be defined there anymore.

Fixes: 0e9f6841f6 ("bpf, libbpf: Add bpf_tail_call_static helper for bpf programs")
Reported-by: Yaniv Agman <yanivagman@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Tested-by: Yaniv Agman <yanivagman@gmail.com>
Link: https://lore.kernel.org/bpf/CAMy7=ZUk08w5Gc2Z-EKi4JFtuUCaZYmE4yzhJjrExXpYKR4L8w@mail.gmail.com
Link: https://lore.kernel.org/bpf/20201021203257.26223-1-daniel@iogearbox.net
2020-10-22 01:46:52 +02:00
Linus Torvalds 9d9af1007b perf tools changes for v5.10: 1st batch
- cgroup improvements for 'perf stat', allowing for compact specification of events
   and cgroups in the command line.
 
 - Support per thread topdown metrics in 'perf stat'.
 
 - Support sample-read topdown metric group in 'perf record'
 
 - Show start of latency in addition to its start in 'perf sched latency'.
 
 - Add min, max to 'perf script' futex-contention output, in addition to avg.
 
 - Allow usage of 'perf_event_attr->exclusive' attribute via the new ':e' event
   modifier.
 
 - Add 'snapshot' command to 'perf record --control', using it with Intel PT.
 
 - Support FIFO file names as alternative options to 'perf record --control'.
 
 - Introduce branch history "streams", to compare 'perf record' runs with
   'perf diff' based on branch records and report hot streams.
 
 - Support PE executable symbol tables using libbfd, to profile, for instance, wine binaries.
 
 - Add filter support for option 'perf ftrace -F/--funcs'.
 
 - Allow configuring the 'disassembler_style' 'perf annotate' knob via 'perf config'
 
 - Update CascadelakeX and SkylakeX JSON vendor events files.
 
 - Add support for parsing perchip/percore JSON vendor events.
 
 - Add power9 hv_24x7 core level metric events.
 
 - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD zen1.
 
 - Enable Family 19h users by matching Zen2 AMD vendor events.
 
 - Use debuginfod in 'perf probe' when required debug files not found locally.
 
 - Display negative tid in non-sample events in 'perf script'.
 
 - Make GTK2 support opt-in
 
 - Add build test with GTK+
 
 - Add missing -lzstd to the fast path feature detection
 
 - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for use in 'perf trace'.
 
 - Show python test script in verbose mode.
 
 - Fix uncore metric expressions
 
 - Msan uninitialized use fixes.
 
 - Use condition variables in 'perf bench numa'
 
 - Autodetect python3 binary in systems without python2.
 
 - Support md5 build ids in addition to sha1.
 
 - Add build id 'perf test' regression test.
 
 - Fix printable strings in python3 scripts.
 
 - Fix off by ones in 'perf trace' in arches using libaudit.
 
 - Fix JSON event code for events referencing std arch events.
 
 - Introduce 'perf test' shell script for Arm CoreSight testing.
 
 - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
   event and in 'perf test tsc'.
 
 - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores", fixes
   and documentation update.
 
 - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and debuginfo files.
 
 - Do not print 'Metric Groups:' unnecessarily in 'perf list'
 
 - Refcounting fixes in the event parsing code.
 
 - Add expand cgroup event 'perf test' entry.
 
 - Fix out of bounds CPU map access when handling armv8_pmu events in 'perf stat'.
 
 - Add build-id injection 'perf bench' benchmark.
 
 - Enter namespace when reading build-id in 'perf inject'.
 
 - Do not load map/dso when injecting build-id speeding up the 'perf inject' process.
 
 - Add --buildid-all option to avoid processing all samples, just the mmap metadata events.
 
 - Add feature test to check if libbfd has buildid support
 
 - Add 'perf test' entry for PE binary format support.
 
 - Fix typos in power8 PMU vendor events JSON files.
 
 - Hide libtraceevent non API functions.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 
 Test results:
 
 The first ones are container based builds of tools/perf with and without libelf
 support.  Where clang is available, it is also used to build perf with/without
 libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang
 when clang and its devel libraries are installed.
 
 The objtool and samples/bpf/ builds are disabled now that I'm switching from
 using the sources in a local volume to fetching them from a http server to
 build it inside the container, to make it easier to build in a container cluster.
 Those will come back later.
 
 Several are cross builds, the ones with -x-ARCH and the android one, and those
 may not have all the features built, due to lack of multi-arch devel packages,
 available and being used so far on just a few, like
 debian:experimental-x-{arm64,mipsel}.
 
 The 'perf test' one will perform a variety of tests exercising
 tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
 with a variety of command line event specifications to then intercept the
 sys_perf_event syscall to check that the perf_event_attr fields are set up as
 expected, among a variety of other unit tests.
 
 Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
 with a variety of feature sets, exercising the build with an incomplete set of
 features as well as with a complete one. It is planned to have it run on each
 of the containers mentioned above, using some container orchestration
 infrastructure. Get in contact if interested in helping having this in place.
 
   $ grep "model name" -m1 /proc/cpuinfo
   model name: AMD Ryzen 9 3900X 12-Core Processor
   $ export PERF_TARBALL=http://192.168.122.1/perf/perf-5.9.0-rc7.tar.xz
   $ dm
   Thu 15 Oct 2020 01:10:56 PM -03
    1    67.40 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
    2    69.01 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
    3    70.79 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
    4    79.89 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
    5    80.88 alpine:3.8                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
    6    83.88 alpine:3.9                    : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
    7   107.87 alpine:3.10                   : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
    8   115.43 alpine:3.11                   : Ok   gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
    9   106.80 alpine:3.12                   : Ok   gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 7445adce501f8473efdb93b17b5eaf2f1445ed4c)
   10   114.06 alpine:edge                   : Ok   gcc (Alpine 10.2.0) 10.2.0, Alpine clang version 10.0.1
   11    70.42 alt:p8                        : Ok   x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final)
   12    98.70 alt:p9                        : Ok   x86_64-alt-linux-gcc (GCC) 8.4.1 20200305 (ALT p9 8.4.1-alt0.p9.1), clang version 10.0.0
   13    80.37 alt:sisyphus                  : Ok   x86_64-alt-linux-gcc (GCC) 9.3.1 20200518 (ALT Sisyphus 9.3.1-alt1), clang version 10.0.1
   14    64.12 amazonlinux:1                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
   15    97.64 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-9), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
   16    22.70 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   17    22.72 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   18    26.70 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
   19    31.86 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
   20   113.19 centos:8                      : Ok   gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), clang version 9.0.1 (Red Hat 9.0.1-2.module_el8.2.0+309+0c7b6b03)
   21    57.23 clearlinux:latest             : Ok   gcc (Clear Linux OS for Intel Architecture) 10.2.1 20200908 releases/gcc-10.2.0-203-g127d693955, clang version 10.0.1
   22    64.98 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
   23    76.08 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final)
   24    74.49 debian:10                     : Ok   gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
   25    78.50 debian:experimental           : Ok   gcc (Debian 10.2.0-15) 10.2.0, Debian clang version 11.0.0-2
   26    33.30 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 10.2.0-3) 10.2.0
   27    30.96 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 9.3.0-8) 9.3.0
   28    32.63 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0
   29    30.12 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
   30    30.99 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final)
   31    68.60 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final)
   32    78.92 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final)
   33    26.15 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
   34    80.13 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final)
   35    90.68 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final)
   36    90.45 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final)
   37   100.88 fedora:28                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final)
   38   105.99 fedora:29                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29)
   39   111.05 fedora:30                     : Ok   gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc30)
   40    29.96 fedora:30-x-ARC-glibc         : Ok   arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225
   41    27.02 fedora:30-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
   42   110.47 fedora:31                     : Ok   gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 9.0.1 (Fedora 9.0.1-2.fc31)
   43    88.78 fedora:32                     : Ok   gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), clang version 10.0.0 (Fedora 10.0.0-2.fc32)
   44    15.92 fedora:rawhide                : FAIL gcc (GCC) 10.2.1 20200916 (Red Hat 10.2.1-4), clang version 11.0.0 (Fedora 11.0.0-0.4.rc3.fc34)
   45    33.58 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 9.3.0-r1 p3) 9.3.0
   46    65.32 mageia:5                      : Ok   gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final)
   47    81.35 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final)
   48   103.94 mageia:7                      : Ok   gcc (Mageia 8.4.0-1.mga7) 8.4.0, clang version 8.0.0 (Mageia 8.0.0-1.mga7)
   49    91.62 manjaro:latest                : Ok   gcc (GCC) 10.2.0, clang version 10.0.1
   50   219.87 openmandriva:cooker           : Ok   gcc (GCC) 10.2.0 20200723 (OpenMandriva), OpenMandriva 11.0.0-0.20200909.1 clang version 11.0.0 (/builddir/build/BUILD/llvm-project-release-11.x/clang 5cb8ffbab42358a7cdb0a67acfadb84df0779579)
   51   111.76 opensuse:15.0                 : Ok   gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 5.0.1 (tags/RELEASE_501/final 312548)
   52   118.03 opensuse:15.1                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238)
   53   107.91 opensuse:15.2                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 9.0.1
   54   102.34 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c], clang version 10.0.1
   55    25.33 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
   56    30.45 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44.0.3)
   57   104.65 oraclelinux:8                 : Ok   gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5.0.3), clang version 9.0.1 (Red Hat 9.0.1-2.0.1.module+el8.2.0+5599+9ed9ef6d)
   58    26.04 ubuntu:12.04                  : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
   59    29.49 ubuntu:14.04                  : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
   60    72.95 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
   61    26.03 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   62    25.15 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   63    24.88 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   64    25.72 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   65    25.39 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   66    25.34 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   67    84.84 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
   68    27.15 ubuntu:18.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
   69    26.68 ubuntu:18.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
   70    22.38 ubuntu:18.04-x-m68k           : Ok   m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   71    26.35 ubuntu:18.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   72    28.58 ubuntu:18.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   73    28.18 ubuntu:18.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   74   178.55 ubuntu:18.04-x-riscv64        : Ok   riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   75    24.58 ubuntu:18.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   76    26.89 ubuntu:18.04-x-sh4            : Ok   sh4-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   77    24.81 ubuntu:18.04-x-sparc64        : Ok   sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   78    68.90 ubuntu:19.10                  : Ok   gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 8.0.1-3build1 (tags/RELEASE_801/final)
   79    69.31 ubuntu:20.04                  : Ok   gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, clang version 10.0.0-4ubuntu1
   80    30.00 ubuntu:20.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566]
   81    70.34 ubuntu:20.10                  : Ok   gcc (Ubuntu 10.2.0-5ubuntu2) 10.2.0, Ubuntu clang version 10.0.1-1
   $
 
   # uname -a
   Linux five 5.9.0+ #1 SMP Thu Oct 15 09:06:41 -03 2020 x86_64 x86_64 x86_64 GNU/Linux
   # git log --oneline -1
   744aec4df2 perf c2c: Update documentation for metrics reorganization
   # perf version --build-options
   perf version 5.9.rc7.g744aec4df2c5
                    dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
       dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
                    glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
            syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
                   libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
                   libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
                  libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
   numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
                  libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
                libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
                 libslang: [ on  ]  # HAVE_SLANG_SUPPORT
                libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
                libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
       libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
                     zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
                     lzma: [ on  ]  # HAVE_LZMA_SUPPORT
                get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
                      bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
                      aio: [ on  ]  # HAVE_AIO_SUPPORT
                     zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
   # perf test
    1: vmlinux symtab matches kallsyms                                 : Ok
    2: Detect openat syscall event                                     : Ok
    3: Detect openat syscall event on all cpus                         : Ok
    4: Read samples using the mmap interface                           : Ok
    5: Test data source output                                         : Ok
    6: Parse event definition strings                                  : Ok
    7: Simple expression parser                                        : Ok
    8: PERF_RECORD_* events & perf_sample fields                       : Ok
    9: Parse perf pmu format                                           : Ok
   10: PMU events                                                      :
   10.1: PMU event table sanity                                        : Ok
   10.2: PMU event map aliases                                         : Ok
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
   11: DSO data read                                                   : Ok
   12: DSO data cache                                                  : Ok
   13: DSO data reopen                                                 : Ok
   14: Roundtrip evsel->name                                           : Ok
   15: Parse sched tracepoints fields                                  : Ok
   16: syscalls:sys_enter_openat event fields                          : Ok
   17: Setup struct perf_event_attr                                    : Ok
   18: Match and link multiple hists                                   : Ok
   19: 'import perf' in python                                         : Ok
   20: Breakpoint overflow signal handler                              : Ok
   21: Breakpoint overflow sampling                                    : Ok
   22: Breakpoint accounting                                           : Ok
   23: Watchpoint                                                      :
   23.1: Read Only Watchpoint                                          : Skip
   23.2: Write Only Watchpoint                                         : Ok
   23.3: Read / Write Watchpoint                                       : Ok
   23.4: Modify Watchpoint                                             : Ok
   24: Number of exit events of a simple workload                      : Ok
   25: Software clock events period values                             : Ok
   26: Object code reading                                             : Ok
   27: Sample parsing                                                  : Ok
   28: Use a dummy software event to keep tracking                     : Ok
   29: Parse with no sample_id_all bit set                             : Ok
   30: Filter hist entries                                             : Ok
   31: Lookup mmap thread                                              : Ok
   32: Share thread maps                                               : Ok
   33: Sort output of hist entries                                     : Ok
   34: Cumulate child hist entries                                     : Ok
   35: Track with sched_switch                                         : Ok
   36: Filter fds with revents mask in a fdarray                       : Ok
   37: Add fd to a fdarray, making it autogrow                         : Ok
   38: kmod_path__parse                                                : Ok
   39: Thread map                                                      : Ok
   40: LLVM search and compile                                         :
   40.1: Basic BPF llvm compile                                        : Ok
   40.2: kbuild searching                                              : Ok
   40.3: Compile source for BPF prologue generation                    : Ok
   40.4: Compile source for BPF relocation                             : Ok
   41: Session topology                                                : Ok
   42: BPF filter                                                      :
   42.1: Basic BPF filtering                                           : Ok
   42.2: BPF pinning                                                   : Ok
   42.3: BPF prologue generation                                       : Ok
   42.4: BPF relocation checker                                        : Ok
   43: Synthesize thread map                                           : Ok
   44: Remove thread map                                               : Ok
   45: Synthesize cpu map                                              : Ok
   46: Synthesize stat config                                          : Ok
   47: Synthesize stat                                                 : Ok
   48: Synthesize stat round                                           : Ok
   49: Synthesize attr update                                          : Ok
   50: Event times                                                     : Ok
   51: Read backward ring buffer                                       : Ok
   52: Print cpu map                                                   : Ok
   53: Merge cpu map                                                   : Ok
   54: Probe SDT events                                                : Ok
   55: is_printable_array                                              : Ok
   56: Print bitmap                                                    : Ok
   57: perf hooks                                                      : Ok
   58: builtin clang support                                           : Skip (not compiled in)
   59: unit_number__scnprintf                                          : Ok
   60: mem2node                                                        : Ok
   61: time utils                                                      : Ok
   62: Test jit_write_elf                                              : Ok
   63: Test libpfm4 support                                            : Skip (not compiled in)
   64: Test api io                                                     : Ok
   65: maps__merge_in                                                  : Ok
   66: Demangle Java                                                   : Ok
   67: Parse and process metrics                                       : Ok
   68: PE file support                                                 : Ok
   69: Event expansion for cgroups                                     : Ok
   70: x86 rdpmc                                                       : Ok
   71: Convert perf time to TSC                                        : Ok
   72: DWARF unwind                                                    : Ok
   73: x86 instruction decoder - new instructions                      : Ok
   74: Intel PT packet decoder                                         : Ok
   75: x86 bp modify                                                   : Ok
   76: probe libc's inet_pton & backtrace it with ping                 : Ok
   77: Check Arm CoreSight trace data recording and synthesized samples: Skip
   78: Use vfs_getname probe to get syscall args filenames             : Ok
   79: Check open filename arg using perf trace + vfs_getname          : Ok
   80: Zstd perf.data compression/decompression                        : Ok
   81: Add vfs_getname probe to get syscall args filenames             : Ok
   82: build id cache operations                                       : Ok
   #
 
   $ git log --oneline -1
   744aec4df2 (HEAD -> perf/core, quaco/perf/core) perf c2c: Update documentation for metrics reorganization
   $ make -C tools/perf build-test
   make: Entering directory '/home/acme/git/perf/tools/perf'
   - tarpkg: ./tests/perf-targz-src-pkg .
             make_install_bin_O: make install-bin
                  make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1
   make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
                 make_no_newt_O: make NO_NEWT=1
            make_no_libbionic_O: make NO_LIBBIONIC=1
                  make_no_sdt_O: make NO_SDT=1
                   make_debug_O: make DEBUG=1
                  make_perf_o_O: make perf.o
               make_no_libbpf_O: make NO_LIBBPF=1
         make_no_libbpf_DEBUG_O: make NO_LIBBPF=1 DEBUG=1
               make_clean_all_O: make clean all
                    make_tags_O: make tags
         make_with_babeltrace_O: make LIBBABELTRACE=1
          make_with_clangllvm_O: make LIBCLANGLLVM=1
              make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
               make_no_libelf_O: make NO_LIBELF=1
            make_no_libcrypto_O: make NO_LIBCRYPTO=1
            make_with_libpfm4_O: make LIBPFM4=1
            make_no_libunwind_O: make NO_LIBUNWIND=1
              make_util_map_o_O: make util/map.o
                make_no_slang_O: make NO_SLANG=1
               make_with_gtk2_O: make GTK2=1
                   make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
        make_util_pmu_bison_o_O: make util/pmu-bison.o
            make_no_backtrace_O: make NO_BACKTRACE=1
             make_no_demangle_O: make NO_DEMANGLE=1
                    make_help_O: make help
                    make_pure_O: make
                 make_no_gtk2_O: make NO_GTK2=1
          make_install_prefix_O: make install prefix=/tmp/krava
              make_no_libnuma_O: make NO_LIBNUMA=1
            make_no_libpython_O: make NO_LIBPYTHON=1
    make_install_prefix_slash_O: make install prefix=/tmp/krava/
             make_no_libaudit_O: make NO_LIBAUDIT=1
             make_no_auxtrace_O: make NO_AUXTRACE=1
                 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 NO_SYSCALL_TABLE=1
                 make_install_O: make install
                     make_doc_O: make doc
              make_no_libperl_O: make NO_LIBPERL=1
          make_no_syscall_tbl_O: make NO_SYSCALL_TABLE=1
   OK
   make: Leaving directory '/home/acme/git/perf/tools/perf'
   $
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCX4iuzgAKCRCyPKLppCJ+
 J1khAP4iMQMFCMpNsBaL6KLtj3aTOhrooYuhbNL3kajqYVyW/QD8Dws35k6m2+tB
 tcOMJykFjPkQ4I13zsxKyugeJuUzSQw=
 =KdSj
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

 - cgroup improvements for 'perf stat', allowing for compact
   specification of events and cgroups in the command line.

 - Support per thread topdown metrics in 'perf stat'.

 - Support sample-read topdown metric group in 'perf record'

 - Show start of latency in addition to its start in 'perf sched
   latency'.

 - Add min, max to 'perf script' futex-contention output, in addition to
   avg.

 - Allow usage of 'perf_event_attr->exclusive' attribute via the new
   ':e' event modifier.

 - Add 'snapshot' command to 'perf record --control', using it with
   Intel PT.

 - Support FIFO file names as alternative options to 'perf record
   --control'.

 - Introduce branch history "streams", to compare 'perf record' runs
   with 'perf diff' based on branch records and report hot streams.

 - Support PE executable symbol tables using libbfd, to profile, for
   instance, wine binaries.

 - Add filter support for option 'perf ftrace -F/--funcs'.

 - Allow configuring the 'disassembler_style' 'perf annotate' knob via
   'perf config'

 - Update CascadelakeX and SkylakeX JSON vendor events files.

 - Add support for parsing perchip/percore JSON vendor events.

 - Add power9 hv_24x7 core level metric events.

 - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD
   zen1.

 - Enable Family 19h users by matching Zen2 AMD vendor events.

 - Use debuginfod in 'perf probe' when required debug files not found
   locally.

 - Display negative tid in non-sample events in 'perf script'.

 - Make GTK2 support opt-in

 - Add build test with GTK+

 - Add missing -lzstd to the fast path feature detection

 - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for
   use in 'perf trace'.

 - Show python test script in verbose mode.

 - Fix uncore metric expressions

 - Msan uninitialized use fixes.

 - Use condition variables in 'perf bench numa'

 - Autodetect python3 binary in systems without python2.

 - Support md5 build ids in addition to sha1.

 - Add build id 'perf test' regression test.

 - Fix printable strings in python3 scripts.

 - Fix off by ones in 'perf trace' in arches using libaudit.

 - Fix JSON event code for events referencing std arch events.

 - Introduce 'perf test' shell script for Arm CoreSight testing.

 - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
   event and in 'perf test tsc'.

 - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores",
   fixes and documentation update.

 - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and
   debuginfo files.

 - Do not print 'Metric Groups:' unnecessarily in 'perf list'

 - Refcounting fixes in the event parsing code.

 - Add expand cgroup event 'perf test' entry.

 - Fix out of bounds CPU map access when handling armv8_pmu events in
   'perf stat'.

 - Add build-id injection 'perf bench' benchmark.

 - Enter namespace when reading build-id in 'perf inject'.

 - Do not load map/dso when injecting build-id speeding up the 'perf
   inject' process.

 - Add --buildid-all option to avoid processing all samples, just the
   mmap metadata events.

 - Add feature test to check if libbfd has buildid support

 - Add 'perf test' entry for PE binary format support.

 - Fix typos in power8 PMU vendor events JSON files.

 - Hide libtraceevent non API functions.

* tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits)
  perf c2c: Update documentation for metrics reorganization
  perf c2c: Add metrics "RMT Load Hit"
  perf c2c: Correct LLC load hit metrics
  perf c2c: Change header for LLC local hit
  perf c2c: Use more explicit headers for HITM
  perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
  perf c2c: Organize metrics based on memory hierarchy
  perf c2c: Display "Total Stores" as a standalone metrics
  perf c2c: Display the total numbers continuously
  perf bench: Use condition variables in numa.
  perf jevents: Fix event code for events referencing std arch events
  perf diff: Support hot streams comparison
  perf streams: Report hot streams
  perf streams: Calculate the sum of total streams hits
  perf streams: Link stream pair
  perf streams: Compare two streams
  perf streams: Get the evsel_streams by evsel_idx
  perf streams: Introduce branch history "streams"
  perf intel-pt: Improve PT documentation slightly
  perf tools: Add support for exclusive groups/events
  ...
2020-10-17 11:47:46 -07:00
Linus Torvalds 9ff9b0d392 networking changes for the 5.10 merge window
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
 traversal in common container configs and improving TCP back-pressure.
 Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
 
 Expand netlink policy support and improve policy export to user space.
 (Ge)netlink core performs request validation according to declared
 policies. Expand the expressiveness of those policies (min/max length
 and bitmasks). Allow dumping policies for particular commands.
 This is used for feature discovery by user space (instead of kernel
 version parsing or trial and error).
 
 Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
 
 Allow more than 255 IPv4 multicast interfaces.
 
 Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
 packets of TCPv6.
 
 In Multi-patch TCP (MPTCP) support concurrent transmission of data
 on multiple subflows in a load balancing scenario. Enhance advertising
 addresses via the RM_ADDR/ADD_ADDR options.
 
 Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
 
 Allow more calls to same peer in RxRPC.
 
 Support two new Controller Area Network (CAN) protocols -
 CAN-FD and ISO 15765-2:2016.
 
 Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
 kernel problem.
 
 Add TC actions for implementing MPLS L2 VPNs.
 
 Improve nexthop code - e.g. handle various corner cases when nexthop
 objects are removed from groups better, skip unnecessary notifications
 and make it easier to offload nexthops into HW by converting
 to a blocking notifier.
 
 Support adding and consuming TCP header options by BPF programs,
 opening the doors for easy experimental and deployment-specific
 TCP option use.
 
 Reorganize TCP congestion control (CC) initialization to simplify life
 of TCP CC implemented in BPF.
 
 Add support for shipping BPF programs with the kernel and loading them
 early on boot via the User Mode Driver mechanism, hence reusing all the
 user space infra we have.
 
 Support sleepable BPF programs, initially targeting LSM and tracing.
 
 Add bpf_d_path() helper for returning full path for given 'struct path'.
 
 Make bpf_tail_call compatible with bpf-to-bpf calls.
 
 Allow BPF programs to call map_update_elem on sockmaps.
 
 Add BPF Type Format (BTF) support for type and enum discovery, as
 well as support for using BTF within the kernel itself (current use
 is for pretty printing structures).
 
 Support listing and getting information about bpf_links via the bpf
 syscall.
 
 Enhance kernel interfaces around NIC firmware update. Allow specifying
 overwrite mask to control if settings etc. are reset during update;
 report expected max time operation may take to users; support firmware
 activation without machine reboot incl. limits of how much impact
 reset may have (e.g. dropping link or not).
 
 Extend ethtool configuration interface to report IEEE-standard
 counters, to limit the need for per-vendor logic in user space.
 
 Adopt or extend devlink use for debug, monitoring, fw update
 in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
 mv88e6xxx, dpaa2-eth).
 
 In mlxsw expose critical and emergency SFP module temperature alarms.
 Refactor port buffer handling to make the defaults more suitable and
 support setting these values explicitly via the DCBNL interface.
 
 Add XDP support for Intel's igb driver.
 
 Support offloading TC flower classification and filtering rules to
 mscc_ocelot switches.
 
 Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
 fixed interval period pulse generator and one-step timestamping in
 dpaa-eth.
 
 Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
 offload.
 
 Add Lynx PHY/PCS MDIO module, and convert various drivers which have
 this HW to use it. Convert mvpp2 to split PCS.
 
 Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
 7-port Mediatek MT7531 IP.
 
 Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
 and wcn3680 support in wcn36xx.
 
 Improve performance for packets which don't require much offloads
 on recent Mellanox NICs by 20% by making multiple packets share
 a descriptor entry.
 
 Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
 subtree to drivers/net. Move MDIO drivers out of the phy directory.
 
 Clean up a lot of W=1 warnings, reportedly the actively developed
 subsections of networking drivers should now build W=1 warning free.
 
 Make sure drivers don't use in_interrupt() to dynamically adapt their
 code. Convert tasklets to use new tasklet_setup API (sadly this
 conversion is not yet complete).
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
 IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
 Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
 r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
 iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
 mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
 wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
 owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
 HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
 =bc1U
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
2020-10-15 18:42:13 -07:00
Linus Torvalds 9e51183e94 linux-kselftest-fixes-5.10-rc1
This kselftest fixes update consists of a selftests harness fix to
 flush stdout before forking to avoid parent and child printing
 duplicates messages. This is evident when test output is redirected
 to a file.
 
 The second fix is a tools/ wide change to avoid comma separated statements
 from Joe Perches. This fix spans tools/lib, tools/power/cpupower, and
 selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl+GAdIACgkQCwJExA0N
 QxyraBAAwsPISUpSA34WLuUwNCddI3ttNW2R63ZrdKSy7QlreM02zG9qyEDPwFil
 GLlgXfUE8QOI7rfiqSwr1elzS07bDdel6UcxTuhuy5KPs2+yieGGZ5lllsVY6gJu
 kF5m7setmdmHQr76HCyyGddwdpCTpz7sP3BJzmYn2iAWAQMwtZBXOEgmnf2yiskX
 SHF/f3Bvrnm+BtbzZEa+ysHpL72AlpKrGuLQAnNOCp/DomKEtRACTNxIzKFeO++r
 uelbHO/MzdaGmrCxy3J/RWz3llQVnj6aafZFaqAV7ReWi/OOYTsV48pAHRm8TTv8
 1LvVP48b7aCUc7QWu+d8SBSDfJQANI4tgcP0TI/hUboIuhU8bVkZAUF69txdFgzb
 DopwQVybQq5yEqmPg1RzvccbDojxXq72BvZyqPBo8WmKHWOQXCo9A1owudmcqtob
 WqTr1eAVAd6Rc2vcjkhnzYxQcb8A093ZfP1fyAZ5HQSH5No//4FP9pWBwMzpncKQ
 GdHmMNBns6v1muWMBj6bgT4GA1sN765Kzt1StYSC257v6gtA8+xHo/PUfojZJxy9
 bieAzuqE8n68IKKz4/Rk2JvfFBnaxDZyQUITOCrcoWJRk5apJc3T5+goq+Bep5Na
 SOFbb0JvrGLBjX3bChmLIYVa7zQkupBgwWU8NPM1tYxce+pBS30=
 =jgRu
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:

 - a selftests harness fix to flush stdout before forking to avoid
   parent and child printing duplicates messages. This is evident when
   test output is redirected to a file.

 - a tools/ wide change to avoid comma separated statements from Joe
   Perches. This fix spans tools/lib, tools/power/cpupower, and
   selftests.

* tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  tools: Avoid comma separated statements
  selftests/harness: Flush stdout before forking
2020-10-14 14:23:51 -07:00
Jiri Olsa b0a323c7f0 perf tools: Add size to 'struct perf_record_header_build_id'
We do not store size with build ids in perf data, but there's enough
space to do it. Adding misc bit PERF_RECORD_MISC_BUILD_ID_SIZE to mark
build id event with size.

With this fix the dso with md5 build id will have correct build id data
and will be usable for debuginfod processing if needed (coming in
following patches).

Committer notes:

Use %zu with size_t to fix this error on 32-bit arches:

  util/header.c: In function '__event_process_build_id':
  util/header.c:2105:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Werror=format=]
     pr_debug("build id event received for %s: %s [%lu]\n",
     ^

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 11:28:12 -03:00
Arnaldo Carvalho de Melo dbaa1b3d9a Merge branch 'perf/urgent' into perf/core
To pick fixes that missed v5.9.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-13 13:02:20 -03:00
Tzvetomir Stoyanov (VMware) a41c32105c tools lib traceevent: Hide non API functions
There are internal library functions, which are not declared as a static.
They are used inside the library from different files. Hide them from
the library users, as they are not part of the API.
These functions are made hidden and are renamed without the prefix "tep_":
 tep_free_plugin_paths
 tep_peek_char
 tep_buffer_init
 tep_get_input_buf_ptr
 tep_get_input_buf
 tep_read_token
 tep_free_token
 tep_free_event
 tep_free_format_field
 __tep_parse_format

Link: https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.camel@decadent.org.uk/
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200930110733.280534-1-tz.stoyanov@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-13 11:47:38 -03:00
Jakub Kicinski ccdf7fae3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-10-12

The main changes are:

1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.

2) libbpf relocation support for different size load/store, from Andrii.

3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.

4) BPF support for per-cpu variables, form Hao.

5) sockmap improvements, from John.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-12 16:16:50 -07:00
Andrii Nakryiko 2b7d88c2b5 libbpf: Allow specifying both ELF and raw BTF for CO-RE BTF override
Use generalized BTF parsing logic, making it possible to parse BTF both from
ELF file, as well as a raw BTF dump. This makes it easier to write custom
tests with manually generated BTFs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201008001025.292064-4-andrii@kernel.org
2020-10-07 18:50:27 -07:00
Andrii Nakryiko a66345bcbd libbpf: Support safe subset of load/store instruction resizing with CO-RE
Add support for patching instructions of the following form:
  - rX = *(T *)(rY + <off>);
  - *(T *)(rX + <off>) = rY;
  - *(T *)(rX + <off>) = <imm>, where T is one of {u8, u16, u32, u64}.

For such instructions, if the actual kernel field recorded in CO-RE relocation
has a different size than the one recorded locally (e.g., from vmlinux.h),
then libbpf will adjust T to an appropriate 1-, 2-, 4-, or 8-byte loads.

In general, such transformation is not always correct and could lead to
invalid final value being loaded or stored. But two classes of cases are
always safe:
  - if both local and target (kernel) types are unsigned integers, but of
  different sizes, then it's OK to adjust load/store instruction according to
  the necessary memory size. Zero-extending nature of such instructions and
  unsignedness make sure that the final value is always correct;
  - pointer size mismatch between BPF target architecture (which is always
  64-bit) and 32-bit host kernel architecture can be similarly resolved
  automatically, because pointer is essentially an unsigned integer. Loading
  32-bit pointer into 64-bit BPF register with zero extension will leave
  correct pointer in the register.

Both cases are necessary to support CO-RE on 32-bit kernels, as `unsigned
long` in vmlinux.h generated from 32-bit kernel is 32-bit, but when compiled
with BPF program for BPF target it will be treated by compiler as 64-bit
integer. Similarly, pointers in vmlinux.h are 32-bit for kernel, but treated
as 64-bit values by compiler for BPF target. Both problems are now resolved by
libbpf for direct memory reads.

But similar transformations are useful in general when kernel fields are
"resized" from, e.g., unsigned int to unsigned long (or vice versa).

Now, similar transformations for signed integers are not safe to perform as
they will result in incorrect sign extension of the value. If such situation
is detected, libbpf will emit helpful message and will poison the instruction.
Not failing immediately means that it's possible to guard the instruction
based on kernel version (or other conditions) and make sure it's not
reachable.

If there is a need to read signed integers that change sizes between different
kernels, it's possible to use BPF_CORE_READ_BITFIELD() macro, which works both
with bitfields and non-bitfield integers of any signedness and handles
sign-extension properly. Also, bpf_core_read() with proper size and/or use of
bpf_core_field_size() relocation could allow to deal with such complicated
situations explicitly, if not so conventiently as direct memory reads.

Selftests added in a separate patch in progs/test_core_autosize.c demonstrate
both direct memory and probed use cases.

BPF_CORE_READ() is not changed and it won't deal with such situations as
automatically as direct memory reads due to the signedness integer
limitations, which are much harder to detect and control with compiler macro
magic. So it's encouraged to utilize direct memory reads as much as possible.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201008001025.292064-3-andrii@kernel.org
2020-10-07 18:50:27 -07:00
Andrii Nakryiko 47f7cf6325 libbpf: Skip CO-RE relocations for not loaded BPF programs
Bypass CO-RE relocations step for BPF programs that are not going to be
loaded. This allows to have BPF programs compiled in and disabled dynamically
if kernel is not supposed to provide enough relocation information. In such
case, there won't be unnecessary warnings about failed relocations.

Fixes: d929758101 ("libbpf: Support disabling auto-loading BPF programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201008001025.292064-2-andrii@kernel.org
2020-10-07 18:50:27 -07:00
Magnus Karlsson 80348d8867 libbpf: Fix compatibility problem in xsk_socket__create
Fix a compatibility problem when the old XDP_SHARED_UMEM mode is used
together with the xsk_socket__create() call. In the old XDP_SHARED_UMEM
mode, only sharing of the same device and queue id was allowed, and
in this mode, the fill ring and completion ring were shared between
the AF_XDP sockets.

Therefore, it was perfectly fine to call the xsk_socket__create() API
for each socket and not use the new xsk_socket__create_shared() API.
This behavior was ruined by the commit introducing XDP_SHARED_UMEM
support between different devices and/or queue ids. This patch restores
the ability to use xsk_socket__create in these circumstances so that
backward compatibility is not broken.

Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1602070946-11154-1-git-send-email-magnus.karlsson@gmail.com
2020-10-07 22:28:43 +02:00
Namhyung Kim bef69bd7cf perf stat: Fix out of bounds CPU map access when handling armv8_pmu events
It was reported that 'perf stat' crashed when using with armv8_pmu (CPU)
events with the task mode.  As 'perf stat' uses an empty cpu map for
task mode but armv8_pmu has its own cpu mask, it has confused which map
it should use when accessing file descriptors and this causes segfaults:

  (gdb) bt
  #0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
      cpu=<optimized out>) at evsel.c:122
  #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
  #2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
  #3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
      argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
      at builtin-stat.c:929
  #4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
      argc=1) at builtin-stat.c:947
  #5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
  #6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
      argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
  #7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
      argv=argv@entry=0xfffffaea2f90) at perf.c:364
  #8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
      argv=<synthetic pointer>) at perf.c:408
  #9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538

To fix this, I simply used the given cpu map unless the evsel actually
is not a system-wide event (like uncore events).

Fixes: 7736627b86 ("perf stat: Use affinity for closing file descriptors")
Reported-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201007081311.1831003-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-07 09:57:58 -03:00
Luigi Rizzo 8cee9107e7 bpf, libbpf: Use valid btf in bpf_program__set_attach_target
bpf_program__set_attach_target(prog, fd, ...) will always fail when
fd = 0 (attach to a kernel symbol) because obj->btf_vmlinux is NULL
and there is no way to set it (at the moment btf_vmlinux is meant
to be temporary storage for use in bpf_object__load_xattr()).

Fix this by using libbpf_find_vmlinux_btf_id().

At some point we may want to opportunistically cache btf_vmlinux
so it can be reused with multiple programs.

Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Petar Penkov <ppenkov@google.com>
Link: https://lore.kernel.org/bpf/20201005224528.389097-1-lrizzo@google.com
2020-10-06 11:36:10 -07:00
Hangbin Liu 2c193d32ca libbpf: Check if pin_path was set even map fd exist
Say a user reuse map fd after creating a map manually and set the
pin_path, then load the object via libbpf.

In libbpf bpf_object__create_maps(), bpf_object__reuse_map() will
return 0 if there is no pinned map in map->pin_path. Then after
checking if map fd exist, we should also check if pin_path was set
and do bpf_map__pin() instead of continue the loop.

Fix it by creating map if fd not exist and continue checking pin_path
after that.

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201006021345.3817033-3-liuhangbin@gmail.com
2020-10-06 11:10:20 -07:00
Hangbin Liu a0f2b7acb4 libbpf: Close map fd if init map slots failed
Previously we forgot to close the map fd if bpf_map_update_elem()
failed during map slot init, which will leak map fd.

Let's move map slot initialization to new function init_map_slots() to
simplify the code. And close the map fd if init slot failed.

Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201006021345.3817033-2-liuhangbin@gmail.com
2020-10-06 11:10:20 -07:00
David S. Miller 8b0308fe31 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Rejecting non-native endian BTF overlapped with the addition
of support for it.

The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05 18:40:01 -07:00
Hao Luo d370bbe121 bpf/libbpf: BTF support for typed ksyms
If a ksym is defined with a type, libbpf will try to find the ksym's btf
information from kernel btf. If a valid btf entry for the ksym is found,
libbpf can pass in the found btf id to the verifier, which validates the
ksym's type and value.

Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
but it has the symbol's address (read from kallsyms) and its value is
treated as a raw pointer.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-3-haoluo@google.com
2020-10-02 14:59:25 -07:00
Joe Perches aa803771a8 tools: Avoid comma separated statements
Use semicolons and braces.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-10-02 10:36:36 -06:00
Andrii Nakryiko 9c6c5c48d7 libbpf: Make btf_dump work with modifiable BTF
Ensure that btf_dump can accommodate new BTF types being appended to BTF
instance after struct btf_dump was created. This came up during attemp to
use btf_dump for raw type dumping in selftests, but given changes are not
excessive, it's good to not have any gotchas in API usage, so I decided to
support such use case in general.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929232843.1249318-2-andriin@fb.com
2020-09-30 12:30:22 -07:00
Daniel Borkmann 0e9f6841f6 bpf, libbpf: Add bpf_tail_call_static helper for bpf programs
Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().

  [0] f5537c2602

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net
2020-09-30 11:50:35 -07:00
Andrii Nakryiko b0efc216f5 libbpf: Compile in PIC mode only for shared library case
Libbpf compiles .o's for static and shared library modes separately, so no
need to specify -fPIC for both. Keep it only for shared library mode.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-3-andriin@fb.com
2020-09-29 17:05:31 -07:00
Andrii Nakryiko 0a62291d69 libbpf: Compile libbpf under -O2 level by default and catch extra warnings
For some reason compiler doesn't complain about uninitialized variable, fixed
in previous patch, if libbpf is compiled without -O2 optimization level. So do
compile it with -O2 and never let similar issue slip by again. -Wall is added
unconditionally, so no need to specify it again.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-2-andriin@fb.com
2020-09-29 17:05:31 -07:00
Andrii Nakryiko 3343391345 libbpf: Fix uninitialized variable in btf_parse_type_sec
Fix obvious unitialized variable use that wasn't reported by compiler. libbpf
Makefile changes to catch such errors are added separately.

Fixes: 3289959b97 ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com
2020-09-29 17:05:31 -07:00
Toke Høiland-Jørgensen a535909142 libbpf: Add support for freplace attachment in bpf_link_create
This adds support for supplying a target btf ID for the bpf_link_create()
operation, and adds a new bpf_program__attach_freplace() high-level API for
attaching freplace functions with a target.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355387.48470.18026176785351166890.stgit@toke.dk
2020-09-29 13:09:24 -07:00
Andrii Nakryiko 3289959b97 libbpf: Support BTF loading and raw data output in both endianness
Teach BTF to recognized wrong endianness and transparently convert it
internally to host endianness. Original endianness of BTF will be preserved
and used during btf__get_raw_data() to convert resulting raw data to the same
endianness and a source raw_data. This means that little-endian host can parse
big-endian BTF with no issues, all the type data will be presented to the
client application in native endianness, but when it's time for emitting BTF
to persist it in a file (e.g., after BTF deduplication), original non-native
endianness will be preserved and stored.

It's possible to query original endianness of BTF data with new
btf__endianness() API. It's also possible to override desired output
endianness with btf__set_endianness(), so that if application needs to load,
say, big-endian BTF and store it as little-endian BTF, it's possible to
manually override this. If btf__set_endianness() was used to change
endianness, btf__endianness() will reflect overridden endianness.

Given there are no known use cases for supporting cross-endianness for
.BTF.ext, loading .BTF.ext in non-native endianness is not supported.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com
2020-09-29 12:21:23 -07:00
Andrii Nakryiko 9141f75a32 selftests/bpf: Test BTF writing APIs
Add selftests for BTF writer APIs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-4-andriin@fb.com
2020-09-28 19:17:13 -07:00
Andrii Nakryiko f86ed050bc libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offset
BTF strings are used not just for names, they can be arbitrary strings used
for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
is too specific and might be misleading. Instead, introduce
btf__str_by_offset() API which uses generic string terminology.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com
2020-09-28 19:17:13 -07:00
Andrii Nakryiko 4a3b33f857 libbpf: Add BTF writing APIs
Add APIs for appending new BTF types at the end of BTF object.

Each BTF kind has either one API of the form btf__add_<kind>(). For types
that have variable amount of additional items (struct/union, enum, func_proto,
datasec), additional API is provided to emit each such item. E.g., for
emitting a struct, one would use the following sequence of API calls:

btf__add_struct(...);
btf__add_field(...);
...
btf__add_field(...);

Each btf__add_field() will ensure that the last BTF type is of STRUCT or
UNION kind and will automatically increment that type's vlen field.

All the strings are provided as C strings (const char *), not a string offset.
This significantly improves usability of BTF writer APIs. All such strings
will be automatically appended to string section or existing string will be
re-used, if such string was already added previously.

Each API attempts to do all the reasonable validations, like enforcing
non-empty names for entities with required names, proper value bounds, various
bit offset restrictions, etc.

Type ID validation is minimal because it's possible to emit a type that refers
to type that will be emitted later, so libbpf has no way to enforce such
cases. User must be careful to properly emit all the necessary types and
specify type IDs that will be valid in the finally generated BTF.

Each of btf__add_<kind>() APIs return new type ID on success or negative
value on error. APIs like btf__add_field() that emit additional items
return zero on success and negative value on error.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com
2020-09-28 19:17:13 -07:00
Andrii Nakryiko a871b04310 libbpf: Add btf__new_empty() to create an empty BTF object
Add an ability to create an empty BTF object from scratch. This is going to be
used by pahole for BTF encoding. And also by selftest for convenient creation
of BTF objects.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-7-andriin@fb.com
2020-09-28 17:27:32 -07:00
Andrii Nakryiko 919d2b1dbb libbpf: Allow modification of BTF and add btf__add_str API
Allow internal BTF representation to switch from default read-only mode, in
which raw BTF data is a single non-modifiable block of memory with BTF header,
types, and strings layed out sequentially and contiguously in memory, into
a writable representation with types and strings data split out into separate
memory regions, that can be dynamically expanded.

Such writable internal representation is transparent to users of libbpf APIs,
but allows to append new types and strings at the end of BTF, which is
a typical use case when generating BTF programmatically. All the basic
guarantees of BTF types and strings layout is preserved, i.e., user can get
`struct btf_type *` pointer and read it directly. Such btf_type pointers might
be invalidated if BTF is modified, so some care is required in such mixed
read/write scenarios.

Switch from read-only to writable configuration happens automatically the
first time when user attempts to modify BTF by either adding a new type or new
string. It is still possible to get raw BTF data, which is a single piece of
memory that can be persisted in ELF section or into a file as raw BTF. Such
raw data memory is also still owned by BTF and will be freed either when BTF
object is freed or if another modification to BTF happens, as any modification
invalidates BTF raw representation.

This patch adds the first two BTF manipulation APIs: btf__add_str(), which
allows to add arbitrary strings to BTF string section, and btf__find_str()
which allows to find existing string offset, but not add it if it's missing.
All the added strings are automatically deduplicated. This is achieved by
maintaining an additional string lookup index for all unique strings. Such
index is built when BTF is switched to modifiable mode. If at that time BTF
strings section contained duplicate strings, they are not de-duplicated. This
is done specifically to not modify the existing content of BTF (types, their
string offsets, etc), which can cause confusion and is especially important
property if there is struct btf_ext associated with struct btf. By following
this "imperfect deduplication" process, btf_ext is kept consitent and correct.
If deduplication of strings is necessary, it can be forced by doing BTF
deduplication, at which point all the strings will be eagerly deduplicated and
all string offsets both in struct btf and struct btf_ext will be updated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko 7d9c71e10b libbpf: Extract generic string hashing function for reuse
Calculating a hash of zero-terminated string is a common need when using
hashmap, so extract it for reuse.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko 192f5a1fe6 libbpf: Generalize common logic for managing dynamically-sized arrays
Managing dynamically-sized array is a common, but not trivial functionality,
which significant amount of logic and code to implement properly. So instead
of re-implementing it all the time, extract it into a helper function ans
reuse.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-4-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko b86042478f libbpf: Remove assumption of single contiguous memory for BTF data
Refactor internals of struct btf to remove assumptions that BTF header, type
data, and string data are layed out contiguously in a memory in a single
memory allocation. Now we have three separate pointers pointing to the start
of each respective are: header, types, strings. In the next patches, these
pointers will be re-assigned to point to independently allocated memory areas,
if BTF needs to be modified.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-3-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko 740e69c3c5 libbpf: Refactor internals of BTF type index
Refactor implementation of internal BTF type index to not use direct pointers.
Instead it uses offset relative to the start of types data section. This
allows for types data to be reallocatable, enabling implementation of
modifiable BTF.

As now getting type by ID has an extra indirection step, convert all internal
type lookups to a new helper btf_type_id(), that returns non-const pointer to
a type by its ID.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-2-andriin@fb.com
2020-09-28 17:27:31 -07:00
Song Liu 88f7fe7233 libbpf: Support test run of raw tracepoint programs
Add bpf_prog_test_run_opts() with support of new fields in bpf_attr.test,
namely, flags and cpu. Also extend _opts operations to support outputs via
opts.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-3-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
Arnaldo Carvalho de Melo 717d182e41 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and get v5.10 development in sync with the main kernel
sources.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 15:44:52 -03:00
Andrii Nakryiko 87f92ac4c1 libbpf: Fix XDP program load regression for old kernels
Fix regression in libbpf, introduced by XDP link change, which causes XDP
programs to fail to be loaded into kernel due to specified BPF_XDP
expected_attach_type. While kernel doesn't enforce expected_attach_type for
BPF_PROG_TYPE_XDP, some old kernels already support XDP program, but they
don't yet recognize expected_attach_type field in bpf_attr, so setting it to
non-zero value causes program load to fail.

Luckily, libbpf already has a mechanism to deal with such cases, so just make
expected_attach_type optional for XDP programs.

Fixes: dc8698cac7 ("libbpf: Add support for BPF XDP link")
Reported-by: Nikita Shirokov <tehnerd@tehnerd.com>
Reported-by: Udip Pant <udippant@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200924171705.3803628-1-andriin@fb.com
2020-09-24 10:33:02 -07:00
David S. Miller 6d772f328d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-09-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).

The main changes are:

1) Full multi function support in libbpf, from Andrii.

2) Refactoring of function argument checks, from Lorenz.

3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.

4) Program metadata support, from YiFei.

5) bpf iterator optimizations, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:11:11 -07:00
David S. Miller 3ab0a7a0c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
   moving another local variable and removing it's
   initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
   One pretty prints the port mode differently, whilst another
   changes the driver to try and obtain the port mode from
   the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 16:45:34 -07:00
Leo Yan d110162caf perf tsc: Support cap_user_time_short for event TIME_CONV
The synthesized event TIME_CONV doesn't contain the complete parameters
for counters, this will lead to wrong conversion between counter cycles
and timestamp.

This patch extends event TIME_CONV to record flags 'cap_user_time_zero'
which is used to indicate the counter parameters are valid or not, if
not will directly return 0 for timestamp calculation.  And record the
flag 'cap_user_time_short' and its relevant fields 'time_cycles' and
'time_mask' for cycle calibration.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kemeng Shi <shikemeng@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Gasson <nick.gasson@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steve Maclean <steve.maclean@microsoft.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zou Wei <zou_wei@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200914115311.2201-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-22 13:46:40 -03:00
Tony Ambardar 1245008122 libbpf: Fix native endian assumption when parsing BTF
Code in btf__parse_raw() fails to detect raw BTF of non-native endianness
and assumes it must be ELF data, which then fails to parse as ELF and
yields a misleading error message:

  root:/# bpftool btf dump file /sys/kernel/btf/vmlinux
  libbpf: failed to get EHDR from /sys/kernel/btf/vmlinux

For example, this could occur after cross-compiling a BTF-enabled kernel
for a target with non-native endianness, which is currently unsupported.

Check for correct endianness and emit a clearer error message:

  root:/# bpftool btf dump file /sys/kernel/btf/vmlinux
  libbpf: non-native BTF endianness is not supported

Fixes: 94a1fedd63 ("libbpf: Add btf__parse_raw() and generic btf__parse() APIs")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/90f81508ecc57bc0da318e0fe0f45cfe49b17ea7.1600417359.git.Tony.Ambardar@gmail.com
2020-09-21 21:50:49 +02:00
David S. Miller d5d325eae7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-09-15

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 19 day(s) which contain
a total of 10 files changed, 47 insertions(+), 38 deletions(-).

The main changes are:

1) docs/bpf fixes, from Andrii.

2) ld_abs fix, from Daniel.

3) socket casting helpers fix, from Martin.

4) hash iterator fixes, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15 19:26:21 -07:00
YiFei Zhu 5d23328dcc libbpf: Add BPF_PROG_BIND_MAP syscall and use it on .rodata section
The patch adds a simple wrapper bpf_prog_bind_map around the syscall.
When the libbpf tries to load a program, it will probe the kernel for
the support of this syscall and unconditionally bind .rodata section
to the program.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Link: https://lore.kernel.org/bpf/20200915234543.3220146-4-sdf@google.com
2020-09-15 18:28:27 -07:00