linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Jakub Sitnicki	0b9ad56b1e	selftests/bpf: Use SOCKMAP for server sockets in bpf_sk_assign test Update bpf_sk_assign test to fetch the server socket from SOCKMAP, now that map lookup from BPF in SOCKMAP is enabled. This way the test TC BPF program doesn't need to know what address server socket is bound to. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429181154.479310-4-jakub@cloudflare.com	2020-04-29 23:31:00 +02:00
Jakub Sitnicki	34a2cc6eee	selftests/bpf: Test that lookup on SOCKMAP/SOCKHASH is allowed Now that bpf_map_lookup_elem() is white-listed for SOCKMAP/SOCKHASH, replace the tests which check that verifier prevents lookup on these map types with ones that ensure that lookup operation is permitted, but only with a release of acquired socket reference. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429181154.479310-3-jakub@cloudflare.com	2020-04-29 23:30:59 +02:00
Jakub Sitnicki	64d85290d7	bpf: Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH White-list map lookup for SOCKMAP/SOCKHASH from BPF. Lookup returns a pointer to a full socket and acquires a reference if necessary. To support it we need to extend the verifier to know that: (1) register storing the lookup result holds a pointer to socket, if lookup was done on SOCKMAP/SOCKHASH, and that (2) map lookup on SOCKMAP/SOCKHASH is a reference acquiring operation, which needs a corresponding reference release with bpf_sk_release. On sock_map side, lookup handlers exposed via bpf_map_ops now bump sk_refcnt if socket is reference counted. In turn, bpf_sk_select_reuseport, the only in-kernel user of SOCKMAP/SOCKHASH ops->map_lookup_elem, was updated to release the reference. Sockets fetched from a map can be used in the same way as ones returned by BPF socket lookup helpers, such as bpf_sk_lookup_tcp. In particular, they can be used with bpf_sk_assign to direct packets toward a socket on TC ingress path. Suggested-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429181154.479310-2-jakub@cloudflare.com	2020-04-29 23:30:59 +02:00
Quentin Monnet	0b3b9ca3d1	tools: bpftool: Make libcap dependency optional The new libcap dependency is not used for an essential feature of bpftool, and we could imagine building the tool without checks on CAP_SYS_ADMIN by disabling probing features as an unprivileged users. Make it so, in order to avoid a hard dependency on libcap, and to ease packaging/embedding of bpftool. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429144506.8999-4-quentin@isovalent.com	2020-04-29 23:25:11 +02:00
Quentin Monnet	cf9bf71452	tools: bpftool: Allow unprivileged users to probe features There is demand for a way to identify what BPF helper functions are available to unprivileged users. To do so, allow unprivileged users to run "bpftool feature probe" to list BPF-related features. This will only show features accessible to those users, and may not reflect the full list of features available (to administrators) on the system. To avoid the case where bpftool is inadvertently run as non-root and would list only a subset of the features supported by the system when it would be expected to list all of them, running as unprivileged is gated behind the "unprivileged" keyword passed to the command line. When used by a privileged user, this keyword allows to drop the CAP_SYS_ADMIN and to list the features available to unprivileged users. Note that this addsd a dependency on libpcap for compiling bpftool. Note that there is no particular reason why the probes were restricted to root, other than the fact I did not need them for unprivileged and did not bother with the additional checks at the time probes were added. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429144506.8999-3-quentin@isovalent.com	2020-04-29 23:25:11 +02:00
Quentin Monnet	e3450b79df	tools: bpftool: For "feature probe" define "full_mode" bool as global The "full_mode" variable used for switching between full or partial feature probing (i.e. with or without probing helpers that will log warnings in kernel logs) was piped from the main do_probe() function down to probe_helpers_for_progtype(), where it is needed. Define it as a global variable: the calls will be more readable, and if other similar flags were to be used in the future, we could use global variables as well instead of extending again the list of arguments with new flags. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429144506.8999-2-quentin@isovalent.com	2020-04-29 23:25:11 +02:00
Alexei Starovoitov	fd9c40c575	Merge branch 'test_progs-asan' Andrii Nakryiko says: ==================== Add necessary infra to build selftests with ASAN (or any other sanitizer). Fix a bunch of found memory leaks and other memory access issues. v1->v2: - don't add ASAN flavor, but allow extra flags for build (Alexei); - fix few more found issues, which somehow were missed first time. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-28 19:51:33 -07:00
Andrii Nakryiko	e4e8f4d047	selftests/bpf: Add runqslower binary to .gitignore With recent changes, runqslower is being copied into selftests/bpf root directory. So add it into .gitignore. Fixes: `b26d1e2b60` ("selftests/bpf: Copy runqslower to OUTPUT directory") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Veronika Kabatova <vkabatov@redhat.com> Link: https://lore.kernel.org/bpf/20200429012111.277390-12-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	8d30e80a04	selftests/bpf: Fix bpf_link leak in ns_current_pid_tgid selftest If condition is inverted, but it's also just not necessary. Fixes: `1c1052e014` ("tools/testing/selftests/bpf: Add self-tests for new helper bpf_get_ns_current_pid_tgid.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Carlos Neira <cneirabustos@gmail.com> Link: https://lore.kernel.org/bpf/20200429012111.277390-11-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	36d0b6159f	selftests/bpf: Disable ASAN instrumentation for mmap()'ed memory read AddressSanitizer assumes that all memory dereferences are done against memory allocated by sanitizer's malloc()/free() code and not touched by anyone else. Seems like this doesn't hold for perf buffer memory. Disable instrumentation on perf buffer callback function. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429012111.277390-10-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	3521ffa2ee	libbpf: Fix huge memory leak in libbpf_find_vmlinux_btf_id() BTF object wasn't freed. Fixes: `a6ed02cac6` ("libbpf: Load btf_vmlinux only once per object.") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200429012111.277390-9-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	13c908495e	selftests/bpf: Fix invalid memory reads in core_relo selftest Another one found by AddressSanitizer. input_len is bigger than actually initialized data size. Fixes: `c7566a6969` ("selftests/bpf: Add field existence CO-RE relocs tests") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429012111.277390-8-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	9f56bb531a	selftests/bpf: Fix memory leak in extract_build_id() getline() allocates string, which has to be freed. Fixes: `81f77fd0de` ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200429012111.277390-7-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	f25d5416d6	selftests/bpf: Fix memory leak in test selector Free test selector substrings, which were strdup()'ed. Fixes: `b65053cd94` ("selftests/bpf: Add whitelist/blacklist of test names to test_progs") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429012111.277390-6-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	229bf8bf4d	libbpf: Fix memory leak and possible double-free in hashmap__clear Fix memory leak in hashmap_clear() not freeing hashmap_entry structs for each of the remaining entries. Also NULL-out bucket list to prevent possible double-free between hashmap__clear() and hashmap__free(). Running test_progs-asan flavor clearly showed this problem. Reported-by: Alston Tang <alston64@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429012111.277390-5-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	42fce2cfb4	selftests/bpf: Convert test_hashmap into test_progs test Fold stand-alone test_hashmap test into test_progs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429012111.277390-4-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	02995dd4bb	selftests/bpf: Add SAN_CFLAGS param to selftests build to allow sanitizers Add ability to specify extra compiler flags with SAN_CFLAGS for compilation of all user-space C files. This allows to build all of selftest programs with, e.g., custom sanitizer flags, without requiring support for such sanitizers from anyone compiling selftest/bpf. As an example, to compile everything with AddressSanitizer, one would do: $ make clean && make SAN_CFLAGS="-fsanitize=address" For AddressSanitizer to work, one needs appropriate libasan shared library installed in the system, with version of libasan matching what GCC links against. E.g., GCC8 needs libasan5, while GCC7 uses libasan4. For CentOS 7, to build everything successfully one would need to: $ sudo yum install devtoolset-8-gcc devtoolset-libasan-devel $ scl enable devtoolset-8 bash # set up environment For Arch Linux to run selftests, one would need to install gcc-libs package to get libasan.so.5: $ sudo pacman -S gcc-libs N.B. EXTRA_CFLAGS name wasn't used, because it's also used by libbpf's Makefile and this causes few issues: 1. default "-g -Wall" flags are overriden; 2. compiling shared library with AddressSanitizer generates a bunch of symbols like: "_GLOBAL__sub_D_00099_0_btf_dump.c", "_GLOBAL__sub_D_00099_0_bpf.c", etc, which screws up versioned symbols check. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Julia Kartseva <hex@fb.com> Link: https://lore.kernel.org/bpf/20200429012111.277390-3-andriin@fb.com	2020-04-28 19:48:05 -07:00
Andrii Nakryiko	76148faa16	selftests/bpf: Ensure test flavors use correct skeletons Ensure that test runner flavors include their own skeletons from <flavor>/ directory. Previously, skeletons generated for no-flavor test_progs were used. Apart from fixing correctness, this also makes it possible to compile only flavors individually: $ make clean && make test_progs-no_alu32 ... now succeeds ... Fixes: `74b5a5968f` ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429012111.277390-2-andriin@fb.com	2020-04-28 19:48:04 -07:00
Alexei Starovoitov	3271e8f3f6	Merge branch 'BTF-map-in-map' Andrii Nakryiko says: ==================== This patch set teaches libbpf how to declare and initialize ARRAY_OF_MAPS and HASH_OF_MAPS maps. See patch #3 for all the details. Patch #1 refactors parsing BTF definition of map to re-use it cleanly for inner map definition parsing. Patch #2 refactors map creation and destruction logic for reuse. It also fixes existing bug with not closing successfully created maps when bpf_object map creation overall fails. Patch #3 adds support for an extension of BTF-defined map syntax, as well as parsing, recording, and use of relocations to allow declaratively initialize outer maps with references to inner maps. v1->v2: - rename __inner to __array (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-28 17:35:48 -07:00
Andrii Nakryiko	646f02ffdd	libbpf: Add BTF-defined map-in-map support As discussed at LPC 2019 ([0]), this patch brings (a quite belated) support for declarative BTF-defined map-in-map support in libbpf. It allows to define ARRAY_OF_MAPS and HASH_OF_MAPS BPF maps without any user-space initialization code involved. Additionally, it allows to initialize outer map's slots with references to respective inner maps at load time, also completely declaratively. Despite a weak type system of C, the way BTF-defined map-in-map definition works, it's actually quite hard to accidentally initialize outer map with incompatible inner maps. This being C, of course, it's still possible, but even that would be caught at load time and error returned with helpful debug log pointing exactly to the slot that failed to be initialized. As an example, here's a rather advanced HASH_OF_MAPS declaration and initialization example, filling slots #0 and #4 with two inner maps: #include <bpf/bpf_helpers.h> struct inner_map { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 1); __type(key, int); __type(value, int); } inner_map1 SEC(".maps"), inner_map2 SEC(".maps"); struct outer_hash { __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS); __uint(max_entries, 5); __uint(key_size, sizeof(int)); __array(values, struct inner_map); } outer_hash SEC(".maps") = { .values = { [0] = &inner_map2, [4] = &inner_map1, }, }; Here's the relevant part of libbpf debug log showing pretty clearly of what's going on with map-in-map initialization: libbpf: .maps relo #0: for 6 value 0 rel.r_offset 96 name 260 ('inner_map1') libbpf: .maps relo #0: map 'outer_arr' slot [0] points to map 'inner_map1' libbpf: .maps relo #1: for 7 value 32 rel.r_offset 112 name 249 ('inner_map2') libbpf: .maps relo #1: map 'outer_arr' slot [2] points to map 'inner_map2' libbpf: .maps relo #2: for 7 value 32 rel.r_offset 144 name 249 ('inner_map2') libbpf: .maps relo #2: map 'outer_hash' slot [0] points to map 'inner_map2' libbpf: .maps relo #3: for 6 value 0 rel.r_offset 176 name 260 ('inner_map1') libbpf: .maps relo #3: map 'outer_hash' slot [4] points to map 'inner_map1' libbpf: map 'inner_map1': created successfully, fd=4 libbpf: map 'inner_map2': created successfully, fd=5 libbpf: map 'outer_hash': created successfully, fd=7 libbpf: map 'outer_hash': slot [0] set to map 'inner_map2' fd=5 libbpf: map 'outer_hash': slot [4] set to map 'inner_map1' fd=4 Notice from the log above that fd=6 (not logged explicitly) is used for inner "prototype" map, necessary for creation of outer map. It is destroyed immediately after outer map is created. See also included selftest with some extra comments explaining extra details of usage. Additionally, similar initialization syntax and libbpf functionality can be used to do initialization of BPF_PROG_ARRAY with references to BPF sub-programs. This can be done in follow up patches, if there will be a demand for this. [0] https://linuxplumbersconf.org/event/4/contributions/448/ Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200429002739.48006-4-andriin@fb.com	2020-04-28 17:35:03 -07:00
Andrii Nakryiko	2d39d7c56f	libbpf: Refactor map creation logic and fix cleanup leak Factor out map creation and destruction logic to simplify code and especially error handling. Also fix map FD leak in case of partially successful map creation during bpf_object load operation. Fixes: `57a00f4164` ("libbpf: Add auto-pinning of maps when loading BPF objects") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200429002739.48006-3-andriin@fb.com	2020-04-28 17:35:03 -07:00
Andrii Nakryiko	41017e56af	libbpf: Refactor BTF-defined map definition parsing logic Factor out BTF map definition logic into stand-alone routine for easier reuse for map-in-map case. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429002739.48006-2-andriin@fb.com	2020-04-28 17:35:03 -07:00
Alexei Starovoitov	1f427a8077	Merge branch 'bpf_link-observability' Andrii Nakryiko says: ==================== This patch series adds various observability APIs to bpf_link: - each bpf_link now gets ID, similar to bpf_map and bpf_prog, by which user-space can iterate over all existing bpf_links and create limited FD from ID; - allows to get extra object information with bpf_link general and type-specific information; - implements `bpf link show` command which lists all active bpf_links in the system; - implements `bpf link pin` allowing to pin bpf_link by ID or from other pinned path. v2->v3: - improve spin locking around bpf_link ID (Alexei); - simplify bpf_link_info handling and fix compilation error on sh arch; v1->v2: - simplified `bpftool link show` implementation (Quentin); - fixed formatting of bpftool-link.rst (Quentin); - fixed attach type printing logic (Quentin); rfc->v1: - dropped read-only bpf_links (Alexei); - fixed bug in bpf_link_cleanup() not removing ID; - fixed bpftool link pinning search logic; - added bash-completion and man page. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-28 17:28:05 -07:00
Andrii Nakryiko	5d085ad2e6	bpftool: Add link bash completions Extend bpftool's bash-completion script to handle new link command and its sub-commands. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200429001614.1544-11-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	7464d013cc	bpftool: Add bpftool-link manpage Add bpftool-link manpage with information and examples of link-related commands. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200429001614.1544-10-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	c5481f9a95	bpftool: Add bpf_link show and pin support Add `bpftool link show` and `bpftool link pin` commands. Example plain output for `link show` (with showing pinned paths): [vmuser@archvm bpf]$ sudo ~/local/linux/tools/bpf/bpftool/bpftool -f link 1: tracing prog 12 prog_type tracing attach_type fentry pinned /sys/fs/bpf/my_test_link pinned /sys/fs/bpf/my_test_link2 2: tracing prog 13 prog_type tracing attach_type fentry 3: tracing prog 14 prog_type tracing attach_type fentry 4: tracing prog 15 prog_type tracing attach_type fentry 5: tracing prog 16 prog_type tracing attach_type fentry 6: tracing prog 17 prog_type tracing attach_type fentry 7: raw_tracepoint prog 21 tp 'sys_enter' 8: cgroup prog 25 cgroup_id 584 attach_type egress 9: cgroup prog 25 cgroup_id 599 attach_type egress 10: cgroup prog 25 cgroup_id 614 attach_type egress 11: cgroup prog 25 cgroup_id 629 attach_type egress Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200429001614.1544-9-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	50325b1761	bpftool: Expose attach_type-to-string array to non-cgroup code Move attach_type_strings into main.h for access in non-cgroup code. bpf_attach_type is used for non-cgroup attach types quite widely now. So also complete missing string translations for non-cgroup attach types. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200429001614.1544-8-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	2c2837b09e	selftests/bpf: Test bpf_link's get_next_id, get_fd_by_id, and get_obj_info Extend bpf_obj_id selftest to verify bpf_link's observability APIs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-7-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	0dbc866832	libbpf: Add low-level APIs for new bpf_link commands Add low-level API calls for bpf_link_get_next_id() and bpf_link_get_fd_by_id(). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-6-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	f2e10bff16	bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command. Also enhance show_fdinfo to potentially include bpf_link type-specific information (similarly to obj_info). Also introduce enum bpf_link_type stored in bpf_link itself and expose it in UAPI. bpf_link_tracing also now will store and return bpf_attach_type. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-5-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	2d602c8cf4	bpf: Support GET_FD_BY_ID and GET_NEXT_ID for bpf_link Add support to look up bpf_link by ID and iterate over all existing bpf_links in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is done as the very last step in finalizing bpf_link, together with installing FD. This approach allows users of bpf_link in kernel code to not worry about races between user-space and kernel code that hasn't finished attaching and initializing bpf_link. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-4-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	a3b80e1078	bpf: Allocate ID for bpf_link Generate ID for each bpf_link using IDR, similarly to bpf_map and bpf_prog. bpf_link creation, initialization, attachment, and exposing to user-space through FD and ID is a complicated multi-step process, abstract it away through bpf_link_primer and bpf_link_prime(), bpf_link_settle(), and bpf_link_cleanup() internal API. They guarantee that until bpf_link is properly attached, user-space won't be able to access partially-initialized bpf_link either from FD or ID. All this allows to simplify bpf_link attachment and error handling code. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-3-andriin@fb.com	2020-04-28 17:27:08 -07:00
Andrii Nakryiko	f9d041271c	bpf: Refactor bpf_link update handling Make bpf_link update support more generic by making it into another bpf_link_ops methods. This allows generic syscall handling code to be agnostic to various conditionally compiled features (e.g., the case of CONFIG_CGROUP_BPF). This also allows to keep link type-specific code to remain static within respective code base. Refactor existing bpf_cgroup_link code and take advantage of this. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-2-andriin@fb.com	2020-04-28 17:27:07 -07:00
Alexei Starovoitov	9b329d0dbe	selftests/bpf: fix test_sysctl_prog with alu32 Similar to commit `b7a0d65d80` ("bpf, testing: Workaround a verifier failure for test_progs") fix test_sysctl_prog.c as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-28 15:31:59 -07:00
Zou Wei	a6bbdf2e75	libbpf: Remove unneeded semicolon in btf_dump_emit_type Fixes the following coccicheck warning: tools/lib/bpf/btf_dump.c:661:4-5: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1588064829-70613-1-git-send-email-zou_wei@huawei.com	2020-04-28 21:47:47 +02:00
Veronika Kabatova	b26d1e2b60	selftests/bpf: Copy runqslower to OUTPUT directory $(OUTPUT)/runqslower makefile target doesn't actually create runqslower binary in the $(OUTPUT) directory. As lib.mk expects all TEST_GEN_PROGS_EXTENDED (which runqslower is a part of) to be present in the OUTPUT directory, this results in an error when running e.g. `make install`: rsync: link_stat "tools/testing/selftests/bpf/runqslower" failed: No such file or directory (2) Copy the binary into the OUTPUT directory after building it to fix the error. Fixes: `3a0d3092a4` ("selftests/bpf: Build runqslower from selftests") Signed-off-by: Veronika Kabatova <vkabatov@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200428173742.2988395-1-vkabatov@redhat.com	2020-04-28 21:27:20 +02:00
Daniel Borkmann	0b54142e4b	Merge branch 'work.sysctl' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler methods to take kernel pointers instead. It gets rid of the set_fs address space overrides used by BPF. As per discussion, pull in the feature branch into bpf-next as it relates to BPF sysctl progs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/	2020-04-28 21:23:38 +02:00
Christoph Hellwig	8c1b2bf16d	bpf, cgroup: Remove unused exports Except for a few of the networking hooks called from modular ipv4 or ipv6 code, all of hooks are just called from guaranteed to be built-in code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20200424064338.538313-2-hch@lst.de	2020-04-27 22:20:22 +02:00
Mao Wenan	e411eb257b	libbpf: Return err if bpf_object__load failed bpf_object__load() has various return code, when it failed to load object, it must return err instead of -EINVAL. Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200426063635.130680-3-maowenan@huawei.com	2020-04-27 14:43:20 +02:00
Christoph Hellwig	32927393dc	sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-27 02:07:40 -04:00
Christoph Hellwig	f461d2dcd5	sysctl: avoid forward declarations Move the sysctl tables to the end of the file to avoid lots of pointless forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-27 02:07:26 -04:00
Christoph Hellwig	2374c09b1c	sysctl: remove all extern declaration from sysctl.c Extern declarations in .c files are a bad style and can lead to mismatches. Use existing definitions in headers where they exist, and otherwise move the external declarations to suitable header files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-27 02:06:53 -04:00
Christoph Hellwig	26363af564	mm: remove watermark_boost_factor_sysctl_handler watermark_boost_factor_sysctl_handler is just a pointless wrapper for proc_dointvec_minmax, so remove it and use proc_dointvec_minmax directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-27 02:06:52 -04:00
Alexei Starovoitov	f131bd3eee	Merge branch 'cloudflare-prog' Lorenz Bauer says: ==================== We've been developing an in-house L4 load balancer based on XDP and TC for a while. Following Alexei's call for more up-to-date examples of production BPF in the kernel tree [1], Cloudflare is making this available under dual GPL-2.0 or BSD 3-clause terms. The code requires at least v5.3 to function correctly. 1: https://lore.kernel.org/bpf/20200326210719.den5isqxntnoqhmv@ast-mbp/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-26 10:00:43 -07:00
Lorenz Bauer	234589012b	selftests/bpf: Add cls_redirect classifier cls_redirect is a TC clsact based replacement for the glb-redirect iptables module available at [1]. It enables what GitHub calls "second chance" flows [2], similarly proposed by the Beamer paper [3]. In contrast to glb-redirect, it also supports migrating UDP flows as long as connected sockets are used. cls_redirect is in production at Cloudflare, as part of our own L4 load balancer. We have modified the encapsulation format slightly from glb-redirect: glbgue_chained_routing.private_data_type has been repurposed to form a version field and several flags. Both have been arranged in a way that a private_data_type value of zero matches the current glb-redirect behaviour. This means that cls_redirect will understand packets in glb-redirect format, but not vice versa. The test suite only covers basic features. For example, cls_redirect will correctly forward path MTU discovery packets, but this is not exercised. It is also possible to switch the encapsulation format to GRE on the last hop, which is also not tested. There are two major distinctions from glb-redirect: first, cls_redirect relies on receiving encapsulated packets directly from a router. This is because we don't have access to the neighbour tables from BPF, yet. See forward_to_next_hop for details. Second, cls_redirect performs decapsulation instead of using separate ipip and sit tunnel devices. This avoids issues with the sit tunnel [4] and makes deploying the classifier easier: decapsulated packets appear on the same interface, so existing firewall rules continue to work as expected. The code base started it's life on v4.19, so there are most likely still hold overs from old workarounds. In no particular order: - The function buf_off is required to defeat a clang optimization that leads to the verifier rejecting the program due to pointer arithmetic in the wrong order. - The function pkt_parse_ipv6 is force inlined, because it would otherwise be rejected due to returning a pointer to stack memory. - The functions fill_tuple and classify_tcp contain kludges, because we've run out of function arguments. - The logic in general is rather nested, due to verifier restrictions. I think this is either because the verifier loses track of constants on the stack, or because it can't track enum like variables. 1: https://github.com/github/glb-director/tree/master/src/glb-redirect 2: https://github.com/github/glb-director/blob/master/docs/development/second-chance-design.md 3: https://www.usenix.org/conference/nsdi18/presentation/olteanu 4: https://github.com/github/glb-director/issues/64 Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200424185556.7358-2-lmb@cloudflare.com	2020-04-26 10:00:36 -07:00
Andrii Nakryiko	6f8a57ccf8	bpf: Make verifier log more relevant by default To make BPF verifier verbose log more releavant and easier to use to debug verification failures, "pop" parts of log that were successfully verified. This has effect of leaving only verifier logs that correspond to code branches that lead to verification failure, which in practice should result in much shorter and more relevant verifier log dumps. This behavior is made the default behavior and can be overriden to do exhaustive logging by specifying BPF_LOG_LEVEL2 log level. Using BPF_LOG_LEVEL2 to disable this behavior is not ideal, because in some cases it's good to have BPF_LOG_LEVEL2 per-instruction register dump verbosity, but still have only relevant verifier branches logged. But for this patch, I didn't want to add any new flags. It might be worth-while to just rethink how BPF verifier logging is performed and requested and streamline it a bit. But this trimming of successfully verified branches seems to be useful and a good default behavior. To test this, I modified runqslower slightly to introduce read of uninitialized stack variable. Log (truncated in the middle to save many lines out of this commit message) BEFORE this change: ; int handle__sched_switch(u64 ctx) 0: (bf) r6 = r1 ; struct task_struct prev = (struct task_struct )ctx[1]; 1: (79) r1 = (u64 )(r6 +8) func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct' 2: (b7) r2 = 0 ; struct event event = {}; 3: (7b) (u64 )(r10 -24) = r2 last_idx 3 first_idx 0 regs=4 stack=0 before 2: (b7) r2 = 0 4: (7b) (u64 )(r10 -32) = r2 5: (7b) (u64 )(r10 -40) = r2 6: (7b) (u64 )(r10 -48) = r2 ; if (prev->state == TASK_RUNNING) [ ... instruction dump from insn #7 through #50 are cut out ... ] 51: (b7) r2 = 16 52: (85) call bpf_get_current_comm#16 last_idx 52 first_idx 42 regs=4 stack=0 before 51: (b7) r2 = 16 ; bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, 53: (bf) r1 = r6 54: (18) r2 = 0xffff8881f3868800 56: (18) r3 = 0xffffffff 58: (bf) r4 = r7 59: (b7) r5 = 32 60: (85) call bpf_perf_event_output#25 last_idx 60 first_idx 53 regs=20 stack=0 before 59: (b7) r5 = 32 61: (bf) r2 = r10 ; event.pid = pid; 62: (07) r2 += -16 ; bpf_map_delete_elem(&start, &pid); 63: (18) r1 = 0xffff8881f3868000 65: (85) call bpf_map_delete_elem#3 ; } 66: (b7) r0 = 0 67: (95) exit from 44 to 66: safe from 34 to 66: safe from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 ; bpf_map_update_elem(&start, &pid, &ts, 0); 28: (bf) r2 = r10 ; 29: (07) r2 += -16 ; tsp = bpf_map_lookup_elem(&start, &pid); 30: (18) r1 = 0xffff8881f3868000 32: (85) call bpf_map_lookup_elem#1 invalid indirect read from stack off -16+0 size 4 processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4 Notice how there is a successful code path from instruction 0 through 67, few successfully verified jumps (44->66, 34->66), and only after that 11->28 jump plus error on instruction #32. AFTER this change (full verifier log, no truncation): ; int handle__sched_switch(u64 ctx) 0: (bf) r6 = r1 ; struct task_struct prev = (struct task_struct )ctx[1]; 1: (79) r1 = (u64 )(r6 +8) func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct' 2: (b7) r2 = 0 ; struct event event = {}; 3: (7b) (u64 )(r10 -24) = r2 last_idx 3 first_idx 0 regs=4 stack=0 before 2: (b7) r2 = 0 4: (7b) (u64 )(r10 -32) = r2 5: (7b) (u64 )(r10 -40) = r2 6: (7b) (u64 )(r10 -48) = r2 ; if (prev->state == TASK_RUNNING) 7: (79) r2 = (u64 )(r1 +16) ; if (prev->state == TASK_RUNNING) 8: (55) if r2 != 0x0 goto pc+19 R1_w=ptr_task_struct(id=0,off=0,imm=0) R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 ; trace_enqueue(prev->tgid, prev->pid); 9: (61) r1 = (u32 )(r1 +1184) 10: (63) (u32 )(r10 -4) = r1 ; if (!pid \|\| (targ_pid && targ_pid != pid)) 11: (15) if r1 == 0x0 goto pc+16 from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 ; bpf_map_update_elem(&start, &pid, &ts, 0); 28: (bf) r2 = r10 ; 29: (07) r2 += -16 ; tsp = bpf_map_lookup_elem(&start, &pid); 30: (18) r1 = 0xffff8881db3ce800 32: (85) call bpf_map_lookup_elem#1 invalid indirect read from stack off -16+0 size 4 processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4 Notice how in this case, there are 0-11 instructions + jump from 11 to 28 is recorded + 28-32 instructions with error on insn #32. test_verifier test runner was updated to specify BPF_LOG_LEVEL2 for VERBOSE_ACCEPT expected result due to potentially "incomplete" success verbose log at BPF_LOG_LEVEL1. On success, verbose log will only have a summary of number of processed instructions, etc, but no branch tracing log. Having just a last succesful branch tracing seemed weird and confusing. Having small and clean summary log in success case seems quite logical and nice, though. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200423195850.1259827-1-andriin@fb.com	2020-04-26 09:47:37 -07:00
Maciej Żenczykowski	71d1921477	bpf: add bpf_ktime_get_boot_ns() On a device like a cellphone which is constantly suspending and resuming CLOCK_MONOTONIC is not particularly useful for keeping track of or reacting to external network events. Instead you want to use CLOCK_BOOTTIME. Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns() based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-26 09:43:05 -07:00
Tobias Klauser	0a05861f80	xsk: Fix typo in xsk_umem_consume_tx and xsk_generic_xmit comments s/backpreassure/backpressure/ Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20200421232927.21082-1-tklauser@distanz.ch	2020-04-26 09:41:31 -07:00
Maciej Żenczykowski	082b57e3eb	net: bpf: Make bpf_ktime_get_ns() available to non GPL programs The entire implementation is in kernel/bpf/helpers.c: BPF_CALL_0(bpf_ktime_get_ns) { /* NMI safe access to clock monotonic */ return ktime_get_mono_fast_ns(); } const struct bpf_func_proto bpf_ktime_get_ns_proto = { .func = bpf_ktime_get_ns, .gpl_only = false, .ret_type = RET_INTEGER, }; and this was presumably marked GPL due to kernel/time/timekeeping.c: EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns); and while that may make sense for kernel modules (although even that is doubtful), there is currently AFAICT no other source of time available to ebpf. Furthermore this is really just equivalent to clock_gettime(CLOCK_MONOTONIC) which is exposed to userspace (via vdso even to make it performant)... As such, I see no reason to keep the GPL restriction. (In the future I'd like to have access to time from Apache licensed ebpf code) Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-26 09:04:14 -07:00
Lorenzo Colitti	6f3f65d80d	net: bpf: Allow TC programs to call BPF_FUNC_skb_change_head This allows TC eBPF programs to modify and forward (redirect) packets from interfaces without ethernet headers (for example cellular) to interfaces with (for example ethernet/wifi). The lack of this appears to simply be an oversight. Tested: in active use in Android R on 4.14+ devices for ipv6 cellular to wifi tethering offload. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-04-26 09:00:50 -07:00

1 2 3 4 5 ...

916413 Commits All Branches Search

916413 Commits

All Branches