OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Yonghong Song	15172a46fa	bpf: net: Refactor bpf_iter target registration Currently bpf_iter_reg_target takes parameters from target and allocates memory to save them. This is really not necessary, esp. in the future we may grow information passed from targets to bpf_iter manager. The patch refactors the code so target reg_info becomes static and bpf_iter manager can just take a reference to it. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com	2020-05-13 12:30:50 -07:00
Yonghong Song	2e3ed68bfc	bpf: Add comments to interpret bpf_prog return values Add a short comment in bpf_iter_run_prog() function to explain how bpf_prog return value is converted to seq_ops->show() return value: bpf_prog return seq_ops()->show() return 0 0 1 -EAGAIN When show() return value is -EAGAIN, the current bpf_seq_read() will end. If the current seq_file buffer is empty, -EAGAIN will return to user space. Otherwise, the buffer will be copied to user space. In both cases, the next bpf_seq_read() call will try to show the same object which returned -EAGAIN previously. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180218.2949517-1-yhs@fb.com	2020-05-13 12:30:50 -07:00
Yonghong Song	21aef70ead	bpf: Change btf_iter func proto prefix to "bpf_iter_" This is to be consistent with tracing and lsm programs which have prefix "bpf_trace_" and "bpf_lsm_" respectively. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180216.2949387-1-yhs@fb.com	2020-05-13 12:30:49 -07:00
Yonghong Song	99aaf53e2f	tools/bpf: selftests : Explain bpf_iter test failures with llvm 10.0.0 Commit `6879c042e1` ("tools/bpf: selftests: Add bpf_iter selftests") added self tests for bpf_iter feature. But two subtests ipv6_route and netlink needs llvm latest 10.x release branch or trunk due to a bug in llvm BPF backend. This patch added the file README.rst to document these two failures so people using llvm 10.0.0 can be aware of them. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180215.2949237-1-yhs@fb.com	2020-05-13 12:30:49 -07:00
Alexei Starovoitov	0aa0372f92	Merge branch 'benchmark-runner' Andrii Nakryiko says: ==================== Add generic benchmark runner framework which simplifies writing various performance benchmarks in a consistent fashion. This framework will be used in follow up patches to test performance of perf buffer and ring buffer as well. Patch #1 extracts parse_num_list to be re-used between test_progs and bench. Patch #2 adds generic runner implementation and atomic counter benchmarks to validate benchmark runner's behavior. Patch #3 implements test_overhead benchmark as part of bench runner. It also add fmod_ret BPF program type to a set of benchmarks. Patch #4 tests faster alternatives to set_task_comm() approach, tested in test_overhead, in search for minimal-overhead way to trigger BPF program execution from user-space on demand. v2->v3: - added --prod-affinity and --cons-affinity (Yonghong); - removed ringbuf-related options leftovers (Yonghong); - added more benchmarking results for test_overhead performance discrepancies; v1->v2: - moved benchmarks into benchs/ subdir (John); - added benchmark "suite" scripts (John); - few small clean ups, change defaults, etc. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-05-13 12:20:02 -07:00
Andrii Nakryiko	c5d420c32c	selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com	2020-05-13 12:19:38 -07:00
Andrii Nakryiko	4eaf0b5c5e	selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement user-space benchmarking part into benchmark runner to compare results. Results with ./bench are consistently somewhat lower than test_overhead's, but relative performance of various types of BPF programs stay consisten (e.g., kretprobe is noticeably slower). This slowdown seems to be coming from the fact that test_overhead is single-threaded, while benchmark always spins off at least one thread for producer. This has been confirmed by hacking multi-threaded test_overhead variant and also single-threaded bench variant. Resutls are below. run_bench_rename.sh script from benchs/ subdirectory was used to produce results for ./bench. Single-threaded implementations =============================== /* bench: single-threaded, atomics / base : 4.622 ± 0.049M/s kprobe : 3.673 ± 0.052M/s kretprobe : 2.625 ± 0.052M/s rawtp : 4.369 ± 0.089M/s fentry : 4.201 ± 0.558M/s fexit : 4.309 ± 0.148M/s fmodret : 4.314 ± 0.203M/s / selftest: single-threaded, no atomics / task_rename base 4555K events per sec task_rename kprobe 3643K events per sec task_rename kretprobe 2506K events per sec task_rename raw_tp 4303K events per sec task_rename fentry 4307K events per sec task_rename fexit 4010K events per sec task_rename fmod_ret 3984K events per sec Multi-threaded implementations ============================== / bench: multi-threaded w/ atomics / base : 3.910 ± 0.023M/s kprobe : 3.048 ± 0.037M/s kretprobe : 2.300 ± 0.015M/s rawtp : 3.687 ± 0.034M/s fentry : 3.740 ± 0.087M/s fexit : 3.510 ± 0.009M/s fmodret : 3.485 ± 0.050M/s / selftest: multi-threaded w/ atomics / task_rename base 3872K events per sec task_rename kprobe 3068K events per sec task_rename kretprobe 2350K events per sec task_rename raw_tp 3731K events per sec task_rename fentry 3639K events per sec task_rename fexit 3558K events per sec task_rename fmod_ret 3511K events per sec / selftest: multi-threaded, no atomics */ task_rename base 3945K events per sec task_rename kprobe 3298K events per sec task_rename kretprobe 2451K events per sec task_rename raw_tp 3718K events per sec task_rename fentry 3782K events per sec task_rename fexit 3543K events per sec task_rename fmod_ret 3526K events per sec Note that the fact that ./bench benchmark always uses atomic increments for counting, while test_overhead doesn't, doesn't influence test results all that much. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com	2020-05-13 12:19:38 -07:00
Andrii Nakryiko	8e7c2a023a	selftests/bpf: Add benchmark runner infrastructure While working on BPF ringbuf implementation, testing, and benchmarking, I've developed a pretty generic and modular benchmark runner, which seems to be generically useful, as I've already used it for one more purpose (testing fastest way to trigger BPF program, to minimize overhead of in-kernel code). This patch adds generic part of benchmark runner and sets up Makefile for extending it with more sets of benchmarks. Benchmarker itself operates by spinning up specified number of producer and consumer threads, setting up interval timer sending SIGALARM signal to application once a second. Every second, current snapshot with hits/drops counters are collected and stored in an array. Drops are useful for producer/consumer benchmarks in which producer might overwhelm consumers. Once test finishes after given amount of warm-up and testing seconds, mean and stddev are calculated (ignoring warm-up results) and is printed out to stdout. This setup seems to give consistent and accurate results. To validate behavior, I added two atomic counting tests: global and local. For global one, all the producer threads are atomically incrementing same counter as fast as possible. This, of course, leads to huge drop of performance once there is more than one producer thread due to CPUs fighting for the same memory location. Local counting, on the other hand, maintains one counter per each producer thread, incremented independently. Once per second, all counters are read and added together to form final "counting throughput" measurement. As expected, such setup demonstrates linear scalability with number of producers (as long as there are enough physical CPU cores, of course). See example output below. Also, this setup can nicely demonstrate disastrous effects of false sharing, if care is not taken to take those per-producer counters apart into independent cache lines. Demo output shows global counter first with 1 producer, then with 4. Both total and per-producer performance significantly drop. The last run is local counter with 4 producers, demonstrating near-perfect scalability. $ ./bench -a -w1 -d2 -p1 count-global Setting up benchmark 'count-global'... Benchmark 'count-global' started. Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s $ ./bench -a -w1 -d2 -p4 count-global Setting up benchmark 'count-global'... Benchmark 'count-global' started. Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s $ ./bench -a -w1 -d2 -p4 count-local Setting up benchmark 'count-local'... Benchmark 'count-local' started. Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s Benchmark runner supports setting thread affinity for producer and consumer threads. You can use -a flag for default CPU selection scheme, where first consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads pick up next CPU and increment one-by-one as well. But user can also specify a set of CPUs independently for producers and consumers with --prod-affinity 1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force producers and consumers to share same set of CPUs, if necessary. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com	2020-05-13 12:19:38 -07:00
Andrii Nakryiko	cd49291ce1	selftests/bpf: Extract parse_num_list into generic testing_helpers.c Add testing_helpers.c, which will contain generic helpers for test runners and tests needing some common generic functionality, like parsing a set of numbers. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-2-andriin@fb.com	2020-05-13 12:19:38 -07:00
Eelco Chaudron	fd9eef1a13	libbpf: Fix probe code to return EPERM if encountered When the probe code was failing for any reason ENOTSUP was returned, even if this was due to not having enough lock space. This patch fixes this by returning EPERM to the user application, so it can respond and increase the RLIMIT_MEMLOCK size. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/158927424896.2342.10402475603585742943.stgit@ebuild	2020-05-13 10:29:54 +02:00
Yauheni Kaliuta	309b81f0fd	selftests/bpf: Install generated test progs Before commit `74b5a5968f` ("selftests/bpf: Replace test_progs and test_maps w/ general rule") selftests/bpf used generic install target from selftests/lib.mk to install generated bpf test progs by mentioning them in TEST_GEN_FILES variable. Take that functionality back. Fixes: `74b5a5968f` ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513021722.7787-1-yauheni.kaliuta@redhat.com	2020-05-13 10:25:41 +02:00
Quentin Monnet	ff20460e94	tools, bpf: Synchronise BPF UAPI header with tools Synchronise the bpf.h header under tools, to report the fixes recently brought to the documentation for the BPF helpers. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-5-quentin@isovalent.com	2020-05-11 21:20:56 +02:00
Quentin Monnet	ab8d78093d	bpf: Minor fixes to BPF helpers documentation Minor improvements to the documentation for BPF helpers: * Fix formatting for the description of "bpf_socket" for bpf_getsockopt() and bpf_setsockopt(), thus suppressing two warnings from rst2man about "Unexpected indentation". * Fix formatting for return values for bpf_sk_assign() and seq_file helpers. * Fix and harmonise formatting, in particular for function/struct names. * Remove blank lines before "Return:" sections. * Replace tabs found in the middle of text lines. * Fix typos. * Add a note to the footer (in Python script) about "bpftool feature probe", including for listing features available to unprivileged users, and add a reference to bpftool man page. Thanks to Florian for reporting two typos (duplicated words). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-4-quentin@isovalent.com	2020-05-11 21:20:53 +02:00
Quentin Monnet	c8caa0bb4b	tools, bpftool: Minor fixes for documentation Bring minor improvements to bpftool documentation. Fix or harmonise formatting, update map types (including in interactive help), improve description for "map create", fix a build warning due to a missing line after the double-colon for the "bpftool prog profile" example, complete/harmonise/sort the list of related bpftool man pages in footers. v2: - Remove (instead of changing) mark-up on "value" in bpftool-map.rst, when it does not refer to something passed on the command line. - Fix an additional typo ("hexadeximal") in the same file. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-3-quentin@isovalent.com	2020-05-11 21:20:50 +02:00
Quentin Monnet	6e7e034e88	tools, bpftool: Poison and replace kernel integer typedefs Replace the use of kernel-only integer typedefs (u8, u32, etc.) by their user space counterpart (__u8, __u32, etc.). Similarly to what libbpf does, poison the typedefs to avoid introducing them again in the future. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-2-quentin@isovalent.com	2020-05-11 21:20:46 +02:00
Gustavo A. R. Silva	385bbf7b11	bpf, libbpf: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200507185057.GA13981@embeddedor	2020-05-11 16:56:47 +02:00
Song Liu	b4563facdc	bpf, runqslower: include proper uapi/bpf.h runqslower doesn't specify include path for uapi/bpf.h. This causes the following warning: In file included from runqslower.c:10: .../tools/testing/selftests/bpf/tools/include/bpf/bpf.h:234:38: warning: 'enum bpf_stats_type' declared inside parameter list will not be visible outside of this definition or declaration 234 \| LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); Fix this by adding -I tools/includ/uapi to the Makefile. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-05-09 18:01:33 -07:00
Alexei Starovoitov	180139dca8	Merge branch 'bpf_iter' Yonghong Song says: ==================== Motivation: The current way to dump kernel data structures mostly: 1. /proc system 2. various specific tools like "ss" which requires kernel support. 3. drgn The dropback for the first two is that whenever you want to dump more, you need change the kernel. For example, Martin wants to dump socket local storage with "ss". Kernel change is needed for it to work ([1]). This is also the direct motivation for this work. drgn ([2]) solves this proble nicely and no kernel change is not needed. But since drgn is not able to verify the validity of a particular pointer value, it might present the wrong results in rare cases. In this patch set, we introduce bpf iterator. Initial kernel changes are still needed for interested kernel data, but a later data structure change will not require kernel changes any more. bpf program itself can adapt to new data structure changes. This will give certain flexibility with guaranteed correctness. In this patch set, kernel seq_ops is used to facilitate iterating through kernel data, similar to current /proc and many other lossless kernel dumping facilities. In the future, different iterators can be implemented to trade off losslessness for other criteria e.g. no repeated object visits, etc. User Interface: 1. Similar to prog/map/link, the iterator can be pinned into a path within a bpffs mount point. 2. The bpftool command can pin an iterator to a file bpftool iter pin <bpf_prog.o> <path> 3. Use `cat <path>` to dump the contents. Use `rm -f <path>` to remove the pinned iterator. 4. The anonymous iterator can be created as well. Please see patch #19 andd #20 for bpf programs and bpf iterator output examples. Note that certain iterators are namespace aware. For example, task and task_file targets only iterate through current pid namespace. ipv6_route and netlink will iterate through current net namespace. Please see individual patches for implementation details. Performance: The bpf iterator provides in-kernel aggregation abilities for kernel data. This can greatly improve performance compared to e.g., iterating all process directories under /proc. For example, I did an experiment on my VM with an application forking different number of tasks and each forked process opening various number of files. The following is the result with the latency with unit of microseconds: # of forked tasks # of open files # of bpf_prog calls # latency (us) 100 100 11503 7586 1000 1000 1013203 709513 10000 100 1130203 764519 The number of bpf_prog calls may be more than forked tasks multipled by open files since there are other tasks running on the system. The bpf program is a do-nothing program. One millions of bpf calls takes less than one second. Although the initial motivation is from Martin's sk_local_storage, this patch didn't implement tcp6 sockets and sk_local_storage. The /proc/net/tcp6 involves three types of sockets, timewait, request and tcp6 sockets. Some kind of type casting or other mechanism is needed to handle all these socket types in one bpf program. This will be addressed in future work. Currently, we do not support kernel data generated under module. This requires some BTF work. More work for more iterators, e.g., tcp, udp, bpf_map elements, etc. Changelog: v3 -> v4: - in bpf_seq_read(), if start() failed with an error, return that error to user space (Andrii) - in bpf_seq_printf(), if reading kernel memory failed for %s and %p{i,I}{4,6}, set buffer to empty string or address 0. Documented this behavior in uapi header (Andrii) - fix a few error handling issues for bpftool (Andrii) - A few other minor fixes and cosmetic changes. v2 -> v3: - add bpf_iter_unreg_target() to unregister a target, used in the error path of the __init functions. - handle err != 0 before handling overflow (Andrii) - reference count "task" for task_file target (Andrii) - remove some redundancy for bpf_map/task/task_file targets - add bpf_iter_unreg_target() in ip6_route_cleanup() - Handling "%%" format in bpf_seq_printf() (Andrii) - implement auto-attach for bpf_iter in libbpf (Andrii) - add macros offsetof and container_of in bpf_helpers.h (Andrii) - add tests for auto-attach and program-return-1 cases - some other minor fixes v1 -> v2: - removed target_feature, using callback functions instead - checking target to ensure program specified btf_id supported (Martin) - link_create change with new changes from Andrii - better handling of btf_iter vs. seq_file private data (Martin, Andrii) - implemented bpf_seq_read() (Andrii, Alexei) - percpu buffer for bpf_seq_printf() (Andrii) - better syntax for BPF_SEQ_PRINTF macro (Andrii) - bpftool fixes (Quentin) - a lot of other fixes RFC v2 -> v1: - rename bpfdump to bpf_iter - use bpffs instead of a new file system - use bpf_link to streamline and simplify iterator creation. References: [1]: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com [2]: https://github.com/osandov/drgn ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-05-09 17:05:33 -07:00
Yonghong Song	6879c042e1	tools/bpf: selftests: Add bpf_iter selftests The added test includes the following subtests: - test verifier change for btf_id_or_null - test load/create_iter/read for ipv6_route/netlink/bpf_map/task/task_file - test anon bpf iterator - test anon bpf iterator reading one char at a time - test file bpf iterator - test overflow (single bpf program output not overflow) - test overflow (single bpf program output overflows) - test bpf prog returning 1 The ipv6_route tests the following verifier change - access fields in the variable length array of the structure. The netlink load tests the following verifier change - put a btf_id ptr value in a stack and accessible to tracing/iter programs. The anon bpf iterator also tests link auto attach through skeleton. $ test_progs -n 2 #2/1 btf_id_or_null:OK #2/2 ipv6_route:OK #2/3 netlink:OK #2/4 bpf_map:OK #2/5 task:OK #2/6 task_file:OK #2/7 anon:OK #2/8 anon-read-one-char:OK #2/9 file:OK #2/10 overflow:OK #2/11 overflow-e2big:OK #2/12 prog-ret-1:OK #2 bpf_iter:OK Summary: 1/12 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175923.2477637-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	acf6163174	tools/bpf: selftests: Add iter progs for bpf_map/task/task_file The implementation is arbitrary, just to show how the bpf programs can be written for bpf_map/task/task_file. They can be costomized for specific needs. For example, for bpf_map, the iterator prints out: $ cat /sys/fs/bpf/my_bpf_map id refcnt usercnt locked_vm 3 2 0 20 6 2 0 20 9 2 0 20 12 2 0 20 13 2 0 20 16 2 0 20 19 2 0 20 %%% END %%% For task, the iterator prints out: $ cat /sys/fs/bpf/my_task tgid gid 1 1 2 2 .... 1944 1944 1948 1948 1949 1949 1953 1953 === END === For task/file, the iterator prints out: $ cat /sys/fs/bpf/my_task_file tgid gid fd file 1 1 0 ffffffff95c97600 1 1 1 ffffffff95c97600 1 1 2 ffffffff95c97600 .... 1895 1895 255 ffffffff95c8fe00 1932 1932 0 ffffffff95c8fe00 1932 1932 1 ffffffff95c8fe00 1932 1932 2 ffffffff95c8fe00 1932 1932 3 ffffffff95c185c0 This is able to print out all open files (fd and file->f_op), so user can compare f_op against a particular kernel file operations to find what it is. For example, from /proc/kallsyms, we can find ffffffff95c185c0 r eventfd_fops so we will know tgid 1932 fd 3 is an eventfd file descriptor. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175922.2477576-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	7c128a6bbd	tools/bpf: selftests: Add iterator programs for ipv6_route and netlink Two bpf programs are added in this patch for netlink and ipv6_route target. On my VM, I am able to achieve identical results compared to /proc/net/netlink and /proc/net/ipv6_route. $ cat /proc/net/netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode 000000002c42d58b 0 0 00000000 0 0 0 2 0 7 00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719 00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422 000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424 .... 00000000159a170d 15 1862 00000002 0 0 0 2 0 1886 000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076 00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683 000000008398fb08 16 0 00000000 0 0 0 2 0 27 $ cat /sys/fs/bpf/my_netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode 000000002c42d58b 0 0 00000000 0 0 0 2 0 7 00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719 00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422 000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424 .... 00000000159a170d 15 1862 00000002 0 0 0 2 0 1886 000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076 00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683 000000008398fb08 16 0 00000000 0 0 0 2 0 27 $ cat /proc/net/ipv6_route fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo $ cat /sys/fs/bpf/my_ipv6_route fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	9406b485de	tools/bpftool: Add bpf_iter support for bptool Currently, only one command is supported bpftool iter pin <bpf_prog.o> <path> It will pin the trace/iter bpf program in the object file <bpf_prog.o> to the <path> where <path> should be on a bpffs mount. For example, $ bpftool iter pin ./bpf_iter_ipv6_route.o \ /sys/fs/bpf/my_route User can then do a `cat` to print out the results: $ cat /sys/fs/bpf/my_route fe800000000000000000000000000000 40 00000000000000000000000000000000 ... 00000000000000000000000000000000 00 00000000000000000000000000000000 ... 00000000000000000000000000000001 80 00000000000000000000000000000000 ... fe800000000000008c0162fffebdfd57 80 00000000000000000000000000000000 ... ff000000000000000000000000000000 08 00000000000000000000000000000000 ... 00000000000000000000000000000000 00 00000000000000000000000000000000 ... The implementation for ipv6_route iterator is in one of subsequent patches. This patch also added BPF_LINK_TYPE_ITER to link query. In the future, we may add additional parameters to pin command by parameterizing the bpf iterator. For example, a map_id or pid may be added to let bpf program only traverses a single map or task, similar to kernel seq_file single_open(). We may also add introspection command for targets/iterators by leveraging the bpf_iter itself. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200509175920.2477247-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	5fbc220862	tools/libpf: Add offsetof/container_of macro in bpf_helpers.h These two helpers will be used later in bpf_iter bpf program bpf_iter_netlink.c. Put them in bpf_helpers.h since they could be useful in other cases. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175919.2477104-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	c09add2fbc	tools/libbpf: Add bpf_iter support Two new libbpf APIs are added to support bpf_iter: - bpf_program__attach_iter Given a bpf program and additional parameters, which is none now, returns a bpf_link. - bpf_iter_create syscall level API to create a bpf iterator. The macro BPF_SEQ_PRINTF are also introduced. The format looks like: BPF_SEQ_PRINTF(seq, "task id %d\n", pid); This macro can help bpf program writers with nicer bpf_seq_printf syntax similar to the kernel one. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175917.2476936-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	9c5f8a1008	bpf: Support variable length array in tracing programs In /proc/net/ipv6_route, we have struct fib6_info { struct fib6_table fib6_table; ... struct fib6_nh fib6_nh[0]; } struct fib6_nh { struct fib_nh_common nh_common; struct rt6_info rt6i_pcpu; struct rt6_exception_bucket rt6i_exception_bucket; }; struct fib_nh_common { ... u8 nhc_gw_family; ... } The access: struct fib6_nh *fib6_nh = &rt->fib6_nh; ... fib6_nh->nh_common.nhc_gw_family ... This patch ensures such an access is handled properly. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175916.2476853-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	1d68f22b3d	bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundary This specifically to handle the case like below: // ptr below is a socket ptr identified by PTR_TO_BTF_ID u64 param[2] = { ptr, val }; bpf_seq_printf(seq, fmt, sizeof(fmt), param, sizeof(param)); In this case, the 16 bytes stack for "param" contains: 8 bytes for ptr with spilled PTR_TO_BTF_ID 8 bytes for val as STACK_MISC The current verifier will complain the ptr should not be visible to the helper. ... 16: (7b) (u64 )(r10 -64) = r2 18: (7b) (u64 )(r10 -56) = r1 19: (bf) r4 = r10 ; 20: (07) r4 += -64 ; BPF_SEQ_PRINTF(seq, fmt1, (long)s, s->sk_protocol); 21: (bf) r1 = r6 22: (18) r2 = 0xffffa8d00018605a 24: (b4) w3 = 10 25: (b4) w5 = 16 26: (85) call bpf_seq_printf#125 R0=inv(id=0) R1_w=ptr_seq_file(id=0,off=0,imm=0) R2_w=map_value(id=0,off=90,ks=4,vs=144,imm=0) R3_w=inv10 R4_w=fp-64 R5_w=inv16 R6=ptr_seq_file(id=0,off=0,imm=0) R7=ptr_netlink_sock(id=0,off=0,imm=0) R10=fp0 fp-56_w=mmmmmmmm fp-64_w=ptr_ last_idx 26 first_idx 13 regs=8 stack=0 before 25: (b4) w5 = 16 regs=8 stack=0 before 24: (b4) w3 = 10 invalid indirect read from stack off -64+0 size 16 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175915.2476783-1-yhs@fb.com	2020-05-09 17:05:27 -07:00
Yonghong Song	492e639f0c	bpf: Add bpf_seq_printf and bpf_seq_write helpers Two helpers bpf_seq_printf and bpf_seq_write, are added for writing data to the seq_file buffer. bpf_seq_printf supports common format string flag/width/type fields so at least I can get identical results for netlink and ipv6_route targets. For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW specifically indicates a write failure due to overflow, which means the object will be repeated in the next bpf invocation if object collection stays the same. Note that if the object collection is changed, depending how collection traversal is done, even if the object still in the collection, it may not be visited. For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to read kernel memory. Reading kernel memory may fail in the following two cases: - invalid kernel address, or - valid kernel address but requiring a major fault If reading kernel memory failed, the %s string will be an empty string and %p{i,I}{4,6} will be all 0. Not returning error to bpf program is consistent with what bpf_trace_printk() does for now. bpf_seq_printf may return -EBUSY meaning that internal percpu buffer for memory copy of strings or other pointees is not available. Bpf program can return 1 to indicate it wants the same object to be repeated. Right now, this should not happen on no-RT kernels since migrate_disable(), which guards bpf prog call, calls preempt_disable(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	b121b341e5	bpf: Add PTR_TO_BTF_ID_OR_NULL support Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support. For tracing/iter program, the bpf program context definition, e.g., for previous bpf_map target, looks like struct bpf_iter__bpf_map { struct bpf_iter_meta meta; struct bpf_map map; }; The kernel guarantees that meta is not NULL, but map pointer maybe NULL. The NULL map indicates that all objects have been traversed, so bpf program can take proper action, e.g., do final aggregation and/or send final report to user space. Add btf_id_or_null_non0_off to prog->aux structure, to indicate that if the context access offset is not 0, set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID. This bit is set for tracing/iter program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	eaaacd2391	bpf: Add task and task/file iterator targets Only the tasks belonging to "current" pid namespace are enumerated. For task/file target, the bpf program will have access to struct task_struct task u32 fd struct file file where fd/file is an open file for the task. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175911.2476407-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	138d0be35b	net: bpf: Add netlink and ipv6_route bpf_iter targets This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	6086d29def	bpf: Add bpf_map iterator Implement seq_file operations to traverse all bpf_maps. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	e5158d987b	bpf: Implement common macros/helpers for target iterators Macro DEFINE_BPF_ITER_FUNC is implemented so target can define an init function to capture the BTF type which represents the target. The bpf_iter_meta is a structure holding meta data, common to all targets in the bpf program. Additional marker functions are called before or after bpf_seq_read() show()/next()/stop() callback functions to help calculate precise seq_num and whether call bpf_prog inside stop(). Two functions, bpf_iter_get_info() and bpf_iter_run_prog(), are implemented so target can get needed information from bpf_iter infrastructure and can run the program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175907.2475956-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	367ec3e483	bpf: Create file bpf iterator To produce a file bpf iterator, the fd must be corresponding to a link_fd assocciated with a trace/iter program. When the pinned file is opened, a seq_file will be generated. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175906.2475893-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	ac51d99bf8	bpf: Create anonymous bpf iterator A new bpf command BPF_ITER_CREATE is added. The anonymous bpf iterator is seq_file based. The seq_file private data are referenced by targets. The bpf_iter infrastructure allocated additional space at seq_file->private before the space used by targets to store some meta data, e.g., prog: prog to run session_id: an unique id for each opened seq_file seq_num: how many times bpf programs are queried in this session done_stop: an internal state to decide whether bpf program should be called in seq_ops->stop() or not The seq_num will start from 0 for valid objects. The bpf program may see the same seq_num more than once if - seq_file buffer overflow happens and the same object is retried by bpf_seq_read(), or - the bpf program explicitly requests a retry of the same object Since module is not supported for bpf_iter, all target registeration happens at __init time, so there is no need to change bpf_iter_unreg_target() as it is used mostly in error path of the init function at which time no bpf iterators have been created yet. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	fd4f12bc38	bpf: Implement bpf_seq_read() for bpf iterator bpf iterator uses seq_file to provide a lossless way to transfer data to user space. But we want to call bpf program after all objects have been traversed, and bpf program may write additional data to the seq_file buffer. The current seq_read() does not work for this use case. Besides allowing stop() function to write to the buffer, the bpf_seq_read() also fixed the buffer size to one page. If any single call of show() or stop() will emit data more than one page to cause overflow, -E2BIG error code will be returned to user space. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175904.2475468-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	2057c92bc9	bpf: Support bpf tracing/iter programs for BPF_LINK_UPDATE Added BPF_LINK_UPDATE support for tracing/iter programs. This way, a file based bpf iterator, which holds a reference to the link, can have its bpf program updated without creating new files. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175902.2475262-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	de4e05cac4	bpf: Support bpf tracing/iter programs for BPF_LINK_CREATE Given a bpf program, the step to create an anonymous bpf iterator is: - create a bpf_iter_link, which combines bpf program and the target. In the future, there could be more information recorded in the link. A link_fd will be returned to the user space. - create an anonymous bpf iterator with the given link_fd. The bpf_iter_link can be pinned to bpffs mount file system to create a file based bpf iterator as well. The benefit to use of bpf_iter_link: - using bpf link simplifies design and implementation as bpf link is used for other tracing bpf programs. - for file based bpf iterator, bpf_iter_link provides a standard way to replace underlying bpf programs. - for both anonymous and free based iterators, bpf link query capability can be leveraged. The patch added support of tracing/iter programs for BPF_LINK_CREATE. A new link type BPF_LINK_TYPE_ITER is added to facilitate link querying. Currently, only prog_id is needed, so there is no additional in-kernel show_fdinfo() and fill_link_info() hook is needed for BPF_LINK_TYPE_ITER link. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	15d83c4d7c	bpf: Allow loading of a bpf_iter program A bpf_iter program is a tracing program with attach type BPF_TRACE_ITER. The load attribute attach_btf_id is used by the verifier against a particular kernel function, which represents a target, e.g., __bpf_iter__bpf_map for target bpf_map which is implemented later. The program return value must be 0 or 1 for now. 0 : successful, except potential seq_file buffer overflow which is handled by seq_file reader. 1 : request to restart the same object In the future, other return values may be used for filtering or teminating the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com	2020-05-09 17:05:26 -07:00
Yonghong Song	ae24345da5	bpf: Implement an interface to register bpf_iter targets The target can call bpf_iter_reg_target() to register itself. The needed information: target: target name seq_ops: the seq_file operations for the target init_seq_private target callback to initialize seq_priv during file open fini_seq_private target callback to clean up seq_priv during file release seq_priv_size: the private_data size needed by the seq_file operations The target name represents a target which provides a seq_ops for iterating objects. The target can provide two callback functions, init_seq_private and fini_seq_private, called during file open/release time. For example, /proc/net/{tcp6, ipv6_route, netlink, ...}, net name space needs to be setup properly during file open and released properly during file release. Function bpf_iter_unreg_target() is also implemented to unregister a particular target. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175859.2474669-1-yhs@fb.com	2020-05-09 17:05:25 -07:00
Stanislav Fomichev	8086fbaf49	bpf: Allow any port in bpf_bind helper We want to have a tighter control on what ports we bind to in the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means connect() becomes slightly more expensive. The expensive part comes from the fact that we now need to call inet_csk_get_port() that verifies that the port is not used and allocates an entry in the hash table for it. Since we can't rely on "snum \|\| !bind_address_no_port" to prevent us from calling POST_BIND hook anymore, let's add another bind flag to indicate that the call site is BPF program. v5: * fix wrong AF_INET (should be AF_INET6) in the bpf program for v6 v3: * More bpf_bind documentation refinements (Martin KaFai Lau) * Add UDP tests as well (Martin KaFai Lau) * Don't start the thread, just do socket+bind+listen (Martin KaFai Lau) v2: * Update documentation (Andrey Ignatov) * Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com	2020-05-09 00:48:20 +02:00
Stanislav Fomichev	cb0721c7e2	net: Refactor arguments of inet{,6}_bind The intent is to add an additional bind parameter in the next commit. Instead of adding another argument, let's convert all existing flag arguments into an extendable bit field. No functional changes. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-4-sdf@google.com	2020-05-09 00:48:20 +02:00
Stanislav Fomichev	488a23b89d	selftests/bpf: Move existing common networking parts into network_helpers 1. Move pkt_v4 and pkt_v6 into network_helpers and adjust the users. 2. Copy-paste spin_lock_thread into two tests that use it. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-3-sdf@google.com	2020-05-09 00:48:20 +02:00
Stanislav Fomichev	33181bb8e8	selftests/bpf: Generalize helpers to control background listener Move the following routines that let us start a background listener thread and connect to a server by fd to the test_prog: * start_server - socket+bind+listen * connect_to_fd - connect to the server identified by fd These will be used in the next commit. Also, extend these helpers to support AF_INET6 and accept the family as an argument. v5: * drop pthread.h (Martin KaFai Lau) * add SO_SNDTIMEO (Martin KaFai Lau) v4: * export extra helper to start server without a thread (Martin KaFai Lau) * tcp_rtt is no longer starting background thread (Martin KaFai Lau) v2: * put helpers into network_helpers.c (Andrii Nakryiko) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-2-sdf@google.com	2020-05-09 00:48:20 +02:00
Jason Yan	2b6c6f0716	bpf, i386: Remove unneeded conversion to bool The '==' expression itself is bool, no need to convert it to bool again. This fixes the following coccicheck warning: arch/x86/net/bpf_jit_comp32.c:1478:50-55: WARNING: conversion to bool not needed here arch/x86/net/bpf_jit_comp32.c:1479:50-55: WARNING: conversion to bool not needed here Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200506140352.37154-1-yanaijie@huawei.com	2020-05-07 16:29:14 +02:00
Alexei Starovoitov	f87b87a1c9	CAP_PERFMON for BPF -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6zMIUTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoTEOEACWUbZlswquZqSAr6pESTjAbjSPZ58s QfmiY218aXvyvwJeINfJduYIjtVjPL9F0qbqmm5Sh4CFflgF9QRUZLxsyuu6K7uY Bsy7hjHWUCO79anxgjie1rqhCxT/orW39E0nlDW72AlrebVwPRc4PmERsV9bl/9z Z7M7aJzza60938K8qN24A63Q4wb3uCfygUYDGkeaN46jmwlHLj8Qwu/L2pVv2/3O 7FHEHYFK0UuvVw6byRpbPoHDbBEGApYszRlEUMRtkk/7zsvdIQJGHQpPMD5PkGY1 kS6x1a7sdnA7++A7Oin7Uq+0y7sgVNeJyPO9o+u9wZgePZL4t87YZlwIL5NlyXIL JHDrz0DAckSYEAYuPnFrXtYW2AQ9TZxVuIHGsJdKW8KOdkHkYUqtXwLuj+0xf8v/ szeJR+l6mOJzDHML7W7KzZdl+AJB/+GE55cLaRPs0bBFyLpUs/vL8BjkoDiDm3/i okGIWQkh8PwpujiS/mHDHqoXuVpVHYAcHD0X+zuLIUVCzKf71Kq7y2fIiHyTR4Wo +x5aeOFWHYOC2DwFdUQ1EiOUCtLbORqq1CDsnIE4KCtsfx1K6IHmqI8X77D8CTEp oSPz0kd8kgBfHwPhCepQ7DnBA1cJTiRbs6/++frU0S5R2HhIOGeLN6hrhrMjNhYD 07jorSqL6hjjog== =+J1X -----END PGP SIGNATURE----- Merge tag 'perf-for-bpf-2020-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next CAP_PERFMON for BPF	2020-05-06 17:12:44 -07:00
Daniel Borkmann	a085a1eeea	Merge branch 'bpf-rv64-jit' Luke Nelson says: ==================== This patch series introduces a set of optimizations to the BPF JIT on RV64. The optimizations are related to the verifier zero-extension optimization and BPF_JMP BPF_K. We tested the optimizations on a QEMU riscv64 virt machine, using lib/test_bpf and test_verifier, and formally verified their correctness using Serval. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2020-05-06 09:48:43 +02:00
Luke Nelson	073ca6a036	bpf, riscv: Optimize BPF_JSET BPF_K using andi on RV64 This patch optimizes BPF_JSET BPF_K by using a RISC-V andi instruction when the BPF immediate fits in 12 bits, instead of first loading the immediate to a temporary register. Examples of generated code with and without this optimization: BPF_JMP_IMM(BPF_JSET, R1, 2, 1) without optimization: 20: li t1,2 24: and t1,a0,t1 28: bnez t1,0x30 BPF_JMP_IMM(BPF_JSET, R1, 2, 1) with optimization: 20: andi t1,a0,2 24: bnez t1,0x2c BPF_JMP32_IMM(BPF_JSET, R1, 2, 1) without optimization: 20: li t1,2 24: mv t2,a0 28: slli t2,t2,0x20 2c: srli t2,t2,0x20 30: slli t1,t1,0x20 34: srli t1,t1,0x20 38: and t1,t2,t1 3c: bnez t1,0x44 BPF_JMP32_IMM(BPF_JSET, R1, 2, 1) with optimization: 20: andi t1,a0,2 24: bnez t1,0x2c In these examples, because the upper 32 bits of the sign-extended immediate are 0, BPF_JMP BPF_JSET and BPF_JMP32 BPF_JSET are equivalent and therefore the JIT produces identical code for them. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200506000320.28965-5-luke.r.nels@gmail.com	2020-05-06 09:48:15 +02:00
Luke Nelson	ca349a6a10	bpf, riscv: Optimize BPF_JMP BPF_K when imm == 0 on RV64 This patch adds an optimization to BPF_JMP (32- and 64-bit) BPF_K for when the BPF immediate is zero. When the immediate is zero, the code can directly use the RISC-V zero register instead of loading a zero immediate to a temporary register first. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200506000320.28965-4-luke.r.nels@gmail.com	2020-05-06 09:48:15 +02:00
Luke Nelson	21a099abb7	bpf, riscv: Optimize FROM_LE using verifier_zext on RV64 This patch adds two optimizations for BPF_ALU BPF_END BPF_FROM_LE in the RV64 BPF JIT. First, it enables the verifier zero-extension optimization to avoid zero extension when imm == 32. Second, it avoids generating code for imm == 64, since it is equivalent to a no-op. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200506000320.28965-3-luke.r.nels@gmail.com	2020-05-06 09:48:15 +02:00
Luke Nelson	0224b2acea	bpf, riscv: Enable missing verifier_zext optimizations on RV64 Commit `66d0d5a854` ("riscv: bpf: eliminate zero extension code-gen") added support for the verifier zero-extension optimization on RV64 and commit `46dd3d7d28` ("bpf, riscv: Enable zext optimization for more RV64G ALU ops") enabled it for more instruction cases. However, BPF_LSH BPF_X and BPF_{LSH,RSH,ARSH} BPF_K are still missing the optimization. This patch enables the zero-extension optimization for these remaining cases. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200506000320.28965-2-luke.r.nels@gmail.com	2020-05-06 09:48:15 +02:00

1 2 3 4 5 ...

916936 Commits All Branches Search

916936 Commits

All Branches