OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nikolay Aleksandrov	34d7ecb3d4	selftests: net: bridge: update IGMP/MLD membership interval value When I fixed IGMPv3/MLDv2 to use the bridge's multicast_membership_interval value which is chosen by user-space instead of calculating it based on multicast_query_interval and multicast_query_response_interval I forgot to update the selftests relying on that behaviour. Now we have to manually set the expected GMI value to perform the tests correctly and get proper results (similar to IGMPv2 behaviour). Fixes: `fac3cb82a5` ("net: bridge: mcast: use multicast_membership_interval for IGMPv3") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-29 13:58:21 +01:00
Shuah Khan	e300a85db1	selftests/net: update .gitignore with newly added tests Update .gitignore with newly added tests: tools/testing/selftests/net/af_unix/test_unix_oob tools/testing/selftests/net/gro tools/testing/selftests/net/ioam6_parser tools/testing/selftests/net/toeplitz Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-29 13:14:58 +01:00
Petr Machata	2b11e24eba	selftests: mlxsw: Test port shaper TBF can be used as a root qdisc, in which case it is supposed to configure port shaper. Add a test that verifies that this is so by installing a root TBF with a ETS or PRIO below it, and then expecting individual bands to all be shaped according to the root TBF configuration. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-28 19:47:50 -07:00
Petr Machata	3d5290ea1d	selftests: mlxsw: Test offloadability of root TBF TBF can be used as a root qdisc, with the usual ETS/RED/TBF hierarchy below it. This use should now be offloaded. Add a test that verifies that it is. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-28 19:47:49 -07:00
David Yang	9c7516d669	tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer The coccinelle check report: ./tools/testing/selftests/vm/split_huge_page_test.c:344:36-42: ERROR: application of sizeof to pointer Use "strlen" to fix it. Link: https://lkml.kernel.org/r/20211012030116.184027-1-davidcomponentone@gmail.com Signed-off-by: David Yang <davidcomponentone@gmail.com> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Zi Yan <ziy@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-10-28 17:18:55 -07:00
Kumar Kartikeya Dwivedi	efadf2ad17	selftests/bpf: Fix memory leak in test_ima The allocated ring buffer is never freed, do so in the cleanup path. Fixes: `f446b570ac` ("bpf/selftests: Update the IMA test to use BPF ring buffer") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211028063501.2239335-9-memxor@gmail.com	2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi	c3fc706e94	selftests/bpf: Fix fd cleanup in sk_lookup test Similar to the fix in commit: `e31eec77e4` ("bpf: selftests: Fix fd cleanup in get_branch_snapshot") We use designated initializer to set fds to -1 without breaking on future changes to MAX_SERVER constant denoting the array size. The particular close(0) occurs on non-reuseport tests, so it can be seen with -n 115/{2,3} but not 115/4. This can cause problems with future tests if they depend on BTF fd never being acquired as fd 0, breaking internal libbpf assumptions. Fixes: `0ab5539f85` ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211028063501.2239335-8-memxor@gmail.com	2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi	087cba799c	selftests/bpf: Add weak/typeless ksym test for light skeleton Also, avoid using CO-RE features, as lskel doesn't support CO-RE, yet. Include both light and libbpf skeleton in same file to test both of them together. In `c48e51c8b0` ("bpf: selftests: Add selftests for module kfunc support"), I added support for generating both lskel and libbpf skel for a BPF object, however the name parameter for bpftool caused collisions when included in same file together. This meant that every test needed a separate file for a libbpf/light skeleton separation instead of subtests. Change that by appending a "_lskel" suffix to the name for files using light skeleton, and convert all existing users. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211028063501.2239335-7-memxor@gmail.com	2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi	92274e24b0	libbpf: Use O_CLOEXEC uniformly when opening fds There are some instances where we don't use O_CLOEXEC when opening an fd, fix these up. Otherwise, it is possible that a parallel fork causes these fds to leak into a child process on execve. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211028063501.2239335-6-memxor@gmail.com	2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi	549a632386	libbpf: Ensure that BPF syscall fds are never 0, 1, or 2 Add a simple wrapper for passing an fd and getting a new one >= 3 if it is one of 0, 1, or 2. There are two primary reasons to make this change: First, libbpf relies on the assumption a certain BPF fd is never 0 (e.g. most recently noticed in [0]). Second, Alexei pointed out in [1] that some environments reset stdin, stdout, and stderr if they notice an invalid fd at these numbers. To protect against both these cases, switch all internal BPF syscall wrappers in libbpf to always return an fd >= 3. We only need to modify the syscall wrappers and not other code that assumes a valid fd by doing >= 0, to avoid pointless churn, and because it is still a valid assumption. The cost paid is two additional syscalls if fd is in range [0, 2]. [0]: `e31eec77e4` ("bpf: selftests: Fix fd cleanup in get_branch_snapshot") [1]: https://lore.kernel.org/bpf/CAADnVQKVKY8o_3aU8Gzke443+uHa-eGoM0h7W4srChMXU1S4Bg@mail.gmail.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211028063501.2239335-5-memxor@gmail.com	2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi	585a357198	libbpf: Add weak ksym support to gen_loader This extends existing ksym relocation code to also support relocating weak ksyms. Care needs to be taken to zero out the src_reg (currently BPF_PSEUOD_BTF_ID, always set for gen_loader by bpf_object__relocate_data) when the BTF ID lookup fails at runtime. This is not a problem for libbpf as it only sets ext->is_set when BTF ID lookup succeeds (and only proceeds in case of failure if ext->is_weak, leading to src_reg remaining as 0 for weak unresolved ksym). A pattern similar to emit_relo_kfunc_btf is followed of first storing the default values and then jumping over actual stores in case of an error. For src_reg adjustment, we also need to perform it when copying the populated instruction, so depending on if copied insn[0].imm is 0 or not, we decide to jump over the adjustment. We cannot reach that point unless the ksym was weak and resolved and zeroed out, as the emit_check_err will cause us to jump to cleanup label, so we do not need to recheck whether the ksym is weak before doing the adjustment after copying BTF ID and BTF FD. This is consistent with how libbpf relocates weak ksym. Logging statements are added to show the relocation result and aid debugging. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211028063501.2239335-4-memxor@gmail.com	2021-10-28 16:30:06 -07:00
Kumar Kartikeya Dwivedi	c24941cd37	libbpf: Add typeless ksym support to gen_loader This uses the bpf_kallsyms_lookup_name helper added in previous patches to relocate typeless ksyms. The return value ENOENT can be ignored, and the value written to 'res' can be directly stored to the insn, as it is overwritten to 0 on lookup failure. For repeating symbols, we can simply copy the previously populated bpf_insn. Also, we need to take care to not close fds for typeless ksym_desc, so reuse the 'off' member's space to add a marker for typeless ksym and use that to skip them in cleanup_relos. We add a emit_ksym_relo_log helper that avoids duplicating common logging instructions between typeless and weak ksym (for future commit). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211028063501.2239335-3-memxor@gmail.com	2021-10-28 16:30:06 -07:00
Kumar Kartikeya Dwivedi	d6aef08a87	bpf: Add bpf_kallsyms_lookup_name helper This helper allows us to get the address of a kernel symbol from inside a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can relocate typeless ksym vars. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com	2021-10-28 16:30:06 -07:00
Peter Zijlstra	134ab5bd18	objtool,x86: Replace alternatives with .retpoline_sites Instead of writing complete alternatives, simply provide a list of all the retpoline thunk calls. Then the kernel is free to do with them as it pleases. Simpler code all-round. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.850007165@infradead.org	2021-10-28 23:25:25 +02:00
Peter Zijlstra	c509331b41	objtool: Shrink struct instruction Any one instruction can only ever call a single function, therefore insn->mcount_loc_node is superfluous and can use insn->call_node. This shrinks struct instruction, which is by far the most numerous structure objtool creates. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.785456706@infradead.org	2021-10-28 23:25:25 +02:00
Peter Zijlstra	dd003edeff	objtool: Explicitly avoid self modifying code in .altinstr_replacement Assume ALTERNATIVE()s know what they're doing and do not change, or cause to change, instructions in .altinstr_replacement sections. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.722511775@infradead.org	2021-10-28 23:25:25 +02:00
Peter Zijlstra	1739c66eb7	objtool: Classify symbols In order to avoid calling str*cmp() on symbol names, over and over, do them all once upfront and store the result. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211026120309.658539311@infradead.org	2021-10-28 23:25:24 +02:00
Joanne Koong	f44bc543a0	bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter This patch adds benchmark tests for comparing the performance of hashmap lookups without the bloom filter vs. hashmap lookups with the bloom filter. Checking the bloom filter first for whether the element exists should overall enable a higher throughput for hashmap lookups, since if the element does not exist in the bloom filter, we can avoid a costly lookup in the hashmap. On average, using 5 hash functions in the bloom filter tended to perform the best across the widest range of different entry sizes. The benchmark results using 5 hash functions (running on 8 threads on a machine with one numa node, and taking the average of 3 runs) were roughly as follows: value_size = 4 bytes - 10k entries: 30% faster 50k entries: 40% faster 100k entries: 40% faster 500k entres: 70% faster 1 million entries: 90% faster 5 million entries: 140% faster value_size = 8 bytes - 10k entries: 30% faster 50k entries: 40% faster 100k entries: 50% faster 500k entres: 80% faster 1 million entries: 100% faster 5 million entries: 150% faster value_size = 16 bytes - 10k entries: 20% faster 50k entries: 30% faster 100k entries: 35% faster 500k entres: 65% faster 1 million entries: 85% faster 5 million entries: 110% faster value_size = 40 bytes - 10k entries: 5% faster 50k entries: 15% faster 100k entries: 20% faster 500k entres: 65% faster 1 million entries: 75% faster 5 million entries: 120% faster Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-6-joannekoong@fb.com	2021-10-28 13:22:49 -07:00
Joanne Koong	57fd1c63c9	bpf/benchs: Add benchmark tests for bloom filter throughput + false positive This patch adds benchmark tests for the throughput (for lookups + updates) and the false positive rate of bloom filter lookups, as well as some minor refactoring of the bash script for running the benchmarks. These benchmarks show that as the number of hash functions increases, the throughput and the false positive rate of the bloom filter decreases. >From the benchmark data, the approximate average false-positive rates are roughly as follows: 1 hash function = ~30% 2 hash functions = ~15% 3 hash functions = ~5% 4 hash functions = ~2.5% 5 hash functions = ~1% 6 hash functions = ~0.5% 7 hash functions = ~0.35% 8 hash functions = ~0.15% 9 hash functions = ~0.1% 10 hash functions = ~0% For reference data, the benchmarks run on one thread on a machine with one numa node for 1 to 5 hash functions for 8-byte and 64-byte values are as follows: 1 hash function: 50k entries 8-byte value Lookups - 51.1 M/s operations Updates - 33.6 M/s operations False positive rate: 24.15% 64-byte value Lookups - 15.7 M/s operations Updates - 15.1 M/s operations False positive rate: 24.2% 100k entries 8-byte value Lookups - 51.0 M/s operations Updates - 33.4 M/s operations False positive rate: 24.04% 64-byte value Lookups - 15.6 M/s operations Updates - 14.6 M/s operations False positive rate: 24.06% 500k entries 8-byte value Lookups - 50.5 M/s operations Updates - 33.1 M/s operations False positive rate: 27.45% 64-byte value Lookups - 15.6 M/s operations Updates - 14.2 M/s operations False positive rate: 27.42% 1 mil entries 8-byte value Lookups - 49.7 M/s operations Updates - 32.9 M/s operations False positive rate: 27.45% 64-byte value Lookups - 15.4 M/s operations Updates - 13.7 M/s operations False positive rate: 27.58% 2.5 mil entries 8-byte value Lookups - 47.2 M/s operations Updates - 31.8 M/s operations False positive rate: 30.94% 64-byte value Lookups - 15.3 M/s operations Updates - 13.2 M/s operations False positive rate: 30.95% 5 mil entries 8-byte value Lookups - 41.1 M/s operations Updates - 28.1 M/s operations False positive rate: 31.01% 64-byte value Lookups - 13.3 M/s operations Updates - 11.4 M/s operations False positive rate: 30.98% 2 hash functions: 50k entries 8-byte value Lookups - 34.1 M/s operations Updates - 20.1 M/s operations False positive rate: 9.13% 64-byte value Lookups - 8.4 M/s operations Updates - 7.9 M/s operations False positive rate: 9.21% 100k entries 8-byte value Lookups - 33.7 M/s operations Updates - 18.9 M/s operations False positive rate: 9.13% 64-byte value Lookups - 8.4 M/s operations Updates - 7.7 M/s operations False positive rate: 9.19% 500k entries 8-byte value Lookups - 32.7 M/s operations Updates - 18.1 M/s operations False positive rate: 12.61% 64-byte value Lookups - 8.4 M/s operations Updates - 7.5 M/s operations False positive rate: 12.61% 1 mil entries 8-byte value Lookups - 30.6 M/s operations Updates - 18.9 M/s operations False positive rate: 12.54% 64-byte value Lookups - 8.0 M/s operations Updates - 7.0 M/s operations False positive rate: 12.52% 2.5 mil entries 8-byte value Lookups - 25.3 M/s operations Updates - 16.7 M/s operations False positive rate: 16.77% 64-byte value Lookups - 7.9 M/s operations Updates - 6.5 M/s operations False positive rate: 16.88% 5 mil entries 8-byte value Lookups - 20.8 M/s operations Updates - 14.7 M/s operations False positive rate: 16.78% 64-byte value Lookups - 7.0 M/s operations Updates - 6.0 M/s operations False positive rate: 16.78% 3 hash functions: 50k entries 8-byte value Lookups - 25.1 M/s operations Updates - 14.6 M/s operations False positive rate: 7.65% 64-byte value Lookups - 5.8 M/s operations Updates - 5.5 M/s operations False positive rate: 7.58% 100k entries 8-byte value Lookups - 24.7 M/s operations Updates - 14.1 M/s operations False positive rate: 7.71% 64-byte value Lookups - 5.8 M/s operations Updates - 5.3 M/s operations False positive rate: 7.62% 500k entries 8-byte value Lookups - 22.9 M/s operations Updates - 13.9 M/s operations False positive rate: 2.62% 64-byte value Lookups - 5.6 M/s operations Updates - 4.8 M/s operations False positive rate: 2.7% 1 mil entries 8-byte value Lookups - 19.8 M/s operations Updates - 12.6 M/s operations False positive rate: 2.60% 64-byte value Lookups - 5.3 M/s operations Updates - 4.4 M/s operations False positive rate: 2.69% 2.5 mil entries 8-byte value Lookups - 16.2 M/s operations Updates - 10.7 M/s operations False positive rate: 4.49% 64-byte value Lookups - 4.9 M/s operations Updates - 4.1 M/s operations False positive rate: 4.41% 5 mil entries 8-byte value Lookups - 18.8 M/s operations Updates - 9.2 M/s operations False positive rate: 4.45% 64-byte value Lookups - 5.2 M/s operations Updates - 3.9 M/s operations False positive rate: 4.54% 4 hash functions: 50k entries 8-byte value Lookups - 19.7 M/s operations Updates - 11.1 M/s operations False positive rate: 1.01% 64-byte value Lookups - 4.4 M/s operations Updates - 4.0 M/s operations False positive rate: 1.00% 100k entries 8-byte value Lookups - 19.5 M/s operations Updates - 10.9 M/s operations False positive rate: 1.00% 64-byte value Lookups - 4.3 M/s operations Updates - 3.9 M/s operations False positive rate: 0.97% 500k entries 8-byte value Lookups - 18.2 M/s operations Updates - 10.6 M/s operations False positive rate: 2.05% 64-byte value Lookups - 4.3 M/s operations Updates - 3.7 M/s operations False positive rate: 2.05% 1 mil entries 8-byte value Lookups - 15.5 M/s operations Updates - 9.6 M/s operations False positive rate: 1.99% 64-byte value Lookups - 4.0 M/s operations Updates - 3.4 M/s operations False positive rate: 1.99% 2.5 mil entries 8-byte value Lookups - 13.8 M/s operations Updates - 7.7 M/s operations False positive rate: 3.91% 64-byte value Lookups - 3.7 M/s operations Updates - 3.6 M/s operations False positive rate: 3.78% 5 mil entries 8-byte value Lookups - 13.0 M/s operations Updates - 6.9 M/s operations False positive rate: 3.93% 64-byte value Lookups - 3.5 M/s operations Updates - 3.7 M/s operations False positive rate: 3.39% 5 hash functions: 50k entries 8-byte value Lookups - 16.4 M/s operations Updates - 9.1 M/s operations False positive rate: 0.78% 64-byte value Lookups - 3.5 M/s operations Updates - 3.2 M/s operations False positive rate: 0.77% 100k entries 8-byte value Lookups - 16.3 M/s operations Updates - 9.0 M/s operations False positive rate: 0.79% 64-byte value Lookups - 3.5 M/s operations Updates - 3.2 M/s operations False positive rate: 0.78% 500k entries 8-byte value Lookups - 15.1 M/s operations Updates - 8.8 M/s operations False positive rate: 1.82% 64-byte value Lookups - 3.4 M/s operations Updates - 3.0 M/s operations False positive rate: 1.78% 1 mil entries 8-byte value Lookups - 13.2 M/s operations Updates - 7.8 M/s operations False positive rate: 1.81% 64-byte value Lookups - 3.2 M/s operations Updates - 2.8 M/s operations False positive rate: 1.80% 2.5 mil entries 8-byte value Lookups - 10.5 M/s operations Updates - 5.9 M/s operations False positive rate: 0.29% 64-byte value Lookups - 3.2 M/s operations Updates - 2.4 M/s operations False positive rate: 0.28% 5 mil entries 8-byte value Lookups - 9.6 M/s operations Updates - 5.7 M/s operations False positive rate: 0.30% 64-byte value Lookups - 3.2 M/s operations Updates - 2.7 M/s operations False positive rate: 0.30% Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-5-joannekoong@fb.com	2021-10-28 13:22:49 -07:00
Joanne Koong	ed9109ad64	selftests/bpf: Add bloom filter map test cases This patch adds test cases for bpf bloom filter maps. They include tests checking against invalid operations by userspace, tests for using the bloom filter map as an inner map, and a bpf program that queries the bloom filter map for values added by a userspace program. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-4-joannekoong@fb.com	2021-10-28 13:22:49 -07:00
Joanne Koong	47512102cd	libbpf: Add "map_extra" as a per-map-type extra flag This patch adds the libbpf infrastructure for supporting a per-map-type "map_extra" field, whose definition will be idiosyncratic depending on map type. For example, for the bloom filter map, the lower 4 bits of map_extra is used to denote the number of hash functions. Please note that until libbpf 1.0 is here, the "bpf_create_map_params" struct is used as a temporary means for propagating the map_extra field to the kernel. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com	2021-10-28 13:22:49 -07:00
Joanne Koong	9330986c03	bpf: Add bloom filter map implementation This patch adds the kernel-side changes for the implementation of a bpf bloom filter map. The bloom filter map supports peek (determining whether an element is present in the map) and push (adding an element to the map) operations.These operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_UPDATE_ELEM -> push The bloom filter map does not have keys, only values. In light of this, the bloom filter map's API matches that of queue stack maps: user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem APIs to query or add an element to the bloom filter map. When the bloom filter map is created, it must be created with a key_size of 0. For updates, the user will pass in the element to add to the map as the value, with a NULL key. For lookups, the user will pass in the element to query in the map as the value, with a NULL key. In the verifier layer, this requires us to modify the argument type of a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in the syscall layer, we need to copy over the user value so that in bpf_map_peek_elem, we know which specific value to query. A few things to please take note of: * If there are any concurrent lookups + updates, the user is responsible for synchronizing this to ensure no false negative lookups occur. * The number of hashes to use for the bloom filter is configurable from userspace. If no number is specified, the default used will be 5 hash functions. The benchmarks later in this patchset can help compare the performance of using different number of hashes on different entry sizes. In general, using more hashes decreases both the false positive rate and the speed of a lookup. * Deleting an element in the bloom filter map is not supported. * The bloom filter map may be used as an inner map. * The "max_entries" size that is specified at map creation time is used to approximate a reasonable bitmap size for the bloom filter, and is not otherwise strictly enforced. If the user wishes to insert more entries into the bloom filter than "max_entries", they may do so but they should be aware that this may lead to a higher false positive rate. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com	2021-10-28 13:22:49 -07:00
Jakub Kicinski	7df621a3ee	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h `7b50ecfcc6` ("net: Rename ->stream_memory_read to ->sock_is_readable") `4c1e34c0db` ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c `0daa55d033` ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") `e77bcdd1f6` ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-28 10:43:58 -07:00
Chang S. Bae	101c669d16	selftests/x86/amx: Add context switch test XSAVE state is thread-local. The kernel switches between thread state at context switch time. Generally, running a selftest for a while will naturally expose it to some context switching and and will test the XSAVE code. Instead of just hoping that the tests get context-switched at random times, force context-switches on purpose. Spawn off a few userspace threads and force context-switches between them. Ensure that the kernel correctly context switches each thread's unique AMX state. [ dhansen: bunches of cleanups ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211026122525.6EFD5758@davehans-spike.ostc.intel.com	2021-10-28 14:35:27 +02:00
Chang S. Bae	6a3e0651b4	selftests/x86/amx: Add test cases for AMX state management AMX TILEDATA is a very large XSAVE feature. It could have caused nasty XSAVE buffer space waste in two places: * Signal stacks * Kernel task_struct->fpu buffers To avoid this waste, neither of these buffers have AMX state by default. The non-default features are called "dynamic" features. There is an arch_prctl(ARCH_REQ_XCOMP_PERM) which allows a task to declare that it wants to use AMX or other "dynamic" XSAVE features. This arch_prctl() ensures that sufficient sigaltstack space is available before it will succeed. It also expands the task_struct buffer. Functions of this test: * Test arch_prctl(ARCH_REQ_XCOMP_PERM). Ensure that it checks for proper sigaltstack sizing and that the sizing is enforced for future sigaltstack calls. * Ensure that ARCH_REQ_XCOMP_PERM is inherited across fork() * Ensure that TILEDATA use before the prctl() is fatal * Ensure that TILEDATA is cleared across fork() Note: Generally, compiler support is needed to do something with AMX. Instead, directly load AMX state from userspace with a plain XSAVE. Do not depend on the compiler. [ dhansen: bunches of cleanups ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211026122524.7BEDAA95@davehans-spike.ostc.intel.com	2021-10-28 14:35:27 +02:00
Yucong Sun	e1ef62a4dd	selftests/bpf: Adding a namespace reset for tc_redirect This patch delete ns_src/ns_dst/ns_redir namespaces before recreating them, making the test more robust. Signed-off-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211025223345.2136168-5-fallentree@fb.com	2021-10-27 11:59:02 -07:00
Yucong Sun	9e7240fb2d	selftests/bpf: Fix attach_probe in parallel mode This patch makes attach_probe uses its own method as attach point, avoiding conflict with other tests like bpf_cookie. Signed-off-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211025223345.2136168-4-fallentree@fb.com	2021-10-27 11:59:02 -07:00
Yucong Sun	547208a386	selfetests/bpf: Update vmtest.sh defaults Increase memory to 4G, 8 SMP core with host cpu passthrough. This make it run faster in parallel mode and more likely to succeed. Signed-off-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211025223345.2136168-2-fallentree@fb.com	2021-10-27 11:58:44 -07:00
Joe Burton	689624f037	libbpf: Deprecate bpf_objects_list Add a flag to `enum libbpf_strict_mode' to disable the global `bpf_objects_list', preventing race conditions when concurrent threads call bpf_object__open() or bpf_object__close(). bpf_object__next() will return NULL if this option is set. Callers may achieve the same workflow by tracking bpf_objects in application code. [0] Closes: https://github.com/libbpf/libbpf/issues/293 Signed-off-by: Joe Burton <jevburton@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com	2021-10-27 11:00:12 -07:00
Masami Hiramatsu	25b9513872	selftests/ftrace: Stop tracing while reading the trace file by default Stop tracing while reading the trace file by default, to prevent the test results while checking it and to avoid taking a long time to check the result. If there is any testcase which wants to test the tracing while reading the trace file, please override this setting inside the test case. This also recovers the pause-on-trace when clean it up. Link: https://lkml.kernel.org/r/163529053143.690749.15365238954175942026.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-10-26 20:27:19 -04:00
Jakub Kicinski	440ffcdd9d	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2021-10-26 We've added 12 non-merge commits during the last 7 day(s) which contain a total of 23 files changed, 118 insertions(+), 98 deletions(-). The main changes are: 1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen. 2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang. 3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai. 4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer. 5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun. 6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian. 7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix potential race in tail call compatibility check bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET selftests/bpf: Use recv_timeout() instead of retries net: Implement ->sock_is_readable() for UDP and AF_UNIX skmsg: Extract and reuse sk_msg_is_readable() net: Rename ->stream_memory_read to ->sock_is_readable tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function cgroup: Fix memory leak caused by missing cgroup_bpf_offline bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch() bpf: Prevent increasing bpf_jit_limit above max bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT bpf: Define bpf_jit_alloc_exec_limit for riscv JIT ==================== Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-26 14:38:55 -07:00
Yucong Sun	67b821502d	selftests/bpf: Use recv_timeout() instead of retries We use non-blocking sockets in those tests, retrying for EAGAIN is ugly because there is no upper bound for the packet arrival time, at least in theory. After we fix poll() on sockmap sockets, now we can switch to select()+recv(). Signed-off-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-5-xiyou.wangcong@gmail.com	2021-10-26 12:29:33 -07:00
Danielle Ratson	c24dbf3d4f	selftests: mlxsw: Remove deprecated test cases After adding the previous patches, the constraint that all the router interface MAC addresses have the same prefix is no longer relevant. Remove the test cases that validated that this constraint is honored. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 13:35:58 +01:00
Danielle Ratson	20d446db61	selftests: Add an occupancy test for RIF MAC profiles When all the RIF MAC profiles are in use, test that it is possible to change the MAC of a netdev (i.e., a RIF) when its MAC profile is not shared with other RIFs. Test that replacement fails when the MAC profile is shared. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 13:35:58 +01:00
Danielle Ratson	a10b7bacde	selftests: mlxsw: Add forwarding test for RIF MAC profiles Verify that MAC profile changes are indeed applied and that packets are forwarded with the correct source MAC. Output example: $ ./rif_mac_profiles.sh TEST: h1->h2: new mac profile [ OK ] TEST: h2->h1: new mac profile [ OK ] TEST: h1->h2: edit mac profile [ OK ] TEST: h2->h1: edit mac profile [ OK ] Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 13:35:58 +01:00
Danielle Ratson	152f98e7c5	selftests: mlxsw: Add a scale test for RIF MAC profiles Query the maximum number of supported RIF MAC profiles using devlink-resource and verify that all available MAC profiles can be utilized and that an error is generated when user space tries to exceed this number. Output example in Spectrum-2: $ TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] TEST: 'rif_mac_profile' overflow 5 [ OK ] Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 13:35:58 +01:00
Song Liu	20d1b54a52	selftests/bpf: Guess function end for test_get_branch_snapshot Function in modules could appear in /proc/kallsyms in random order. ffffffffa02608a0 t bpf_testmod_loop_test ffffffffa02600c0 t __traceiter_bpf_testmod_test_writable_bare ffffffffa0263b60 d __tracepoint_bpf_testmod_test_write_bare ffffffffa02608c0 T bpf_testmod_test_read ffffffffa0260d08 t __SCT__tp_func_bpf_testmod_test_writable_bare ffffffffa0263300 d __SCK__tp_func_bpf_testmod_test_read ffffffffa0260680 T bpf_testmod_test_write ffffffffa0260860 t bpf_testmod_test_mod_kfunc Therefore, we cannot reliably use kallsyms_find_next() to find the end of a function. Replace it with a simple guess (start + 128). This is good enough for this test. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022234814.318457-1-songliubraving@fb.com	2021-10-25 21:43:05 -07:00
Song Liu	b4e8707276	selftests/bpf: Skip all serial_test_get_branch_snapshot in vm Skipping the second half of the test is not enough to silent the warning in dmesg. Skip the whole test before we can either properly silent the warning in kernel, or fix LBR snapshot for VM. Fixes: `025bd7c753` ("selftests/bpf: Add test for bpf_get_branch_snapshot") Fixes: `aa67fdb464` ("selftests/bpf: Skip the second half of get_branch_snapshot in vm") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026000733.477714-1-songliubraving@fb.com	2021-10-25 20:41:00 -07:00
Ilya Leoshkevich	2e2c6d3fb3	selftests/bpf: Fix test_core_reloc_mods on big-endian machines This is the same as commit `d164dd9a5c` ("selftests/bpf: Fix test_core_autosize on big-endian machines"), but for test_core_reloc_mods. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026010831.748682-7-iii@linux.ibm.com	2021-10-25 20:39:42 -07:00
Ilya Leoshkevich	3e7ed9cebb	selftests/seccomp: Use __BYTE_ORDER__ Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined __BYTE_ORDER for consistency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026010831.748682-6-iii@linux.ibm.com	2021-10-25 20:39:42 -07:00
Ilya Leoshkevich	06fca841fb	selftests/bpf: Use __BYTE_ORDER__ Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined __BYTE_ORDER for consistency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026010831.748682-4-iii@linux.ibm.com	2021-10-25 20:39:42 -07:00
Ilya Leoshkevich	3930198dc9	libbpf: Use __BYTE_ORDER__ Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined __BYTE_ORDER for consistency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026010831.748682-3-iii@linux.ibm.com	2021-10-25 20:39:41 -07:00
Ilya Leoshkevich	45f2bebc80	libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED() __BYTE_ORDER is supposed to be defined by a libc, and __BYTE_ORDER__ - by a compiler. bpf_core_read.h checks __BYTE_ORDER == __LITTLE_ENDIAN, which is true if neither are defined, leading to incorrect behavior on big-endian hosts if libc headers are not included, which is often the case. Fixes: `ee26dade0e` ("libbpf: Add support for relocatable bitfields") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211026010831.748682-2-iii@linux.ibm.com	2021-10-25 20:39:41 -07:00
Viktor Rosendahl	f604de20c0	tools/latency-collector: Use correct size when writing queue_full_warning queue_full_warning is a pointer, so it is wrong to use sizeof to calculate the number of characters of the string it points to. The effect is that we only print out the first few characters of the warning string. The correct way is to use strlen(). We don't need to add 1 to the strlen() because we don't want to write the terminating null character to stdout. Link: https://lkml.kernel.org/r/20211019160701.15587-1-Viktor.Rosendahl@bmw.de Link: https://lore.kernel.org/r/8fd4bb65ef3da67feac9ce3258cdbe9824752cf1.1629198502.git.jing.yangyang@zte.com.cn Link: https://lore.kernel.org/r/20211012025424.180781-1-davidcomponentone@gmail.com Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Viktor Rosendahl <Viktor.Rosendahl@bmw.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-10-25 22:27:19 -04:00
Andrii Nakryiko	c4813e969a	libbpf: Deprecate ambiguously-named bpf_program__size() API The name of the API doesn't convey clearly that this size is in number of bytes (there needed to be a separate comment to make this clear in libbpf.h). Further, measuring the size of BPF program in bytes is not exactly the best fit, because BPF programs always consist of 8-byte instructions. As such, bpf_program__insn_cnt() is a better alternative in pretty much any imaginable case. So schedule bpf_program__size() deprecation starting from v0.7 and it will be removed in libbpf 1.0. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211025224531.1088894-5-andrii@kernel.org	2021-10-25 18:37:21 -07:00
Andrii Nakryiko	e21d585cb3	libbpf: Deprecate multi-instance bpf_program APIs Schedule deprecation of a set of APIs that are related to multi-instance bpf_programs: - bpf_program__set_prep() ([0]); - bpf_program__{set,unset}_instance() ([1]); - bpf_program__nth_fd(). These APIs are obscure, very niche, and don't seem to be used much in practice. bpf_program__set_prep() is pretty useless for anything but the simplest BPF programs, as it doesn't allow to adjust BPF program load attributes, among other things. In short, it already bitrotted and will bitrot some more if not removed. With bpf_program__insns() API, which gives access to post-processed BPF program instructions of any given entry-point BPF program, it's now possible to do whatever necessary adjustments were possible with set_prep() API before, but also more. Given any such use case is automatically an advanced use case, requiring users to stick to low-level bpf_prog_load() APIs and managing their own prog FDs is reasonable. [0] Closes: https://github.com/libbpf/libbpf/issues/299 [1] Closes: https://github.com/libbpf/libbpf/issues/300 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211025224531.1088894-4-andrii@kernel.org	2021-10-25 18:37:21 -07:00
Andrii Nakryiko	65a7fa2e4e	libbpf: Add ability to fetch bpf_program's underlying instructions Add APIs providing read-only access to bpf_program BPF instructions ([0]). This is useful for diagnostics purposes, but it also allows a cleaner support for cloning BPF programs after libbpf did all the FD resolution and CO-RE relocations, subprog instructions appending, etc. Currently, cloning BPF program is possible only through hijacking a half-broken bpf_program__set_prep() API, which doesn't really work well for anything but most primitive programs. For instance, set_prep() API doesn't allow adjusting BPF program load parameters which are necessary for loading fentry/fexit BPF programs (the case where BPF program cloning is a necessity if doing some sort of mass-attachment functionality). Given bpf_program__set_prep() API is set to be deprecated, having a cleaner alternative is a must. libbpf internally already keeps track of linear array of struct bpf_insn, so it's not hard to expose it. The only gotcha is that libbpf previously freed instructions array during bpf_object load time, which would make this API much less useful overall, because in between bpf_object__open() and bpf_object__load() a lot of changes to instructions are done by libbpf. So this patch makes libbpf hold onto prog->insns array even after BPF program loading. I think this is a small price for added functionality and improved introspection of BPF program code. See retsnoop PR ([1]) for how it can be used in practice and code savings compared to relying on bpf_program__set_prep(). [0] Closes: https://github.com/libbpf/libbpf/issues/298 [1] https://github.com/anakryiko/retsnoop/pull/1 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211025224531.1088894-3-andrii@kernel.org	2021-10-25 18:37:21 -07:00
Andrii Nakryiko	de5d0dcef6	libbpf: Fix off-by-one bug in bpf_core_apply_relo() Fix instruction index validity check which has off-by-one error. Fixes: `3ee4f53355` ("libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211025224531.1088894-2-andrii@kernel.org	2021-10-25 18:37:21 -07:00
Quentin Monnet	d6699f8e0f	bpftool: Switch to libbpf's hashmap for PIDs/names references In order to show PIDs and names for processes holding references to BPF programs, maps, links, or BTF objects, bpftool creates hash maps to store all relevant information. This commit is part of a set that transitions from the kernel's hash map implementation to the one coming with libbpf. The motivation is to make bpftool less dependent of kernel headers, to ease the path to a potential out-of-tree mirror, like libbpf has. This is the third and final step of the transition, in which we convert the hash maps used for storing the information about the processes holding references to BPF objects (programs, maps, links, BTF), and at last we drop the inclusion of tools/include/linux/hashtable.h. Note: Checkpatch complains about the use of __weak declarations, and the missing empty lines after the bunch of empty function declarations when compiling without the BPF skeletons (none of these were introduced in this patch). We want to keep things as they are, and the reports should be safe to ignore. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-6-quentin@isovalent.com	2021-10-25 17:31:39 -07:00
Quentin Monnet	2828d0d75b	bpftool: Switch to libbpf's hashmap for programs/maps in BTF listing In order to show BPF programs and maps using BTF objects when the latter are being listed, bpftool creates hash maps to store all relevant items. This commit is part of a set that transitions from the kernel's hash map implementation to the one coming with libbpf. The motivation is to make bpftool less dependent of kernel headers, to ease the path to a potential out-of-tree mirror, like libbpf has. This commit focuses on the two hash maps used by bpftool when listing BTF objects to store references to programs and maps, and convert them to the libbpf's implementation. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-5-quentin@isovalent.com	2021-10-25 17:31:39 -07:00
Quentin Monnet	8f184732b6	bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects In order to show pinned paths for BPF programs, maps, or links when listing them with the "-f" option, bpftool creates hash maps to store all relevant paths under the bpffs. So far, it would rely on the kernel implementation (from tools/include/linux/hashtable.h). We can make bpftool rely on libbpf's implementation instead. The motivation is to make bpftool less dependent of kernel headers, to ease the path to a potential out-of-tree mirror, like libbpf has. This commit is the first step of the conversion: the hash maps for pinned paths for programs, maps, and links are converted to libbpf's hashmap.{c,h}. Other hash maps used for the PIDs of process holding references to BPF objects are left unchanged for now. On the build side, this requires adding a dependency to a second header internal to libbpf, and making it a dependency for the bootstrap bpftool version as well. The rest of the changes are a rather straightforward conversion. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-4-quentin@isovalent.com	2021-10-25 17:31:38 -07:00
Quentin Monnet	46241271d1	bpftool: Do not expose and init hash maps for pinned path in main.c BPF programs, maps, and links, can all be listed with their pinned paths by bpftool, when the "-f" option is provided. To do so, bpftool builds hash maps containing all pinned paths for each kind of objects. These three hash maps are always initialised in main.c, and exposed through main.h. There appear to be no particular reason to do so: we can just as well make them static to the files that need them (prog.c, map.c, and link.c respectively), and initialise them only when we want to show objects and the "-f" switch is provided. This may prevent unnecessary memory allocations if the implementation of the hash maps was to change in the future. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-3-quentin@isovalent.com	2021-10-25 17:31:38 -07:00
Quentin Monnet	8b6c46241c	bpftool: Remove Makefile dep. on $(LIBBPF) for $(LIBBPF_INTERNAL_HDRS) The dependency is only useful to make sure that the $(LIBBPF_HDRS_DIR) directory is created before we try to install locally the required libbpf internal header. Let's create this directory properly instead. This is in preparation of making $(LIBBPF_INTERNAL_HDRS) a dependency to the bootstrap bpftool version, in which case we want no dependency on $(LIBBPF). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211023205154.6710-2-quentin@isovalent.com	2021-10-25 17:31:38 -07:00
Andrii Nakryiko	3762a39ce8	selftests/bpf: Split out bpf_verif_scale selftests into multiple tests Instead of using subtests in bpf_verif_scale selftest, turn each scale sub-test into its own test. Each subtest is compltely independent and just reuses a bit of common test running logic, so the conversion is trivial. For convenience, keep all of BPF verifier scale tests in one file. This conversion shaves off a significant amount of time when running test_progs in parallel mode. E.g., just running scale tests (-t verif_scale): BEFORE ====== Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED real 0m22.894s user 0m0.012s sys 0m22.797s AFTER ===== Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED real 0m12.044s user 0m0.024s sys 0m27.869s Ten second saving right there. test_progs -j is not yet ready to be turned on by default, unfortunately, and some tests fail almost every time, but this is a good improvement nevertheless. Ignoring few failures, here is sequential vs parallel run times when running all tests now: SEQUENTIAL ========== Summary: 206/953 PASSED, 4 SKIPPED, 0 FAILED real 1m5.625s user 0m4.211s sys 0m31.650s PARALLEL ======== Summary: 204/952 PASSED, 4 SKIPPED, 2 FAILED real 0m35.550s user 0m4.998s sys 0m39.890s Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-5-andrii@kernel.org	2021-10-25 14:45:46 -07:00
Andrii Nakryiko	2c0f51ac32	selftests/bpf: Mark tc_redirect selftest as serial It seems to cause a lot of harm to kprobe/tracepoint selftests. Yucong mentioned before that it does manipulate sysfs, which might be the reason. So let's mark it as serial, though ideally it would be less intrusive on the system at test. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-4-andrii@kernel.org	2021-10-25 14:45:46 -07:00
Andrii Nakryiko	8ea688e7f4	selftests/bpf: Support multiple tests per file Revamp how test discovery works for test_progs and allow multiple test entries per file. Any global void function with no arguments and serial_test_ or test_ prefix is considered a test. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-3-andrii@kernel.org	2021-10-25 14:45:46 -07:00
Andrii Nakryiko	6972dc3b87	selftests/bpf: Normalize selftest entry points Ensure that all test entry points are global void functions with no input arguments. Mark few subtest entry points as static. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022223228.99920-2-andrii@kernel.org	2021-10-25 14:45:45 -07:00
Ido Schimmel	e860419684	selftests: mlxsw: Reduce test run time Instead of iterating over all the available trap policers, only perform the tests with three policers: The first, the last and the one in the middle of the range. On a Spectrum-3 system, this reduces the run time from almost an hour to a few minutes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 14:10:11 +01:00
Ido Schimmel	535ac9a5fb	selftests: mlxsw: Use permanent neighbours instead of reachable ones The nexthop objects tests configure dummy reachable neighbours so that the nexthops will have a MAC address and be programmed to the device. Since these are dummy reachable neighbours, they can be transitioned by the kernel to a failed state if they are around for too long. This can happen, for example, if the "TIMEOUT" variable is configured with a too high value. Make the tests more robust by configuring the neighbours as permanent, so that the tests do not depend on the configured timeout value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 14:10:11 +01:00
Petr Machata	b8bfafe434	selftests: mlxsw: Add helpers for skipping selftests A number of mlxsw-specific selftests currently detect whether they are run on a compatible machine, and bail out silently when not. These tests are however done in a somewhat impenetrable manner by directly comparing PCI IDs against a blacklist or a whitelist, and bailing out silently if the machine is not compatible. Instead, add a helper, mlxsw_only_on_spectrum(), which allows specifying the supported machines in a human-readable manner. If the current machine is incompatible, the helper emits a SKIP message and returns an error code, based on which the caller can gracefully bail out in a suitable way. This allows a more readable conditions such as: mlxsw_only_on_spectrum 2+ \|\| return Convert all existing open-coded guards to the new helper. Also add two new guards to do_mark_test() and do_drop_test(), which are supported only on Spectrum-2+, but the corresponding check was not there. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 14:10:11 +01:00
Vladimir Oltean	eccd0a80dc	selftests: net: dsa: add a stress test for unlocked FDB operations This test is a bit strange in that it is perhaps more manual than others: it does not transmit a clear OK/FAIL verdict, because user space does not have synchronous feedback from the kernel. If a hardware access fails, it is in deferred context. Nonetheless, on sja1105 I have used it successfully to find and solve a concurrency issue, so it can be used as a starting point for other driver maintainers too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:42 +01:00
Vladimir Oltean	d70b51f284	selftests: lib: forwarding: allow tests to not require mz and jq These programs are useful, but not all selftests require them. Additionally, on embedded boards without package management (things like buildroot), installing mausezahn or jq is not always as trivial as downloading a package from the web. So it is actually a bit annoying to require programs that are not used. Introduce options that can be set by scripts to not enforce these dependencies. For compatibility, default to "yes". Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Cc: Guillaume Nault <gnault@redhat.com> Cc: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:42 +01:00
David S. Miller	2d7e73f09f	Revert "Merge branch 'dsa-rtnl'" This reverts commit `965e6b262f`, reversing changes made to `4d98bb0d7e`.	2021-10-25 12:59:25 +01:00
Vladimir Oltean	edc90d1585	selftests: net: dsa: add a stress test for unlocked FDB operations This test is a bit strange in that it is perhaps more manual than others: it does not transmit a clear OK/FAIL verdict, because user space does not have synchronous feedback from the kernel. If a hardware access fails, it is in deferred context. Nonetheless, on sja1105 I have used it successfully to find and solve a concurrency issue, so it can be used as a starting point for other driver maintainers too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:47:45 +01:00
Vladimir Oltean	016748961b	selftests: lib: forwarding: allow tests to not require mz and jq These programs are useful, but not all selftests require them. Additionally, on embedded boards without package management (things like buildroot), installing mausezahn or jq is not always as trivial as downloading a package from the web. So it is actually a bit annoying to require programs that are not used. Introduce options that can be set by scripts to not enforce these dependencies. For compatibility, default to "yes". Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Cc: Guillaume Nault <gnault@redhat.com> Cc: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:47:45 +01:00
Andrii Nakryiko	c825f5fee1	libbpf: Fix BTF header parsing checks Original code assumed fixed and correct BTF header length. That's not always the case, though, so fix this bug with a proper additional check. And use actual header length instead of sizeof(struct btf_header) in sanity checks. Fixes: `8a138aed4a` ("bpf: btf: Add BTF support to libbpf") Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211023003157.726961-2-andrii@kernel.org	2021-10-22 17:33:31 -07:00
Andrii Nakryiko	5245dafe3d	libbpf: Fix overflow in BTF sanity checks btf_header's str_off+str_len or type_off+type_len can overflow as they are u32s. This will lead to bypassing the sanity checks during BTF parsing, resulting in crashes afterwards. Fix by using 64-bit signed integers for comparison. Fixes: `d812362450` ("libbpf: Fix BTF data layout checks and allow empty BTF") Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211023003157.726961-1-andrii@kernel.org	2021-10-22 17:33:31 -07:00
Yonghong Song	8c18ea2d2c	selftests/bpf: Add BTF_KIND_DECL_TAG typedef example in tag.c Change value type in progs/tag.c to a typedef with a btf_decl_tag. With `bpftool btf dump file tag.o`, we have ... [14] TYPEDEF 'value_t' type_id=17 [15] DECL_TAG 'tag1' type_id=14 component_idx=-1 [16] DECL_TAG 'tag2' type_id=14 component_idx=-1 [17] STRUCT '(anon)' size=8 vlen=2 'a' type_id=2 bits_offset=0 'b' type_id=2 bits_offset=32 ... The btf_tag selftest also succeeded: $ ./test_progs -t tag #21 btf_tag:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211021195643.4020315-1-yhs@fb.com	2021-10-22 17:04:44 -07:00
Yonghong Song	557c8c4804	selftests/bpf: Test deduplication for BTF_KIND_DECL_TAG typedef Add unit tests for deduplication of BTF_KIND_DECL_TAG to typedef types. Also changed a few comments from "tag" to "decl_tag" to match BTF_KIND_DECL_TAG enum value name. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211021195638.4019770-1-yhs@fb.com	2021-10-22 17:04:44 -07:00
Yonghong Song	9d19a12b02	selftests/bpf: Add BTF_KIND_DECL_TAG typedef unit tests Test good and bad variants of typedef BTF_KIND_DECL_TAG encoding. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211021195633.4019472-1-yhs@fb.com	2021-10-22 17:04:43 -07:00
Stanislav Fomichev	d1321207b1	selftests/bpf: Fix flow dissector tests - update custom loader to search by name, not section name - update bpftool commands to use proper pin path Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211021214814.1236114-4-sdf@google.com	2021-10-22 16:53:38 -07:00
Stanislav Fomichev	a77f879ba1	libbpf: Use func name when pinning programs with LIBBPF_STRICT_SEC_NAME We can't use section name anymore because they are not unique and pinning objects with multiple programs with the same progtype/secname will fail. [0] Closes: https://github.com/libbpf/libbpf/issues/273 Fixes: `33a2c75c55` ("libbpf: add internal pin_name") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20211021214814.1236114-2-sdf@google.com	2021-10-22 16:53:11 -07:00
Quentin Monnet	e89ef634f8	bpftool: Avoid leaking the JSON writer prepared for program metadata Bpftool creates a new JSON object for writing program metadata in plain text mode, regardless of metadata being present or not. Then this writer is freed if any metadata has been found and printed, but it leaks otherwise. We cannot destroy the object unconditionally, because the destructor prints an undesirable line break. Instead, make sure the writer is created only after we have found program metadata to print. Found with valgrind. Fixes: `aff52e685e` ("bpftool: Support dumping metadata") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022094743.11052-1-quentin@isovalent.com	2021-10-22 16:44:56 -07:00
Hengqi Chen	487ef148cf	selftests/bpf: Switch to new btf__type_cnt/btf__raw_data APIs Replace the calls to btf__get_nr_types/btf__get_raw_data in selftests with new APIs btf__type_cnt/btf__raw_data. The old APIs will be deprecated in libbpf v0.7+. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022130623.1548429-6-hengqi.chen@gmail.com	2021-10-22 16:09:14 -07:00
Hengqi Chen	58fc155b0e	bpftool: Switch to new btf__type_cnt API Replace the call to btf__get_nr_types with new API btf__type_cnt. The old API will be deprecated in libbpf v0.7+. No functionality change. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022130623.1548429-5-hengqi.chen@gmail.com	2021-10-22 16:09:14 -07:00
Hengqi Chen	2d8f09fafc	tools/resolve_btfids: Switch to new btf__type_cnt API Replace the call to btf__get_nr_types with new API btf__type_cnt. The old API will be deprecated in libbpf v0.7+. No functionality change. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022130623.1548429-4-hengqi.chen@gmail.com	2021-10-22 16:09:14 -07:00
Hengqi Chen	2502e74bb5	perf bpf: Switch to new btf__raw_data API Replace the call to btf__get_raw_data with new API btf__raw_data. The old APIs will be deprecated in libbpf v0.7+. No functionality change. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022130623.1548429-3-hengqi.chen@gmail.com	2021-10-22 16:09:14 -07:00
Hengqi Chen	6a886de070	libbpf: Add btf__type_cnt() and btf__raw_data() APIs Add btf__type_cnt() and btf__raw_data() APIs and deprecate btf__get_nr_type() and btf__get_raw_data() since the old APIs don't follow the libbpf naming convention for getters which omit 'get' in the name (see [0]). btf__raw_data() is just an alias to the existing btf__get_raw_data(). btf__type_cnt() now returns the number of all types of the BTF object including 'void'. [0] Closes: https://github.com/libbpf/libbpf/issues/279 Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022130623.1548429-2-hengqi.chen@gmail.com	2021-10-22 16:09:14 -07:00
Mauricio Vásquez	1000298c76	libbpf: Fix memory leak in btf__dedup() Free btf_dedup if btf_ensure_modifiable() returns error. Fixes: `919d2b1dbb` ("libbpf: Allow modification of BTF and add btf__add_str API") Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211022202035.48868-1-mauricio@kinvolk.io	2021-10-22 16:00:53 -07:00
Andrii Nakryiko	57385ae31f	selftests/bpf: Make perf_buffer selftests work on 4.9 kernel again Recent change to use tp/syscalls/sys_enter_nanosleep for perf_buffer selftests causes this selftest to fail on 4.9 kernel in libbpf CI ([0]): libbpf: prog 'handle_sys_enter': failed to attach to perf_event FD 6: Invalid argument libbpf: prog 'handle_sys_enter': failed to attach to tracepoint 'syscalls/sys_enter_nanosleep': Invalid argument It's not exactly clear why, because perf_event itself is created for this tracepoint, but I can't even compile 4.9 kernel locally, so it's hard to figure this out. If anyone has better luck and would like to help investigating this, I'd really appreciate this. For now, unblock CI by switching back to raw_syscalls/sys_enter, but reduce amount of unnecessary samples emitted by filter by process ID. Use explicit ARRAY map for that to make it work on 4.9 as well, because global data isn't yet supported there. Fixes: `aa274f98b2` ("selftests/bpf: Fix possible/online index mismatch in perf_buffer test") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022201342.3490692-1-andrii@kernel.org	2021-10-22 14:26:33 -07:00
Andrii Nakryiko	fae1b05e6f	libbpf: Fix the use of aligned attribute Building libbpf sources out of kernel tree (in Github repo) we run into compilation error due to unknown __aligned attribute. It must be coming from some kernel header, which is not available to Github sources. Use explicit __attribute__((aligned(16))) instead. Fixes: `961632d541` ("libbpf: Fix dumping non-aligned __int128") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211022192502.2975553-1-andrii@kernel.org	2021-10-22 14:24:36 -07:00
Florian Westphal	1f83b835a3	fcnal-test: kill hanging ping/nettest binaries on cleanup On my box I see a bunch of ping/nettest processes hanging around after fcntal-test.sh is done. Clean those up before netns deletion. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211021140247.29691-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-22 14:03:18 -07:00
David S. Miller	bdfa75ad70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-22 11:41:16 +01:00
Linus Torvalds	6c2c712767	Networking fixes for 5.15-rc7, including fixes from netfilter, and can. Current release - regressions: - revert "vrf: reset skb conntrack connection on VRF rcv", there are valid uses for previous behavior - can: m_can: fix iomap_read_fifo() and iomap_write_fifo() Current release - new code bugs: - mlx5: e-switch, return correct error code on group creation failure Previous releases - regressions: - sctp: fix transport encap_port update in sctp_vtag_verify - stmmac: fix E2E delay mechanism (in PTP timestamping) Previous releases - always broken: - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of init - netfilter: ipvs: make global sysctl read-only in non-init netns - tcp: md5: fix selection between vrf and non-vrf keys - ipv6: count rx stats on the orig netdev when forwarding - bridge: mcast: use multicast_membership_interval for IGMPv3 - can: - j1939: fix UAF for rx_kref of j1939_priv abort sessions on receiving bad messages - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix return error on FC timeout on TX path - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited - hns3: schedule the polling again when allocation fails, prevent stalls - drivers: add missing of_node_put() when aborting for_each_available_child_of_node() - ptp: fix possible memory leak and UAF in ptp_clock_register() - e1000e: fix packet loss in burst mode on Tiger Lake and later - mlx5e: ipsec: fix more checksum offload issues Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFxgHwACgkQMUZtbf5S IrvFgw//T73aR3B2Xvz5/1rtglfmtcUqFQsyGDXGD5HnfbAbsRcz8vcQ/mTsExl7 +mJY/ZuQefsD7UQDyg3GNhbgf1+pEjHC81ryeNsfET7+JxgYLD3NEYSBYUqIFZUo gStAStGBG+ClQUaqlkGFyyf6GrqwpmxZKRr6F9fUsufQ14m9tvcT/QPcrXL4q7qX Fz644yUe/IvKnuJDHJVZsc8UXR9NTPCyCNJT9kVewwPMIMEc/xMOg5QONLZT0TlC Zk4XJIqlBBEQWrN/QwrGXm82aO+3gQZyD5K9AvpczgcBjOr6FJOmN6zkQrqNNWaC 2wPAfWi7DALPtOZR6lCxoeWfLRfdn1ZOn5x2z5xrtAXCV2FTaMg8in9TzJ57qmcb /l43QzcNGSj1ytyny8pqgdsX2MSqs0O5VSG4egMtz7TeU/rs7uAx2IVHbPT8CHop PvhVHeUeu9lGu+FUK8piQbb5aVpbA9qlOj/rXNrHDIxdA9McQgVs+tljNG4X5KtX L7BR84wNg98HtIINVx6RjYz9lOpG1qBuw5RCiqiAaN1RBY7lYAhMaAE6U3azjgC+ AIz/MacNuAz/oTuutQB6/0WZDDJhy4WEy3TrDLlpQNz6yIrpKFN+ftyF6DuVUSMH PmtZ4E/DLooQL5KwuoDdYDH1gSMlggBejeGHTFJ+RUMuvRePZQ8= =Hwqr -----END PGP SIGNATURE----- Merge tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, and can. We'll have one more fix for a socket accounting regression, it's still getting polished. Otherwise things look fine. Current release - regressions: - revert "vrf: reset skb conntrack connection on VRF rcv", there are valid uses for previous behavior - can: m_can: fix iomap_read_fifo() and iomap_write_fifo() Current release - new code bugs: - mlx5: e-switch, return correct error code on group creation failure Previous releases - regressions: - sctp: fix transport encap_port update in sctp_vtag_verify - stmmac: fix E2E delay mechanism (in PTP timestamping) Previous releases - always broken: - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of init - netfilter: ipvs: make global sysctl read-only in non-init netns - tcp: md5: fix selection between vrf and non-vrf keys - ipv6: count rx stats on the orig netdev when forwarding - bridge: mcast: use multicast_membership_interval for IGMPv3 - can: - j1939: fix UAF for rx_kref of j1939_priv abort sessions on receiving bad messages - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix return error on FC timeout on TX path - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited - hns3: schedule the polling again when allocation fails, prevent stalls - drivers: add missing of_node_put() when aborting for_each_available_child_of_node() - ptp: fix possible memory leak and UAF in ptp_clock_register() - e1000e: fix packet loss in burst mode on Tiger Lake and later - mlx5e: ipsec: fix more checksum offload issues" * tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) usbnet: sanity check for maxpacket net: enetc: make sure all traffic classes can send large frames net: enetc: fix ethtool counter name for PM0_TERR ptp: free 'vclock_index' in ptp_clock_release() sfc: Don't use netif_info before net_device setup sfc: Export fibre-specific supported link modes net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags net/mlx5e: IPsec: Fix a misuse of the software parser's fields net/mlx5e: Fix vlan data lost during suspend flow net/mlx5: E-switch, Return correct error code on group creation failure net/mlx5: Lag, change multipath and bonding to be mutually exclusive ice: Add missing E810 device ids igc: Update I226_K device ID e1000e: Fix packet loss on Tiger Lake and later e1000e: Separate TGP board type from SPT ptp: Fix possible memory leak in ptp_clock_register() net: stmmac: Fix E2E delay mechanism nfc: st95hf: Make spi remove() callback return zero net: hns3: disable sriov before unload hclge layer net: hns3: fix vf reset workqueue cannot exit ...	2021-10-21 15:36:50 -10:00
Andrii Nakryiko	4f2511e199	selftests/bpf: Switch to ".bss"/".rodata"/".data" lookups for internal maps Utilize libbpf's feature of allowing to lookup internal maps by their ELF section names. No need to guess or calculate the exact truncated prefix taken from the object name. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-11-andrii@kernel.org	2021-10-21 17:10:11 -07:00
Andrii Nakryiko	26071635ac	libbpf: Simplify look up by name of internal maps Map name that's assigned to internal maps (.rodata, .data, .bss, etc) consist of a small prefix of bpf_object's name and ELF section name as a suffix. This makes it hard for users to "guess" the name to use for looking up by name with bpf_object__find_map_by_name() API. One proposal was to drop object name prefix from the map name and just use ".rodata", ".data", etc, names. One downside called out was that when multiple BPF applications are active on the host, it will be hard to distinguish between multiple instances of .rodata and know which BPF object (app) they belong to. Having few first characters, while quite limiting, still can give a bit of a clue, in general. Note, though, that btf_value_type_id for such global data maps (ARRAY) points to DATASEC type, which encodes full ELF name, so tools like bpftool can take advantage of this fact to "recover" full original name of the map. This is also the reason why for custom .data.* and .rodata.* maps libbpf uses only their ELF names and doesn't prepend object name at all. Another downside of such approach is that it is not backwards compatible and, among direct use of bpf_object__find_map_by_name() API, will break any BPF skeleton generated using bpftool that was compiled with older libbpf version. Instead of causing all this pain, libbpf will still generate map name using a combination of object name and ELF section name, but it will allow looking such maps up by their natural names, which correspond to their respective ELF section names. This means non-truncated ELF section names longer than 15 characters are going to be expected and supported. With such set up, we get the best of both worlds: leave small bits of a clue about BPF application that instantiated such maps, as well as making it easy for user apps to lookup such maps at runtime. In this sense it closes corresponding libbpf 1.0 issue ([0]). BPF skeletons will continue using full names for lookups. [0] Closes: https://github.com/libbpf/libbpf/issues/275 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-10-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	30c5bd9647	selftests/bpf: Demonstrate use of custom .rodata/.data sections Enhance existing selftests to demonstrate the use of custom .data/.rodata sections. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-9-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	aed659170a	libbpf: Support multiple .rodata.* and .data.* BPF maps Add support for having multiple .rodata and .data data sections ([0]). .rodata/.data are supported like the usual, but now also .rodata.<whatever> and .data.<whatever> are also supported. Each such section will get its own backing BPF_MAP_TYPE_ARRAY, just like .rodata and .data. Multiple .bss maps are not supported, as the whole '.bss' name is confusing and might be deprecated soon, as well as user would need to specify custom ELF section with SEC() attribute anyway, so might as well stick to just .data.* and .rodata.* convention. User-visible map name for such new maps is going to be just their ELF section names. [0] https://github.com/libbpf/libbpf/issues/274 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-8-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	ef9356d392	bpftool: Improve skeleton generation for data maps without DATASEC type It can happen that some data sections (e.g., .rodata.cst16, containing compiler populated string constants) won't have a corresponding BTF DATASEC type. Now that libbpf supports .rodata.* and .data.* sections, situation like that will cause invalid BPF skeleton to be generated that won't compile successfully, as some parts of skeleton would assume memory-mapped struct definitions for each special data section. Fix this by generating empty struct definitions for such data sections. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-7-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	8654b4d35e	bpftool: Support multiple .rodata/.data internal maps in skeleton Remove the assumption about only single instance of each of .rodata and .data internal maps. Nothing changes for '.rodata' and '.data' maps, but new '.rodata.something' map will get 'rodata_something' section in BPF skeleton for them (as well as having struct bpf_map * field in maps section with the same field name). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-6-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	25bbbd7a44	libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps Remove internal libbpf assumption that there can be only one .rodata, .data, and .bss map per BPF object. To achieve that, extend and generalize the scheme that was used for keeping track of relocation ELF sections. Now each ELF section has a temporary extra index that keeps track of logical type of ELF section (relocations, data, read-only data, BSS). Switch relocation to this scheme, as well as .rodata/.data/.bss handling. We don't yet allow multiple .rodata, .data, and .bss sections, but no libbpf internal code makes an assumption that there can be only one of each and thus they can be explicitly referenced by a single index. Next patches will actually allow multiple .rodata and .data sections. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-5-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	ad23b72384	libbpf: Use Elf64-specific types explicitly for dealing with ELF Minimize the usage of class-agnostic gelf_xxx() APIs from libelf. These APIs require copying ELF data structures into local GElf_xxx structs and have a more cumbersome API. BPF ELF file is defined to be always 64-bit ELF object, even when intended to be run on 32-bit host architectures, so there is no need to do class-agnostic conversions everywhere. BPF static linker implementation within libbpf has been using Elf64-specific types since initial implementation. Add two simple helpers, elf_sym_by_idx() and elf_rel_by_idx(), for more succinct direct access to ELF symbol and relocation records within ELF data itself and switch all the GElf_xxx usage into Elf64_xxx equivalents. The only remaining place within libbpf.c that's still using gelf API is gelf_getclass(), as there doesn't seem to be a direct way to get underlying ELF bitness. No functional changes intended. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-4-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	29a30ff501	libbpf: Extract ELF processing state into separate struct Name currently anonymous internal struct that keeps ELF-related state for bpf_object. Just a bit of clean up, no functional changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-3-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Andrii Nakryiko	b96c07f3b5	libbpf: Deprecate btf__finalize_data() and move it into libbpf.c There isn't a good use case where anyone but libbpf itself needs to call btf__finalize_data(). It was implemented for internal use and it's not clear why it was made into public API in the first place. To function, it requires active ELF data, which is stored inside bpf_object for the duration of opening phase only. But the only BTF that needs bpf_object's ELF is that bpf_object's BTF itself, which libbpf fixes up automatically during bpf_object__open() operation anyways. There is no need for any additional fix up and no reasonable scenario where it's useful and appropriate. Thus, btf__finalize_data() is just an API atavism and is better removed. So this patch marks it as deprecated immediately (v0.6+) and moves the code from btf.c into libbpf.c where it's used in the context of bpf_object opening phase. Such code co-location allows to make code structure more straightforward and remove bpf_object__section_size() and bpf_object__variable_offset() internal helpers from libbpf_internal.h, making them static. Their naming is also adjusted to not create a wrong illusion that they are some sort of method of bpf_object. They are internal helpers and are called appropriately. This is part of libbpf 1.0 effort ([0]). [0] Closes: https://github.com/libbpf/libbpf/issues/276 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021014404.2635234-2-andrii@kernel.org	2021-10-21 17:10:10 -07:00
Jiri Olsa	99d099757a	selftests/bpf: Use nanosleep tracepoint in perf buffer test The perf buffer tests triggers trace with nanosleep syscall, but monitors all syscalls, which results in lot of data in the buffer and makes it harder to debug. Let's lower the trace traffic and monitor just nanosleep syscall. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-4-jolsa@kernel.org	2021-10-21 15:59:31 -07:00
Jiri Olsa	aa274f98b2	selftests/bpf: Fix possible/online index mismatch in perf_buffer test The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer serial_test_perf_buffer:PASS:nr_cpus 0 nsec serial_test_perf_buffer:PASS:nr_on_cpus 0 nsec serial_test_perf_buffer:PASS:skel_load 0 nsec serial_test_perf_buffer:PASS:attach_kprobe 0 nsec serial_test_perf_buffer:PASS:perf_buf__new 0 nsec serial_test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #4 serial_test_perf_buffer:PASS:perf_buffer__poll 0 nsec serial_test_perf_buffer:PASS:seen_cpu_cnt 0 nsec serial_test_perf_buffer:PASS:buf_cnt 0 nsec ... serial_test_perf_buffer:PASS:fd_check 0 nsec serial_test_perf_buffer:PASS:drain_buf 0 nsec serial_test_perf_buffer:PASS:consume_buf 0 nsec serial_test_perf_buffer:FAIL:cpu_seen cpu 5 not seen #88 perf_buffer:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED If the offline cpu is from the middle of the possible set, we get mismatch with possible and online cpu buffers. The perf buffer test calls perf_buffer__consume_buffer for all 'possible' cpus, but the library holds only 'online' cpu buffers and perf_buffer__consume_buffer returns them based on index. Adding extra (online) index to keep track of online buffers, we need the original (possible) index to trigger trace on proper cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-3-jolsa@kernel.org	2021-10-21 15:59:28 -07:00
Jiri Olsa	d4121376ac	selftests/bpf: Fix perf_buffer test on system with offline cpus The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org	2021-10-21 15:59:20 -07:00
Dave Marchevsky	e1b9023fc7	selftests/bpf: Add verif_stats test verified_insns field was added to response of bpf_obj_get_info_by_fd call on a prog. Confirm that it's being populated by loading a simple program and asking for its info. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211020074818.1017682-3-davemarchevsky@fb.com	2021-10-21 15:51:47 -07:00
Dave Marchevsky	aba64c7da9	bpf: Add verified_insns to bpf_prog_info and fdinfo This stat is currently printed in the verifier log and not stored anywhere. To ease consumption of this data, add a field to bpf_prog_aux so it can be exposed via BPF_OBJ_GET_INFO_BY_FD and fdinfo. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211020074818.1017682-2-davemarchevsky@fb.com	2021-10-21 15:51:47 -07:00
Ilya Leoshkevich	632f96d265	libbpf: Fix ptr_is_aligned() usages Currently ptr_is_aligned() takes size, and not alignment, as a parameter, which may be overly pessimistic e.g. for __i128 on s390, which must be only 8-byte aligned. Fix by using btf__align_of(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211021104658.624944-2-iii@linux.ibm.com	2021-10-21 15:50:04 -07:00

1 2 3 4 5 ...

27242 Commits