Commit Graph

33584 Commits

Author SHA1 Message Date
Jakub Kicinski 2d104c390f bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY9RqJgAKCRDbK58LschI
 gw2IAP9G5uhFO5abBzYLupp6SY3T5j97MUvPwLfFqUEt7EXmuwEA2lCUEWeW0KtR
 QX+QmzCa6iHxrW7WzP4DUYLue//FJQY=
 =yYqA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28 00:00:14 -08:00
Jakub Kicinski b568d3072a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e ("ice: move devlink port creation/deletion")
  643ef23bd9 ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef43 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5 ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a9 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b765148 ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27 22:56:18 -08:00
Stanislav Fomichev 16809afdcb selftest/bpf: Make crashes more debuggable in test_progs
Reset stdio before printing verbose log of the SIGSEGV'ed test.
Otherwise, it's hard to understand what's going on in the cases like [0].

With the following patch applied:

	--- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
	+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
	@@ -392,6 +392,11 @@ void test_xdp_metadata(void)
	 		       "generate freplace packet"))
	 		goto out;

	+
	+	ASSERT_EQ(1, 2, "oops");
	+	int *x = 0;
	+	*x = 1; /* die */
	+
	 	while (!retries--) {
	 		if (bpf_obj2->bss->called)
	 			break;

Before:

 #281     xdp_metadata:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x55c919d98bcf]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bf90)[0x7f36aea5df90]
./test_progs(test_xdp_metadata+0x1db0)[0x55c919d8c6d0]
./test_progs(+0x23b438)[0x55c919d9a438]
./test_progs(main+0x534)[0x55c919d99454]
/lib/x86_64-linux-gnu/libc.so.6(+0x2718a)[0x7f36aea4918a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f36aea49245]
./test_progs(_start+0x21)[0x55c919b82ef1]

After:

test_xdp_metadata:PASS:ip netns add xdp_metadata 0 nsec
open_netns:PASS:malloc token 0 nsec
open_netns:PASS:open /proc/self/ns/net 0 nsec
open_netns:PASS:open netns fd 0 nsec
open_netns:PASS:setns 0 nsec
..
test_xdp_metadata:FAIL:oops unexpected oops: actual 1 != expected 2
 #281     xdp_metadata:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x1f)[0x562714a76bcf]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bf90)[0x7fa663f9cf90]
./test_progs(test_xdp_metadata+0x1db0)[0x562714a6a6d0]
./test_progs(+0x23b438)[0x562714a78438]
./test_progs(main+0x534)[0x562714a77454]
/lib/x86_64-linux-gnu/libc.so.6(+0x2718a)[0x7fa663f8818a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fa663f88245]
./test_progs(_start+0x21)[0x562714860ef1]

0: https://github.com/kernel-patches/bpf/actions/runs/4019879316/jobs/6907358876

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230127215705.1254316-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-27 15:22:00 -08:00
Grant Seltzer e4ce876f10 libbpf: Add documentation to map pinning API functions
This adds documentation for the following API functions:

- bpf_map__set_pin_path()
- bpf_map__pin_path()
- bpf_map__is_pinned()
- bpf_map__pin()
- bpf_map__unpin()
- bpf_object__pin_maps()
- bpf_object__unpin_maps()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024225.520685-1-grantseltzer@gmail.com
2023-01-27 23:03:19 +01:00
Grant Seltzer 139df64d26 libbpf: Fix malformed documentation formatting
This fixes the doxygen format documentation above the
user_ring_buffer__* APIs. There has to be a newline
before the @brief, otherwise doxygen won't render them
for libbpf.readthedocs.org.

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230126024749.522278-1-grantseltzer@gmail.com
2023-01-27 23:02:03 +01:00
Stanislav Fomichev a5f3a3f7c1 selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
The existing timestamping_enable() is a no-op because it applies
to the socket-related path that we are not verifying here
anymore. (but still leaving the code around hoping we can
have xdp->skb path verified here as well)

  poll: 1 (0)
  xsk_ring_cons__peek: 1
  0xf64788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
  rx_hash: 3697961069
  rx_timestamp:  1674657672142214773 (sec:1674657672.1422)
  XDP RX-time:   1674657709561774876 (sec:1674657709.5618) delta sec:37.4196
  AF_XDP time:   1674657709561871034 (sec:1674657709.5619) delta
sec:0.0001 (96.158 usec)
  0xf64788: complete idx=8 addr=8000

Also, maybe something to archive here, see [0] for Jesper's note
about NIC vs host clock delta.

0: https://lore.kernel.org/bpf/f3a116dc-1b14-3432-ad20-a36179ef0608@redhat.com/

v2:
- Restore original value (Martin)

Fixes: 297a3f1241 ("selftests/bpf: Simple program to dump XDP RX metadata")
Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230126225030.510629-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-26 22:10:31 -08:00
Jakub Kicinski 3a43ded081 tools: ynl: store ops in ordered dict to avoid random ordering
When rendering code we should walk the ops in the order in which
they are declared in the spec. This is both more intuitive and
prevents code from jumping around when hashing in the dict changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-26 16:32:41 -08:00
Jakub Kicinski b49c34e217 tools: ynl: rename ops_list -> msg_list
ops_list contains all the operations, but the main iteration use
case is to walk only ops which define attrs. Rename ops_list to
msg_list, because now it looks like the contents are the same,
just the format is different. While at it convert from tuple
to just keys, none of the users care about the name of the op.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-26 16:32:41 -08:00
Jakub Kicinski 66fa34b9c2 tools: ynl: support kdocs for flags in code generation
Lorenzo reports that after switching from enum to flags netdev
family lost ability to render kdoc (and the enum contents got
generally garbled).

Combine the flags and enum handling in uAPI handling.

Reported-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-26 16:32:41 -08:00
Matthieu Baerts 8dbdf24f4e selftests: mptcp: userspace: avoid read errors
During the cleanup phase, the server pids were killed with a SIGTERM
directly, not using a SIGUSR1 first to quit safely. As a result, this
test was often ending with two error messages:

  read: Connection reset by peer

While at it, use a for-loop to terminate all the PIDs the same way.

Also the different files are now removed after having killed the PIDs
using them. It makes more sense to do that in this order.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26 13:33:30 +01:00
Matthieu Baerts 10d4273411 selftests: mptcp: userspace: print error details if any
Before, only '[FAIL]' was printed in case of error during the validation
phase.

Now, in case of failure, the variable name, its value and expected one
are displayed to help understand what was wrong.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26 13:33:30 +01:00
Matthieu Baerts 1c0b0ee264 selftests: mptcp: userspace: refactor asserts
Instead of having a long list of conditions to check, it is possible to
give a list of variable names to compare with their 'e_XXX' version.

This will ease the introduction of the following commit which will print
which condition has failed (if any).

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26 13:33:30 +01:00
Matthieu Baerts f790ae03db selftests: mptcp: userspace: print titles
This script is running a few tests after having setup the environment.

Printing titles helps understand what is being tested.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26 13:33:30 +01:00
Paolo Abeni ad3493746e selftests: mptcp: add test-cases for mixed v4/v6 subflows
Note that we can't guess the listener family anymore based on the client
target address: always use IPv6.

The fullmesh flag with endpoints from different families is also
validated here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-26 13:33:30 +01:00
Jakub Sitnicki ae5439658c selftests/net: Cover the IP_LOCAL_PORT_RANGE socket option
Exercise IP_LOCAL_PORT_RANGE socket option in various scenarios:

1. pass invalid values to setsockopt
2. pass a range outside of the per-netns port range
3. configure a single-port range
4. exhaust a configured multi-port range
5. check interaction with late-bind (IP_BIND_ADDRESS_NO_PORT)
6. set then get the per-socket port range

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25 22:45:00 -08:00
Kui-Feng Lee d1246f9360 selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
Ensures that whenever bpf_setsockopt() is called with the SOL_TCP
option on a ktls enabled socket, the call will be accepted by the
system. The provided test makes sure of this by performing an
examination when the server side socket is in the CLOSE_WAIT state. At
this stage, ktls is still enabled on the server socket and can be used
to test if bpf_setsockopt() works correctly with linux.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230125201608.908230-3-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-25 15:10:34 -08:00
David Vernet 7dd880592a bpf/selftests: Verify struct_ops prog sleepable behavior
In a set of prior changes, we added the ability for struct_ops programs
to be sleepable. This patch enhances the dummy_st_ops selftest suite to
validate this behavior by adding a new sleepable struct_ops entry to
dummy_st_ops.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00
David Vernet 913b2255c3 libbpf: Support sleepable struct_ops.s section
In a prior change, the verifier was updated to support sleepable
BPF_PROG_TYPE_STRUCT_OPS programs. A caller could set the program as
sleepable with bpf_program__set_flags(), but it would be more ergonomic
and more in-line with other sleepable program types if we supported
suffixing a struct_ops section name with .s to indicate that it's
sleepable.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00
David Vernet 1e12d3ef47 bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
BPF struct_ops programs currently cannot be marked as sleepable. This
need not be the case -- struct_ops programs can be sleepable, and e.g.
invoke kfuncs that export the KF_SLEEPABLE flag. So as to allow future
struct_ops programs to invoke such kfuncs, this patch updates the
verifier to allow struct_ops programs to be sleepable. A follow-on patch
will add support to libbpf for specifying struct_ops.s as a sleepable
struct_ops program, and then another patch will add testcases to the
dummy_st_ops selftest suite which test sleepable struct_ops behavior.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00
Daniel T. Lee 2514a31241 selftests/bpf: Fix vmtest static compilation error
As stated in README.rst, in order to resolve errors with linker errors,
'LDLIBS=-static' should be used. Most problems will be solved by this
option, but in the case of urandom_read, this won't fix the problem. So
the Makefile is currently implemented to strip the 'static' option when
compiling the urandom_read. However, stripping this static option isn't
configured properly on $(LDLIBS) correctly, which is now causing errors
on static compilation.

    # LDLIBS=-static ./vmtest.sh
    ld.lld: error: attempted static link of dynamic object liburandom_read.so
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    make: *** [Makefile:190: /linux/tools/testing/selftests/bpf/urandom_read] Error 1
    make: *** Waiting for unfinished jobs....

This commit fixes this problem by configuring the strip with $(LDLIBS).

Fixes: 68084a1364 ("selftests/bpf: Fix building bpf selftests statically")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230125100440.21734-1-danieltimlee@gmail.com
2023-01-25 18:31:32 +01:00
Ian Rogers 13e07691a1 tools/resolve_btfids: Alter how HOSTCC is forced
HOSTCC is always wanted when building. Setting CC to HOSTCC happens
after tools/scripts/Makefile.include is included, meaning flags are
set assuming say CC is gcc, but then it can be later set to HOSTCC
which may be clang. tools/scripts/Makefile.include is needed for host
set up and common macros in objtool's Makefile. Rather than override
CC to HOSTCC, just pass CC as HOSTCC to Makefile.build, the libsubcmd
builds and the linkage step. This means the Makefiles don't see things
like CC changing and tool flag determination, and similar, work
properly.

Also, clear the passed subdir as otherwise an outer build may break by
inadvertently passing an inappropriate value.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230124064324.672022-2-irogers@google.com
2023-01-25 18:28:14 +01:00
Ian Rogers af03299d85 tools/resolve_btfids: Install subcmd headers
Previously tools/lib/subcmd was added to the include path, switch to
installing the headers and then including from that directory. This
avoids dependencies on headers internal to tools/lib/subcmd. Add the
missing subcmd directory to the affected #include.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230124064324.672022-1-irogers@google.com
2023-01-25 18:28:08 +01:00
David Vernet 7b6abcfa15 selftests/bpf: Add selftest suite for cpumask kfuncs
A recent patch added a new set of kfuncs for allocating, freeing,
manipulating, and querying cpumasks. This patch adds a new 'cpumask'
selftest suite which verifies their behavior.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 07:57:50 -08:00
David Vernet a6541f4d28 selftests/bpf: Add nested trust selftests suite
Now that defining trusted fields in a struct is supported, we should add
selftests to verify the behavior. This patch adds a few such testcases.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 07:57:50 -08:00
David Vernet caf713c338 bpf: Disallow NULLable pointers for trusted kfuncs
KF_TRUSTED_ARGS kfuncs currently have a subtle and insidious bug in
validating pointers to scalars. Say that you have a kfunc like the
following, which takes an array as the first argument:

bool bpf_cpumask_empty(const struct cpumask *cpumask)
{
	return cpumask_empty(cpumask);
}

...
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS)
...

If a BPF program were to invoke the kfunc with a NULL argument, it would
crash the kernel. The reason is that struct cpumask is defined as a
bitmap, which is itself defined as an array, and is accessed as a memory
address by bitmap operations. So when the verifier analyzes the
register, it interprets it as a pointer to a scalar struct, which is an
array of size 8. check_mem_reg() then sees that the register is NULL and
returns 0, and the kfunc crashes when it passes it down to the cpumask
wrappers.

To fix this, this patch adds a check for KF_ARG_PTR_TO_MEM which
verifies that the register doesn't contain a possibly-NULL pointer if
the kfunc is KF_TRUSTED_ARGS.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 07:57:49 -08:00
Jakub Kicinski e4b48ed460 tools: ynl: add a completely generic client
Add a CLI sample which can take in arbitrary request
in JSON format, convert it to Netlink and do the inverse
for output.

It's meant as a development tool primarily and perhaps
for selftests which need to tickle netlink in a special way.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24 10:58:11 +01:00
Jakub Kicinski be5bea1cc0 net: add basic C code generators for Netlink
Code generators to turn Netlink specs into C code.
I'm definitely not proud of it.

The main generator is in Python, there's a bash script
to regen all code-gen'ed files in tree after making
spec changes.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24 10:58:11 +01:00
Davide Caratti ca22da2fbd act_mirred: use the backlog for nested calls to mirred ingress
William reports kernel soft-lockups on some OVS topologies when TC mirred
egress->ingress action is hit by local TCP traffic [1].
The same can also be reproduced with SCTP (thanks Xin for verifying), when
client and server reach themselves through mirred egress to ingress, and
one of the two peers sends a "heartbeat" packet (from within a timer).

Enqueueing to backlog proved to fix this soft lockup; however, as Cong
noticed [2], we should preserve - when possible - the current mirred
behavior that counts as "overlimits" any eventual packet drop subsequent to
the mirred forwarding action [3]. A compromise solution might use the
backlog only when tcf_mirred_act() has a nest level greater than one:
change tcf_mirred_forward() accordingly.

Also, add a kselftest that can reproduce the lockup and verifies TC mirred
ability to account for further packet drops after TC mirred egress->ingress
(when the nest level is 1).

 [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
 [3] such behavior is not guaranteed: for example, if RPS or skb RX
     timestamping is enabled on the mirred target device, the kernel
     can defer receiving the skb and return NET_RX_SUCCESS inside
     tcf_mirred_forward().

Reported-by: William Zhao <wizhao@redhat.com>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24 10:30:54 +01:00
Eric Dumazet 057fb03160 selftests: net: tcp_mmap: populate pages in send path
In commit 72653ae530 ("selftests: net: tcp_mmap:
Use huge pages in send path") I made a change to use hugepages
for the buffer used by the client (tx path)

Today, I understood that the cause for poor zerocopy
performance was that after a mmap() for a 512KB memory
zone, kernel uses a single zeropage, mapped 128 times.

This was really the reason for poor tx path performance
in zero copy mode, because this zero page refcount is
under high pressure, especially when TCP ACK packets
are processed on another cpu.

We need either to force a COW on all the memory range,
or use MAP_POPULATE so that a zero page is not abused.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120181136.3764521-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:24:29 -08:00
Andrii Nakryiko a4d325ae46 libbpf: Clean up now not needed __PT_PARM{1-6}_SYSCALL_REG defaults
Each architecture supports at least 6 syscall argument registers, so now
that specs for each architecture is defined in bpf_tracing.h, remove
unnecessary macro overrides, which previously were required to keep
existing BPF_KSYSCALL() uses compiling and working.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-26-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko 92dc5cdfc1 selftests/bpf: Add 6-argument syscall tracing test
Turns out splice() is one of the syscalls that's using current maximum
number of arguments (six). This is perfect for testing, so extend
bpf_syscall_macro selftest to also trace splice() syscall, using
BPF_KSYSCALL() macro. This makes sure all the syscall argument register
definitions are correct.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-25-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko 12a299f0b5 libbpf: Define loongarch syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-24-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko 2cf802737f libbpf: Define arc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-23-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko a0426216a3 libbpf: Define riscv syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-22-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko 377c15b1a2 libbpf: Define sparc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-21-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko c1cc01a2d1 libbpf: Define powerpc syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Note that 7th arg is supported on 32-bit powerpc architecture, by not on
powerpc64.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-20-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko cfd0bbe915 libbpf: Define mips syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-19-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko 3488ea0584 libbpf: Define arm64 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
We need PT_REGS_PARM1_[CORE_]SYSCALL macros overrides, similarly to
s390x, due to orig_x0 not being present in UAPI's pt_regs, so we need to
utilize BPF CO-RE and custom pt_regs___arm64 definition.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-18-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko 3a95c42d65 libbpf: Define arm syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-17-andrii@kernel.org
2023-01-23 20:53:01 +01:00
Andrii Nakryiko e82b96a3a9 libbpf: Define s390x syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Note that we need custom overrides for PT_REGS_PARM1_[CORE_]SYSCALL
macros due to the need to use BPF CO-RE and custom local pt_regs
definitions to fetch orig_gpr2, storing 1st argument.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-16-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko ff00f9cbd2 libbpf: Define i386 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-15-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko d21fbceedd libbpf: Define x86-64 syscall regs spec in bpf_tracing.h
Define explicit table of registers used for syscall argument passing.
Remove now unnecessary overrides of PT_REGS_PARM5_[CORE_]SYSCALL macros.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-14-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko 8ccabeef91 libbpf: Improve syscall tracing support in bpf_tracing.h
Set up generic support in bpf_tracing.h for up to 7 syscall arguments
tracing with BPF_KSYSCALL, which seems to be the limit according to
syscall(2) manpage. Also change the way that syscall convention is
specified to be more explicit. Subsequent patches will adjust and define
proper per-architecture syscall conventions.

__PT_PARM1_SYSCALL_REG through __PT_PARM6_SYSCALL_REG is added
temporarily to keep everything working before each architecture has
syscall reg tables defined. They will be removed afterwards.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Link: https://lore.kernel.org/bpf/20230120200914.3008030-13-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko bc72742beb selftests/bpf: Validate arch-specific argument registers limits
Update uprobe_autoattach selftest to validate architecture-specific
argument passing through registers. Use new BPF_UPROBE and
BPF_URETPROBE, and construct both BPF-side and user-space side in such
a way that for different architectures we are fetching and checking
different number of arguments, matching architecture-specific limit of
how many registers are available for argument passing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com> # arm64
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Link: https://lore.kernel.org/bpf/20230120200914.3008030-12-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko ac4afd6e6f libbpf: Add BPF_UPROBE and BPF_URETPROBE macro aliases
Add BPF_UPROBE and BPF_URETPROBE macros, aliased to BPF_KPROBE and
BPF_KRETPROBE, respectively. This makes uprobe-based BPF program code
much less confusing, especially to people new to tracing, at no cost in
terms of maintainability. We'll use this macro in selftests in
subsequent patch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-11-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko 55ff00d539 libbpf: Complete LoongArch (loongarch) spec in bpf_tracing.h
Add PARM6 through PARM8 definitions. Add kernel docs link describing ABI
for LoongArch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-10-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko 0ac0865679 libbpf: Fix and complete ARC spec in bpf_tracing.h
Add PARM6 through PARM8 definitions. Also fix frame pointer (FP)
register definition. Also leave a link to where to find ABI spec.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-9-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko b13ed8ca7f libbpf: Complete riscv arch spec in bpf_tracing.h
Add PARM6 through PARM8 definitions for RISC V (riscv) arch. Leave the
link for ABI doc for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com> # RISC-V
Link: https://lore.kernel.org/bpf/20230120200914.3008030-8-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko 7f60f5d85e libbpf: Complete sparc spec in bpf_tracing.h
Add PARM6 definition for sparc architecture. Leave a link to calling
convention documentation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-7-andrii@kernel.org
2023-01-23 20:53:00 +01:00
Andrii Nakryiko 2eb2be30b8 libbpf: Complete powerpc spec in bpf_tracing.h
Add definitions of PARM6 through PARM8 for powerpc architecture. Add
also a link to a functiona call sequence documentation for future reference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230120200914.3008030-6-andrii@kernel.org
2023-01-23 20:52:59 +01:00