2019-05-29 01:10:09 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2017-03-31 12:45:38 +08:00
|
|
|
/* Copyright (c) 2017 Facebook
|
|
|
|
*/
|
|
|
|
#include <linux/bpf.h>
|
2021-10-02 09:17:57 +08:00
|
|
|
#include <linux/btf.h>
|
2021-03-25 09:52:52 +08:00
|
|
|
#include <linux/btf_ids.h>
|
2017-03-31 12:45:38 +08:00
|
|
|
#include <linux/slab.h>
|
2022-01-15 00:39:46 +08:00
|
|
|
#include <linux/init.h>
|
2017-03-31 12:45:38 +08:00
|
|
|
#include <linux/vmalloc.h>
|
|
|
|
#include <linux/etherdevice.h>
|
|
|
|
#include <linux/filter.h>
|
2021-08-10 07:51:51 +08:00
|
|
|
#include <linux/rcupdate_trace.h>
|
2017-03-31 12:45:38 +08:00
|
|
|
#include <linux/sched/signal.h>
|
bpf: Introduce bpf sk local storage
After allowing a bpf prog to
- directly read the skb->sk ptr
- get the fullsock bpf_sock by "bpf_sk_fullsock()"
- get the bpf_tcp_sock by "bpf_tcp_sock()"
- get the listener sock by "bpf_get_listener_sock()"
- avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
into different bpf running context.
this patch is another effort to make bpf's network programming
more intuitive to do (together with memory and performance benefit).
When bpf prog needs to store data for a sk, the current practice is to
define a map with the usual 4-tuples (src/dst ip/port) as the key.
If multiple bpf progs require to store different sk data, multiple maps
have to be defined. Hence, wasting memory to store the duplicated
keys (i.e. 4 tuples here) in each of the bpf map.
[ The smallest key could be the sk pointer itself which requires
some enhancement in the verifier and it is a separate topic. ]
Also, the bpf prog needs to clean up the elem when sk is freed.
Otherwise, the bpf map will become full and un-usable quickly.
The sk-free tracking currently could be done during sk state
transition (e.g. BPF_SOCK_OPS_STATE_CB).
The size of the map needs to be predefined which then usually ended-up
with an over-provisioned map in production. Even the map was re-sizable,
while the sk naturally come and go away already, this potential re-size
operation is arguably redundant if the data can be directly connected
to the sk itself instead of proxy-ing through a bpf map.
This patch introduces sk->sk_bpf_storage to provide local storage space
at sk for bpf prog to use. The space will be allocated when the first bpf
prog has created data for this particular sk.
The design optimizes the bpf prog's lookup (and then optionally followed by
an inline update). bpf_spin_lock should be used if the inline update needs
to be protected.
BPF_MAP_TYPE_SK_STORAGE:
-----------------------
To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can
be created to fit different bpf progs' needs. The map enforces
BTF to allow printing the sk-local-storage during a system-wise
sk dump (e.g. "ss -ta") in the future.
The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
a "sk-local-storage" data from a particular sk.
Think of the map as a meta-data (or "type") of a "sk-local-storage". This
particular "type" of "sk-local-storage" data can then be stored in any sk.
The main purposes of this map are mostly:
1. Define the size of a "sk-local-storage" type.
2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
map-id, map-btf...etc.)
3. Keep track of all sk's storages of this "type" and clean them up
when the map is freed.
sk->sk_bpf_storage:
------------------
The main lookup/update/delete is done on sk->sk_bpf_storage (which
is a "struct bpf_sk_storage"). When doing a lookup,
the "map" pointer is now used as the "key" to search on the
sk_storage->list. The "map" pointer is actually serving
as the "type" of the "sk-local-storage" that is being
requested.
To allow very fast lookup, it should be as fast as looking up an
array at a stable-offset. At the same time, it is not ideal to
set a hard limit on the number of sk-local-storage "type" that the
system can have. Hence, this patch takes a cache approach.
The last search result from sk_storage->list is cached in
sk_storage->cache[] which is a stable sized array. Each
"sk-local-storage" type has a stable offset to the cache[] array.
In the future, a map's flag could be introduced to do cache
opt-out/enforcement if it became necessary.
The cache size is 16 (i.e. 16 types of "sk-local-storage").
Programs can share map. On the program side, having a few bpf_progs
running in the networking hotpath is already a lot. The bpf_prog
should have already consolidated the existing sock-key-ed map usage
to minimize the map lookup penalty. 16 has enough runway to grow.
All sk-local-storage data will be removed from sk->sk_bpf_storage
during sk destruction.
bpf_sk_storage_get() and bpf_sk_storage_delete():
------------------------------------------------
Instead of using bpf_map_(lookup|update|delete)_elem(),
the bpf prog needs to use the new helper bpf_sk_storage_get() and
bpf_sk_storage_delete(). The verifier can then enforce the
ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to
"create" new elem if one does not exist in the sk. It is done by
the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be
provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together,
it has eliminated the potential use cases for an equivalent
bpf_map_update_elem() API (for bpf_prog) in this patch.
Misc notes:
----------
1. map_get_next_key is not supported. From the userspace syscall
perspective, the map has the socket fd as the key while the map
can be shared by pinned-file or map-id.
Since btf is enforced, the existing "ss" could be enhanced to pretty
print the local-storage.
Supporting a kernel defined btf with 4 tuples as the return key could
be explored later also.
2. The sk->sk_lock cannot be acquired. Atomic operations is used instead.
e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
Please refer to the source code comments for the details in
synchronization cases and considerations.
3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.
Benchmark:
---------
Here is the benchmark data collected by turning on
the "kernel.bpf_stats_enabled" sysctl.
Two bpf progs are tested:
One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
sk ptr as the key. (verifier is modified to support sk ptr as the key
That should have shortened the key lookup time.)
Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.
Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
each egress skb and then bump the cnt. netperf is used to drive
data with 4096 connected UDP sockets.
BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
loaded_at 2019-04-15T13:46:39-0700 uid 0
xlated 344B jited 258B memlock 4096B map_ids 16
btf_id 5
BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
loaded_at 2019-04-15T13:47:54-0700 uid 0
xlated 168B jited 156B memlock 4096B map_ids 17
btf_id 6
Here is a high-level picture on how are the objects organized:
sk
┌──────┐
│ │
│ │
│ │
│*sk_bpf_storage─────▶ bpf_sk_storage
└──────┘ ┌───────┐
┌───────────┤ list │
│ │ │
│ │ │
│ │ │
│ └───────┘
│
│ elem
│ ┌────────┐
├─▶│ snode │
│ ├────────┤
│ │ data │ bpf_map
│ ├────────┤ ┌─────────┐
│ │map_node│◀─┬─────┤ list │
│ └────────┘ │ │ │
│ │ │ │
│ elem │ │ │
│ ┌────────┐ │ └─────────┘
└─▶│ snode │ │
├────────┤ │
bpf_map │ data │ │
┌─────────┐ ├────────┤ │
│ list ├───────▶│map_node│ │
│ │ └────────┘ │
│ │ │
│ │ elem │
└─────────┘ ┌────────┐ │
┌─▶│ snode │ │
│ ├────────┤ │
│ │ data │ │
│ ├────────┤ │
│ │map_node│◀─┘
│ └────────┘
│
│
│ ┌───────┐
sk └──────────│ list │
┌──────┐ │ │
│ │ │ │
│ │ │ │
│ │ └───────┘
│*sk_bpf_storage───────▶bpf_sk_storage
└──────┘
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-27 07:39:39 +08:00
|
|
|
#include <net/bpf_sk_storage.h>
|
2018-10-20 00:57:58 +08:00
|
|
|
#include <net/sock.h>
|
|
|
|
#include <net/tcp.h>
|
2021-03-03 18:18:13 +08:00
|
|
|
#include <net/net_namespace.h>
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
#include <net/page_pool.h>
|
2020-03-05 03:18:53 +08:00
|
|
|
#include <linux/error-injection.h>
|
2020-09-26 04:54:29 +08:00
|
|
|
#include <linux/smp.h>
|
2021-03-03 18:18:13 +08:00
|
|
|
#include <linux/sock_diag.h>
|
2023-04-22 01:02:59 +08:00
|
|
|
#include <linux/netfilter.h>
|
2021-07-08 06:16:55 +08:00
|
|
|
#include <net/xdp.h>
|
2023-04-22 01:02:59 +08:00
|
|
|
#include <net/netfilter/nf_bpf_link.h>
|
2017-03-31 12:45:38 +08:00
|
|
|
|
2019-04-27 02:49:51 +08:00
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include <trace/events/bpf_test_run.h>
|
|
|
|
|
2021-03-03 18:18:12 +08:00
|
|
|
struct bpf_test_timer {
|
|
|
|
enum { NO_PREEMPT, NO_MIGRATE } mode;
|
|
|
|
u32 i;
|
|
|
|
u64 time_start, time_spent;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void bpf_test_timer_enter(struct bpf_test_timer *t)
|
|
|
|
__acquires(rcu)
|
|
|
|
{
|
|
|
|
rcu_read_lock();
|
|
|
|
if (t->mode == NO_PREEMPT)
|
|
|
|
preempt_disable();
|
|
|
|
else
|
|
|
|
migrate_disable();
|
|
|
|
|
|
|
|
t->time_start = ktime_get_ns();
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bpf_test_timer_leave(struct bpf_test_timer *t)
|
|
|
|
__releases(rcu)
|
|
|
|
{
|
|
|
|
t->time_start = 0;
|
|
|
|
|
|
|
|
if (t->mode == NO_PREEMPT)
|
|
|
|
preempt_enable();
|
|
|
|
else
|
|
|
|
migrate_enable();
|
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations,
|
|
|
|
u32 repeat, int *err, u32 *duration)
|
2021-03-03 18:18:12 +08:00
|
|
|
__must_hold(rcu)
|
|
|
|
{
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
t->i += iterations;
|
2021-03-03 18:18:12 +08:00
|
|
|
if (t->i >= repeat) {
|
|
|
|
/* We're done. */
|
|
|
|
t->time_spent += ktime_get_ns() - t->time_start;
|
|
|
|
do_div(t->time_spent, t->i);
|
|
|
|
*duration = t->time_spent > U32_MAX ? U32_MAX : (u32)t->time_spent;
|
|
|
|
*err = 0;
|
|
|
|
goto reset;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (signal_pending(current)) {
|
|
|
|
/* During iteration: we've been cancelled, abort. */
|
|
|
|
*err = -EINTR;
|
|
|
|
goto reset;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (need_resched()) {
|
|
|
|
/* During iteration: we need to reschedule between runs. */
|
|
|
|
t->time_spent += ktime_get_ns() - t->time_start;
|
|
|
|
bpf_test_timer_leave(t);
|
|
|
|
cond_resched();
|
|
|
|
bpf_test_timer_enter(t);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Do another round. */
|
|
|
|
return true;
|
|
|
|
|
|
|
|
reset:
|
|
|
|
t->i = 0;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
/* We put this struct at the head of each page with a context and frame
|
|
|
|
* initialised when the page is allocated, so we don't have to do this on each
|
|
|
|
* repetition of the test run.
|
|
|
|
*/
|
|
|
|
struct xdp_page_head {
|
|
|
|
struct xdp_buff orig_ctx;
|
|
|
|
struct xdp_buff ctx;
|
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-25 00:36:07 +08:00
|
|
|
union {
|
|
|
|
/* ::data_hard_start starts here */
|
|
|
|
DECLARE_FLEX_ARRAY(struct xdp_frame, frame);
|
|
|
|
DECLARE_FLEX_ARRAY(u8, data);
|
|
|
|
};
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct xdp_test_data {
|
|
|
|
struct xdp_buff *orig_ctx;
|
|
|
|
struct xdp_rxq_info rxq;
|
|
|
|
struct net_device *dev;
|
|
|
|
struct page_pool *pp;
|
|
|
|
struct xdp_frame **frames;
|
|
|
|
struct sk_buff **skbs;
|
2022-04-10 05:30:53 +08:00
|
|
|
struct xdp_mem_info mem;
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
u32 batch_size;
|
|
|
|
u32 frame_cnt;
|
|
|
|
};
|
|
|
|
|
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-25 00:36:07 +08:00
|
|
|
/* tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c:%MAX_PKT_SIZE
|
|
|
|
* must be updated accordingly this gets changed, otherwise BPF selftests
|
|
|
|
* will fail.
|
|
|
|
*/
|
2022-03-11 06:56:20 +08:00
|
|
|
#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head))
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
#define TEST_XDP_MAX_BATCH 256
|
|
|
|
|
|
|
|
static void xdp_test_run_init_page(struct page *page, void *arg)
|
|
|
|
{
|
|
|
|
struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
|
|
|
|
struct xdp_buff *new_ctx, *orig_ctx;
|
|
|
|
u32 headroom = XDP_PACKET_HEADROOM;
|
|
|
|
struct xdp_test_data *xdp = arg;
|
|
|
|
size_t frm_len, meta_len;
|
|
|
|
struct xdp_frame *frm;
|
|
|
|
void *data;
|
|
|
|
|
|
|
|
orig_ctx = xdp->orig_ctx;
|
|
|
|
frm_len = orig_ctx->data_end - orig_ctx->data_meta;
|
|
|
|
meta_len = orig_ctx->data - orig_ctx->data_meta;
|
|
|
|
headroom -= meta_len;
|
|
|
|
|
|
|
|
new_ctx = &head->ctx;
|
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-25 00:36:07 +08:00
|
|
|
frm = head->frame;
|
|
|
|
data = head->data;
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
memcpy(data + headroom, orig_ctx->data_meta, frm_len);
|
|
|
|
|
|
|
|
xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq);
|
|
|
|
xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
|
|
|
|
new_ctx->data = new_ctx->data_meta + meta_len;
|
|
|
|
|
|
|
|
xdp_update_frame_from_buff(new_ctx, frm);
|
|
|
|
frm->mem = new_ctx->rxq->mem;
|
|
|
|
|
|
|
|
memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx)
|
|
|
|
{
|
|
|
|
struct page_pool *pp;
|
|
|
|
int err = -ENOMEM;
|
|
|
|
struct page_pool_params pp_params = {
|
|
|
|
.order = 0,
|
|
|
|
.flags = 0,
|
|
|
|
.pool_size = xdp->batch_size,
|
|
|
|
.nid = NUMA_NO_NODE,
|
|
|
|
.init_callback = xdp_test_run_init_page,
|
|
|
|
.init_arg = xdp,
|
|
|
|
};
|
|
|
|
|
|
|
|
xdp->frames = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL);
|
|
|
|
if (!xdp->frames)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
xdp->skbs = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL);
|
|
|
|
if (!xdp->skbs)
|
|
|
|
goto err_skbs;
|
|
|
|
|
|
|
|
pp = page_pool_create(&pp_params);
|
|
|
|
if (IS_ERR(pp)) {
|
|
|
|
err = PTR_ERR(pp);
|
|
|
|
goto err_pp;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* will copy 'mem.id' into pp->xdp_mem_id */
|
2022-04-10 05:30:53 +08:00
|
|
|
err = xdp_reg_mem_model(&xdp->mem, MEM_TYPE_PAGE_POOL, pp);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (err)
|
|
|
|
goto err_mmodel;
|
|
|
|
|
|
|
|
xdp->pp = pp;
|
|
|
|
|
|
|
|
/* We create a 'fake' RXQ referencing the original dev, but with an
|
|
|
|
* xdp_mem_info pointing to our page_pool
|
|
|
|
*/
|
|
|
|
xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
|
|
|
|
xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
|
|
|
|
xdp->rxq.mem.id = pp->xdp_mem_id;
|
|
|
|
xdp->dev = orig_ctx->rxq->dev;
|
|
|
|
xdp->orig_ctx = orig_ctx;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_mmodel:
|
|
|
|
page_pool_destroy(pp);
|
|
|
|
err_pp:
|
2022-03-10 17:28:27 +08:00
|
|
|
kvfree(xdp->skbs);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
err_skbs:
|
2022-03-10 17:28:27 +08:00
|
|
|
kvfree(xdp->frames);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xdp_test_run_teardown(struct xdp_test_data *xdp)
|
|
|
|
{
|
2022-04-10 05:30:53 +08:00
|
|
|
xdp_unreg_mem_model(&xdp->mem);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
page_pool_destroy(xdp->pp);
|
|
|
|
kfree(xdp->frames);
|
|
|
|
kfree(xdp->skbs);
|
|
|
|
}
|
|
|
|
|
2023-03-17 01:50:50 +08:00
|
|
|
static bool frame_was_changed(const struct xdp_page_head *head)
|
|
|
|
{
|
|
|
|
/* xdp_scrub_frame() zeroes the data pointer, flags is the last field,
|
|
|
|
* i.e. has the highest chances to be overwritten. If those two are
|
|
|
|
* untouched, it's most likely safe to skip the context reset.
|
|
|
|
*/
|
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDhSiwAKCRDbK58LschI
g8cbAQCH4xrquOeDmYyGXFQGchHZAIj++tKg8ABU4+hYeJtrlwEA6D4W6wjoSZRk
mLSptZ9qro8yZA86BvyPvlBT1h9ELQA=
=StAc
-----END PGP SIGNATURE-----
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-04-13
We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).
The main changes are:
1) Rework BPF verifier log behavior and implement it as a rotating log
by default with the option to retain old-style fixed log behavior,
from Andrii Nakryiko.
2) Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params, from Christian Ehrig.
3) Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton,
from Alexei Starovoitov.
4) Optimize hashmap lookups when key size is multiple of 4,
from Anton Protopopov.
5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps, from David Vernet.
6) Add support for stashing local BPF kptr into a map value via
bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
for new cgroups, from Dave Marchevsky.
7) Fix BTF handling of is_int_ptr to skip modifiers to work around
tracing issues where a program cannot be attached, from Feng Zhou.
8) Migrate a big portion of test_verifier unit tests over to
test_progs -a verifier_* via inline asm to ease {read,debug}ability,
from Eduard Zingerman.
9) Several updates to the instruction-set.rst documentation
which is subject to future IETF standardization
(https://lwn.net/Articles/926882/), from Dave Thaler.
10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
known bits information propagation, from Daniel Borkmann.
11) Add skb bitfield compaction work related to BPF with the overall goal
to make more of the sk_buff bits optional, from Jakub Kicinski.
12) BPF selftest cleanups for build id extraction which stand on its own
from the upcoming integration work of build id into struct file object,
from Jiri Olsa.
13) Add fixes and optimizations for xsk descriptor validation and several
selftest improvements for xsk sockets, from Kal Conley.
14) Add BPF links for struct_ops and enable switching implementations
of BPF TCP cong-ctls under a given name by replacing backing
struct_ops map, from Kui-Feng Lee.
15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
offset stack read as earlier Spectre checks cover this,
from Luis Gerhorst.
16) Fix issues in copy_from_user_nofault() for BPF and other tracers
to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
and Alexei Starovoitov.
17) Add --json-summary option to test_progs in order for CI tooling to
ease parsing of test results, from Manu Bretelle.
18) Batch of improvements and refactoring to prep for upcoming
bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
from Martin KaFai Lau.
19) Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations,
from Quentin Monnet.
20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
the module name from BTF of the target and searching kallsyms of
the correct module, from Viktor Malik.
21) Improve BPF verifier handling of '<const> <cond> <non_const>'
to better detect whether in particular jmp32 branches are taken,
from Yonghong Song.
22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
A built-in cc or one from a kernel module is already able to write
to app_limited, from Yixin Shen.
Conflicts:
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
0f10f647f455 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/
include/net/ip_tunnels.h
bc9d003dc48c3 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
ac931d4cdec3d ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/
net/bpf/test_run.c
e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
294635a8165a ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/
====================
Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14 07:11:22 +08:00
|
|
|
return head->frame->data != head->orig_ctx.data ||
|
|
|
|
head->frame->flags != head->orig_ctx.flags;
|
2023-03-17 01:50:50 +08:00
|
|
|
}
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
static bool ctx_was_changed(struct xdp_page_head *head)
|
|
|
|
{
|
|
|
|
return head->orig_ctx.data != head->ctx.data ||
|
|
|
|
head->orig_ctx.data_meta != head->ctx.data_meta ||
|
|
|
|
head->orig_ctx.data_end != head->ctx.data_end;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void reset_ctx(struct xdp_page_head *head)
|
|
|
|
{
|
2023-03-17 01:50:50 +08:00
|
|
|
if (likely(!frame_was_changed(head) && !ctx_was_changed(head)))
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
head->ctx.data = head->orig_ctx.data;
|
|
|
|
head->ctx.data_meta = head->orig_ctx.data_meta;
|
|
|
|
head->ctx.data_end = head->orig_ctx.data_end;
|
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-25 00:36:07 +08:00
|
|
|
xdp_update_frame_from_buff(&head->ctx, head->frame);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
|
|
|
|
struct sk_buff **skbs,
|
|
|
|
struct net_device *dev)
|
|
|
|
{
|
|
|
|
gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
|
|
|
|
int i, n;
|
|
|
|
LIST_HEAD(list);
|
|
|
|
|
2023-02-09 14:06:42 +08:00
|
|
|
n = kmem_cache_alloc_bulk(skbuff_cache, gfp, nframes, (void **)skbs);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (unlikely(n == 0)) {
|
|
|
|
for (i = 0; i < nframes; i++)
|
|
|
|
xdp_return_frame(frames[i]);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < nframes; i++) {
|
|
|
|
struct xdp_frame *xdpf = frames[i];
|
|
|
|
struct sk_buff *skb = skbs[i];
|
|
|
|
|
|
|
|
skb = __xdp_build_skb_from_frame(xdpf, skb, dev);
|
|
|
|
if (!skb) {
|
|
|
|
xdp_return_frame(xdpf);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
list_add_tail(&skb->list, &list);
|
|
|
|
}
|
|
|
|
netif_receive_skb_list(&list);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
|
|
|
|
u32 repeat)
|
|
|
|
{
|
|
|
|
struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
|
|
|
|
int err = 0, act, ret, i, nframes = 0, batch_sz;
|
|
|
|
struct xdp_frame **frames = xdp->frames;
|
|
|
|
struct xdp_page_head *head;
|
|
|
|
struct xdp_frame *frm;
|
|
|
|
bool redirect = false;
|
|
|
|
struct xdp_buff *ctx;
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
batch_sz = min_t(u32, repeat, xdp->batch_size);
|
|
|
|
|
|
|
|
local_bh_disable();
|
|
|
|
xdp_set_return_frame_no_direct();
|
|
|
|
|
|
|
|
for (i = 0; i < batch_sz; i++) {
|
|
|
|
page = page_pool_dev_alloc_pages(xdp->pp);
|
|
|
|
if (!page) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
head = phys_to_virt(page_to_phys(page));
|
|
|
|
reset_ctx(head);
|
|
|
|
ctx = &head->ctx;
|
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-25 00:36:07 +08:00
|
|
|
frm = head->frame;
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
xdp->frame_cnt++;
|
|
|
|
|
|
|
|
act = bpf_prog_run_xdp(prog, ctx);
|
|
|
|
|
|
|
|
/* if program changed pkt bounds we need to update the xdp_frame */
|
|
|
|
if (unlikely(ctx_was_changed(head))) {
|
|
|
|
ret = xdp_update_frame_from_buff(ctx, frm);
|
|
|
|
if (ret) {
|
|
|
|
xdp_return_buff(ctx);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (act) {
|
|
|
|
case XDP_TX:
|
|
|
|
/* we can't do a real XDP_TX since we're not in the
|
|
|
|
* driver, so turn it into a REDIRECT back to the same
|
|
|
|
* index
|
|
|
|
*/
|
|
|
|
ri->tgt_index = xdp->dev->ifindex;
|
|
|
|
ri->map_id = INT_MAX;
|
|
|
|
ri->map_type = BPF_MAP_TYPE_UNSPEC;
|
|
|
|
fallthrough;
|
|
|
|
case XDP_REDIRECT:
|
|
|
|
redirect = true;
|
|
|
|
ret = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog);
|
|
|
|
if (ret)
|
|
|
|
xdp_return_buff(ctx);
|
|
|
|
break;
|
|
|
|
case XDP_PASS:
|
|
|
|
frames[nframes++] = frm;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
bpf_warn_invalid_xdp_action(NULL, prog, act);
|
|
|
|
fallthrough;
|
|
|
|
case XDP_DROP:
|
|
|
|
xdp_return_buff(ctx);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (redirect)
|
|
|
|
xdp_do_flush();
|
|
|
|
if (nframes) {
|
|
|
|
ret = xdp_recv_frames(frames, nframes, xdp->skbs, xdp->dev);
|
|
|
|
if (ret)
|
|
|
|
err = ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
xdp_clear_return_frame_no_direct();
|
|
|
|
local_bh_enable();
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx,
|
|
|
|
u32 repeat, u32 batch_size, u32 *time)
|
|
|
|
|
|
|
|
{
|
|
|
|
struct xdp_test_data xdp = { .batch_size = batch_size };
|
|
|
|
struct bpf_test_timer t = { .mode = NO_MIGRATE };
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!repeat)
|
|
|
|
repeat = 1;
|
|
|
|
|
|
|
|
ret = xdp_test_run_setup(&xdp, ctx);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
bpf_test_timer_enter(&t);
|
|
|
|
do {
|
|
|
|
xdp.frame_cnt = 0;
|
|
|
|
ret = xdp_test_run_batch(&xdp, prog, repeat - t.i);
|
|
|
|
if (unlikely(ret < 0))
|
|
|
|
break;
|
|
|
|
} while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time));
|
|
|
|
bpf_test_timer_leave(&t);
|
|
|
|
|
|
|
|
xdp_test_run_teardown(&xdp);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2019-02-13 07:42:38 +08:00
|
|
|
static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
|
2019-12-14 01:51:10 +08:00
|
|
|
u32 *retval, u32 *time, bool xdp)
|
2017-03-31 12:45:38 +08:00
|
|
|
{
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
struct bpf_prog_array_item item = {.prog = prog};
|
|
|
|
struct bpf_run_ctx *old_ctx;
|
|
|
|
struct bpf_cg_run_ctx run_ctx;
|
2021-03-03 18:18:12 +08:00
|
|
|
struct bpf_test_timer t = { NO_MIGRATE };
|
2018-09-28 22:45:36 +08:00
|
|
|
enum bpf_cgroup_storage_type stype;
|
2021-03-03 18:18:12 +08:00
|
|
|
int ret;
|
2017-03-31 12:45:38 +08:00
|
|
|
|
2018-09-28 22:45:36 +08:00
|
|
|
for_each_cgroup_storage_type(stype) {
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
item.cgroup_storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
|
|
|
|
if (IS_ERR(item.cgroup_storage[stype])) {
|
|
|
|
item.cgroup_storage[stype] = NULL;
|
2018-09-28 22:45:36 +08:00
|
|
|
for_each_cgroup_storage_type(stype)
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
bpf_cgroup_storage_free(item.cgroup_storage[stype]);
|
2018-09-28 22:45:36 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
2018-08-03 05:27:27 +08:00
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
if (!repeat)
|
|
|
|
repeat = 1;
|
2019-02-13 07:42:38 +08:00
|
|
|
|
2021-03-03 18:18:12 +08:00
|
|
|
bpf_test_timer_enter(&t);
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
|
2021-03-03 18:18:12 +08:00
|
|
|
do {
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
run_ctx.prog_item = &item;
|
2023-02-17 08:41:47 +08:00
|
|
|
local_bh_disable();
|
2019-12-14 01:51:10 +08:00
|
|
|
if (xdp)
|
|
|
|
*retval = bpf_prog_run_xdp(prog, ctx);
|
|
|
|
else
|
2021-08-15 15:05:54 +08:00
|
|
|
*retval = bpf_prog_run(prog, ctx);
|
2023-02-17 08:41:47 +08:00
|
|
|
local_bh_enable();
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
} while (bpf_test_timer_continue(&t, 1, repeat, &ret, time));
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
bpf_reset_run_ctx(old_ctx);
|
2021-03-03 18:18:12 +08:00
|
|
|
bpf_test_timer_leave(&t);
|
2017-03-31 12:45:38 +08:00
|
|
|
|
2018-09-28 22:45:36 +08:00
|
|
|
for_each_cgroup_storage_type(stype)
|
bpf: Add ambient BPF runtime context stored in current
b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed the problem with cgroup-local storage use in BPF by
pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
possible BPF program preemptions and nested executions.
While this seems to work good in practice, it introduces new and unnecessary
failure mode in which not all BPF programs might be executed if we fail to
find an unused slot for cgroup storage, however unlikely it is. It might also
not be so unlikely when/if we allow sleepable cgroup BPF programs in the
future.
Further, the way that cgroup storage is implemented as ambiently-available
property during entire BPF program execution is a convenient way to pass extra
information to BPF program and helpers without requiring user code to pass
around extra arguments explicitly. So it would be good to have a generic
solution that can allow implementing this without arbitrary restrictions.
Ideally, such solution would work for both preemptable and sleepable BPF
programs in exactly the same way.
This patch introduces such solution, bpf_run_ctx. It adds one pointer field
(bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
macros in such a way that it always stays valid throughout BPF program
execution. BPF program preemption is handled by remembering previous
current->bpf_ctx value locally while executing nested BPF program and
restoring old value after nested BPF program finishes. This is handled by two
helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
supposed to be used before and after BPF program runs, respectively.
Restoring old value of the pointer handles preemption, while bpf_run_ctx
pointer being a property of current task_struct naturally solves this problem
for sleepable BPF programs by "following" BPF program execution as it is
scheduled in and out of CPU. It would even allow CPU migration of BPF
programs, even though it's not currently allowed by BPF infra.
This patch cleans up cgroup local storage handling as a first application. The
design itself is generic, though, with bpf_run_ctx being an empty struct that
is supposed to be embedded into a specific struct for a given BPF program type
(bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
this mechanism for other uses within tracing BPF programs.
To verify that this change doesn't revert the fix to the original cgroup
storage issue, I ran the same repro as in the original report ([0]) and didn't
get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
[0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
2021-07-13 07:06:15 +08:00
|
|
|
bpf_cgroup_storage_free(item.cgroup_storage[stype]);
|
2018-08-03 05:27:27 +08:00
|
|
|
|
2019-02-13 07:42:38 +08:00
|
|
|
return ret;
|
2017-03-31 12:45:38 +08:00
|
|
|
}
|
|
|
|
|
2017-05-02 23:36:33 +08:00
|
|
|
static int bpf_test_finish(const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr, const void *data,
|
2022-01-21 18:09:59 +08:00
|
|
|
struct skb_shared_info *sinfo, u32 size,
|
|
|
|
u32 retval, u32 duration)
|
2017-03-31 12:45:38 +08:00
|
|
|
{
|
2017-05-02 23:36:33 +08:00
|
|
|
void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
|
2017-03-31 12:45:38 +08:00
|
|
|
int err = -EFAULT;
|
2018-12-03 19:31:23 +08:00
|
|
|
u32 copy_size = size;
|
2017-03-31 12:45:38 +08:00
|
|
|
|
2018-12-03 19:31:23 +08:00
|
|
|
/* Clamp copy if the user has provided a size hint, but copy the full
|
|
|
|
* buffer if not to retain old behaviour.
|
|
|
|
*/
|
|
|
|
if (kattr->test.data_size_out &&
|
|
|
|
copy_size > kattr->test.data_size_out) {
|
|
|
|
copy_size = kattr->test.data_size_out;
|
|
|
|
err = -ENOSPC;
|
|
|
|
}
|
|
|
|
|
2022-01-21 18:09:59 +08:00
|
|
|
if (data_out) {
|
|
|
|
int len = sinfo ? copy_size - sinfo->xdp_frags_size : copy_size;
|
|
|
|
|
2022-03-01 07:23:32 +08:00
|
|
|
if (len < 0) {
|
|
|
|
err = -ENOSPC;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-01-21 18:09:59 +08:00
|
|
|
if (copy_to_user(data_out, data, len))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (sinfo) {
|
2022-02-05 07:58:49 +08:00
|
|
|
int i, offset = len;
|
|
|
|
u32 data_len;
|
2022-01-21 18:09:59 +08:00
|
|
|
|
|
|
|
for (i = 0; i < sinfo->nr_frags; i++) {
|
|
|
|
skb_frag_t *frag = &sinfo->frags[i];
|
|
|
|
|
|
|
|
if (offset >= copy_size) {
|
|
|
|
err = -ENOSPC;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2022-02-05 07:58:49 +08:00
|
|
|
data_len = min_t(u32, copy_size - offset,
|
2022-01-21 18:09:59 +08:00
|
|
|
skb_frag_size(frag));
|
|
|
|
|
|
|
|
if (copy_to_user(data_out + offset,
|
|
|
|
skb_frag_address(frag),
|
|
|
|
data_len))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
offset += data_len;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size)))
|
|
|
|
goto out;
|
|
|
|
if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
|
|
|
|
goto out;
|
|
|
|
if (copy_to_user(&uattr->test.duration, &duration, sizeof(duration)))
|
|
|
|
goto out;
|
2018-12-03 19:31:23 +08:00
|
|
|
if (err != -ENOSPC)
|
|
|
|
err = 0;
|
2017-03-31 12:45:38 +08:00
|
|
|
out:
|
2019-04-27 02:49:51 +08:00
|
|
|
trace_bpf_test_finish(&err);
|
2017-03-31 12:45:38 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-11-15 02:57:08 +08:00
|
|
|
/* Integer types of various sizes and pointer combinations cover variety of
|
|
|
|
* architecture dependent calling conventions. 7+ can be supported in the
|
|
|
|
* future.
|
|
|
|
*/
|
2020-03-28 04:47:13 +08:00
|
|
|
__diag_push();
|
2022-03-05 06:46:44 +08:00
|
|
|
__diag_ignore_all("-Wmissing-prototypes",
|
|
|
|
"Global functions as their definitions will be in vmlinux BTF");
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc int bpf_fentry_test1(int a)
|
2019-11-15 02:57:08 +08:00
|
|
|
{
|
|
|
|
return a + 1;
|
|
|
|
}
|
2022-01-15 00:39:53 +08:00
|
|
|
EXPORT_SYMBOL_GPL(bpf_fentry_test1);
|
2019-11-15 02:57:08 +08:00
|
|
|
|
|
|
|
int noinline bpf_fentry_test2(int a, u64 b)
|
|
|
|
{
|
|
|
|
return a + b;
|
|
|
|
}
|
|
|
|
|
|
|
|
int noinline bpf_fentry_test3(char a, int b, u64 c)
|
|
|
|
{
|
|
|
|
return a + b + c;
|
|
|
|
}
|
|
|
|
|
|
|
|
int noinline bpf_fentry_test4(void *a, char b, int c, u64 d)
|
|
|
|
{
|
|
|
|
return (long)a + b + c + d;
|
|
|
|
}
|
|
|
|
|
|
|
|
int noinline bpf_fentry_test5(u64 a, void *b, short c, int d, u64 e)
|
|
|
|
{
|
|
|
|
return a + (long)b + c + d + e;
|
|
|
|
}
|
|
|
|
|
|
|
|
int noinline bpf_fentry_test6(u64 a, void *b, short c, int d, void *e, u64 f)
|
|
|
|
{
|
|
|
|
return a + (long)b + c + d + (long)e + f;
|
|
|
|
}
|
|
|
|
|
bpf: Add tests for PTR_TO_BTF_ID vs. null comparison
Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
one for PTR_TO_BTF_ID in the ctx structure and the
other for PTR_TO_BTF_ID after one level pointer chasing.
In both cases, the test ensures condition is not
removed.
For example, for this test
struct bpf_fentry_test_t {
struct bpf_fentry_test_t *a;
};
int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
{
if (arg == 0)
test7_result = 1;
return 0;
}
Before the previous verifier change, we have xlated codes:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
1: (b4) w0 = 0
2: (95) exit
After the previous verifier change, we have:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; if (arg == 0)
1: (55) if r1 != 0x0 goto pc+4
; test7_result = 1;
2: (18) r1 = map[id:6][0]+48
4: (b7) r2 = 1
5: (7b) *(u64 *)(r1 +0) = r2
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
6: (b4) w0 = 0
7: (95) exit
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
2020-07-01 01:12:41 +08:00
|
|
|
struct bpf_fentry_test_t {
|
|
|
|
struct bpf_fentry_test_t *a;
|
|
|
|
};
|
|
|
|
|
|
|
|
int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg)
|
|
|
|
{
|
|
|
|
return (long)arg;
|
|
|
|
}
|
|
|
|
|
|
|
|
int noinline bpf_fentry_test8(struct bpf_fentry_test_t *arg)
|
|
|
|
{
|
|
|
|
return (long)arg->a;
|
|
|
|
}
|
|
|
|
|
2023-04-10 16:59:08 +08:00
|
|
|
__bpf_kfunc u32 bpf_fentry_test9(u32 *a)
|
|
|
|
{
|
|
|
|
return *a;
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc int bpf_modify_return_test(int a, int *b)
|
2020-03-05 03:18:53 +08:00
|
|
|
{
|
|
|
|
*b += 1;
|
|
|
|
return a + *b;
|
|
|
|
}
|
2021-03-25 09:52:52 +08:00
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc u64 bpf_kfunc_call_test1(struct sock *sk, u32 a, u64 b, u32 c, u64 d)
|
2021-03-25 09:52:52 +08:00
|
|
|
{
|
|
|
|
return a + b + c + d;
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc int bpf_kfunc_call_test2(struct sock *sk, u32 a, u32 b)
|
2021-03-25 09:52:52 +08:00
|
|
|
{
|
|
|
|
return a + b;
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc struct sock *bpf_kfunc_call_test3(struct sock *sk)
|
2021-03-25 09:52:52 +08:00
|
|
|
{
|
|
|
|
return sk;
|
|
|
|
}
|
|
|
|
|
2023-01-28 08:06:33 +08:00
|
|
|
long noinline bpf_kfunc_call_test4(signed char a, short b, int c, long d)
|
|
|
|
{
|
|
|
|
/* Provoke the compiler to assume that the caller has sign-extended a,
|
|
|
|
* b and c on platforms where this is required (e.g. s390x).
|
|
|
|
*/
|
|
|
|
return (long)a + (long)b + (long)c + d;
|
|
|
|
}
|
|
|
|
|
2023-03-10 15:41:00 +08:00
|
|
|
int noinline bpf_fentry_shadow_test(int a)
|
|
|
|
{
|
|
|
|
return a + 1;
|
|
|
|
}
|
|
|
|
|
2022-04-25 05:49:01 +08:00
|
|
|
struct prog_test_member1 {
|
|
|
|
int a;
|
|
|
|
};
|
|
|
|
|
2022-03-05 06:46:45 +08:00
|
|
|
struct prog_test_member {
|
2022-04-25 05:49:01 +08:00
|
|
|
struct prog_test_member1 m;
|
|
|
|
int c;
|
2022-03-05 06:46:45 +08:00
|
|
|
};
|
|
|
|
|
2022-01-15 00:39:52 +08:00
|
|
|
struct prog_test_ref_kfunc {
|
|
|
|
int a;
|
|
|
|
int b;
|
2022-03-05 06:46:45 +08:00
|
|
|
struct prog_test_member memb;
|
2022-01-15 00:39:52 +08:00
|
|
|
struct prog_test_ref_kfunc *next;
|
2022-05-12 03:46:52 +08:00
|
|
|
refcount_t cnt;
|
2022-01-15 00:39:52 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct prog_test_ref_kfunc prog_test_struct = {
|
|
|
|
.a = 42,
|
|
|
|
.b = 108,
|
|
|
|
.next = &prog_test_struct,
|
2022-05-12 03:46:52 +08:00
|
|
|
.cnt = REFCOUNT_INIT(1),
|
2022-01-15 00:39:52 +08:00
|
|
|
};
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc struct prog_test_ref_kfunc *
|
2022-01-15 00:39:52 +08:00
|
|
|
bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr)
|
|
|
|
{
|
2022-05-12 03:46:52 +08:00
|
|
|
refcount_inc(&prog_test_struct.cnt);
|
2022-01-15 00:39:52 +08:00
|
|
|
return &prog_test_struct;
|
|
|
|
}
|
|
|
|
|
bpf: Treat KF_RELEASE kfuncs as KF_TRUSTED_ARGS
KF_RELEASE kfuncs are not currently treated as having KF_TRUSTED_ARGS,
even though they have a superset of the requirements of KF_TRUSTED_ARGS.
Like KF_TRUSTED_ARGS, KF_RELEASE kfuncs require a 0-offset argument, and
don't allow NULL-able arguments. Unlike KF_TRUSTED_ARGS which require
_either_ an argument with ref_obj_id > 0, _or_ (ref->type &
BPF_REG_TRUSTED_MODIFIERS) (and no unsafe modifiers allowed), KF_RELEASE
only allows for ref_obj_id > 0. Because KF_RELEASE today doesn't
automatically imply KF_TRUSTED_ARGS, some of these requirements are
enforced in different ways that can make the behavior of the verifier
feel unpredictable. For example, a KF_RELEASE kfunc with a NULL-able
argument will currently fail in the verifier with a message like, "arg#0
is ptr_or_null_ expected ptr_ or socket" rather than "Possibly NULL
pointer passed to trusted arg0". Our intention is the same, but the
semantics are different due to implemenetation details that kfunc authors
and BPF program writers should not need to care about.
Let's make the behavior of the verifier more consistent and intuitive by
having KF_RELEASE kfuncs imply the presence of KF_TRUSTED_ARGS. Our
eventual goal is to have all kfuncs assume KF_TRUSTED_ARGS by default
anyways, so this takes us a step in that direction.
Note that it does not make sense to assume KF_TRUSTED_ARGS for all
KF_ACQUIRE kfuncs. KF_ACQUIRE kfuncs can have looser semantics than
KF_RELEASE, with e.g. KF_RCU | KF_RET_NULL. We may want to have
KF_ACQUIRE imply KF_TRUSTED_ARGS _unless_ KF_RCU is specified, but that
can be left to another patch set, and there are no such subtleties to
address for KF_RELEASE.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-26 05:31:46 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p)
|
|
|
|
{
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc struct prog_test_member *
|
2022-04-25 05:49:01 +08:00
|
|
|
bpf_kfunc_call_memb_acquire(void)
|
|
|
|
{
|
2022-05-12 03:46:52 +08:00
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
return NULL;
|
2022-04-25 05:49:01 +08:00
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
2022-05-12 03:46:52 +08:00
|
|
|
refcount_dec(&p->cnt);
|
2022-01-15 00:39:52 +08:00
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_memb_release(struct prog_test_member *p)
|
2022-03-05 06:46:45 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_memb1_release(struct prog_test_member1 *p)
|
2022-04-25 05:49:01 +08:00
|
|
|
{
|
2022-05-12 03:46:52 +08:00
|
|
|
WARN_ON_ONCE(1);
|
2022-04-25 05:49:01 +08:00
|
|
|
}
|
|
|
|
|
2022-09-06 23:13:03 +08:00
|
|
|
static int *__bpf_kfunc_call_test_get_mem(struct prog_test_ref_kfunc *p, const int size)
|
|
|
|
{
|
|
|
|
if (size > 2 * sizeof(int))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return (int *)p;
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p,
|
|
|
|
const int rdwr_buf_size)
|
2022-09-06 23:13:03 +08:00
|
|
|
{
|
|
|
|
return __bpf_kfunc_call_test_get_mem(p, rdwr_buf_size);
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p,
|
|
|
|
const int rdonly_buf_size)
|
2022-09-06 23:13:03 +08:00
|
|
|
{
|
|
|
|
return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* the next 2 ones can't be really used for testing expect to ensure
|
|
|
|
* that the verifier rejects the call.
|
|
|
|
* Acquire functions must return struct pointers, so these ones are
|
|
|
|
* failing.
|
|
|
|
*/
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc int *bpf_kfunc_call_test_acq_rdonly_mem(struct prog_test_ref_kfunc *p,
|
|
|
|
const int rdonly_buf_size)
|
2022-09-06 23:13:03 +08:00
|
|
|
{
|
|
|
|
return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size);
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_int_mem_release(int *p)
|
2022-09-06 23:13:03 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2022-01-15 00:39:52 +08:00
|
|
|
struct prog_test_pass1 {
|
|
|
|
int x0;
|
|
|
|
struct {
|
|
|
|
int x1;
|
|
|
|
struct {
|
|
|
|
int x2;
|
|
|
|
struct {
|
|
|
|
int x3;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
|
|
|
struct prog_test_pass2 {
|
|
|
|
int len;
|
|
|
|
short arr1[4];
|
|
|
|
struct {
|
|
|
|
char arr2[4];
|
|
|
|
unsigned long arr3[8];
|
|
|
|
} x;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct prog_test_fail1 {
|
|
|
|
void *p;
|
|
|
|
int x;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct prog_test_fail2 {
|
|
|
|
int x8;
|
|
|
|
struct prog_test_pass1 x;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct prog_test_fail3 {
|
|
|
|
int len;
|
|
|
|
char arr1[2];
|
2022-01-22 19:09:44 +08:00
|
|
|
char arr2[];
|
2022-01-15 00:39:52 +08:00
|
|
|
};
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_pass1(struct prog_test_pass1 *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_fail1(struct prog_test_fail1 *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_fail2(struct prog_test_fail2 *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_fail3(struct prog_test_fail3 *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_mem_len_pass1(void *mem, int mem__sz)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_mem_len_fail1(void *mem, int len)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p)
|
2022-01-15 00:39:52 +08:00
|
|
|
{
|
2023-03-03 12:14:43 +08:00
|
|
|
/* p != NULL, but p->cnt could be 0 */
|
2022-01-15 00:39:52 +08:00
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:15 +08:00
|
|
|
__bpf_kfunc void bpf_kfunc_call_test_destructive(void)
|
2022-07-21 21:42:36 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2023-02-02 01:30:16 +08:00
|
|
|
__bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused)
|
2022-08-10 14:59:05 +08:00
|
|
|
{
|
2023-02-02 01:30:16 +08:00
|
|
|
return arg;
|
2022-08-10 14:59:05 +08:00
|
|
|
}
|
|
|
|
|
2020-03-28 04:47:13 +08:00
|
|
|
__diag_pop();
|
2020-03-05 03:18:53 +08:00
|
|
|
|
2022-12-06 22:59:32 +08:00
|
|
|
BTF_SET8_START(bpf_test_modify_return_ids)
|
|
|
|
BTF_ID_FLAGS(func, bpf_modify_return_test)
|
|
|
|
BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE)
|
|
|
|
BTF_SET8_END(bpf_test_modify_return_ids)
|
|
|
|
|
|
|
|
static const struct btf_kfunc_id_set bpf_test_modify_return_set = {
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.set = &bpf_test_modify_return_ids,
|
|
|
|
};
|
2020-03-05 03:18:53 +08:00
|
|
|
|
2022-07-21 21:42:35 +08:00
|
|
|
BTF_SET8_START(test_sk_check_kfunc_ids)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test2)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test3)
|
2023-01-28 08:06:33 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test4)
|
2022-07-21 21:42:35 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_acquire, KF_ACQUIRE | KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE)
|
2022-09-06 23:13:03 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdwr_mem, KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdonly_mem, KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_acq_rdonly_mem, KF_ACQUIRE | KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_int_mem_release, KF_RELEASE)
|
2022-07-21 21:42:35 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass2)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail1)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail2)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
|
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
|
2023-03-03 12:14:43 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
|
2022-08-10 14:59:05 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
|
2023-02-02 01:30:16 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
|
bpf: Treat KF_RELEASE kfuncs as KF_TRUSTED_ARGS
KF_RELEASE kfuncs are not currently treated as having KF_TRUSTED_ARGS,
even though they have a superset of the requirements of KF_TRUSTED_ARGS.
Like KF_TRUSTED_ARGS, KF_RELEASE kfuncs require a 0-offset argument, and
don't allow NULL-able arguments. Unlike KF_TRUSTED_ARGS which require
_either_ an argument with ref_obj_id > 0, _or_ (ref->type &
BPF_REG_TRUSTED_MODIFIERS) (and no unsafe modifiers allowed), KF_RELEASE
only allows for ref_obj_id > 0. Because KF_RELEASE today doesn't
automatically imply KF_TRUSTED_ARGS, some of these requirements are
enforced in different ways that can make the behavior of the verifier
feel unpredictable. For example, a KF_RELEASE kfunc with a NULL-able
argument will currently fail in the verifier with a message like, "arg#0
is ptr_or_null_ expected ptr_ or socket" rather than "Possibly NULL
pointer passed to trusted arg0". Our intention is the same, but the
semantics are different due to implemenetation details that kfunc authors
and BPF program writers should not need to care about.
Let's make the behavior of the verifier more consistent and intuitive by
having KF_RELEASE kfuncs imply the presence of KF_TRUSTED_ARGS. Our
eventual goal is to have all kfuncs assume KF_TRUSTED_ARGS by default
anyways, so this takes us a step in that direction.
Note that it does not make sense to assume KF_TRUSTED_ARGS for all
KF_ACQUIRE kfuncs. KF_ACQUIRE kfuncs can have looser semantics than
KF_RELEASE, with e.g. KF_RCU | KF_RET_NULL. We may want to have
KF_ACQUIRE imply KF_TRUSTED_ARGS _unless_ KF_RCU is specified, but that
can be left to another patch set, and there are no such subtleties to
address for KF_RELEASE.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-26 05:31:46 +08:00
|
|
|
BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
|
2022-07-21 21:42:35 +08:00
|
|
|
BTF_SET8_END(test_sk_check_kfunc_ids)
|
2022-04-25 05:49:00 +08:00
|
|
|
|
2022-01-21 18:09:57 +08:00
|
|
|
static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
|
|
|
|
u32 size, u32 headroom, u32 tailroom)
|
2017-03-31 12:45:38 +08:00
|
|
|
{
|
|
|
|
void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
|
|
|
|
void *data;
|
|
|
|
|
|
|
|
if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
|
2020-05-18 21:05:27 +08:00
|
|
|
if (user_size > size)
|
|
|
|
return ERR_PTR(-EMSGSIZE);
|
|
|
|
|
2022-11-02 16:16:20 +08:00
|
|
|
size = SKB_DATA_ALIGN(size);
|
2017-03-31 12:45:38 +08:00
|
|
|
data = kzalloc(size + headroom + tailroom, GFP_USER);
|
|
|
|
if (!data)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2020-05-18 21:05:27 +08:00
|
|
|
if (copy_from_user(data + headroom, data_in, user_size)) {
|
2017-03-31 12:45:38 +08:00
|
|
|
kfree(data);
|
|
|
|
return ERR_PTR(-EFAULT);
|
|
|
|
}
|
2020-03-05 03:18:52 +08:00
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
return data;
|
|
|
|
}
|
|
|
|
|
2020-03-05 03:18:52 +08:00
|
|
|
int bpf_prog_test_run_tracing(struct bpf_prog *prog,
|
|
|
|
const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
bpf: Add tests for PTR_TO_BTF_ID vs. null comparison
Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
one for PTR_TO_BTF_ID in the ctx structure and the
other for PTR_TO_BTF_ID after one level pointer chasing.
In both cases, the test ensures condition is not
removed.
For example, for this test
struct bpf_fentry_test_t {
struct bpf_fentry_test_t *a;
};
int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
{
if (arg == 0)
test7_result = 1;
return 0;
}
Before the previous verifier change, we have xlated codes:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
1: (b4) w0 = 0
2: (95) exit
After the previous verifier change, we have:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; if (arg == 0)
1: (55) if r1 != 0x0 goto pc+4
; test7_result = 1;
2: (18) r1 = map[id:6][0]+48
4: (b7) r2 = 1
5: (7b) *(u64 *)(r1 +0) = r2
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
6: (b4) w0 = 0
7: (95) exit
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
2020-07-01 01:12:41 +08:00
|
|
|
struct bpf_fentry_test_t arg = {};
|
2020-03-05 03:18:53 +08:00
|
|
|
u16 side_effect = 0, ret = 0;
|
|
|
|
int b = 2, err = -EFAULT;
|
|
|
|
u32 retval = 0;
|
2020-03-05 03:18:52 +08:00
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
|
2020-09-26 04:54:29 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2020-03-05 03:18:52 +08:00
|
|
|
switch (prog->expected_attach_type) {
|
|
|
|
case BPF_TRACE_FENTRY:
|
|
|
|
case BPF_TRACE_FEXIT:
|
|
|
|
if (bpf_fentry_test1(1) != 2 ||
|
|
|
|
bpf_fentry_test2(2, 3) != 5 ||
|
|
|
|
bpf_fentry_test3(4, 5, 6) != 15 ||
|
|
|
|
bpf_fentry_test4((void *)7, 8, 9, 10) != 34 ||
|
|
|
|
bpf_fentry_test5(11, (void *)12, 13, 14, 15) != 65 ||
|
bpf: Add tests for PTR_TO_BTF_ID vs. null comparison
Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
one for PTR_TO_BTF_ID in the ctx structure and the
other for PTR_TO_BTF_ID after one level pointer chasing.
In both cases, the test ensures condition is not
removed.
For example, for this test
struct bpf_fentry_test_t {
struct bpf_fentry_test_t *a;
};
int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
{
if (arg == 0)
test7_result = 1;
return 0;
}
Before the previous verifier change, we have xlated codes:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
1: (b4) w0 = 0
2: (95) exit
After the previous verifier change, we have:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; if (arg == 0)
1: (55) if r1 != 0x0 goto pc+4
; test7_result = 1;
2: (18) r1 = map[id:6][0]+48
4: (b7) r2 = 1
5: (7b) *(u64 *)(r1 +0) = r2
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
6: (b4) w0 = 0
7: (95) exit
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
2020-07-01 01:12:41 +08:00
|
|
|
bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111 ||
|
|
|
|
bpf_fentry_test7((struct bpf_fentry_test_t *)0) != 0 ||
|
2023-04-10 16:59:08 +08:00
|
|
|
bpf_fentry_test8(&arg) != 0 ||
|
|
|
|
bpf_fentry_test9(&retval) != 0)
|
2020-03-05 03:18:52 +08:00
|
|
|
goto out;
|
|
|
|
break;
|
2020-03-05 03:18:53 +08:00
|
|
|
case BPF_MODIFY_RETURN:
|
|
|
|
ret = bpf_modify_return_test(1, &b);
|
|
|
|
if (b != 2)
|
|
|
|
side_effect = 1;
|
|
|
|
break;
|
2020-03-05 03:18:52 +08:00
|
|
|
default:
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-03-05 03:18:53 +08:00
|
|
|
retval = ((u32)side_effect << 16) | ret;
|
|
|
|
if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
|
|
|
|
goto out;
|
|
|
|
|
2020-03-05 03:18:52 +08:00
|
|
|
err = 0;
|
|
|
|
out:
|
|
|
|
trace_bpf_test_finish(&err);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-09-26 04:54:29 +08:00
|
|
|
struct bpf_raw_tp_test_run_info {
|
|
|
|
struct bpf_prog *prog;
|
|
|
|
void *ctx;
|
|
|
|
u32 retval;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void
|
|
|
|
__bpf_prog_test_run_raw_tp(void *data)
|
|
|
|
{
|
|
|
|
struct bpf_raw_tp_test_run_info *info = data;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
2021-08-15 15:05:54 +08:00
|
|
|
info->retval = bpf_prog_run(info->prog, info->ctx);
|
2020-09-26 04:54:29 +08:00
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
|
|
|
|
const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
|
|
|
void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in);
|
|
|
|
__u32 ctx_size_in = kattr->test.ctx_size_in;
|
|
|
|
struct bpf_raw_tp_test_run_info info;
|
|
|
|
int cpu = kattr->test.cpu, err = 0;
|
2020-09-30 06:29:49 +08:00
|
|
|
int current_cpu;
|
2020-09-26 04:54:29 +08:00
|
|
|
|
|
|
|
/* doesn't support data_in/out, ctx_out, duration, or repeat */
|
|
|
|
if (kattr->test.data_in || kattr->test.data_out ||
|
|
|
|
kattr->test.ctx_out || kattr->test.duration ||
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
kattr->test.repeat || kattr->test.batch_size)
|
2020-09-26 04:54:29 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2021-01-13 07:42:54 +08:00
|
|
|
if (ctx_size_in < prog->aux->max_ctx_offset ||
|
|
|
|
ctx_size_in > MAX_BPF_FUNC_ARGS * sizeof(u64))
|
2020-09-26 04:54:29 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 && cpu != 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (ctx_size_in) {
|
2021-10-18 19:30:48 +08:00
|
|
|
info.ctx = memdup_user(ctx_in, ctx_size_in);
|
|
|
|
if (IS_ERR(info.ctx))
|
|
|
|
return PTR_ERR(info.ctx);
|
2020-09-26 04:54:29 +08:00
|
|
|
} else {
|
|
|
|
info.ctx = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
info.prog = prog;
|
|
|
|
|
2020-09-30 06:29:49 +08:00
|
|
|
current_cpu = get_cpu();
|
2020-09-26 04:54:29 +08:00
|
|
|
if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
|
2020-09-30 06:29:49 +08:00
|
|
|
cpu == current_cpu) {
|
2020-09-26 04:54:29 +08:00
|
|
|
__bpf_prog_test_run_raw_tp(&info);
|
2020-09-30 06:29:49 +08:00
|
|
|
} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
|
2020-09-26 04:54:29 +08:00
|
|
|
/* smp_call_function_single() also checks cpu_online()
|
|
|
|
* after csd_lock(). However, since cpu is from user
|
|
|
|
* space, let's do an extra quick check to filter out
|
|
|
|
* invalid value before smp_call_function_single().
|
|
|
|
*/
|
2020-09-30 06:29:49 +08:00
|
|
|
err = -ENXIO;
|
|
|
|
} else {
|
2020-09-26 04:54:29 +08:00
|
|
|
err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
|
|
|
|
&info, 1);
|
|
|
|
}
|
2020-09-30 06:29:49 +08:00
|
|
|
put_cpu();
|
2020-09-26 04:54:29 +08:00
|
|
|
|
2020-09-30 06:29:49 +08:00
|
|
|
if (!err &&
|
|
|
|
copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32)))
|
2020-09-26 04:54:29 +08:00
|
|
|
err = -EFAULT;
|
|
|
|
|
|
|
|
kfree(info.ctx);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-04-10 02:49:09 +08:00
|
|
|
static void *bpf_ctx_init(const union bpf_attr *kattr, u32 max_size)
|
|
|
|
{
|
|
|
|
void __user *data_in = u64_to_user_ptr(kattr->test.ctx_in);
|
|
|
|
void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out);
|
|
|
|
u32 size = kattr->test.ctx_size_in;
|
|
|
|
void *data;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!data_in && !data_out)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
data = kzalloc(max_size, GFP_USER);
|
|
|
|
if (!data)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
|
|
|
if (data_in) {
|
2021-05-14 08:36:05 +08:00
|
|
|
err = bpf_check_uarg_tail_zero(USER_BPFPTR(data_in), max_size, size);
|
2019-04-10 02:49:09 +08:00
|
|
|
if (err) {
|
|
|
|
kfree(data);
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
|
|
|
|
size = min_t(u32, max_size, size);
|
|
|
|
if (copy_from_user(data, data_in, size)) {
|
|
|
|
kfree(data);
|
|
|
|
return ERR_PTR(-EFAULT);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return data;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_ctx_finish(const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr, const void *data,
|
|
|
|
u32 size)
|
|
|
|
{
|
|
|
|
void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out);
|
|
|
|
int err = -EFAULT;
|
|
|
|
u32 copy_size = size;
|
|
|
|
|
|
|
|
if (!data || !data_out)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (copy_size > kattr->test.ctx_size_out) {
|
|
|
|
copy_size = kattr->test.ctx_size_out;
|
|
|
|
err = -ENOSPC;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (copy_to_user(data_out, data, copy_size))
|
|
|
|
goto out;
|
|
|
|
if (copy_to_user(&uattr->test.ctx_size_out, &size, sizeof(size)))
|
|
|
|
goto out;
|
|
|
|
if (err != -ENOSPC)
|
|
|
|
err = 0;
|
|
|
|
out:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* range_is_zero - test whether buffer is initialized
|
|
|
|
* @buf: buffer to check
|
|
|
|
* @from: check from this position
|
|
|
|
* @to: check up until (excluding) this position
|
|
|
|
*
|
|
|
|
* This function returns true if the there is a non-zero byte
|
|
|
|
* in the buf in the range [from,to).
|
|
|
|
*/
|
|
|
|
static inline bool range_is_zero(void *buf, size_t from, size_t to)
|
|
|
|
{
|
|
|
|
return !memchr_inv((u8 *)buf + from, 0, to - from);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
|
|
|
|
{
|
|
|
|
struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
|
|
|
|
|
|
|
|
if (!__skb)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* make sure the fields we don't use are zeroed */
|
2019-12-19 04:57:47 +08:00
|
|
|
if (!range_is_zero(__skb, 0, offsetof(struct __sk_buff, mark)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* mark is allowed */
|
|
|
|
|
|
|
|
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, mark),
|
|
|
|
offsetof(struct __sk_buff, priority)))
|
2019-04-10 02:49:09 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* priority is allowed */
|
2021-08-31 11:33:56 +08:00
|
|
|
/* ingress_ifindex is allowed */
|
2020-08-03 17:05:45 +08:00
|
|
|
/* ifindex is allowed */
|
|
|
|
|
|
|
|
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, ifindex),
|
2019-04-10 02:49:09 +08:00
|
|
|
offsetof(struct __sk_buff, cb)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* cb is allowed */
|
|
|
|
|
2019-12-11 03:19:33 +08:00
|
|
|
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, cb),
|
2019-10-16 02:31:24 +08:00
|
|
|
offsetof(struct __sk_buff, tstamp)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* tstamp is allowed */
|
2019-12-14 06:30:27 +08:00
|
|
|
/* wire_len is allowed */
|
|
|
|
/* gso_segs is allowed */
|
2019-10-16 02:31:24 +08:00
|
|
|
|
2019-12-14 06:30:27 +08:00
|
|
|
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs),
|
2020-03-04 04:05:01 +08:00
|
|
|
offsetof(struct __sk_buff, gso_size)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* gso_size is allowed */
|
|
|
|
|
|
|
|
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_size),
|
2021-09-10 06:04:09 +08:00
|
|
|
offsetof(struct __sk_buff, hwtstamp)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* hwtstamp is allowed */
|
|
|
|
|
|
|
|
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, hwtstamp),
|
2019-04-10 02:49:09 +08:00
|
|
|
sizeof(struct __sk_buff)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-12-19 04:57:47 +08:00
|
|
|
skb->mark = __skb->mark;
|
2019-04-10 02:49:09 +08:00
|
|
|
skb->priority = __skb->priority;
|
2021-08-31 11:33:56 +08:00
|
|
|
skb->skb_iif = __skb->ingress_ifindex;
|
2019-10-16 02:31:24 +08:00
|
|
|
skb->tstamp = __skb->tstamp;
|
2019-04-10 02:49:09 +08:00
|
|
|
memcpy(&cb->data, __skb->cb, QDISC_CB_PRIV_LEN);
|
|
|
|
|
2019-12-14 06:30:27 +08:00
|
|
|
if (__skb->wire_len == 0) {
|
|
|
|
cb->pkt_len = skb->len;
|
|
|
|
} else {
|
|
|
|
if (__skb->wire_len < skb->len ||
|
2022-05-14 02:33:57 +08:00
|
|
|
__skb->wire_len > GSO_LEGACY_MAX_SIZE)
|
2019-12-14 06:30:27 +08:00
|
|
|
return -EINVAL;
|
|
|
|
cb->pkt_len = __skb->wire_len;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (__skb->gso_segs > GSO_MAX_SEGS)
|
|
|
|
return -EINVAL;
|
|
|
|
skb_shinfo(skb)->gso_segs = __skb->gso_segs;
|
2020-03-04 04:05:01 +08:00
|
|
|
skb_shinfo(skb)->gso_size = __skb->gso_size;
|
2021-09-10 06:04:09 +08:00
|
|
|
skb_shinfo(skb)->hwtstamps.hwtstamp = __skb->hwtstamp;
|
2019-12-14 06:30:27 +08:00
|
|
|
|
2019-04-10 02:49:09 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void convert_skb_to___skb(struct sk_buff *skb, struct __sk_buff *__skb)
|
|
|
|
{
|
|
|
|
struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
|
|
|
|
|
|
|
|
if (!__skb)
|
|
|
|
return;
|
|
|
|
|
2019-12-19 04:57:47 +08:00
|
|
|
__skb->mark = skb->mark;
|
2019-04-10 02:49:09 +08:00
|
|
|
__skb->priority = skb->priority;
|
2021-08-31 11:33:56 +08:00
|
|
|
__skb->ingress_ifindex = skb->skb_iif;
|
2020-08-03 17:05:45 +08:00
|
|
|
__skb->ifindex = skb->dev->ifindex;
|
2019-10-16 02:31:24 +08:00
|
|
|
__skb->tstamp = skb->tstamp;
|
2019-04-10 02:49:09 +08:00
|
|
|
memcpy(__skb->cb, &cb->data, QDISC_CB_PRIV_LEN);
|
2019-12-14 06:30:27 +08:00
|
|
|
__skb->wire_len = cb->pkt_len;
|
|
|
|
__skb->gso_segs = skb_shinfo(skb)->gso_segs;
|
2021-09-10 06:04:09 +08:00
|
|
|
__skb->hwtstamp = skb_shinfo(skb)->hwtstamps.hwtstamp;
|
2019-04-10 02:49:09 +08:00
|
|
|
}
|
|
|
|
|
2021-09-27 20:39:21 +08:00
|
|
|
static struct proto bpf_dummy_proto = {
|
|
|
|
.name = "bpf_dummy",
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.obj_size = sizeof(struct sock),
|
|
|
|
};
|
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
|
|
|
bool is_l2 = false, is_direct_pkt_access = false;
|
2020-08-03 17:05:45 +08:00
|
|
|
struct net *net = current->nsproxy->net_ns;
|
|
|
|
struct net_device *dev = net->loopback_dev;
|
2017-03-31 12:45:38 +08:00
|
|
|
u32 size = kattr->test.data_size_in;
|
|
|
|
u32 repeat = kattr->test.repeat;
|
2019-04-10 02:49:09 +08:00
|
|
|
struct __sk_buff *ctx = NULL;
|
2017-03-31 12:45:38 +08:00
|
|
|
u32 retval, duration;
|
2018-07-11 21:30:14 +08:00
|
|
|
int hh_len = ETH_HLEN;
|
2017-03-31 12:45:38 +08:00
|
|
|
struct sk_buff *skb;
|
2018-10-20 00:57:58 +08:00
|
|
|
struct sock *sk;
|
2017-03-31 12:45:38 +08:00
|
|
|
void *data;
|
|
|
|
int ret;
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
|
2020-09-26 04:54:29 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2022-01-21 18:09:57 +08:00
|
|
|
data = bpf_test_init(kattr, kattr->test.data_size_in,
|
|
|
|
size, NET_SKB_PAD + NET_IP_ALIGN,
|
2017-03-31 12:45:38 +08:00
|
|
|
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
|
|
|
|
if (IS_ERR(data))
|
|
|
|
return PTR_ERR(data);
|
|
|
|
|
2019-04-10 02:49:09 +08:00
|
|
|
ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
|
|
|
|
if (IS_ERR(ctx)) {
|
|
|
|
kfree(data);
|
|
|
|
return PTR_ERR(ctx);
|
|
|
|
}
|
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
switch (prog->type) {
|
|
|
|
case BPF_PROG_TYPE_SCHED_CLS:
|
|
|
|
case BPF_PROG_TYPE_SCHED_ACT:
|
|
|
|
is_l2 = true;
|
2020-08-24 06:36:59 +08:00
|
|
|
fallthrough;
|
2017-03-31 12:45:38 +08:00
|
|
|
case BPF_PROG_TYPE_LWT_IN:
|
|
|
|
case BPF_PROG_TYPE_LWT_OUT:
|
|
|
|
case BPF_PROG_TYPE_LWT_XMIT:
|
|
|
|
is_direct_pkt_access = true;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2021-09-27 20:39:21 +08:00
|
|
|
sk = sk_alloc(net, AF_UNSPEC, GFP_USER, &bpf_dummy_proto, 1);
|
2018-10-20 00:57:58 +08:00
|
|
|
if (!sk) {
|
|
|
|
kfree(data);
|
2019-04-10 02:49:09 +08:00
|
|
|
kfree(ctx);
|
2018-10-20 00:57:58 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
sock_init_data(NULL, sk);
|
|
|
|
|
2022-12-08 14:02:59 +08:00
|
|
|
skb = slab_build_skb(data);
|
2017-03-31 12:45:38 +08:00
|
|
|
if (!skb) {
|
|
|
|
kfree(data);
|
2019-04-10 02:49:09 +08:00
|
|
|
kfree(ctx);
|
2021-09-27 20:39:21 +08:00
|
|
|
sk_free(sk);
|
2017-03-31 12:45:38 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2018-10-20 00:57:58 +08:00
|
|
|
skb->sk = sk;
|
2017-03-31 12:45:38 +08:00
|
|
|
|
2017-05-02 23:36:45 +08:00
|
|
|
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
|
2017-03-31 12:45:38 +08:00
|
|
|
__skb_put(skb, size);
|
2020-08-03 17:05:45 +08:00
|
|
|
if (ctx && ctx->ifindex > 1) {
|
|
|
|
dev = dev_get_by_index(net, ctx->ifindex);
|
|
|
|
if (!dev) {
|
|
|
|
ret = -ENODEV;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
skb->protocol = eth_type_trans(skb, dev);
|
2017-03-31 12:45:38 +08:00
|
|
|
skb_reset_network_header(skb);
|
|
|
|
|
2020-08-03 17:05:44 +08:00
|
|
|
switch (skb->protocol) {
|
|
|
|
case htons(ETH_P_IP):
|
|
|
|
sk->sk_family = AF_INET;
|
|
|
|
if (sizeof(struct iphdr) <= skb_headlen(skb)) {
|
|
|
|
sk->sk_rcv_saddr = ip_hdr(skb)->saddr;
|
|
|
|
sk->sk_daddr = ip_hdr(skb)->daddr;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case htons(ETH_P_IPV6):
|
|
|
|
sk->sk_family = AF_INET6;
|
|
|
|
if (sizeof(struct ipv6hdr) <= skb_headlen(skb)) {
|
|
|
|
sk->sk_v6_rcv_saddr = ipv6_hdr(skb)->saddr;
|
|
|
|
sk->sk_v6_daddr = ipv6_hdr(skb)->daddr;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
if (is_l2)
|
2018-07-11 21:30:14 +08:00
|
|
|
__skb_push(skb, hh_len);
|
2017-03-31 12:45:38 +08:00
|
|
|
if (is_direct_pkt_access)
|
2017-09-25 08:25:50 +08:00
|
|
|
bpf_compute_data_pointers(skb);
|
2019-04-10 02:49:09 +08:00
|
|
|
ret = convert___skb_to_skb(skb, ctx);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
2019-12-14 01:51:10 +08:00
|
|
|
ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false);
|
2019-04-10 02:49:09 +08:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
2018-07-11 21:30:14 +08:00
|
|
|
if (!is_l2) {
|
|
|
|
if (skb_headroom(skb) < hh_len) {
|
|
|
|
int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb));
|
|
|
|
|
|
|
|
if (pskb_expand_head(skb, nhead, 0, GFP_USER)) {
|
2019-04-10 02:49:09 +08:00
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
2018-07-11 21:30:14 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
memset(__skb_push(skb, hh_len), 0, hh_len);
|
|
|
|
}
|
2019-04-10 02:49:09 +08:00
|
|
|
convert_skb_to___skb(skb, ctx);
|
2018-07-11 21:30:14 +08:00
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
size = skb->len;
|
|
|
|
/* bpf program can never convert linear skb to non-linear */
|
|
|
|
if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
|
|
|
|
size = skb_headlen(skb);
|
2022-01-21 18:09:59 +08:00
|
|
|
ret = bpf_test_finish(kattr, uattr, skb->data, NULL, size, retval,
|
|
|
|
duration);
|
2019-04-10 02:49:09 +08:00
|
|
|
if (!ret)
|
|
|
|
ret = bpf_ctx_finish(kattr, uattr, ctx,
|
|
|
|
sizeof(struct __sk_buff));
|
|
|
|
out:
|
2020-08-03 17:05:45 +08:00
|
|
|
if (dev && dev != net->loopback_dev)
|
|
|
|
dev_put(dev);
|
2017-03-31 12:45:38 +08:00
|
|
|
kfree_skb(skb);
|
2021-09-27 20:39:21 +08:00
|
|
|
sk_free(sk);
|
2019-04-10 02:49:09 +08:00
|
|
|
kfree(ctx);
|
2017-03-31 12:45:38 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-07-08 06:16:55 +08:00
|
|
|
static int xdp_convert_md_to_buff(struct xdp_md *xdp_md, struct xdp_buff *xdp)
|
|
|
|
{
|
2021-07-08 06:16:56 +08:00
|
|
|
unsigned int ingress_ifindex, rx_queue_index;
|
|
|
|
struct netdev_rx_queue *rxqueue;
|
|
|
|
struct net_device *device;
|
|
|
|
|
2021-07-08 06:16:55 +08:00
|
|
|
if (!xdp_md)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (xdp_md->egress_ifindex != 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2021-07-08 06:16:56 +08:00
|
|
|
ingress_ifindex = xdp_md->ingress_ifindex;
|
|
|
|
rx_queue_index = xdp_md->rx_queue_index;
|
|
|
|
|
|
|
|
if (!ingress_ifindex && rx_queue_index)
|
2021-07-08 06:16:55 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2021-07-08 06:16:56 +08:00
|
|
|
if (ingress_ifindex) {
|
|
|
|
device = dev_get_by_index(current->nsproxy->net_ns,
|
|
|
|
ingress_ifindex);
|
|
|
|
if (!device)
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
if (rx_queue_index >= device->real_num_rx_queues)
|
|
|
|
goto free_dev;
|
|
|
|
|
|
|
|
rxqueue = __netif_get_rx_queue(device, rx_queue_index);
|
2021-07-08 06:16:55 +08:00
|
|
|
|
2021-07-08 06:16:56 +08:00
|
|
|
if (!xdp_rxq_info_is_reg(&rxqueue->xdp_rxq))
|
|
|
|
goto free_dev;
|
|
|
|
|
|
|
|
xdp->rxq = &rxqueue->xdp_rxq;
|
|
|
|
/* The device is now tracked in the xdp->rxq for later
|
|
|
|
* dev_put()
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
|
|
|
|
xdp->data = xdp->data_meta + xdp_md->data;
|
2021-07-08 06:16:55 +08:00
|
|
|
return 0;
|
2021-07-08 06:16:56 +08:00
|
|
|
|
|
|
|
free_dev:
|
|
|
|
dev_put(device);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
|
|
|
|
{
|
|
|
|
if (!xdp_md)
|
|
|
|
return;
|
|
|
|
|
|
|
|
xdp_md->data = xdp->data - xdp->data_meta;
|
|
|
|
xdp_md->data_end = xdp->data_end - xdp->data_meta;
|
|
|
|
|
|
|
|
if (xdp_md->ingress_ifindex)
|
|
|
|
dev_put(xdp->rxq->dev);
|
2021-07-08 06:16:55 +08:00
|
|
|
}
|
|
|
|
|
2017-03-31 12:45:38 +08:00
|
|
|
int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES);
|
2020-05-14 18:51:35 +08:00
|
|
|
u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
u32 batch_size = kattr->test.batch_size;
|
2022-03-10 19:02:28 +08:00
|
|
|
u32 retval = 0, duration, max_data_sz;
|
2017-03-31 12:45:38 +08:00
|
|
|
u32 size = kattr->test.data_size_in;
|
2022-01-21 18:09:58 +08:00
|
|
|
u32 headroom = XDP_PACKET_HEADROOM;
|
2017-03-31 12:45:38 +08:00
|
|
|
u32 repeat = kattr->test.repeat;
|
2018-01-31 19:58:56 +08:00
|
|
|
struct netdev_rx_queue *rxqueue;
|
2022-01-21 18:09:58 +08:00
|
|
|
struct skb_shared_info *sinfo;
|
2017-03-31 12:45:38 +08:00
|
|
|
struct xdp_buff xdp = {};
|
2022-01-21 18:09:58 +08:00
|
|
|
int i, ret = -EINVAL;
|
2021-07-08 06:16:55 +08:00
|
|
|
struct xdp_md *ctx;
|
2017-03-31 12:45:38 +08:00
|
|
|
void *data;
|
|
|
|
|
2021-07-08 16:04:09 +08:00
|
|
|
if (prog->expected_attach_type == BPF_XDP_DEVMAP ||
|
|
|
|
prog->expected_attach_type == BPF_XDP_CPUMAP)
|
|
|
|
return -EINVAL;
|
2021-08-04 23:37:50 +08:00
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2023-01-20 06:15:26 +08:00
|
|
|
if (bpf_prog_is_dev_bound(prog->aux))
|
|
|
|
return -EINVAL;
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (do_live) {
|
|
|
|
if (!batch_size)
|
|
|
|
batch_size = NAPI_POLL_WEIGHT;
|
|
|
|
else if (batch_size > TEST_XDP_MAX_BATCH)
|
|
|
|
return -E2BIG;
|
2022-03-11 06:56:20 +08:00
|
|
|
|
|
|
|
headroom += sizeof(struct xdp_page_head);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
} else if (batch_size) {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-07-08 06:16:55 +08:00
|
|
|
ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
|
|
|
|
if (IS_ERR(ctx))
|
|
|
|
return PTR_ERR(ctx);
|
|
|
|
|
|
|
|
if (ctx) {
|
|
|
|
/* There can't be user provided data before the meta data */
|
|
|
|
if (ctx->data_meta || ctx->data_end != size ||
|
|
|
|
ctx->data > ctx->data_end ||
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
unlikely(xdp_metalen_invalid(ctx->data)) ||
|
|
|
|
(do_live && (kattr->test.data_out || kattr->test.ctx_out)))
|
2021-07-08 06:16:55 +08:00
|
|
|
goto free_ctx;
|
|
|
|
/* Meta data is allocated from the headroom */
|
|
|
|
headroom -= ctx->data;
|
|
|
|
}
|
2019-04-12 06:47:07 +08:00
|
|
|
|
2020-05-14 18:51:35 +08:00
|
|
|
max_data_sz = 4096 - headroom - tailroom;
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (size > max_data_sz) {
|
|
|
|
/* disallow live data mode for jumbo frames */
|
|
|
|
if (do_live)
|
|
|
|
goto free_ctx;
|
|
|
|
size = max_data_sz;
|
|
|
|
}
|
2020-05-14 18:51:35 +08:00
|
|
|
|
2022-01-21 18:09:58 +08:00
|
|
|
data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom);
|
2021-07-08 06:16:55 +08:00
|
|
|
if (IS_ERR(data)) {
|
|
|
|
ret = PTR_ERR(data);
|
|
|
|
goto free_ctx;
|
|
|
|
}
|
2017-03-31 12:45:38 +08:00
|
|
|
|
2018-01-31 19:58:56 +08:00
|
|
|
rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
|
2022-01-21 18:09:58 +08:00
|
|
|
rxqueue->xdp_rxq.frag_size = headroom + max_data_sz + tailroom;
|
|
|
|
xdp_init_buff(&xdp, rxqueue->xdp_rxq.frag_size, &rxqueue->xdp_rxq);
|
2020-12-23 05:09:29 +08:00
|
|
|
xdp_prepare_buff(&xdp, data, headroom, size, true);
|
2022-01-21 18:09:58 +08:00
|
|
|
sinfo = xdp_get_shared_info_from_buff(&xdp);
|
2020-12-23 05:09:29 +08:00
|
|
|
|
2021-07-08 06:16:55 +08:00
|
|
|
ret = xdp_convert_md_to_buff(ctx, &xdp);
|
|
|
|
if (ret)
|
|
|
|
goto free_data;
|
|
|
|
|
2022-01-21 18:09:58 +08:00
|
|
|
if (unlikely(kattr->test.data_size_in > size)) {
|
|
|
|
void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
|
|
|
|
|
|
|
|
while (size < kattr->test.data_size_in) {
|
|
|
|
struct page *page;
|
|
|
|
skb_frag_t *frag;
|
2022-02-05 07:58:48 +08:00
|
|
|
u32 data_len;
|
2022-01-21 18:09:58 +08:00
|
|
|
|
2022-02-03 04:53:20 +08:00
|
|
|
if (sinfo->nr_frags == MAX_SKB_FRAGS) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2022-01-21 18:09:58 +08:00
|
|
|
page = alloc_page(GFP_KERNEL);
|
|
|
|
if (!page) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
frag = &sinfo->frags[sinfo->nr_frags++];
|
|
|
|
__skb_frag_set_page(frag, page);
|
|
|
|
|
2022-02-05 07:58:48 +08:00
|
|
|
data_len = min_t(u32, kattr->test.data_size_in - size,
|
2022-01-21 18:09:58 +08:00
|
|
|
PAGE_SIZE);
|
|
|
|
skb_frag_size_set(frag, data_len);
|
|
|
|
|
|
|
|
if (copy_from_user(page_address(page), data_in + size,
|
|
|
|
data_len)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
sinfo->xdp_frags_size += data_len;
|
|
|
|
size += data_len;
|
|
|
|
}
|
|
|
|
xdp_buff_set_frags_flag(&xdp);
|
|
|
|
}
|
|
|
|
|
2021-09-28 17:30:59 +08:00
|
|
|
if (repeat > 1)
|
|
|
|
bpf_prog_change_xdp(NULL, prog);
|
2022-01-21 18:09:58 +08:00
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (do_live)
|
|
|
|
ret = bpf_test_run_xdp_live(prog, &xdp, repeat, batch_size, &duration);
|
|
|
|
else
|
|
|
|
ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
|
2021-07-08 06:16:56 +08:00
|
|
|
/* We convert the xdp_buff back to an xdp_md before checking the return
|
|
|
|
* code so the reference count of any held netdevice will be decremented
|
|
|
|
* even if the test run failed.
|
|
|
|
*/
|
|
|
|
xdp_convert_buff_to_md(&xdp, ctx);
|
2018-12-02 02:39:44 +08:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
2021-07-08 06:16:55 +08:00
|
|
|
|
2022-01-21 18:09:58 +08:00
|
|
|
size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size;
|
2022-01-21 18:09:59 +08:00
|
|
|
ret = bpf_test_finish(kattr, uattr, xdp.data_meta, sinfo, size,
|
|
|
|
retval, duration);
|
2021-07-08 06:16:55 +08:00
|
|
|
if (!ret)
|
|
|
|
ret = bpf_ctx_finish(kattr, uattr, ctx,
|
|
|
|
sizeof(struct xdp_md));
|
|
|
|
|
2018-12-02 02:39:44 +08:00
|
|
|
out:
|
2021-09-28 17:30:59 +08:00
|
|
|
if (repeat > 1)
|
|
|
|
bpf_prog_change_xdp(prog, NULL);
|
2021-07-08 06:16:55 +08:00
|
|
|
free_data:
|
2022-01-21 18:09:58 +08:00
|
|
|
for (i = 0; i < sinfo->nr_frags; i++)
|
|
|
|
__free_page(skb_frag_page(&sinfo->frags[i]));
|
2017-03-31 12:45:38 +08:00
|
|
|
kfree(data);
|
2021-07-08 06:16:55 +08:00
|
|
|
free_ctx:
|
|
|
|
kfree(ctx);
|
2017-03-31 12:45:38 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2019-01-29 00:53:54 +08:00
|
|
|
|
2019-07-26 06:52:27 +08:00
|
|
|
static int verify_user_bpf_flow_keys(struct bpf_flow_keys *ctx)
|
|
|
|
{
|
|
|
|
/* make sure the fields we don't use are zeroed */
|
|
|
|
if (!range_is_zero(ctx, 0, offsetof(struct bpf_flow_keys, flags)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* flags is allowed */
|
|
|
|
|
2019-12-11 03:19:33 +08:00
|
|
|
if (!range_is_zero(ctx, offsetofend(struct bpf_flow_keys, flags),
|
2019-07-26 06:52:27 +08:00
|
|
|
sizeof(struct bpf_flow_keys)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-29 00:53:54 +08:00
|
|
|
int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
|
|
|
|
const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
2021-03-03 18:18:12 +08:00
|
|
|
struct bpf_test_timer t = { NO_PREEMPT };
|
2019-01-29 00:53:54 +08:00
|
|
|
u32 size = kattr->test.data_size_in;
|
2019-04-22 23:55:45 +08:00
|
|
|
struct bpf_flow_dissector ctx = {};
|
2019-01-29 00:53:54 +08:00
|
|
|
u32 repeat = kattr->test.repeat;
|
2019-07-26 06:52:27 +08:00
|
|
|
struct bpf_flow_keys *user_ctx;
|
2019-01-29 00:53:54 +08:00
|
|
|
struct bpf_flow_keys flow_keys;
|
2019-04-22 23:55:45 +08:00
|
|
|
const struct ethhdr *eth;
|
2019-07-26 06:52:27 +08:00
|
|
|
unsigned int flags = 0;
|
2019-01-29 00:53:54 +08:00
|
|
|
u32 retval, duration;
|
|
|
|
void *data;
|
|
|
|
int ret;
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
|
2020-09-26 04:54:29 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2019-04-22 23:55:45 +08:00
|
|
|
if (size < ETH_HLEN)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2022-01-21 18:09:57 +08:00
|
|
|
data = bpf_test_init(kattr, kattr->test.data_size_in, size, 0, 0);
|
2019-01-29 00:53:54 +08:00
|
|
|
if (IS_ERR(data))
|
|
|
|
return PTR_ERR(data);
|
|
|
|
|
2019-04-22 23:55:45 +08:00
|
|
|
eth = (struct ethhdr *)data;
|
2019-01-29 00:53:54 +08:00
|
|
|
|
|
|
|
if (!repeat)
|
|
|
|
repeat = 1;
|
|
|
|
|
2019-07-26 06:52:27 +08:00
|
|
|
user_ctx = bpf_ctx_init(kattr, sizeof(struct bpf_flow_keys));
|
|
|
|
if (IS_ERR(user_ctx)) {
|
|
|
|
kfree(data);
|
|
|
|
return PTR_ERR(user_ctx);
|
|
|
|
}
|
|
|
|
if (user_ctx) {
|
|
|
|
ret = verify_user_bpf_flow_keys(user_ctx);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
flags = user_ctx->flags;
|
|
|
|
}
|
|
|
|
|
2019-04-22 23:55:45 +08:00
|
|
|
ctx.flow_keys = &flow_keys;
|
|
|
|
ctx.data = data;
|
|
|
|
ctx.data_end = (__u8 *)data + size;
|
|
|
|
|
2021-03-03 18:18:12 +08:00
|
|
|
bpf_test_timer_enter(&t);
|
|
|
|
do {
|
2019-04-22 23:55:45 +08:00
|
|
|
retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
|
2019-07-26 06:52:27 +08:00
|
|
|
size, flags);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
} while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration));
|
2021-03-03 18:18:12 +08:00
|
|
|
bpf_test_timer_leave(&t);
|
2019-04-22 23:55:45 +08:00
|
|
|
|
2021-03-03 18:18:12 +08:00
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
2019-01-29 00:53:54 +08:00
|
|
|
|
2022-01-21 18:09:59 +08:00
|
|
|
ret = bpf_test_finish(kattr, uattr, &flow_keys, NULL,
|
|
|
|
sizeof(flow_keys), retval, duration);
|
2019-07-26 06:52:27 +08:00
|
|
|
if (!ret)
|
|
|
|
ret = bpf_ctx_finish(kattr, uattr, user_ctx,
|
|
|
|
sizeof(struct bpf_flow_keys));
|
2019-01-29 00:53:54 +08:00
|
|
|
|
2019-02-20 02:54:17 +08:00
|
|
|
out:
|
2019-07-26 06:52:27 +08:00
|
|
|
kfree(user_ctx);
|
2019-04-22 23:55:45 +08:00
|
|
|
kfree(data);
|
2019-01-29 00:53:54 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2021-03-03 18:18:13 +08:00
|
|
|
|
|
|
|
int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
|
|
|
struct bpf_test_timer t = { NO_PREEMPT };
|
|
|
|
struct bpf_prog_array *progs = NULL;
|
|
|
|
struct bpf_sk_lookup_kern ctx = {};
|
|
|
|
u32 repeat = kattr->test.repeat;
|
|
|
|
struct bpf_sk_lookup *user_ctx;
|
|
|
|
u32 retval, duration;
|
|
|
|
int ret = -EINVAL;
|
|
|
|
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
|
2021-03-03 18:18:13 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out ||
|
|
|
|
kattr->test.data_size_out)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!repeat)
|
|
|
|
repeat = 1;
|
|
|
|
|
|
|
|
user_ctx = bpf_ctx_init(kattr, sizeof(*user_ctx));
|
|
|
|
if (IS_ERR(user_ctx))
|
|
|
|
return PTR_ERR(user_ctx);
|
|
|
|
|
|
|
|
if (!user_ctx)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (user_ctx->sk)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx)))
|
|
|
|
goto out;
|
|
|
|
|
2022-02-10 02:43:32 +08:00
|
|
|
if (user_ctx->local_port > U16_MAX) {
|
2021-03-03 18:18:13 +08:00
|
|
|
ret = -ERANGE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ctx.family = (u16)user_ctx->family;
|
|
|
|
ctx.protocol = (u16)user_ctx->protocol;
|
|
|
|
ctx.dport = (u16)user_ctx->local_port;
|
2022-02-10 02:43:32 +08:00
|
|
|
ctx.sport = user_ctx->remote_port;
|
2021-03-03 18:18:13 +08:00
|
|
|
|
|
|
|
switch (ctx.family) {
|
|
|
|
case AF_INET:
|
|
|
|
ctx.v4.daddr = (__force __be32)user_ctx->local_ip4;
|
|
|
|
ctx.v4.saddr = (__force __be32)user_ctx->remote_ip4;
|
|
|
|
break;
|
|
|
|
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case AF_INET6:
|
|
|
|
ctx.v6.daddr = (struct in6_addr *)user_ctx->local_ip6;
|
|
|
|
ctx.v6.saddr = (struct in6_addr *)user_ctx->remote_ip6;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
default:
|
|
|
|
ret = -EAFNOSUPPORT;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
progs = bpf_prog_array_alloc(1, GFP_KERNEL);
|
|
|
|
if (!progs) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
progs->items[0].prog = prog;
|
|
|
|
|
|
|
|
bpf_test_timer_enter(&t);
|
|
|
|
do {
|
|
|
|
ctx.selected_sk = NULL;
|
2021-08-15 15:05:54 +08:00
|
|
|
retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run);
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
} while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration));
|
2021-03-03 18:18:13 +08:00
|
|
|
bpf_test_timer_leave(&t);
|
|
|
|
|
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
user_ctx->cookie = 0;
|
|
|
|
if (ctx.selected_sk) {
|
|
|
|
if (ctx.selected_sk->sk_reuseport && !ctx.no_reuseport) {
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);
|
|
|
|
}
|
|
|
|
|
2022-01-21 18:09:59 +08:00
|
|
|
ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration);
|
2021-03-03 18:18:13 +08:00
|
|
|
if (!ret)
|
|
|
|
ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx));
|
|
|
|
|
|
|
|
out:
|
|
|
|
bpf_prog_array_free(progs);
|
|
|
|
kfree(user_ctx);
|
|
|
|
return ret;
|
|
|
|
}
|
2021-05-14 08:36:03 +08:00
|
|
|
|
|
|
|
int bpf_prog_test_run_syscall(struct bpf_prog *prog,
|
|
|
|
const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
|
|
|
void __user *ctx_in = u64_to_user_ptr(kattr->test.ctx_in);
|
|
|
|
__u32 ctx_size_in = kattr->test.ctx_size_in;
|
|
|
|
void *ctx = NULL;
|
|
|
|
u32 retval;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/* doesn't support data_in/out, ctx_out, duration, or repeat or flags */
|
|
|
|
if (kattr->test.data_in || kattr->test.data_out ||
|
|
|
|
kattr->test.ctx_out || kattr->test.duration ||
|
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09 18:53:42 +08:00
|
|
|
kattr->test.repeat || kattr->test.flags ||
|
|
|
|
kattr->test.batch_size)
|
2021-05-14 08:36:03 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (ctx_size_in < prog->aux->max_ctx_offset ||
|
|
|
|
ctx_size_in > U16_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (ctx_size_in) {
|
2021-10-18 19:30:48 +08:00
|
|
|
ctx = memdup_user(ctx_in, ctx_size_in);
|
|
|
|
if (IS_ERR(ctx))
|
|
|
|
return PTR_ERR(ctx);
|
2021-05-14 08:36:03 +08:00
|
|
|
}
|
2021-08-10 07:51:51 +08:00
|
|
|
|
|
|
|
rcu_read_lock_trace();
|
2021-05-14 08:36:03 +08:00
|
|
|
retval = bpf_prog_run_pin_on_cpu(prog, ctx);
|
2021-08-10 07:51:51 +08:00
|
|
|
rcu_read_unlock_trace();
|
2021-05-14 08:36:03 +08:00
|
|
|
|
|
|
|
if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) {
|
|
|
|
err = -EFAULT;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
if (ctx_size_in)
|
|
|
|
if (copy_to_user(ctx_in, ctx, ctx_size_in))
|
|
|
|
err = -EFAULT;
|
|
|
|
out:
|
|
|
|
kfree(ctx);
|
|
|
|
return err;
|
|
|
|
}
|
2022-01-15 00:39:46 +08:00
|
|
|
|
2023-04-22 01:02:59 +08:00
|
|
|
static int verify_and_copy_hook_state(struct nf_hook_state *state,
|
|
|
|
const struct nf_hook_state *user,
|
|
|
|
struct net_device *dev)
|
|
|
|
{
|
|
|
|
if (user->in || user->out)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (user->net || user->sk || user->okfn)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
switch (user->pf) {
|
|
|
|
case NFPROTO_IPV4:
|
|
|
|
case NFPROTO_IPV6:
|
|
|
|
switch (state->hook) {
|
|
|
|
case NF_INET_PRE_ROUTING:
|
|
|
|
state->in = dev;
|
|
|
|
break;
|
|
|
|
case NF_INET_LOCAL_IN:
|
|
|
|
state->in = dev;
|
|
|
|
break;
|
|
|
|
case NF_INET_FORWARD:
|
|
|
|
state->in = dev;
|
|
|
|
state->out = dev;
|
|
|
|
break;
|
|
|
|
case NF_INET_LOCAL_OUT:
|
|
|
|
state->out = dev;
|
|
|
|
break;
|
|
|
|
case NF_INET_POST_ROUTING:
|
|
|
|
state->out = dev;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
state->pf = user->pf;
|
|
|
|
state->hook = user->hook;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static __be16 nfproto_eth(int nfproto)
|
|
|
|
{
|
|
|
|
switch (nfproto) {
|
|
|
|
case NFPROTO_IPV4:
|
|
|
|
return htons(ETH_P_IP);
|
|
|
|
case NFPROTO_IPV6:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return htons(ETH_P_IPV6);
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_prog_test_run_nf(struct bpf_prog *prog,
|
|
|
|
const union bpf_attr *kattr,
|
|
|
|
union bpf_attr __user *uattr)
|
|
|
|
{
|
|
|
|
struct net *net = current->nsproxy->net_ns;
|
|
|
|
struct net_device *dev = net->loopback_dev;
|
|
|
|
struct nf_hook_state *user_ctx, hook_state = {
|
|
|
|
.pf = NFPROTO_IPV4,
|
|
|
|
.hook = NF_INET_LOCAL_OUT,
|
|
|
|
};
|
|
|
|
u32 size = kattr->test.data_size_in;
|
|
|
|
u32 repeat = kattr->test.repeat;
|
|
|
|
struct bpf_nf_ctx ctx = {
|
|
|
|
.state = &hook_state,
|
|
|
|
};
|
|
|
|
struct sk_buff *skb = NULL;
|
|
|
|
u32 retval, duration;
|
|
|
|
void *data;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (size < sizeof(struct iphdr))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
data = bpf_test_init(kattr, kattr->test.data_size_in, size,
|
|
|
|
NET_SKB_PAD + NET_IP_ALIGN,
|
|
|
|
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
|
|
|
|
if (IS_ERR(data))
|
|
|
|
return PTR_ERR(data);
|
|
|
|
|
|
|
|
if (!repeat)
|
|
|
|
repeat = 1;
|
|
|
|
|
|
|
|
user_ctx = bpf_ctx_init(kattr, sizeof(struct nf_hook_state));
|
|
|
|
if (IS_ERR(user_ctx)) {
|
|
|
|
kfree(data);
|
|
|
|
return PTR_ERR(user_ctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (user_ctx) {
|
|
|
|
ret = verify_and_copy_hook_state(&hook_state, user_ctx, dev);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
skb = slab_build_skb(data);
|
|
|
|
if (!skb) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
data = NULL; /* data released via kfree_skb */
|
|
|
|
|
|
|
|
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
|
|
|
|
__skb_put(skb, size);
|
|
|
|
|
|
|
|
ret = -EINVAL;
|
|
|
|
|
|
|
|
if (hook_state.hook != NF_INET_LOCAL_OUT) {
|
|
|
|
if (size < ETH_HLEN + sizeof(struct iphdr))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
skb->protocol = eth_type_trans(skb, dev);
|
|
|
|
switch (skb->protocol) {
|
|
|
|
case htons(ETH_P_IP):
|
|
|
|
if (hook_state.pf == NFPROTO_IPV4)
|
|
|
|
break;
|
|
|
|
goto out;
|
|
|
|
case htons(ETH_P_IPV6):
|
|
|
|
if (size < ETH_HLEN + sizeof(struct ipv6hdr))
|
|
|
|
goto out;
|
|
|
|
if (hook_state.pf == NFPROTO_IPV6)
|
|
|
|
break;
|
|
|
|
goto out;
|
|
|
|
default:
|
|
|
|
ret = -EPROTO;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
skb_reset_network_header(skb);
|
|
|
|
} else {
|
|
|
|
skb->protocol = nfproto_eth(hook_state.pf);
|
|
|
|
}
|
|
|
|
|
|
|
|
ctx.skb = skb;
|
|
|
|
|
|
|
|
ret = bpf_test_run(prog, &ctx, repeat, &retval, &duration, false);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration);
|
|
|
|
|
|
|
|
out:
|
|
|
|
kfree(user_ctx);
|
|
|
|
kfree_skb(skb);
|
|
|
|
kfree(data);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-01-15 00:39:46 +08:00
|
|
|
static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
|
2022-07-21 21:42:35 +08:00
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.set = &test_sk_check_kfunc_ids,
|
2022-01-15 00:39:46 +08:00
|
|
|
};
|
|
|
|
|
2022-04-25 05:49:00 +08:00
|
|
|
BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
|
|
|
|
BTF_ID(struct, prog_test_ref_kfunc)
|
|
|
|
BTF_ID(func, bpf_kfunc_call_test_release)
|
|
|
|
BTF_ID(struct, prog_test_member)
|
|
|
|
BTF_ID(func, bpf_kfunc_call_memb_release)
|
|
|
|
|
2022-01-15 00:39:46 +08:00
|
|
|
static int __init bpf_prog_test_run_init(void)
|
|
|
|
{
|
2022-04-25 05:49:00 +08:00
|
|
|
const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = {
|
|
|
|
{
|
|
|
|
.btf_id = bpf_prog_test_dtor_kfunc_ids[0],
|
|
|
|
.kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
.btf_id = bpf_prog_test_dtor_kfunc_ids[2],
|
|
|
|
.kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[3],
|
|
|
|
},
|
|
|
|
};
|
|
|
|
int ret;
|
|
|
|
|
2022-12-06 22:59:32 +08:00
|
|
|
ret = register_btf_fmodret_id_set(&bpf_test_modify_return_set);
|
|
|
|
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
|
2022-08-10 05:30:31 +08:00
|
|
|
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_prog_test_kfunc_set);
|
2022-09-06 23:13:00 +08:00
|
|
|
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_prog_test_kfunc_set);
|
2022-04-25 05:49:00 +08:00
|
|
|
return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc,
|
|
|
|
ARRAY_SIZE(bpf_prog_test_dtor_kfunc),
|
|
|
|
THIS_MODULE);
|
2022-01-15 00:39:46 +08:00
|
|
|
}
|
|
|
|
late_initcall(bpf_prog_test_run_init);
|