Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-09-01

The following pull-request contains BPF updates for your *net-next* tree.

There are two small conflicts when pulling, resolve as follows:

1) Merge conflict in tools/lib/bpf/libbpf.c between 88a8212028 ("libbpf: Factor
   out common ELF operations and improve logging") in bpf-next and 1e891e513e
   ("libbpf: Fix map index used in error message") in net-next. Resolve by taking
   the hunk in bpf-next:

        [...]
        scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx);
        data = elf_sec_data(obj, scn);
        if (!scn || !data) {
                pr_warn("elf: failed to get %s map definitions for %s\n",
                        MAPS_ELF_SEC, obj->path);
                return -EINVAL;
        }
        [...]

2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between
   9647c57b11 ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for
   better performance") in bpf-next and e20f0dbf20 ("net/mlx5e: RX, Add a prefetch
   command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining
   net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like:

        [...]
        xdp_set_data_meta_invalid(xdp);
        xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
        net_prefetch(xdp->data);
        [...]

We've added 133 non-merge commits during the last 14 day(s) which contain
a total of 246 files changed, 13832 insertions(+), 3105 deletions(-).

The main changes are:

1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper
   for tracing to reliably access user memory, from Alexei Starovoitov.

2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau.

3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa.

4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson.

5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko.

6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh.

7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer.

8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song.

9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant.

10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee.

11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua.

12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski.

13) Various smaller cleanups and improvements all over the place.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
David S. Miller 2020-09-01 13:05:08 -07:00
commit 150f29f5e6
246 changed files with 13815 additions and 3088 deletions

View File

@ -149,7 +149,7 @@ In case the patch or patch series has to be reworked and sent out
again in a second or later revision, it is also required to add a again in a second or later revision, it is also required to add a
version number (``v2``, ``v3``, ...) into the subject prefix:: version number (``v2``, ``v3``, ...) into the subject prefix::
git format-patch --subject-prefix='PATCH net-next v2' start..finish git format-patch --subject-prefix='PATCH bpf-next v2' start..finish
When changes have been requested to the patch series, always send the When changes have been requested to the patch series, always send the
whole patch series again with the feedback incorporated (never send whole patch series again with the feedback incorporated (never send
@ -479,12 +479,13 @@ LLVM's static compiler lists the supported targets through
$ llc --version $ llc --version
LLVM (http://llvm.org/): LLVM (http://llvm.org/):
LLVM version 6.0.0svn LLVM version 10.0.0
Optimized build. Optimized build.
Default target: x86_64-unknown-linux-gnu Default target: x86_64-unknown-linux-gnu
Host CPU: skylake Host CPU: skylake
Registered Targets: Registered Targets:
aarch64 - AArch64 (little endian)
bpf - BPF (host endian) bpf - BPF (host endian)
bpfeb - BPF (big endian) bpfeb - BPF (big endian)
bpfel - BPF (little endian) bpfel - BPF (little endian)
@ -517,6 +518,10 @@ from the git repositories::
The built binaries can then be found in the build/bin/ directory, where The built binaries can then be found in the build/bin/ directory, where
you can point the PATH variable to. you can point the PATH variable to.
Set ``-DLLVM_TARGETS_TO_BUILD`` equal to the target you wish to build, you
will find a full list of targets within the llvm-project/llvm/lib/Target
directory.
Q: Reporting LLVM BPF issues Q: Reporting LLVM BPF issues
---------------------------- ----------------------------
Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code

View File

@ -724,6 +724,31 @@ want to define unused entry in BTF_ID_LIST, like::
BTF_ID_UNUSED BTF_ID_UNUSED
BTF_ID(struct, task_struct) BTF_ID(struct, task_struct)
The ``BTF_SET_START/END`` macros pair defines sorted list of BTF ID values
and their count, with following syntax::
BTF_SET_START(set)
BTF_ID(type1, name1)
BTF_ID(type2, name2)
BTF_SET_END(set)
resulting in following layout in .BTF_ids section::
__BTF_ID__set__set:
.zero 4
__BTF_ID__type1__name1__3:
.zero 4
__BTF_ID__type2__name2__4:
.zero 4
The ``struct btf_id_set set;`` variable is defined to access the list.
The ``typeX`` name can be one of following::
struct, union, typedef, func
and is used as a filter when resolving the BTF ID value.
All the BTF ID lists and sets are compiled in the .BTF_ids section and All the BTF ID lists and sets are compiled in the .BTF_ids section and
resolved during the linking phase of kernel build by ``resolve_btfids`` tool. resolved during the linking phase of kernel build by ``resolve_btfids`` tool.

View File

@ -52,6 +52,7 @@ Program types
prog_cgroup_sysctl prog_cgroup_sysctl
prog_flow_dissector prog_flow_dissector
bpf_lsm bpf_lsm
prog_sk_lookup
Map types Map types

View File

@ -0,0 +1,98 @@
.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
=====================
BPF sk_lookup program
=====================
BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
into the socket lookup performed by the transport layer when a packet is to be
delivered locally.
When invoked BPF sk_lookup program can select a socket that will receive the
incoming packet by calling the ``bpf_sk_assign()`` BPF helper function.
Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.
Motivation
==========
BPF sk_lookup program type was introduced to address setup scenarios where
binding sockets to an address with ``bind()`` socket call is impractical, such
as:
1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
conflict,
2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
case.
Such setups would require creating and ``bind()``'ing one socket to each of the
IP address/port in the range, leading to resource consumption and potential
latency spikes during socket lookup.
Attachment
==========
BPF sk_lookup program can be attached to a network namespace with
``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
netns FD as attachment ``target_fd``.
Multiple programs can be attached to one network namespace. Programs will be
invoked in the same order as they were attached.
Hooks
=====
The attached BPF sk_lookup programs run whenever the transport layer needs to
find a listening (TCP) or an unconnected (UDP) socket for an incoming packet.
Incoming traffic to established (TCP) and connected (UDP) sockets is delivered
as usual without triggering the BPF sk_lookup hook.
The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
verdict code. As for other BPF program types that are network filters,
``SK_PASS`` signifies that the socket lookup should continue on to regular
hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
packet.
A BPF sk_lookup program can also select a socket to receive the packet by
calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
selection. Selecting a socket only takes effect if the program has terminated
with ``SK_PASS`` code.
When multiple programs are attached, the end result is determined from return
codes of all the programs according to the following rules:
1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
is used as the result of the socket lookup.
2. If more than one program returned ``SK_PASS`` and selected a socket, the last
selection takes effect.
3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
selected a socket, socket lookup fails.
4. If all programs returned ``SK_PASS`` and none of them selected a socket,
socket lookup continues on.
API
===
In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
receives information about the packet that triggered the socket lookup. Namely:
* IP version (``AF_INET`` or ``AF_INET6``),
* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
* source and destination IP address,
* source and destination L4 port,
* the socket that has been selected with ``bpf_sk_assign()``.
Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
header, and `bpf-helpers(7)
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
for ``bpf_sk_assign()`` for details.
Example
=======
See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
implementation.

View File

@ -258,14 +258,21 @@ socket into zero-copy mode or fail.
XDP_SHARED_UMEM bind flag XDP_SHARED_UMEM bind flag
------------------------- -------------------------
This flag enables you to bind multiple sockets to the same UMEM, but This flag enables you to bind multiple sockets to the same UMEM. It
only if they share the same queue id. In this mode, each socket has works on the same queue id, between queue ids and between
their own RX and TX rings, but the UMEM (tied to the fist socket netdevs/devices. In this mode, each socket has their own RX and TX
created) only has a single FILL ring and a single COMPLETION rings as usual, but you are going to have one or more FILL and
ring. To use this mode, create the first socket and bind it in the normal COMPLETION ring pairs. You have to create one of these pairs per
way. Create a second socket and create an RX and a TX ring, or at unique netdev and queue id tuple that you bind to.
least one of them, but no FILL or COMPLETION rings as the ones from
the first socket will be used. In the bind call, set he Starting with the case were we would like to share a UMEM between
sockets bound to the same netdev and queue id. The UMEM (tied to the
fist socket created) will only have a single FILL ring and a single
COMPLETION ring as there is only on unique netdev,queue_id tuple that
we have bound to. To use this mode, create the first socket and bind
it in the normal way. Create a second socket and create an RX and a TX
ring, or at least one of them, but no FILL or COMPLETION rings as the
ones from the first socket will be used. In the bind call, set he
XDP_SHARED_UMEM option and provide the initial socket's fd in the XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field. You can attach an arbitrary number of extra sxdp_shared_umem_fd field. You can attach an arbitrary number of extra
sockets this way. sockets this way.
@ -305,11 +312,41 @@ concurrently. There are no synchronization primitives in the
libbpf code that protects multiple users at this point in time. libbpf code that protects multiple users at this point in time.
Libbpf uses this mode if you create more than one socket tied to the Libbpf uses this mode if you create more than one socket tied to the
same umem. However, note that you need to supply the same UMEM. However, note that you need to supply the
XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the
xsk_socket__create calls and load your own XDP program as there is no xsk_socket__create calls and load your own XDP program as there is no
built in one in libbpf that will route the traffic for you. built in one in libbpf that will route the traffic for you.
The second case is when you share a UMEM between sockets that are
bound to different queue ids and/or netdevs. In this case you have to
create one FILL ring and one COMPLETION ring for each unique
netdev,queue_id pair. Let us say you want to create two sockets bound
to two different queue ids on the same netdev. Create the first socket
and bind it in the normal way. Create a second socket and create an RX
and a TX ring, or at least one of them, and then one FILL and
COMPLETION ring for this socket. Then in the bind call, set he
XDP_SHARED_UMEM option and provide the initial socket's fd in the
sxdp_shared_umem_fd field as you registered the UMEM on that
socket. These two sockets will now share one and the same UMEM.
There is no need to supply an XDP program like the one in the previous
case where sockets were bound to the same queue id and
device. Instead, use the NIC's packet steering capabilities to steer
the packets to the right queue. In the previous example, there is only
one queue shared among sockets, so the NIC cannot do this steering. It
can only steer between queues.
In libbpf, you need to use the xsk_socket__create_shared() API as it
takes a reference to a FILL ring and a COMPLETION ring that will be
created for you and bound to the shared UMEM. You can use this
function for all the sockets you create, or you can use it for the
second and following ones and use xsk_socket__create() for the first
one. Both methods yield the same result.
Note that a UMEM can be shared between sockets on the same queue id
and device, as well as between queues on the same device and between
devices at the same time.
XDP_USE_NEED_WAKEUP bind flag XDP_USE_NEED_WAKEUP bind flag
----------------------------- -----------------------------
@ -364,7 +401,7 @@ resources by only setting up one of them. Both the FILL ring and the
COMPLETION ring are mandatory as you need to have a UMEM tied to your COMPLETION ring are mandatory as you need to have a UMEM tied to your
socket. But if the XDP_SHARED_UMEM flag is used, any socket after the socket. But if the XDP_SHARED_UMEM flag is used, any socket after the
first one does not have a UMEM and should in that case not have any first one does not have a UMEM and should in that case not have any
FILL or COMPLETION rings created as the ones from the shared umem will FILL or COMPLETION rings created as the ones from the shared UMEM will
be used. Note, that the rings are single-producer single-consumer, so be used. Note, that the rings are single-producer single-consumer, so
do not try to access them from multiple processes at the same do not try to access them from multiple processes at the same
time. See the XDP_SHARED_UMEM section. time. See the XDP_SHARED_UMEM section.
@ -567,6 +604,17 @@ A: The short answer is no, that is not supported at the moment. The
switch, or other distribution mechanism, in your NIC to direct switch, or other distribution mechanism, in your NIC to direct
traffic to the correct queue id and socket. traffic to the correct queue id and socket.
Q: My packets are sometimes corrupted. What is wrong?
A: Care has to be taken not to feed the same buffer in the UMEM into
more than one ring at the same time. If you for example feed the
same buffer into the FILL ring and the TX ring at the same time, the
NIC might receive data into the buffer at the same time it is
sending it. This will cause some packets to become corrupted. Same
thing goes for feeding the same buffer into the FILL rings
belonging to different queue ids or netdevs bound with the
XDP_SHARED_UMEM flag.
Credits Credits
======= =======

View File

@ -1379,10 +1379,15 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
u8 *prog = *pprog; u8 *prog = *pprog;
int cnt = 0; int cnt = 0;
if (p->aux->sleepable) {
if (emit_call(&prog, __bpf_prog_enter_sleepable, prog))
return -EINVAL;
} else {
if (emit_call(&prog, __bpf_prog_enter, prog)) if (emit_call(&prog, __bpf_prog_enter, prog))
return -EINVAL; return -EINVAL;
/* remember prog start time returned by __bpf_prog_enter */ /* remember prog start time returned by __bpf_prog_enter */
emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
}
/* arg1: lea rdi, [rbp - stack_size] */ /* arg1: lea rdi, [rbp - stack_size] */
EMIT4(0x48, 0x8D, 0x7D, -stack_size); EMIT4(0x48, 0x8D, 0x7D, -stack_size);
@ -1402,6 +1407,10 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
if (mod_ret) if (mod_ret)
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
if (p->aux->sleepable) {
if (emit_call(&prog, __bpf_prog_exit_sleepable, prog))
return -EINVAL;
} else {
/* arg1: mov rdi, progs[i] */ /* arg1: mov rdi, progs[i] */
emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32,
(u32) (long) p); (u32) (long) p);
@ -1409,6 +1418,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6); emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
if (emit_call(&prog, __bpf_prog_exit, prog)) if (emit_call(&prog, __bpf_prog_exit, prog))
return -EINVAL; return -EINVAL;
}
*pprog = prog; *pprog = prog;
return 0; return 0;

View File

@ -1967,7 +1967,7 @@ static int i40e_set_ringparam(struct net_device *netdev,
(new_rx_count == vsi->rx_rings[0]->count)) (new_rx_count == vsi->rx_rings[0]->count))
return 0; return 0;
/* If there is a AF_XDP UMEM attached to any of Rx rings, /* If there is a AF_XDP page pool attached to any of Rx rings,
* disallow changing the number of descriptors -- regardless * disallow changing the number of descriptors -- regardless
* if the netdev is running or not. * if the netdev is running or not.
*/ */

View File

@ -3122,12 +3122,12 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
} }
/** /**
* i40e_xsk_umem - Retrieve the AF_XDP ZC if XDP and ZC is enabled * i40e_xsk_pool - Retrieve the AF_XDP buffer pool if XDP and ZC is enabled
* @ring: The Tx or Rx ring * @ring: The Tx or Rx ring
* *
* Returns the UMEM or NULL. * Returns the AF_XDP buffer pool or NULL.
**/ **/
static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring) static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring)
{ {
bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi); bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi);
int qid = ring->queue_index; int qid = ring->queue_index;
@ -3138,7 +3138,7 @@ static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps)) if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps))
return NULL; return NULL;
return xdp_get_umem_from_qid(ring->vsi->netdev, qid); return xsk_get_pool_from_qid(ring->vsi->netdev, qid);
} }
/** /**
@ -3157,7 +3157,7 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
u32 qtx_ctl = 0; u32 qtx_ctl = 0;
if (ring_is_xdp(ring)) if (ring_is_xdp(ring))
ring->xsk_umem = i40e_xsk_umem(ring); ring->xsk_pool = i40e_xsk_pool(ring);
/* some ATR related tx ring init */ /* some ATR related tx ring init */
if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) { if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) {
@ -3280,12 +3280,13 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
kfree(ring->rx_bi); kfree(ring->rx_bi);
ring->xsk_umem = i40e_xsk_umem(ring); ring->xsk_pool = i40e_xsk_pool(ring);
if (ring->xsk_umem) { if (ring->xsk_pool) {
ret = i40e_alloc_rx_bi_zc(ring); ret = i40e_alloc_rx_bi_zc(ring);
if (ret) if (ret)
return ret; return ret;
ring->rx_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); ring->rx_buf_len =
xsk_pool_get_rx_frame_size(ring->xsk_pool);
/* For AF_XDP ZC, we disallow packets to span on /* For AF_XDP ZC, we disallow packets to span on
* multiple buffers, thus letting us skip that * multiple buffers, thus letting us skip that
* handling in the fast-path. * handling in the fast-path.
@ -3368,8 +3369,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q); ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q);
writel(0, ring->tail); writel(0, ring->tail);
if (ring->xsk_umem) { if (ring->xsk_pool) {
xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)); ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring));
} else { } else {
ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
@ -3380,7 +3381,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
*/ */
dev_info(&vsi->back->pdev->dev, dev_info(&vsi->back->pdev->dev,
"Failed to allocate some buffers on %sRx ring %d (pf_q %d)\n", "Failed to allocate some buffers on %sRx ring %d (pf_q %d)\n",
ring->xsk_umem ? "UMEM enabled " : "", ring->xsk_pool ? "AF_XDP ZC enabled " : "",
ring->queue_index, pf_q); ring->queue_index, pf_q);
} }
@ -12644,7 +12645,7 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi,
*/ */
if (need_reset && prog) if (need_reset && prog)
for (i = 0; i < vsi->num_queue_pairs; i++) for (i = 0; i < vsi->num_queue_pairs; i++)
if (vsi->xdp_rings[i]->xsk_umem) if (vsi->xdp_rings[i]->xsk_pool)
(void)i40e_xsk_wakeup(vsi->netdev, i, (void)i40e_xsk_wakeup(vsi->netdev, i,
XDP_WAKEUP_RX); XDP_WAKEUP_RX);
@ -12923,8 +12924,8 @@ static int i40e_xdp(struct net_device *dev,
switch (xdp->command) { switch (xdp->command) {
case XDP_SETUP_PROG: case XDP_SETUP_PROG:
return i40e_xdp_setup(vsi, xdp->prog); return i40e_xdp_setup(vsi, xdp->prog);
case XDP_SETUP_XSK_UMEM: case XDP_SETUP_XSK_POOL:
return i40e_xsk_umem_setup(vsi, xdp->xsk.umem, return i40e_xsk_pool_setup(vsi, xdp->xsk.pool,
xdp->xsk.queue_id); xdp->xsk.queue_id);
default: default:
return -EINVAL; return -EINVAL;

View File

@ -636,7 +636,7 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
unsigned long bi_size; unsigned long bi_size;
u16 i; u16 i;
if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) { if (ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
i40e_xsk_clean_tx_ring(tx_ring); i40e_xsk_clean_tx_ring(tx_ring);
} else { } else {
/* ring already cleared, nothing to do */ /* ring already cleared, nothing to do */
@ -1335,7 +1335,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
rx_ring->skb = NULL; rx_ring->skb = NULL;
} }
if (rx_ring->xsk_umem) { if (rx_ring->xsk_pool) {
i40e_xsk_clean_rx_ring(rx_ring); i40e_xsk_clean_rx_ring(rx_ring);
goto skip_free; goto skip_free;
} }
@ -1369,7 +1369,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
} }
skip_free: skip_free:
if (rx_ring->xsk_umem) if (rx_ring->xsk_pool)
i40e_clear_rx_bi_zc(rx_ring); i40e_clear_rx_bi_zc(rx_ring);
else else
i40e_clear_rx_bi(rx_ring); i40e_clear_rx_bi(rx_ring);
@ -2575,7 +2575,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
* budget and be more aggressive about cleaning up the Tx descriptors. * budget and be more aggressive about cleaning up the Tx descriptors.
*/ */
i40e_for_each_ring(ring, q_vector->tx) { i40e_for_each_ring(ring, q_vector->tx) {
bool wd = ring->xsk_umem ? bool wd = ring->xsk_pool ?
i40e_clean_xdp_tx_irq(vsi, ring) : i40e_clean_xdp_tx_irq(vsi, ring) :
i40e_clean_tx_irq(vsi, ring, budget); i40e_clean_tx_irq(vsi, ring, budget);
@ -2603,7 +2603,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
budget_per_ring = budget; budget_per_ring = budget;
i40e_for_each_ring(ring, q_vector->rx) { i40e_for_each_ring(ring, q_vector->rx) {
int cleaned = ring->xsk_umem ? int cleaned = ring->xsk_pool ?
i40e_clean_rx_irq_zc(ring, budget_per_ring) : i40e_clean_rx_irq_zc(ring, budget_per_ring) :
i40e_clean_rx_irq(ring, budget_per_ring); i40e_clean_rx_irq(ring, budget_per_ring);

View File

@ -388,7 +388,7 @@ struct i40e_ring {
struct i40e_channel *ch; struct i40e_channel *ch;
struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xdp_rxq;
struct xdp_umem *xsk_umem; struct xsk_buff_pool *xsk_pool;
} ____cacheline_internodealigned_in_smp; } ____cacheline_internodealigned_in_smp;
static inline bool ring_uses_build_skb(struct i40e_ring *ring) static inline bool ring_uses_build_skb(struct i40e_ring *ring)

View File

@ -29,14 +29,16 @@ static struct xdp_buff **i40e_rx_bi(struct i40e_ring *rx_ring, u32 idx)
} }
/** /**
* i40e_xsk_umem_enable - Enable/associate a UMEM to a certain ring/qid * i40e_xsk_pool_enable - Enable/associate an AF_XDP buffer pool to a
* certain ring/qid
* @vsi: Current VSI * @vsi: Current VSI
* @umem: UMEM * @pool: buffer pool
* @qid: Rx ring to associate UMEM to * @qid: Rx ring to associate buffer pool with
* *
* Returns 0 on success, <0 on failure * Returns 0 on success, <0 on failure
**/ **/
static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem, static int i40e_xsk_pool_enable(struct i40e_vsi *vsi,
struct xsk_buff_pool *pool,
u16 qid) u16 qid)
{ {
struct net_device *netdev = vsi->netdev; struct net_device *netdev = vsi->netdev;
@ -53,7 +55,7 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
qid >= netdev->real_num_tx_queues) qid >= netdev->real_num_tx_queues)
return -EINVAL; return -EINVAL;
err = xsk_buff_dma_map(umem, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR); err = xsk_pool_dma_map(pool, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR);
if (err) if (err)
return err; return err;
@ -80,21 +82,22 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
} }
/** /**
* i40e_xsk_umem_disable - Disassociate a UMEM from a certain ring/qid * i40e_xsk_pool_disable - Disassociate an AF_XDP buffer pool from a
* certain ring/qid
* @vsi: Current VSI * @vsi: Current VSI
* @qid: Rx ring to associate UMEM to * @qid: Rx ring to associate buffer pool with
* *
* Returns 0 on success, <0 on failure * Returns 0 on success, <0 on failure
**/ **/
static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid) static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid)
{ {
struct net_device *netdev = vsi->netdev; struct net_device *netdev = vsi->netdev;
struct xdp_umem *umem; struct xsk_buff_pool *pool;
bool if_running; bool if_running;
int err; int err;
umem = xdp_get_umem_from_qid(netdev, qid); pool = xsk_get_pool_from_qid(netdev, qid);
if (!umem) if (!pool)
return -EINVAL; return -EINVAL;
if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi); if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi);
@ -106,7 +109,7 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
} }
clear_bit(qid, vsi->af_xdp_zc_qps); clear_bit(qid, vsi->af_xdp_zc_qps);
xsk_buff_dma_unmap(umem, I40E_RX_DMA_ATTR); xsk_pool_dma_unmap(pool, I40E_RX_DMA_ATTR);
if (if_running) { if (if_running) {
err = i40e_queue_pair_enable(vsi, qid); err = i40e_queue_pair_enable(vsi, qid);
@ -118,20 +121,21 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
} }
/** /**
* i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid * i40e_xsk_pool_setup - Enable/disassociate an AF_XDP buffer pool to/from
* a ring/qid
* @vsi: Current VSI * @vsi: Current VSI
* @umem: UMEM to enable/associate to a ring, or NULL to disable * @pool: Buffer pool to enable/associate to a ring, or NULL to disable
* @qid: Rx ring to (dis)associate UMEM (from)to * @qid: Rx ring to (dis)associate buffer pool (from)to
* *
* This function enables or disables a UMEM to a certain ring. * This function enables or disables a buffer pool to a certain ring.
* *
* Returns 0 on success, <0 on failure * Returns 0 on success, <0 on failure
**/ **/
int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem, int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
u16 qid) u16 qid)
{ {
return umem ? i40e_xsk_umem_enable(vsi, umem, qid) : return pool ? i40e_xsk_pool_enable(vsi, pool, qid) :
i40e_xsk_umem_disable(vsi, qid); i40e_xsk_pool_disable(vsi, qid);
} }
/** /**
@ -191,7 +195,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
rx_desc = I40E_RX_DESC(rx_ring, ntu); rx_desc = I40E_RX_DESC(rx_ring, ntu);
bi = i40e_rx_bi(rx_ring, ntu); bi = i40e_rx_bi(rx_ring, ntu);
do { do {
xdp = xsk_buff_alloc(rx_ring->xsk_umem); xdp = xsk_buff_alloc(rx_ring->xsk_pool);
if (!xdp) { if (!xdp) {
ok = false; ok = false;
goto no_buffers; goto no_buffers;
@ -310,7 +314,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
bi = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); bi = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
(*bi)->data_end = (*bi)->data + size; (*bi)->data_end = (*bi)->data + size;
xsk_buff_dma_sync_for_cpu(*bi); xsk_buff_dma_sync_for_cpu(*bi, rx_ring->xsk_pool);
xdp_res = i40e_run_xdp_zc(rx_ring, *bi); xdp_res = i40e_run_xdp_zc(rx_ring, *bi);
if (xdp_res) { if (xdp_res) {
@ -358,11 +362,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
i40e_finalize_xdp_rx(rx_ring, xdp_xmit); i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
xsk_set_rx_need_wakeup(rx_ring->xsk_umem); xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
else else
xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
return (int)total_rx_packets; return (int)total_rx_packets;
} }
@ -385,11 +389,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
dma_addr_t dma; dma_addr_t dma;
while (budget-- > 0) { while (budget-- > 0) {
if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc))
break; break;
dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr);
xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma,
desc.len); desc.len);
tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use]; tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
@ -416,7 +420,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
I40E_TXD_QW1_CMD_SHIFT); I40E_TXD_QW1_CMD_SHIFT);
i40e_xdp_ring_update_tail(xdp_ring); i40e_xdp_ring_update_tail(xdp_ring);
xsk_umem_consume_tx_done(xdp_ring->xsk_umem); xsk_tx_release(xdp_ring->xsk_pool);
i40e_update_tx_stats(xdp_ring, sent_frames, total_bytes); i40e_update_tx_stats(xdp_ring, sent_frames, total_bytes);
} }
@ -448,7 +452,7 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
**/ **/
bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring) bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
{ {
struct xdp_umem *umem = tx_ring->xsk_umem; struct xsk_buff_pool *bp = tx_ring->xsk_pool;
u32 i, completed_frames, xsk_frames = 0; u32 i, completed_frames, xsk_frames = 0;
u32 head_idx = i40e_get_head(tx_ring); u32 head_idx = i40e_get_head(tx_ring);
struct i40e_tx_buffer *tx_bi; struct i40e_tx_buffer *tx_bi;
@ -488,13 +492,13 @@ skip:
tx_ring->next_to_clean -= tx_ring->count; tx_ring->next_to_clean -= tx_ring->count;
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(umem, xsk_frames); xsk_tx_completed(bp, xsk_frames);
i40e_arm_wb(tx_ring, vsi, completed_frames); i40e_arm_wb(tx_ring, vsi, completed_frames);
out_xmit: out_xmit:
if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) if (xsk_uses_need_wakeup(tx_ring->xsk_pool))
xsk_set_tx_need_wakeup(tx_ring->xsk_umem); xsk_set_tx_need_wakeup(tx_ring->xsk_pool);
return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring)); return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
} }
@ -526,7 +530,7 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
if (queue_id >= vsi->num_queue_pairs) if (queue_id >= vsi->num_queue_pairs)
return -ENXIO; return -ENXIO;
if (!vsi->xdp_rings[queue_id]->xsk_umem) if (!vsi->xdp_rings[queue_id]->xsk_pool)
return -ENXIO; return -ENXIO;
ring = vsi->xdp_rings[queue_id]; ring = vsi->xdp_rings[queue_id];
@ -565,7 +569,7 @@ void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring)
void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring) void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
{ {
u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
struct xdp_umem *umem = tx_ring->xsk_umem; struct xsk_buff_pool *bp = tx_ring->xsk_pool;
struct i40e_tx_buffer *tx_bi; struct i40e_tx_buffer *tx_bi;
u32 xsk_frames = 0; u32 xsk_frames = 0;
@ -585,14 +589,15 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
} }
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(umem, xsk_frames); xsk_tx_completed(bp, xsk_frames);
} }
/** /**
* i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have AF_XDP UMEM attached * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have an AF_XDP
* buffer pool attached
* @vsi: vsi * @vsi: vsi
* *
* Returns true if any of the Rx rings has an AF_XDP UMEM attached * Returns true if any of the Rx rings has an AF_XDP buffer pool attached
**/ **/
bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi) bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
{ {
@ -600,7 +605,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
int i; int i;
for (i = 0; i < vsi->num_queue_pairs; i++) { for (i = 0; i < vsi->num_queue_pairs; i++) {
if (xdp_get_umem_from_qid(netdev, i)) if (xsk_get_pool_from_qid(netdev, i))
return true; return true;
} }

View File

@ -5,12 +5,12 @@
#define _I40E_XSK_H_ #define _I40E_XSK_H_
struct i40e_vsi; struct i40e_vsi;
struct xdp_umem; struct xsk_buff_pool;
struct zero_copy_allocator; struct zero_copy_allocator;
int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair); int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair); int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem, int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
u16 qid); u16 qid);
bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count); bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget); int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);

View File

@ -321,9 +321,9 @@ struct ice_vsi {
struct ice_ring **xdp_rings; /* XDP ring array */ struct ice_ring **xdp_rings; /* XDP ring array */
u16 num_xdp_txq; /* Used XDP queues */ u16 num_xdp_txq; /* Used XDP queues */
u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */ u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
struct xdp_umem **xsk_umems; struct xsk_buff_pool **xsk_pools;
u16 num_xsk_umems_used; u16 num_xsk_pools_used;
u16 num_xsk_umems; u16 num_xsk_pools;
} ____cacheline_internodealigned_in_smp; } ____cacheline_internodealigned_in_smp;
/* struct that defines an interrupt vector */ /* struct that defines an interrupt vector */
@ -507,25 +507,25 @@ static inline void ice_set_ring_xdp(struct ice_ring *ring)
} }
/** /**
* ice_xsk_umem - get XDP UMEM bound to a ring * ice_xsk_pool - get XSK buffer pool bound to a ring
* @ring - ring to use * @ring - ring to use
* *
* Returns a pointer to xdp_umem structure if there is an UMEM present, * Returns a pointer to xdp_umem structure if there is a buffer pool present,
* NULL otherwise. * NULL otherwise.
*/ */
static inline struct xdp_umem *ice_xsk_umem(struct ice_ring *ring) static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_ring *ring)
{ {
struct xdp_umem **umems = ring->vsi->xsk_umems; struct xsk_buff_pool **pools = ring->vsi->xsk_pools;
u16 qid = ring->q_index; u16 qid = ring->q_index;
if (ice_ring_is_xdp(ring)) if (ice_ring_is_xdp(ring))
qid -= ring->vsi->num_xdp_txq; qid -= ring->vsi->num_xdp_txq;
if (qid >= ring->vsi->num_xsk_umems || !umems || !umems[qid] || if (qid >= ring->vsi->num_xsk_pools || !pools || !pools[qid] ||
!ice_is_xdp_ena_vsi(ring->vsi)) !ice_is_xdp_ena_vsi(ring->vsi))
return NULL; return NULL;
return umems[qid]; return pools[qid];
} }
/** /**

View File

@ -308,12 +308,12 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
ring->q_index); ring->q_index);
ring->xsk_umem = ice_xsk_umem(ring); ring->xsk_pool = ice_xsk_pool(ring);
if (ring->xsk_umem) { if (ring->xsk_pool) {
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
ring->rx_buf_len = ring->rx_buf_len =
xsk_umem_get_rx_frame_size(ring->xsk_umem); xsk_pool_get_rx_frame_size(ring->xsk_pool);
/* For AF_XDP ZC, we disallow packets to span on /* For AF_XDP ZC, we disallow packets to span on
* multiple buffers, thus letting us skip that * multiple buffers, thus letting us skip that
* handling in the fast-path. * handling in the fast-path.
@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
NULL); NULL);
if (err) if (err)
return err; return err;
xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
ring->q_index); ring->q_index);
@ -417,9 +417,9 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
ring->tail = hw->hw_addr + QRX_TAIL(pf_q); ring->tail = hw->hw_addr + QRX_TAIL(pf_q);
writel(0, ring->tail); writel(0, ring->tail);
if (ring->xsk_umem) { if (ring->xsk_pool) {
if (!xsk_buff_can_alloc(ring->xsk_umem, num_bufs)) { if (!xsk_buff_can_alloc(ring->xsk_pool, num_bufs)) {
dev_warn(dev, "UMEM does not provide enough addresses to fill %d buffers on Rx ring %d\n", dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n",
num_bufs, ring->q_index); num_bufs, ring->q_index);
dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n"); dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n");
@ -428,7 +428,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
err = ice_alloc_rx_bufs_zc(ring, num_bufs); err = ice_alloc_rx_bufs_zc(ring, num_bufs);
if (err) if (err)
dev_info(dev, "Failed to allocate some buffers on UMEM enabled Rx ring %d (pf_q %d)\n", dev_info(dev, "Failed to allocate some buffers on XSK buffer pool enabled Rx ring %d (pf_q %d)\n",
ring->q_index, pf_q); ring->q_index, pf_q);
return 0; return 0;
} }

View File

@ -1743,7 +1743,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
return ret; return ret;
for (i = 0; i < vsi->num_xdp_txq; i++) for (i = 0; i < vsi->num_xdp_txq; i++)
vsi->xdp_rings[i]->xsk_umem = ice_xsk_umem(vsi->xdp_rings[i]); vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi->xdp_rings[i]);
return ret; return ret;
} }

View File

@ -2273,7 +2273,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
if (ice_setup_tx_ring(xdp_ring)) if (ice_setup_tx_ring(xdp_ring))
goto free_xdp_rings; goto free_xdp_rings;
ice_set_ring_xdp(xdp_ring); ice_set_ring_xdp(xdp_ring);
xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring); xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
} }
return 0; return 0;
@ -2517,13 +2517,13 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
if (if_running) if (if_running)
ret = ice_up(vsi); ret = ice_up(vsi);
if (!ret && prog && vsi->xsk_umems) { if (!ret && prog && vsi->xsk_pools) {
int i; int i;
ice_for_each_rxq(vsi, i) { ice_for_each_rxq(vsi, i) {
struct ice_ring *rx_ring = vsi->rx_rings[i]; struct ice_ring *rx_ring = vsi->rx_rings[i];
if (rx_ring->xsk_umem) if (rx_ring->xsk_pool)
napi_schedule(&rx_ring->q_vector->napi); napi_schedule(&rx_ring->q_vector->napi);
} }
} }
@ -2549,8 +2549,8 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
switch (xdp->command) { switch (xdp->command) {
case XDP_SETUP_PROG: case XDP_SETUP_PROG:
return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack); return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack);
case XDP_SETUP_XSK_UMEM: case XDP_SETUP_XSK_POOL:
return ice_xsk_umem_setup(vsi, xdp->xsk.umem, return ice_xsk_pool_setup(vsi, xdp->xsk.pool,
xdp->xsk.queue_id); xdp->xsk.queue_id);
default: default:
return -EINVAL; return -EINVAL;

View File

@ -145,7 +145,7 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring)
{ {
u16 i; u16 i;
if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_umem) { if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
ice_xsk_clean_xdp_ring(tx_ring); ice_xsk_clean_xdp_ring(tx_ring);
goto tx_skip_free; goto tx_skip_free;
} }
@ -375,7 +375,7 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
if (!rx_ring->rx_buf) if (!rx_ring->rx_buf)
return; return;
if (rx_ring->xsk_umem) { if (rx_ring->xsk_pool) {
ice_xsk_clean_rx_ring(rx_ring); ice_xsk_clean_rx_ring(rx_ring);
goto rx_skip_free; goto rx_skip_free;
} }
@ -1610,7 +1610,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
* budget and be more aggressive about cleaning up the Tx descriptors. * budget and be more aggressive about cleaning up the Tx descriptors.
*/ */
ice_for_each_ring(ring, q_vector->tx) { ice_for_each_ring(ring, q_vector->tx) {
bool wd = ring->xsk_umem ? bool wd = ring->xsk_pool ?
ice_clean_tx_irq_zc(ring, budget) : ice_clean_tx_irq_zc(ring, budget) :
ice_clean_tx_irq(ring, budget); ice_clean_tx_irq(ring, budget);
@ -1640,7 +1640,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
* comparison in the irq context instead of many inside the * comparison in the irq context instead of many inside the
* ice_clean_rx_irq function and makes the codebase cleaner. * ice_clean_rx_irq function and makes the codebase cleaner.
*/ */
cleaned = ring->xsk_umem ? cleaned = ring->xsk_pool ?
ice_clean_rx_irq_zc(ring, budget_per_ring) : ice_clean_rx_irq_zc(ring, budget_per_ring) :
ice_clean_rx_irq(ring, budget_per_ring); ice_clean_rx_irq(ring, budget_per_ring);
work_done += cleaned; work_done += cleaned;

View File

@ -295,7 +295,7 @@ struct ice_ring {
struct rcu_head rcu; /* to avoid race on free */ struct rcu_head rcu; /* to avoid race on free */
struct bpf_prog *xdp_prog; struct bpf_prog *xdp_prog;
struct xdp_umem *xsk_umem; struct xsk_buff_pool *xsk_pool;
/* CL3 - 3rd cacheline starts here */ /* CL3 - 3rd cacheline starts here */
struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xdp_rxq;
/* CLX - the below items are only accessed infrequently and should be /* CLX - the below items are only accessed infrequently and should be

View File

@ -236,7 +236,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
if (err) if (err)
goto free_buf; goto free_buf;
ice_set_ring_xdp(xdp_ring); ice_set_ring_xdp(xdp_ring);
xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring); xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
} }
err = ice_setup_rx_ctx(rx_ring); err = ice_setup_rx_ctx(rx_ring);
@ -260,21 +260,21 @@ free_buf:
} }
/** /**
* ice_xsk_alloc_umems - allocate a UMEM region for an XDP socket * ice_xsk_alloc_pools - allocate a buffer pool for an XDP socket
* @vsi: VSI to allocate the UMEM on * @vsi: VSI to allocate the buffer pool on
* *
* Returns 0 on success, negative on error * Returns 0 on success, negative on error
*/ */
static int ice_xsk_alloc_umems(struct ice_vsi *vsi) static int ice_xsk_alloc_pools(struct ice_vsi *vsi)
{ {
if (vsi->xsk_umems) if (vsi->xsk_pools)
return 0; return 0;
vsi->xsk_umems = kcalloc(vsi->num_xsk_umems, sizeof(*vsi->xsk_umems), vsi->xsk_pools = kcalloc(vsi->num_xsk_pools, sizeof(*vsi->xsk_pools),
GFP_KERNEL); GFP_KERNEL);
if (!vsi->xsk_umems) { if (!vsi->xsk_pools) {
vsi->num_xsk_umems = 0; vsi->num_xsk_pools = 0;
return -ENOMEM; return -ENOMEM;
} }
@ -282,73 +282,73 @@ static int ice_xsk_alloc_umems(struct ice_vsi *vsi)
} }
/** /**
* ice_xsk_remove_umem - Remove an UMEM for a certain ring/qid * ice_xsk_remove_pool - Remove an buffer pool for a certain ring/qid
* @vsi: VSI from which the VSI will be removed * @vsi: VSI from which the VSI will be removed
* @qid: Ring/qid associated with the UMEM * @qid: Ring/qid associated with the buffer pool
*/ */
static void ice_xsk_remove_umem(struct ice_vsi *vsi, u16 qid) static void ice_xsk_remove_pool(struct ice_vsi *vsi, u16 qid)
{ {
vsi->xsk_umems[qid] = NULL; vsi->xsk_pools[qid] = NULL;
vsi->num_xsk_umems_used--; vsi->num_xsk_pools_used--;
if (vsi->num_xsk_umems_used == 0) { if (vsi->num_xsk_pools_used == 0) {
kfree(vsi->xsk_umems); kfree(vsi->xsk_pools);
vsi->xsk_umems = NULL; vsi->xsk_pools = NULL;
vsi->num_xsk_umems = 0; vsi->num_xsk_pools = 0;
} }
} }
/** /**
* ice_xsk_umem_disable - disable a UMEM region * ice_xsk_pool_disable - disable a buffer pool region
* @vsi: Current VSI * @vsi: Current VSI
* @qid: queue ID * @qid: queue ID
* *
* Returns 0 on success, negative on failure * Returns 0 on success, negative on failure
*/ */
static int ice_xsk_umem_disable(struct ice_vsi *vsi, u16 qid) static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid)
{ {
if (!vsi->xsk_umems || qid >= vsi->num_xsk_umems || if (!vsi->xsk_pools || qid >= vsi->num_xsk_pools ||
!vsi->xsk_umems[qid]) !vsi->xsk_pools[qid])
return -EINVAL; return -EINVAL;
xsk_buff_dma_unmap(vsi->xsk_umems[qid], ICE_RX_DMA_ATTR); xsk_pool_dma_unmap(vsi->xsk_pools[qid], ICE_RX_DMA_ATTR);
ice_xsk_remove_umem(vsi, qid); ice_xsk_remove_pool(vsi, qid);
return 0; return 0;
} }
/** /**
* ice_xsk_umem_enable - enable a UMEM region * ice_xsk_pool_enable - enable a buffer pool region
* @vsi: Current VSI * @vsi: Current VSI
* @umem: pointer to a requested UMEM region * @pool: pointer to a requested buffer pool region
* @qid: queue ID * @qid: queue ID
* *
* Returns 0 on success, negative on failure * Returns 0 on success, negative on failure
*/ */
static int static int
ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
{ {
int err; int err;
if (vsi->type != ICE_VSI_PF) if (vsi->type != ICE_VSI_PF)
return -EINVAL; return -EINVAL;
if (!vsi->num_xsk_umems) if (!vsi->num_xsk_pools)
vsi->num_xsk_umems = min_t(u16, vsi->num_rxq, vsi->num_txq); vsi->num_xsk_pools = min_t(u16, vsi->num_rxq, vsi->num_txq);
if (qid >= vsi->num_xsk_umems) if (qid >= vsi->num_xsk_pools)
return -EINVAL; return -EINVAL;
err = ice_xsk_alloc_umems(vsi); err = ice_xsk_alloc_pools(vsi);
if (err) if (err)
return err; return err;
if (vsi->xsk_umems && vsi->xsk_umems[qid]) if (vsi->xsk_pools && vsi->xsk_pools[qid])
return -EBUSY; return -EBUSY;
vsi->xsk_umems[qid] = umem; vsi->xsk_pools[qid] = pool;
vsi->num_xsk_umems_used++; vsi->num_xsk_pools_used++;
err = xsk_buff_dma_map(vsi->xsk_umems[qid], ice_pf_to_dev(vsi->back), err = xsk_pool_dma_map(vsi->xsk_pools[qid], ice_pf_to_dev(vsi->back),
ICE_RX_DMA_ATTR); ICE_RX_DMA_ATTR);
if (err) if (err)
return err; return err;
@ -357,17 +357,17 @@ ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
} }
/** /**
* ice_xsk_umem_setup - enable/disable a UMEM region depending on its state * ice_xsk_pool_setup - enable/disable a buffer pool region depending on its state
* @vsi: Current VSI * @vsi: Current VSI
* @umem: UMEM to enable/associate to a ring, NULL to disable * @pool: buffer pool to enable/associate to a ring, NULL to disable
* @qid: queue ID * @qid: queue ID
* *
* Returns 0 on success, negative on failure * Returns 0 on success, negative on failure
*/ */
int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
{ {
bool if_running, umem_present = !!umem; bool if_running, pool_present = !!pool;
int ret = 0, umem_failure = 0; int ret = 0, pool_failure = 0;
if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi); if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi);
@ -375,26 +375,26 @@ int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid)
ret = ice_qp_dis(vsi, qid); ret = ice_qp_dis(vsi, qid);
if (ret) { if (ret) {
netdev_err(vsi->netdev, "ice_qp_dis error = %d\n", ret); netdev_err(vsi->netdev, "ice_qp_dis error = %d\n", ret);
goto xsk_umem_if_up; goto xsk_pool_if_up;
} }
} }
umem_failure = umem_present ? ice_xsk_umem_enable(vsi, umem, qid) : pool_failure = pool_present ? ice_xsk_pool_enable(vsi, pool, qid) :
ice_xsk_umem_disable(vsi, qid); ice_xsk_pool_disable(vsi, qid);
xsk_umem_if_up: xsk_pool_if_up:
if (if_running) { if (if_running) {
ret = ice_qp_ena(vsi, qid); ret = ice_qp_ena(vsi, qid);
if (!ret && umem_present) if (!ret && pool_present)
napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi); napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi);
else if (ret) else if (ret)
netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret); netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret);
} }
if (umem_failure) { if (pool_failure) {
netdev_err(vsi->netdev, "Could not %sable UMEM, error = %d\n", netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n",
umem_present ? "en" : "dis", umem_failure); pool_present ? "en" : "dis", pool_failure);
return umem_failure; return pool_failure;
} }
return ret; return ret;
@ -425,7 +425,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
rx_buf = &rx_ring->rx_buf[ntu]; rx_buf = &rx_ring->rx_buf[ntu];
do { do {
rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_umem); rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool);
if (!rx_buf->xdp) { if (!rx_buf->xdp) {
ret = true; ret = true;
break; break;
@ -595,7 +595,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean]; rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean];
rx_buf->xdp->data_end = rx_buf->xdp->data + size; rx_buf->xdp->data_end = rx_buf->xdp->data + size;
xsk_buff_dma_sync_for_cpu(rx_buf->xdp); xsk_buff_dma_sync_for_cpu(rx_buf->xdp, rx_ring->xsk_pool);
xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp); xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp);
if (xdp_res) { if (xdp_res) {
@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
ice_finalize_xdp_rx(rx_ring, xdp_xmit); ice_finalize_xdp_rx(rx_ring, xdp_xmit);
ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
xsk_set_rx_need_wakeup(rx_ring->xsk_umem); xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
else else
xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
return (int)total_rx_packets; return (int)total_rx_packets;
} }
@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use]; tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use];
if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc))
break; break;
dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr);
xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma,
desc.len); desc.len);
tx_buf->bytecount = desc.len; tx_buf->bytecount = desc.len;
@ -703,7 +703,7 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
if (tx_desc) { if (tx_desc) {
ice_xdp_ring_update_tail(xdp_ring); ice_xdp_ring_update_tail(xdp_ring);
xsk_umem_consume_tx_done(xdp_ring->xsk_umem); xsk_tx_release(xdp_ring->xsk_pool);
} }
return budget > 0 && work_done; return budget > 0 && work_done;
@ -777,10 +777,10 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget)
xdp_ring->next_to_clean = ntc; xdp_ring->next_to_clean = ntc;
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames); xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem)) if (xsk_uses_need_wakeup(xdp_ring->xsk_pool))
xsk_set_tx_need_wakeup(xdp_ring->xsk_umem); xsk_set_tx_need_wakeup(xdp_ring->xsk_pool);
ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes); ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes);
xmit_done = ice_xmit_zc(xdp_ring, ICE_DFLT_IRQ_WORK); xmit_done = ice_xmit_zc(xdp_ring, ICE_DFLT_IRQ_WORK);
@ -814,7 +814,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
if (queue_id >= vsi->num_txq) if (queue_id >= vsi->num_txq)
return -ENXIO; return -ENXIO;
if (!vsi->xdp_rings[queue_id]->xsk_umem) if (!vsi->xdp_rings[queue_id]->xsk_pool)
return -ENXIO; return -ENXIO;
ring = vsi->xdp_rings[queue_id]; ring = vsi->xdp_rings[queue_id];
@ -833,20 +833,20 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
} }
/** /**
* ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP UMEM attached * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP buff pool attached
* @vsi: VSI to be checked * @vsi: VSI to be checked
* *
* Returns true if any of the Rx rings has an AF_XDP UMEM attached * Returns true if any of the Rx rings has an AF_XDP buff pool attached
*/ */
bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi) bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi)
{ {
int i; int i;
if (!vsi->xsk_umems) if (!vsi->xsk_pools)
return false; return false;
for (i = 0; i < vsi->num_xsk_umems; i++) { for (i = 0; i < vsi->num_xsk_pools; i++) {
if (vsi->xsk_umems[i]) if (vsi->xsk_pools[i])
return true; return true;
} }
@ -854,7 +854,7 @@ bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi)
} }
/** /**
* ice_xsk_clean_rx_ring - clean UMEM queues connected to a given Rx ring * ice_xsk_clean_rx_ring - clean buffer pool queues connected to a given Rx ring
* @rx_ring: ring to be cleaned * @rx_ring: ring to be cleaned
*/ */
void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring) void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
@ -872,7 +872,7 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
} }
/** /**
* ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its UMEM queues * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer pool queues
* @xdp_ring: XDP_Tx ring * @xdp_ring: XDP_Tx ring
*/ */
void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
@ -896,5 +896,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
} }
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames); xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
} }

View File

@ -9,7 +9,8 @@
struct ice_vsi; struct ice_vsi;
#ifdef CONFIG_XDP_SOCKETS #ifdef CONFIG_XDP_SOCKETS
int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid); int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
u16 qid);
int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget); int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget);
bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget); bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget);
int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags); int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
@ -19,8 +20,8 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring);
void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring); void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring);
#else #else
static inline int static inline int
ice_xsk_umem_setup(struct ice_vsi __always_unused *vsi, ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
struct xdp_umem __always_unused *umem, struct xsk_buff_pool __always_unused *pool,
u16 __always_unused qid) u16 __always_unused qid)
{ {
return -EOPNOTSUPP; return -EOPNOTSUPP;

View File

@ -350,7 +350,7 @@ struct ixgbe_ring {
struct ixgbe_rx_queue_stats rx_stats; struct ixgbe_rx_queue_stats rx_stats;
}; };
struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xdp_rxq;
struct xdp_umem *xsk_umem; struct xsk_buff_pool *xsk_pool;
u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */ u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */
u16 rx_buf_len; u16 rx_buf_len;
} ____cacheline_internodealigned_in_smp; } ____cacheline_internodealigned_in_smp;

View File

@ -3151,7 +3151,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
#endif #endif
ixgbe_for_each_ring(ring, q_vector->tx) { ixgbe_for_each_ring(ring, q_vector->tx) {
bool wd = ring->xsk_umem ? bool wd = ring->xsk_pool ?
ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) : ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
ixgbe_clean_tx_irq(q_vector, ring, budget); ixgbe_clean_tx_irq(q_vector, ring, budget);
@ -3171,7 +3171,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
per_ring_budget = budget; per_ring_budget = budget;
ixgbe_for_each_ring(ring, q_vector->rx) { ixgbe_for_each_ring(ring, q_vector->rx) {
int cleaned = ring->xsk_umem ? int cleaned = ring->xsk_pool ?
ixgbe_clean_rx_irq_zc(q_vector, ring, ixgbe_clean_rx_irq_zc(q_vector, ring,
per_ring_budget) : per_ring_budget) :
ixgbe_clean_rx_irq(q_vector, ring, ixgbe_clean_rx_irq(q_vector, ring,
@ -3466,9 +3466,9 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
u32 txdctl = IXGBE_TXDCTL_ENABLE; u32 txdctl = IXGBE_TXDCTL_ENABLE;
u8 reg_idx = ring->reg_idx; u8 reg_idx = ring->reg_idx;
ring->xsk_umem = NULL; ring->xsk_pool = NULL;
if (ring_is_xdp(ring)) if (ring_is_xdp(ring))
ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); ring->xsk_pool = ixgbe_xsk_pool(adapter, ring);
/* disable queue to avoid issues while updating state */ /* disable queue to avoid issues while updating state */
IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0); IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
@ -3708,8 +3708,8 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT; srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
/* configure the packet buffer length */ /* configure the packet buffer length */
if (rx_ring->xsk_umem) { if (rx_ring->xsk_pool) {
u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_umem); u32 xsk_buf_len = xsk_pool_get_rx_frame_size(rx_ring->xsk_pool);
/* If the MAC support setting RXDCTL.RLPML, the /* If the MAC support setting RXDCTL.RLPML, the
* SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
@ -4054,12 +4054,12 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
u8 reg_idx = ring->reg_idx; u8 reg_idx = ring->reg_idx;
xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); ring->xsk_pool = ixgbe_xsk_pool(adapter, ring);
if (ring->xsk_umem) { if (ring->xsk_pool) {
WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
MEM_TYPE_XSK_BUFF_POOL, MEM_TYPE_XSK_BUFF_POOL,
NULL)); NULL));
xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
} else { } else {
WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
MEM_TYPE_PAGE_SHARED, NULL)); MEM_TYPE_PAGE_SHARED, NULL));
@ -4114,8 +4114,8 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
#endif #endif
} }
if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) { if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) {
u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); u32 xsk_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool);
rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK | rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
IXGBE_RXDCTL_RLPML_EN); IXGBE_RXDCTL_RLPML_EN);
@ -4137,7 +4137,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl); IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
ixgbe_rx_desc_queue_enable(adapter, ring); ixgbe_rx_desc_queue_enable(adapter, ring);
if (ring->xsk_umem) if (ring->xsk_pool)
ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring)); ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring));
else else
ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring)); ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
@ -5287,7 +5287,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
u16 i = rx_ring->next_to_clean; u16 i = rx_ring->next_to_clean;
struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i]; struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i];
if (rx_ring->xsk_umem) { if (rx_ring->xsk_pool) {
ixgbe_xsk_clean_rx_ring(rx_ring); ixgbe_xsk_clean_rx_ring(rx_ring);
goto skip_free; goto skip_free;
} }
@ -5979,7 +5979,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
u16 i = tx_ring->next_to_clean; u16 i = tx_ring->next_to_clean;
struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i]; struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
if (tx_ring->xsk_umem) { if (tx_ring->xsk_pool) {
ixgbe_xsk_clean_tx_ring(tx_ring); ixgbe_xsk_clean_tx_ring(tx_ring);
goto out; goto out;
} }
@ -10141,7 +10141,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
*/ */
if (need_reset && prog) if (need_reset && prog)
for (i = 0; i < adapter->num_rx_queues; i++) for (i = 0; i < adapter->num_rx_queues; i++)
if (adapter->xdp_ring[i]->xsk_umem) if (adapter->xdp_ring[i]->xsk_pool)
(void)ixgbe_xsk_wakeup(adapter->netdev, i, (void)ixgbe_xsk_wakeup(adapter->netdev, i,
XDP_WAKEUP_RX); XDP_WAKEUP_RX);
@ -10155,8 +10155,8 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
switch (xdp->command) { switch (xdp->command) {
case XDP_SETUP_PROG: case XDP_SETUP_PROG:
return ixgbe_xdp_setup(dev, xdp->prog); return ixgbe_xdp_setup(dev, xdp->prog);
case XDP_SETUP_XSK_UMEM: case XDP_SETUP_XSK_POOL:
return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem, return ixgbe_xsk_pool_setup(adapter, xdp->xsk.pool,
xdp->xsk.queue_id); xdp->xsk.queue_id);
default: default:

View File

@ -28,9 +28,10 @@ void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring); void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring); void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
struct ixgbe_ring *ring); struct ixgbe_ring *ring);
int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
struct xsk_buff_pool *pool,
u16 qid); u16 qid);
void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle); void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);

View File

@ -8,7 +8,7 @@
#include "ixgbe.h" #include "ixgbe.h"
#include "ixgbe_txrx_common.h" #include "ixgbe_txrx_common.h"
struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter,
struct ixgbe_ring *ring) struct ixgbe_ring *ring)
{ {
bool xdp_on = READ_ONCE(adapter->xdp_prog); bool xdp_on = READ_ONCE(adapter->xdp_prog);
@ -17,11 +17,11 @@ struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps)) if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps))
return NULL; return NULL;
return xdp_get_umem_from_qid(adapter->netdev, qid); return xsk_get_pool_from_qid(adapter->netdev, qid);
} }
static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter,
struct xdp_umem *umem, struct xsk_buff_pool *pool,
u16 qid) u16 qid)
{ {
struct net_device *netdev = adapter->netdev; struct net_device *netdev = adapter->netdev;
@ -35,7 +35,7 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
qid >= netdev->real_num_tx_queues) qid >= netdev->real_num_tx_queues)
return -EINVAL; return -EINVAL;
err = xsk_buff_dma_map(umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR);
if (err) if (err)
return err; return err;
@ -59,13 +59,13 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
return 0; return 0;
} }
static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid)
{ {
struct xdp_umem *umem; struct xsk_buff_pool *pool;
bool if_running; bool if_running;
umem = xdp_get_umem_from_qid(adapter->netdev, qid); pool = xsk_get_pool_from_qid(adapter->netdev, qid);
if (!umem) if (!pool)
return -EINVAL; return -EINVAL;
if_running = netif_running(adapter->netdev) && if_running = netif_running(adapter->netdev) &&
@ -75,7 +75,7 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
ixgbe_txrx_ring_disable(adapter, qid); ixgbe_txrx_ring_disable(adapter, qid);
clear_bit(qid, adapter->af_xdp_zc_qps); clear_bit(qid, adapter->af_xdp_zc_qps);
xsk_buff_dma_unmap(umem, IXGBE_RX_DMA_ATTR); xsk_pool_dma_unmap(pool, IXGBE_RX_DMA_ATTR);
if (if_running) if (if_running)
ixgbe_txrx_ring_enable(adapter, qid); ixgbe_txrx_ring_enable(adapter, qid);
@ -83,11 +83,12 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
return 0; return 0;
} }
int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
struct xsk_buff_pool *pool,
u16 qid) u16 qid)
{ {
return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) : return pool ? ixgbe_xsk_pool_enable(adapter, pool, qid) :
ixgbe_xsk_umem_disable(adapter, qid); ixgbe_xsk_pool_disable(adapter, qid);
} }
static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
@ -149,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
i -= rx_ring->count; i -= rx_ring->count;
do { do {
bi->xdp = xsk_buff_alloc(rx_ring->xsk_umem); bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool);
if (!bi->xdp) { if (!bi->xdp) {
ok = false; ok = false;
break; break;
@ -286,7 +287,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
} }
bi->xdp->data_end = bi->xdp->data + size; bi->xdp->data_end = bi->xdp->data + size;
xsk_buff_dma_sync_for_cpu(bi->xdp); xsk_buff_dma_sync_for_cpu(bi->xdp, rx_ring->xsk_pool);
xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp); xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp);
if (xdp_res) { if (xdp_res) {
@ -344,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
q_vector->rx.total_packets += total_rx_packets; q_vector->rx.total_packets += total_rx_packets;
q_vector->rx.total_bytes += total_rx_bytes; q_vector->rx.total_bytes += total_rx_bytes;
if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
xsk_set_rx_need_wakeup(rx_ring->xsk_umem); xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
else else
xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); xsk_clear_rx_need_wakeup(rx_ring->xsk_pool);
return (int)total_rx_packets; return (int)total_rx_packets;
} }
@ -373,6 +374,7 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
{ {
struct xsk_buff_pool *pool = xdp_ring->xsk_pool;
union ixgbe_adv_tx_desc *tx_desc = NULL; union ixgbe_adv_tx_desc *tx_desc = NULL;
struct ixgbe_tx_buffer *tx_bi; struct ixgbe_tx_buffer *tx_bi;
bool work_done = true; bool work_done = true;
@ -387,12 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
break; break;
} }
if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) if (!xsk_tx_peek_desc(pool, &desc))
break; break;
dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); dma = xsk_buff_raw_get_dma(pool, desc.addr);
xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, xsk_buff_raw_dma_sync_for_device(pool, dma, desc.len);
desc.len);
tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use]; tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
tx_bi->bytecount = desc.len; tx_bi->bytecount = desc.len;
@ -418,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
if (tx_desc) { if (tx_desc) {
ixgbe_xdp_ring_update_tail(xdp_ring); ixgbe_xdp_ring_update_tail(xdp_ring);
xsk_umem_consume_tx_done(xdp_ring->xsk_umem); xsk_tx_release(pool);
} }
return !!budget && work_done; return !!budget && work_done;
@ -439,7 +440,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
{ {
u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
unsigned int total_packets = 0, total_bytes = 0; unsigned int total_packets = 0, total_bytes = 0;
struct xdp_umem *umem = tx_ring->xsk_umem; struct xsk_buff_pool *pool = tx_ring->xsk_pool;
union ixgbe_adv_tx_desc *tx_desc; union ixgbe_adv_tx_desc *tx_desc;
struct ixgbe_tx_buffer *tx_bi; struct ixgbe_tx_buffer *tx_bi;
u32 xsk_frames = 0; u32 xsk_frames = 0;
@ -484,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
q_vector->tx.total_packets += total_packets; q_vector->tx.total_packets += total_packets;
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(umem, xsk_frames); xsk_tx_completed(pool, xsk_frames);
if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) if (xsk_uses_need_wakeup(pool))
xsk_set_tx_need_wakeup(tx_ring->xsk_umem); xsk_set_tx_need_wakeup(pool);
return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit); return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
} }
@ -511,7 +512,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
if (test_bit(__IXGBE_TX_DISABLED, &ring->state)) if (test_bit(__IXGBE_TX_DISABLED, &ring->state))
return -ENETDOWN; return -ENETDOWN;
if (!ring->xsk_umem) if (!ring->xsk_pool)
return -ENXIO; return -ENXIO;
if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) { if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
@ -526,7 +527,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring) void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
{ {
u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
struct xdp_umem *umem = tx_ring->xsk_umem; struct xsk_buff_pool *pool = tx_ring->xsk_pool;
struct ixgbe_tx_buffer *tx_bi; struct ixgbe_tx_buffer *tx_bi;
u32 xsk_frames = 0; u32 xsk_frames = 0;
@ -546,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
} }
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(umem, xsk_frames); xsk_tx_completed(pool, xsk_frames);
} }

View File

@ -24,7 +24,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \ en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
en_selftest.o en/port.o en/monitor_stats.o en/health.o \ en_selftest.o en/port.o en/monitor_stats.o en/health.o \
en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/umem.o \ en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o
# #

View File

@ -442,7 +442,7 @@ struct mlx5e_xdpsq {
struct mlx5e_cq cq; struct mlx5e_cq cq;
/* read only */ /* read only */
struct xdp_umem *umem; struct xsk_buff_pool *xsk_pool;
struct mlx5_wq_cyc wq; struct mlx5_wq_cyc wq;
struct mlx5e_xdpsq_stats *stats; struct mlx5e_xdpsq_stats *stats;
mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check; mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check;
@ -606,7 +606,7 @@ struct mlx5e_rq {
struct page_pool *page_pool; struct page_pool *page_pool;
/* AF_XDP zero-copy */ /* AF_XDP zero-copy */
struct xdp_umem *umem; struct xsk_buff_pool *xsk_pool;
struct work_struct recover_work; struct work_struct recover_work;
@ -729,12 +729,13 @@ struct mlx5e_hv_vhca_stats_agent {
#endif #endif
struct mlx5e_xsk { struct mlx5e_xsk {
/* UMEMs are stored separately from channels, because we don't want to /* XSK buffer pools are stored separately from channels,
* lose them when channels are recreated. The kernel also stores UMEMs, * because we don't want to lose them when channels are
* but it doesn't distinguish between zero-copy and non-zero-copy UMEMs, * recreated. The kernel also stores buffer pool, but it doesn't
* so rely on our mechanism. * distinguish between zero-copy and non-zero-copy UMEMs, so
* rely on our mechanism.
*/ */
struct xdp_umem **umems; struct xsk_buff_pool **pools;
u16 refcnt; u16 refcnt;
bool ever_used; bool ever_used;
}; };
@ -893,7 +894,7 @@ struct mlx5e_xsk_param;
struct mlx5e_rq_param; struct mlx5e_rq_param;
int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
struct xdp_umem *umem, struct mlx5e_rq *rq); struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq);
int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time); int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time);
void mlx5e_deactivate_rq(struct mlx5e_rq *rq); void mlx5e_deactivate_rq(struct mlx5e_rq *rq);
void mlx5e_close_rq(struct mlx5e_rq *rq); void mlx5e_close_rq(struct mlx5e_rq *rq);
@ -903,7 +904,7 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
struct mlx5e_sq_param *param, struct mlx5e_icosq *sq); struct mlx5e_sq_param *param, struct mlx5e_icosq *sq);
void mlx5e_close_icosq(struct mlx5e_icosq *sq); void mlx5e_close_icosq(struct mlx5e_icosq *sq);
int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
struct mlx5e_sq_param *param, struct xdp_umem *umem, struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,
struct mlx5e_xdpsq *sq, bool is_redirect); struct mlx5e_xdpsq *sq, bool is_redirect);
void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq); void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq);

View File

@ -445,7 +445,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
} while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq))); } while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq)));
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(sq->umem, xsk_frames); xsk_tx_completed(sq->xsk_pool, xsk_frames);
sq->stats->cqes += i; sq->stats->cqes += i;
@ -475,7 +475,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)
} }
if (xsk_frames) if (xsk_frames)
xsk_umem_complete_tx(sq->umem, xsk_frames); xsk_tx_completed(sq->xsk_pool, xsk_frames);
} }
int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
@ -563,4 +563,3 @@ void mlx5e_set_xmit_fp(struct mlx5e_xdpsq *sq, bool is_mpw)
sq->xmit_xdp_frame = is_mpw ? sq->xmit_xdp_frame = is_mpw ?
mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame; mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame;
} }

View File

@ -1,31 +1,31 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2019 Mellanox Technologies. */ /* Copyright (c) 2019-2020, Mellanox Technologies inc. All rights reserved. */
#include <net/xdp_sock_drv.h> #include <net/xdp_sock_drv.h>
#include "umem.h" #include "pool.h"
#include "setup.h" #include "setup.h"
#include "en/params.h" #include "en/params.h"
static int mlx5e_xsk_map_umem(struct mlx5e_priv *priv, static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv,
struct xdp_umem *umem) struct xsk_buff_pool *pool)
{ {
struct device *dev = priv->mdev->device; struct device *dev = priv->mdev->device;
return xsk_buff_dma_map(umem, dev, 0); return xsk_pool_dma_map(pool, dev, 0);
} }
static void mlx5e_xsk_unmap_umem(struct mlx5e_priv *priv, static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv,
struct xdp_umem *umem) struct xsk_buff_pool *pool)
{ {
return xsk_buff_dma_unmap(umem, 0); return xsk_pool_dma_unmap(pool, 0);
} }
static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk) static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk)
{ {
if (!xsk->umems) { if (!xsk->pools) {
xsk->umems = kcalloc(MLX5E_MAX_NUM_CHANNELS, xsk->pools = kcalloc(MLX5E_MAX_NUM_CHANNELS,
sizeof(*xsk->umems), GFP_KERNEL); sizeof(*xsk->pools), GFP_KERNEL);
if (unlikely(!xsk->umems)) if (unlikely(!xsk->pools))
return -ENOMEM; return -ENOMEM;
} }
@ -35,68 +35,68 @@ static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk)
return 0; return 0;
} }
static void mlx5e_xsk_put_umems(struct mlx5e_xsk *xsk) static void mlx5e_xsk_put_pools(struct mlx5e_xsk *xsk)
{ {
if (!--xsk->refcnt) { if (!--xsk->refcnt) {
kfree(xsk->umems); kfree(xsk->pools);
xsk->umems = NULL; xsk->pools = NULL;
} }
} }
static int mlx5e_xsk_add_umem(struct mlx5e_xsk *xsk, struct xdp_umem *umem, u16 ix) static int mlx5e_xsk_add_pool(struct mlx5e_xsk *xsk, struct xsk_buff_pool *pool, u16 ix)
{ {
int err; int err;
err = mlx5e_xsk_get_umems(xsk); err = mlx5e_xsk_get_pools(xsk);
if (unlikely(err)) if (unlikely(err))
return err; return err;
xsk->umems[ix] = umem; xsk->pools[ix] = pool;
return 0; return 0;
} }
static void mlx5e_xsk_remove_umem(struct mlx5e_xsk *xsk, u16 ix) static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix)
{ {
xsk->umems[ix] = NULL; xsk->pools[ix] = NULL;
mlx5e_xsk_put_umems(xsk); mlx5e_xsk_put_pools(xsk);
} }
static bool mlx5e_xsk_is_umem_sane(struct xdp_umem *umem) static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool)
{ {
return xsk_umem_get_headroom(umem) <= 0xffff && return xsk_pool_get_headroom(pool) <= 0xffff &&
xsk_umem_get_chunk_size(umem) <= 0xffff; xsk_pool_get_chunk_size(pool) <= 0xffff;
} }
void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk) void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk)
{ {
xsk->headroom = xsk_umem_get_headroom(umem); xsk->headroom = xsk_pool_get_headroom(pool);
xsk->chunk_size = xsk_umem_get_chunk_size(umem); xsk->chunk_size = xsk_pool_get_chunk_size(pool);
} }
static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
struct xdp_umem *umem, u16 ix) struct xsk_buff_pool *pool, u16 ix)
{ {
struct mlx5e_params *params = &priv->channels.params; struct mlx5e_params *params = &priv->channels.params;
struct mlx5e_xsk_param xsk; struct mlx5e_xsk_param xsk;
struct mlx5e_channel *c; struct mlx5e_channel *c;
int err; int err;
if (unlikely(mlx5e_xsk_get_umem(&priv->channels.params, &priv->xsk, ix))) if (unlikely(mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix)))
return -EBUSY; return -EBUSY;
if (unlikely(!mlx5e_xsk_is_umem_sane(umem))) if (unlikely(!mlx5e_xsk_is_pool_sane(pool)))
return -EINVAL; return -EINVAL;
err = mlx5e_xsk_map_umem(priv, umem); err = mlx5e_xsk_map_pool(priv, pool);
if (unlikely(err)) if (unlikely(err))
return err; return err;
err = mlx5e_xsk_add_umem(&priv->xsk, umem, ix); err = mlx5e_xsk_add_pool(&priv->xsk, pool, ix);
if (unlikely(err)) if (unlikely(err))
goto err_unmap_umem; goto err_unmap_pool;
mlx5e_build_xsk_param(umem, &xsk); mlx5e_build_xsk_param(pool, &xsk);
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
/* XSK objects will be created on open. */ /* XSK objects will be created on open. */
@ -112,9 +112,9 @@ static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv,
c = priv->channels.c[ix]; c = priv->channels.c[ix];
err = mlx5e_open_xsk(priv, params, &xsk, umem, c); err = mlx5e_open_xsk(priv, params, &xsk, pool, c);
if (unlikely(err)) if (unlikely(err))
goto err_remove_umem; goto err_remove_pool;
mlx5e_activate_xsk(c); mlx5e_activate_xsk(c);
@ -132,11 +132,11 @@ err_deactivate:
mlx5e_deactivate_xsk(c); mlx5e_deactivate_xsk(c);
mlx5e_close_xsk(c); mlx5e_close_xsk(c);
err_remove_umem: err_remove_pool:
mlx5e_xsk_remove_umem(&priv->xsk, ix); mlx5e_xsk_remove_pool(&priv->xsk, ix);
err_unmap_umem: err_unmap_pool:
mlx5e_xsk_unmap_umem(priv, umem); mlx5e_xsk_unmap_pool(priv, pool);
return err; return err;
@ -146,7 +146,7 @@ validate_closed:
*/ */
if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) { if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) {
err = -EINVAL; err = -EINVAL;
goto err_remove_umem; goto err_remove_pool;
} }
return 0; return 0;
@ -154,45 +154,45 @@ validate_closed:
static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix) static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix)
{ {
struct xdp_umem *umem = mlx5e_xsk_get_umem(&priv->channels.params, struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&priv->channels.params,
&priv->xsk, ix); &priv->xsk, ix);
struct mlx5e_channel *c; struct mlx5e_channel *c;
if (unlikely(!umem)) if (unlikely(!pool))
return -EINVAL; return -EINVAL;
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
goto remove_umem; goto remove_pool;
/* XSK RQ and SQ are only created if XDP program is set. */ /* XSK RQ and SQ are only created if XDP program is set. */
if (!priv->channels.params.xdp_prog) if (!priv->channels.params.xdp_prog)
goto remove_umem; goto remove_pool;
c = priv->channels.c[ix]; c = priv->channels.c[ix];
mlx5e_xsk_redirect_rqt_to_drop(priv, ix); mlx5e_xsk_redirect_rqt_to_drop(priv, ix);
mlx5e_deactivate_xsk(c); mlx5e_deactivate_xsk(c);
mlx5e_close_xsk(c); mlx5e_close_xsk(c);
remove_umem: remove_pool:
mlx5e_xsk_remove_umem(&priv->xsk, ix); mlx5e_xsk_remove_pool(&priv->xsk, ix);
mlx5e_xsk_unmap_umem(priv, umem); mlx5e_xsk_unmap_pool(priv, pool);
return 0; return 0;
} }
static int mlx5e_xsk_enable_umem(struct mlx5e_priv *priv, struct xdp_umem *umem, static int mlx5e_xsk_enable_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool,
u16 ix) u16 ix)
{ {
int err; int err;
mutex_lock(&priv->state_lock); mutex_lock(&priv->state_lock);
err = mlx5e_xsk_enable_locked(priv, umem, ix); err = mlx5e_xsk_enable_locked(priv, pool, ix);
mutex_unlock(&priv->state_lock); mutex_unlock(&priv->state_lock);
return err; return err;
} }
static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix) static int mlx5e_xsk_disable_pool(struct mlx5e_priv *priv, u16 ix)
{ {
int err; int err;
@ -203,7 +203,7 @@ static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix)
return err; return err;
} }
int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid) int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid)
{ {
struct mlx5e_priv *priv = netdev_priv(dev); struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5e_params *params = &priv->channels.params; struct mlx5e_params *params = &priv->channels.params;
@ -212,6 +212,6 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid)
if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix))) if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix)))
return -EINVAL; return -EINVAL;
return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) : return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) :
mlx5e_xsk_disable_umem(priv, ix); mlx5e_xsk_disable_pool(priv, ix);
} }

View File

@ -0,0 +1,27 @@
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2019-2020, Mellanox Technologies inc. All rights reserved. */
#ifndef __MLX5_EN_XSK_POOL_H__
#define __MLX5_EN_XSK_POOL_H__
#include "en.h"
static inline struct xsk_buff_pool *mlx5e_xsk_get_pool(struct mlx5e_params *params,
struct mlx5e_xsk *xsk, u16 ix)
{
if (!xsk || !xsk->pools)
return NULL;
if (unlikely(ix >= params->num_channels))
return NULL;
return xsk->pools[ix];
}
struct mlx5e_xsk_param;
void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk);
/* .ndo_bpf callback. */
int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid);
#endif /* __MLX5_EN_XSK_POOL_H__ */

View File

@ -48,7 +48,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
xdp->data_end = xdp->data + cqe_bcnt32; xdp->data_end = xdp->data + cqe_bcnt32;
xdp_set_data_meta_invalid(xdp); xdp_set_data_meta_invalid(xdp);
xsk_buff_dma_sync_for_cpu(xdp); xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
net_prefetch(xdp->data); net_prefetch(xdp->data);
rcu_read_lock(); rcu_read_lock();
@ -99,7 +99,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
xdp->data_end = xdp->data + cqe_bcnt; xdp->data_end = xdp->data + cqe_bcnt;
xdp_set_data_meta_invalid(xdp); xdp_set_data_meta_invalid(xdp);
xsk_buff_dma_sync_for_cpu(xdp); xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
net_prefetch(xdp->data); net_prefetch(xdp->data);
if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) { if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {

View File

@ -19,10 +19,10 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
struct mlx5e_wqe_frag_info *wi, struct mlx5e_wqe_frag_info *wi,
u32 cqe_bcnt); u32 cqe_bcnt);
static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq, static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq,
struct mlx5e_dma_info *dma_info) struct mlx5e_dma_info *dma_info)
{ {
dma_info->xsk = xsk_buff_alloc(rq->umem); dma_info->xsk = xsk_buff_alloc(rq->xsk_pool);
if (!dma_info->xsk) if (!dma_info->xsk)
return -ENOMEM; return -ENOMEM;
@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err) static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err)
{ {
if (!xsk_umem_uses_need_wakeup(rq->umem)) if (!xsk_uses_need_wakeup(rq->xsk_pool))
return alloc_err; return alloc_err;
if (unlikely(alloc_err)) if (unlikely(alloc_err))
xsk_set_rx_need_wakeup(rq->umem); xsk_set_rx_need_wakeup(rq->xsk_pool);
else else
xsk_clear_rx_need_wakeup(rq->umem); xsk_clear_rx_need_wakeup(rq->xsk_pool);
return false; return false;
} }

View File

@ -45,7 +45,7 @@ static void mlx5e_build_xsk_cparam(struct mlx5e_priv *priv,
} }
int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
struct mlx5e_channel *c) struct mlx5e_channel *c)
{ {
struct mlx5e_channel_param *cparam; struct mlx5e_channel_param *cparam;
@ -64,7 +64,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
if (unlikely(err)) if (unlikely(err))
goto err_free_cparam; goto err_free_cparam;
err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq); err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq);
if (unlikely(err)) if (unlikely(err))
goto err_close_rx_cq; goto err_close_rx_cq;
@ -72,13 +72,13 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
if (unlikely(err)) if (unlikely(err))
goto err_close_rq; goto err_close_rq;
/* Create a separate SQ, so that when the UMEM is disabled, we could /* Create a separate SQ, so that when the buff pool is disabled, we could
* close this SQ safely and stop receiving CQEs. In other case, e.g., if * close this SQ safely and stop receiving CQEs. In other case, e.g., if
* the XDPSQ was used instead, we might run into trouble when the UMEM * the XDPSQ was used instead, we might run into trouble when the buff pool
* is disabled and then reenabled, but the SQ continues receiving CQEs * is disabled and then reenabled, but the SQ continues receiving CQEs
* from the old UMEM. * from the old buff pool.
*/ */
err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true); err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true);
if (unlikely(err)) if (unlikely(err))
goto err_close_tx_cq; goto err_close_tx_cq;

View File

@ -12,7 +12,7 @@ bool mlx5e_validate_xsk_param(struct mlx5e_params *params,
struct mlx5e_xsk_param *xsk, struct mlx5e_xsk_param *xsk,
struct mlx5_core_dev *mdev); struct mlx5_core_dev *mdev);
int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool,
struct mlx5e_channel *c); struct mlx5e_channel *c);
void mlx5e_close_xsk(struct mlx5e_channel *c); void mlx5e_close_xsk(struct mlx5e_channel *c);
void mlx5e_activate_xsk(struct mlx5e_channel *c); void mlx5e_activate_xsk(struct mlx5e_channel *c);

View File

@ -2,7 +2,7 @@
/* Copyright (c) 2019 Mellanox Technologies. */ /* Copyright (c) 2019 Mellanox Technologies. */
#include "tx.h" #include "tx.h"
#include "umem.h" #include "pool.h"
#include "en/xdp.h" #include "en/xdp.h"
#include "en/params.h" #include "en/params.h"
#include <net/xdp_sock_drv.h> #include <net/xdp_sock_drv.h>
@ -66,7 +66,7 @@ static void mlx5e_xsk_tx_post_err(struct mlx5e_xdpsq *sq,
bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
{ {
struct xdp_umem *umem = sq->umem; struct xsk_buff_pool *pool = sq->xsk_pool;
struct mlx5e_xdp_info xdpi; struct mlx5e_xdp_info xdpi;
struct mlx5e_xdp_xmit_data xdptxd; struct mlx5e_xdp_xmit_data xdptxd;
bool work_done = true; bool work_done = true;
@ -87,7 +87,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
break; break;
} }
if (!xsk_umem_consume_tx(umem, &desc)) { if (!xsk_tx_peek_desc(pool, &desc)) {
/* TX will get stuck until something wakes it up by /* TX will get stuck until something wakes it up by
* triggering NAPI. Currently it's expected that the * triggering NAPI. Currently it's expected that the
* application calls sendto() if there are consumed, but * application calls sendto() if there are consumed, but
@ -96,11 +96,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
break; break;
} }
xdptxd.dma_addr = xsk_buff_raw_get_dma(umem, desc.addr); xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr);
xdptxd.data = xsk_buff_raw_get_data(umem, desc.addr); xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr);
xdptxd.len = desc.len; xdptxd.len = desc.len;
xsk_buff_raw_dma_sync_for_device(umem, xdptxd.dma_addr, xdptxd.len); xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len);
ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
mlx5e_xmit_xdp_frame, sq, &xdptxd, &xdpi, check_result); mlx5e_xmit_xdp_frame, sq, &xdptxd, &xdpi, check_result);
@ -119,7 +119,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
mlx5e_xdp_mpwqe_complete(sq); mlx5e_xdp_mpwqe_complete(sq);
mlx5e_xmit_xdp_doorbell(sq); mlx5e_xmit_xdp_doorbell(sq);
xsk_umem_consume_tx_done(umem); xsk_tx_release(pool);
} }
return !(budget && work_done); return !(budget && work_done);

View File

@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget);
static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq) static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq)
{ {
if (!xsk_umem_uses_need_wakeup(sq->umem)) if (!xsk_uses_need_wakeup(sq->xsk_pool))
return; return;
if (sq->pc != sq->cc) if (sq->pc != sq->cc)
xsk_clear_tx_need_wakeup(sq->umem); xsk_clear_tx_need_wakeup(sq->xsk_pool);
else else
xsk_set_tx_need_wakeup(sq->umem); xsk_set_tx_need_wakeup(sq->xsk_pool);
} }
#endif /* __MLX5_EN_XSK_TX_H__ */ #endif /* __MLX5_EN_XSK_TX_H__ */

View File

@ -1,29 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2019 Mellanox Technologies. */
#ifndef __MLX5_EN_XSK_UMEM_H__
#define __MLX5_EN_XSK_UMEM_H__
#include "en.h"
static inline struct xdp_umem *mlx5e_xsk_get_umem(struct mlx5e_params *params,
struct mlx5e_xsk *xsk, u16 ix)
{
if (!xsk || !xsk->umems)
return NULL;
if (unlikely(ix >= params->num_channels))
return NULL;
return xsk->umems[ix];
}
struct mlx5e_xsk_param;
void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk);
/* .ndo_bpf callback. */
int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid);
int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries);
#endif /* __MLX5_EN_XSK_UMEM_H__ */

View File

@ -32,7 +32,7 @@
#include "en.h" #include "en.h"
#include "en/port.h" #include "en/port.h"
#include "en/xsk/umem.h" #include "en/xsk/pool.h"
#include "lib/clock.h" #include "lib/clock.h"
void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv, void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,

View File

@ -33,7 +33,7 @@
#include <linux/mlx5/fs.h> #include <linux/mlx5/fs.h>
#include "en.h" #include "en.h"
#include "en/params.h" #include "en/params.h"
#include "en/xsk/umem.h" #include "en/xsk/pool.h"
struct mlx5e_ethtool_rule { struct mlx5e_ethtool_rule {
struct list_head list; struct list_head list;

View File

@ -57,7 +57,7 @@
#include "en/monitor_stats.h" #include "en/monitor_stats.h"
#include "en/health.h" #include "en/health.h"
#include "en/params.h" #include "en/params.h"
#include "en/xsk/umem.h" #include "en/xsk/pool.h"
#include "en/xsk/setup.h" #include "en/xsk/setup.h"
#include "en/xsk/rx.h" #include "en/xsk/rx.h"
#include "en/xsk/tx.h" #include "en/xsk/tx.h"
@ -363,7 +363,7 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work)
static int mlx5e_alloc_rq(struct mlx5e_channel *c, static int mlx5e_alloc_rq(struct mlx5e_channel *c,
struct mlx5e_params *params, struct mlx5e_params *params,
struct mlx5e_xsk_param *xsk, struct mlx5e_xsk_param *xsk,
struct xdp_umem *umem, struct xsk_buff_pool *xsk_pool,
struct mlx5e_rq_param *rqp, struct mlx5e_rq_param *rqp,
struct mlx5e_rq *rq) struct mlx5e_rq *rq)
{ {
@ -389,9 +389,9 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
rq->mdev = mdev; rq->mdev = mdev;
rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu);
rq->xdpsq = &c->rq_xdpsq; rq->xdpsq = &c->rq_xdpsq;
rq->umem = umem; rq->xsk_pool = xsk_pool;
if (rq->umem) if (rq->xsk_pool)
rq->stats = &c->priv->channel_stats[c->ix].xskrq; rq->stats = &c->priv->channel_stats[c->ix].xskrq;
else else
rq->stats = &c->priv->channel_stats[c->ix].rq; rq->stats = &c->priv->channel_stats[c->ix].rq;
@ -477,7 +477,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
if (xsk) { if (xsk) {
err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
MEM_TYPE_XSK_BUFF_POOL, NULL); MEM_TYPE_XSK_BUFF_POOL, NULL);
xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq); xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq);
} else { } else {
/* Create a page_pool and register it with rxq */ /* Create a page_pool and register it with rxq */
pp_params.order = 0; pp_params.order = 0;
@ -816,11 +816,11 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params,
struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk,
struct xdp_umem *umem, struct mlx5e_rq *rq) struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq)
{ {
int err; int err;
err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq); err = mlx5e_alloc_rq(c, params, xsk, xsk_pool, param, rq);
if (err) if (err)
return err; return err;
@ -925,7 +925,7 @@ static int mlx5e_alloc_xdpsq_db(struct mlx5e_xdpsq *sq, int numa)
static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
struct mlx5e_params *params, struct mlx5e_params *params,
struct xdp_umem *umem, struct xsk_buff_pool *xsk_pool,
struct mlx5e_sq_param *param, struct mlx5e_sq_param *param,
struct mlx5e_xdpsq *sq, struct mlx5e_xdpsq *sq,
bool is_redirect) bool is_redirect)
@ -941,9 +941,9 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
sq->uar_map = mdev->mlx5e_res.bfreg.map; sq->uar_map = mdev->mlx5e_res.bfreg.map;
sq->min_inline_mode = params->tx_min_inline_mode; sq->min_inline_mode = params->tx_min_inline_mode;
sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu);
sq->umem = umem; sq->xsk_pool = xsk_pool;
sq->stats = sq->umem ? sq->stats = sq->xsk_pool ?
&c->priv->channel_stats[c->ix].xsksq : &c->priv->channel_stats[c->ix].xsksq :
is_redirect ? is_redirect ?
&c->priv->channel_stats[c->ix].xdpsq : &c->priv->channel_stats[c->ix].xdpsq :
@ -1408,13 +1408,13 @@ void mlx5e_close_icosq(struct mlx5e_icosq *sq)
} }
int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
struct mlx5e_sq_param *param, struct xdp_umem *umem, struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,
struct mlx5e_xdpsq *sq, bool is_redirect) struct mlx5e_xdpsq *sq, bool is_redirect)
{ {
struct mlx5e_create_sq_param csp = {}; struct mlx5e_create_sq_param csp = {};
int err; int err;
err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect); err = mlx5e_alloc_xdpsq(c, params, xsk_pool, param, sq, is_redirect);
if (err) if (err)
return err; return err;
@ -1907,7 +1907,7 @@ static u8 mlx5e_enumerate_lag_port(struct mlx5_core_dev *mdev, int ix)
static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
struct mlx5e_params *params, struct mlx5e_params *params,
struct mlx5e_channel_param *cparam, struct mlx5e_channel_param *cparam,
struct xdp_umem *umem, struct xsk_buff_pool *xsk_pool,
struct mlx5e_channel **cp) struct mlx5e_channel **cp)
{ {
int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
@ -1946,9 +1946,9 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
if (unlikely(err)) if (unlikely(err))
goto err_napi_del; goto err_napi_del;
if (umem) { if (xsk_pool) {
mlx5e_build_xsk_param(umem, &xsk); mlx5e_build_xsk_param(xsk_pool, &xsk);
err = mlx5e_open_xsk(priv, params, &xsk, umem, c); err = mlx5e_open_xsk(priv, params, &xsk, xsk_pool, c);
if (unlikely(err)) if (unlikely(err))
goto err_close_queues; goto err_close_queues;
} }
@ -2309,12 +2309,12 @@ int mlx5e_open_channels(struct mlx5e_priv *priv,
mlx5e_build_channel_param(priv, &chs->params, cparam); mlx5e_build_channel_param(priv, &chs->params, cparam);
for (i = 0; i < chs->num; i++) { for (i = 0; i < chs->num; i++) {
struct xdp_umem *umem = NULL; struct xsk_buff_pool *xsk_pool = NULL;
if (chs->params.xdp_prog) if (chs->params.xdp_prog)
umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i); xsk_pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i);
err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]); err = mlx5e_open_channel(priv, i, &chs->params, cparam, xsk_pool, &chs->c[i]);
if (err) if (err)
goto err_close_channels; goto err_close_channels;
} }
@ -3892,13 +3892,14 @@ static bool mlx5e_xsk_validate_mtu(struct net_device *netdev,
u16 ix; u16 ix;
for (ix = 0; ix < chs->params.num_channels; ix++) { for (ix = 0; ix < chs->params.num_channels; ix++) {
struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix); struct xsk_buff_pool *xsk_pool =
mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix);
struct mlx5e_xsk_param xsk; struct mlx5e_xsk_param xsk;
if (!umem) if (!xsk_pool)
continue; continue;
mlx5e_build_xsk_param(umem, &xsk); mlx5e_build_xsk_param(xsk_pool, &xsk);
if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) { if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) {
u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk); u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk);
@ -4423,8 +4424,8 @@ static int mlx5e_xdp(struct net_device *dev, struct netdev_bpf *xdp)
switch (xdp->command) { switch (xdp->command) {
case XDP_SETUP_PROG: case XDP_SETUP_PROG:
return mlx5e_xdp_set(dev, xdp->prog); return mlx5e_xdp_set(dev, xdp->prog);
case XDP_SETUP_XSK_UMEM: case XDP_SETUP_XSK_POOL:
return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem, return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool,
xdp->xsk.queue_id); xdp->xsk.queue_id);
default: default:
return -EINVAL; return -EINVAL;

View File

@ -280,8 +280,8 @@ static inline int mlx5e_page_alloc_pool(struct mlx5e_rq *rq,
static inline int mlx5e_page_alloc(struct mlx5e_rq *rq, static inline int mlx5e_page_alloc(struct mlx5e_rq *rq,
struct mlx5e_dma_info *dma_info) struct mlx5e_dma_info *dma_info)
{ {
if (rq->umem) if (rq->xsk_pool)
return mlx5e_xsk_page_alloc_umem(rq, dma_info); return mlx5e_xsk_page_alloc_pool(rq, dma_info);
else else
return mlx5e_page_alloc_pool(rq, dma_info); return mlx5e_page_alloc_pool(rq, dma_info);
} }
@ -312,7 +312,7 @@ static inline void mlx5e_page_release(struct mlx5e_rq *rq,
struct mlx5e_dma_info *dma_info, struct mlx5e_dma_info *dma_info,
bool recycle) bool recycle)
{ {
if (rq->umem) if (rq->xsk_pool)
/* The `recycle` parameter is ignored, and the page is always /* The `recycle` parameter is ignored, and the page is always
* put into the Reuse Ring, because there is no way to return * put into the Reuse Ring, because there is no way to return
* the page to the userspace when the interface goes down. * the page to the userspace when the interface goes down.
@ -399,14 +399,14 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk)
int err; int err;
int i; int i;
if (rq->umem) { if (rq->xsk_pool) {
int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags; int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags;
/* Check in advance that we have enough frames, instead of /* Check in advance that we have enough frames, instead of
* allocating one-by-one, failing and moving frames to the * allocating one-by-one, failing and moving frames to the
* Reuse Ring. * Reuse Ring.
*/ */
if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired))) if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired)))
return -ENOMEM; return -ENOMEM;
} }
@ -504,8 +504,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
/* Check in advance that we have enough frames, instead of allocating /* Check in advance that we have enough frames, instead of allocating
* one-by-one, failing and moving frames to the Reuse Ring. * one-by-one, failing and moving frames to the Reuse Ring.
*/ */
if (rq->umem && if (rq->xsk_pool &&
unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) {
err = -ENOMEM; err = -ENOMEM;
goto err; goto err;
} }
@ -753,7 +753,7 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
* the driver when it refills the Fill Ring. * the driver when it refills the Fill Ring.
* 2. Otherwise, busy poll by rescheduling the NAPI poll. * 2. Otherwise, busy poll by rescheduling the NAPI poll.
*/ */
if (unlikely(alloc_err == -ENOMEM && rq->umem)) if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool))
return true; return true;
return false; return false;

View File

@ -219,24 +219,6 @@ struct veth {
__be16 h_vlan_TCI; __be16 h_vlan_TCI;
}; };
bool tun_is_xdp_frame(void *ptr)
{
return (unsigned long)ptr & TUN_XDP_FLAG;
}
EXPORT_SYMBOL(tun_is_xdp_frame);
void *tun_xdp_to_ptr(void *ptr)
{
return (void *)((unsigned long)ptr | TUN_XDP_FLAG);
}
EXPORT_SYMBOL(tun_xdp_to_ptr);
void *tun_ptr_to_xdp(void *ptr)
{
return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG);
}
EXPORT_SYMBOL(tun_ptr_to_xdp);
static int tun_napi_receive(struct napi_struct *napi, int budget) static int tun_napi_receive(struct napi_struct *napi, int budget)
{ {
struct tun_file *tfile = container_of(napi, struct tun_file, napi); struct tun_file *tfile = container_of(napi, struct tun_file, napi);

View File

@ -234,14 +234,14 @@ static bool veth_is_xdp_frame(void *ptr)
return (unsigned long)ptr & VETH_XDP_FLAG; return (unsigned long)ptr & VETH_XDP_FLAG;
} }
static void *veth_ptr_to_xdp(void *ptr) static struct xdp_frame *veth_ptr_to_xdp(void *ptr)
{ {
return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG); return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG);
} }
static void *veth_xdp_to_ptr(void *ptr) static void *veth_xdp_to_ptr(struct xdp_frame *xdp)
{ {
return (void *)((unsigned long)ptr | VETH_XDP_FLAG); return (void *)((unsigned long)xdp | VETH_XDP_FLAG);
} }
static void veth_ptr_free(void *ptr) static void veth_ptr_free(void *ptr)

View File

@ -279,6 +279,31 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) \ #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) \
BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_RECVMSG, NULL) BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_RECVMSG, NULL)
/* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a
* fullsock and its parent fullsock cannot be traced by
* sk_to_full_sk().
*
* e.g. sock_ops->sk is a request_sock and it is under syncookie mode.
* Its listener-sk is not attached to the rsk_listener.
* In this case, the caller holds the listener-sk (unlocked),
* set its sock_ops->sk to req_sk, and call this SOCK_OPS"_SK" with
* the listener-sk such that the cgroup-bpf-progs of the
* listener-sk will be run.
*
* Regardless of syncookie mode or not,
* calling bpf_setsockopt on listener-sk will not make sense anyway,
* so passing 'sock_ops->sk == req_sk' to the bpf prog is appropriate here.
*/
#define BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(sock_ops, sk) \
({ \
int __ret = 0; \
if (cgroup_bpf_enabled) \
__ret = __cgroup_bpf_run_filter_sock_ops(sk, \
sock_ops, \
BPF_CGROUP_SOCK_OPS); \
__ret; \
})
#define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \ #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \
({ \ ({ \
int __ret = 0; \ int __ret = 0; \

View File

@ -34,6 +34,8 @@ struct btf_type;
struct exception_table_entry; struct exception_table_entry;
struct seq_operations; struct seq_operations;
struct bpf_iter_aux_info; struct bpf_iter_aux_info;
struct bpf_local_storage;
struct bpf_local_storage_map;
extern struct idr btf_idr; extern struct idr btf_idr;
extern spinlock_t btf_idr_lock; extern spinlock_t btf_idr_lock;
@ -104,6 +106,25 @@ struct bpf_map_ops {
__poll_t (*map_poll)(struct bpf_map *map, struct file *filp, __poll_t (*map_poll)(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts); struct poll_table_struct *pts);
/* Functions called by bpf_local_storage maps */
int (*map_local_storage_charge)(struct bpf_local_storage_map *smap,
void *owner, u32 size);
void (*map_local_storage_uncharge)(struct bpf_local_storage_map *smap,
void *owner, u32 size);
struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner);
/* map_meta_equal must be implemented for maps that can be
* used as an inner map. It is a runtime check to ensure
* an inner map can be inserted to an outer map.
*
* Some properties of the inner map has been used during the
* verification time. When inserting an inner map at the runtime,
* map_meta_equal has to ensure the inserting map has the same
* properties that the verifier has used earlier.
*/
bool (*map_meta_equal)(const struct bpf_map *meta0,
const struct bpf_map *meta1);
/* BTF name and id of struct allocated by map_alloc */ /* BTF name and id of struct allocated by map_alloc */
const char * const map_btf_name; const char * const map_btf_name;
int *map_btf_id; int *map_btf_id;
@ -227,6 +248,9 @@ int map_check_no_btf(const struct bpf_map *map,
const struct btf_type *key_type, const struct btf_type *key_type,
const struct btf_type *value_type); const struct btf_type *value_type);
bool bpf_map_meta_equal(const struct bpf_map *meta0,
const struct bpf_map *meta1);
extern const struct bpf_map_ops bpf_map_offload_ops; extern const struct bpf_map_ops bpf_map_offload_ops;
/* function argument constraints */ /* function argument constraints */
@ -309,6 +333,7 @@ struct bpf_func_proto {
* for this argument. * for this argument.
*/ */
int *ret_btf_id; /* return value btf_id */ int *ret_btf_id; /* return value btf_id */
bool (*allowed)(const struct bpf_prog *prog);
}; };
/* bpf_context is intentionally undefined structure. Pointer to bpf_context is /* bpf_context is intentionally undefined structure. Pointer to bpf_context is
@ -514,6 +539,8 @@ int arch_prepare_bpf_trampoline(void *image, void *image_end,
/* these two functions are called from generated trampoline */ /* these two functions are called from generated trampoline */
u64 notrace __bpf_prog_enter(void); u64 notrace __bpf_prog_enter(void);
void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start); void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
void notrace __bpf_prog_enter_sleepable(void);
void notrace __bpf_prog_exit_sleepable(void);
struct bpf_ksym { struct bpf_ksym {
unsigned long start; unsigned long start;
@ -709,6 +736,7 @@ struct bpf_prog_aux {
bool offload_requested; bool offload_requested;
bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */ bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
bool func_proto_unreliable; bool func_proto_unreliable;
bool sleepable;
enum bpf_tramp_prog_type trampoline_prog_type; enum bpf_tramp_prog_type trampoline_prog_type;
struct bpf_trampoline *trampoline; struct bpf_trampoline *trampoline;
struct hlist_node tramp_hlist; struct hlist_node tramp_hlist;
@ -1218,12 +1246,18 @@ typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
union bpf_iter_link_info *linfo, union bpf_iter_link_info *linfo,
struct bpf_iter_aux_info *aux); struct bpf_iter_aux_info *aux);
typedef void (*bpf_iter_detach_target_t)(struct bpf_iter_aux_info *aux); typedef void (*bpf_iter_detach_target_t)(struct bpf_iter_aux_info *aux);
typedef void (*bpf_iter_show_fdinfo_t) (const struct bpf_iter_aux_info *aux,
struct seq_file *seq);
typedef int (*bpf_iter_fill_link_info_t)(const struct bpf_iter_aux_info *aux,
struct bpf_link_info *info);
#define BPF_ITER_CTX_ARG_MAX 2 #define BPF_ITER_CTX_ARG_MAX 2
struct bpf_iter_reg { struct bpf_iter_reg {
const char *target; const char *target;
bpf_iter_attach_target_t attach_target; bpf_iter_attach_target_t attach_target;
bpf_iter_detach_target_t detach_target; bpf_iter_detach_target_t detach_target;
bpf_iter_show_fdinfo_t show_fdinfo;
bpf_iter_fill_link_info_t fill_link_info;
u32 ctx_arg_info_size; u32 ctx_arg_info_size;
struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX]; struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
const struct bpf_iter_seq_info *seq_info; const struct bpf_iter_seq_info *seq_info;
@ -1250,6 +1284,10 @@ int bpf_iter_new_fd(struct bpf_link *link);
bool bpf_link_is_iter(struct bpf_link *link); bool bpf_link_is_iter(struct bpf_link *link);
struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop); struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop);
int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx); int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx);
void bpf_iter_map_show_fdinfo(const struct bpf_iter_aux_info *aux,
struct seq_file *seq);
int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux,
struct bpf_link_info *info);
int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value); int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value); int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
@ -1340,6 +1378,8 @@ int btf_struct_access(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size, const struct btf_type *t, int off, int size,
enum bpf_access_type atype, enum bpf_access_type atype,
u32 *next_btf_id); u32 *next_btf_id);
bool btf_struct_ids_match(struct bpf_verifier_log *log,
int off, u32 id, u32 need_type_id);
int btf_resolve_helper_id(struct bpf_verifier_log *log, int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int); const struct bpf_func_proto *fn, int);
@ -1358,6 +1398,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
struct btf *btf, const struct btf_type *t); struct btf *btf, const struct btf_type *t);
struct bpf_prog *bpf_prog_by_id(u32 id); struct bpf_prog *bpf_prog_by_id(u32 id);
struct bpf_link *bpf_link_by_id(u32 id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id); const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
#else /* !CONFIG_BPF_SYSCALL */ #else /* !CONFIG_BPF_SYSCALL */
@ -1637,6 +1678,7 @@ int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
struct bpf_prog *old, u32 which); struct bpf_prog *old, u32 which);
int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog); int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog);
int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype); int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
void sock_map_unhash(struct sock *sk); void sock_map_unhash(struct sock *sk);
void sock_map_close(struct sock *sk, long timeout); void sock_map_close(struct sock *sk, long timeout);
#else #else
@ -1658,6 +1700,12 @@ static inline int sock_map_prog_detach(const union bpf_attr *attr,
{ {
return -EOPNOTSUPP; return -EOPNOTSUPP;
} }
static inline int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value,
u64 flags)
{
return -EOPNOTSUPP;
}
#endif /* CONFIG_BPF_STREAM_PARSER */ #endif /* CONFIG_BPF_STREAM_PARSER */
#if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL) #if defined(CONFIG_INET) && defined(CONFIG_BPF_SYSCALL)
@ -1736,6 +1784,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto;
extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto;
extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto;
extern const struct bpf_func_proto bpf_copy_from_user_proto;
const struct bpf_func_proto *bpf_tracing_func_proto( const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog); enum bpf_func_id func_id, const struct bpf_prog *prog);
@ -1850,4 +1899,7 @@ enum bpf_text_poke_type {
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
void *addr1, void *addr2); void *addr1, void *addr2);
struct btf_id_set;
bool btf_id_set_contains(struct btf_id_set *set, u32 id);
#endif /* _LINUX_BPF_H */ #endif /* _LINUX_BPF_H */

View File

@ -0,0 +1,163 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (c) 2019 Facebook
* Copyright 2020 Google LLC.
*/
#ifndef _BPF_LOCAL_STORAGE_H
#define _BPF_LOCAL_STORAGE_H
#include <linux/bpf.h>
#include <linux/rculist.h>
#include <linux/list.h>
#include <linux/hash.h>
#include <linux/types.h>
#include <uapi/linux/btf.h>
#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
struct bpf_local_storage_map_bucket {
struct hlist_head list;
raw_spinlock_t lock;
};
/* Thp map is not the primary owner of a bpf_local_storage_elem.
* Instead, the container object (eg. sk->sk_bpf_storage) is.
*
* The map (bpf_local_storage_map) is for two purposes
* 1. Define the size of the "local storage". It is
* the map's value_size.
*
* 2. Maintain a list to keep track of all elems such
* that they can be cleaned up during the map destruction.
*
* When a bpf local storage is being looked up for a
* particular object, the "bpf_map" pointer is actually used
* as the "key" to search in the list of elem in
* the respective bpf_local_storage owned by the object.
*
* e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
* as the searching key.
*/
struct bpf_local_storage_map {
struct bpf_map map;
/* Lookup elem does not require accessing the map.
*
* Updating/Deleting requires a bucket lock to
* link/unlink the elem from the map. Having
* multiple buckets to improve contention.
*/
struct bpf_local_storage_map_bucket *buckets;
u32 bucket_log;
u16 elem_size;
u16 cache_idx;
};
struct bpf_local_storage_data {
/* smap is used as the searching key when looking up
* from the object's bpf_local_storage.
*
* Put it in the same cacheline as the data to minimize
* the number of cachelines access during the cache hit case.
*/
struct bpf_local_storage_map __rcu *smap;
u8 data[] __aligned(8);
};
/* Linked to bpf_local_storage and bpf_local_storage_map */
struct bpf_local_storage_elem {
struct hlist_node map_node; /* Linked to bpf_local_storage_map */
struct hlist_node snode; /* Linked to bpf_local_storage */
struct bpf_local_storage __rcu *local_storage;
struct rcu_head rcu;
/* 8 bytes hole */
/* The data is stored in aother cacheline to minimize
* the number of cachelines access during a cache hit.
*/
struct bpf_local_storage_data sdata ____cacheline_aligned;
};
struct bpf_local_storage {
struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
struct hlist_head list; /* List of bpf_local_storage_elem */
void *owner; /* The object that owns the above "list" of
* bpf_local_storage_elem.
*/
struct rcu_head rcu;
raw_spinlock_t lock; /* Protect adding/removing from the "list" */
};
/* U16_MAX is much more than enough for sk local storage
* considering a tcp_sock is ~2k.
*/
#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \
min_t(u32, \
(KMALLOC_MAX_SIZE - MAX_BPF_STACK - \
sizeof(struct bpf_local_storage_elem)), \
(U16_MAX - sizeof(struct bpf_local_storage_elem)))
#define SELEM(_SDATA) \
container_of((_SDATA), struct bpf_local_storage_elem, sdata)
#define SDATA(_SELEM) (&(_SELEM)->sdata)
#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
struct bpf_local_storage_cache {
spinlock_t idx_lock;
u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
};
#define DEFINE_BPF_STORAGE_CACHE(name) \
static struct bpf_local_storage_cache name = { \
.idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \
}
u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
u16 idx);
/* Helper functions for bpf_local_storage */
int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
struct bpf_local_storage_data *
bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
bool cacheit_lockit);
void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
const struct btf *btf,
const struct btf_type *key_type,
const struct btf_type *value_type);
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem);
bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem,
bool uncharge_omem);
void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
void bpf_selem_link_map(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *selem);
void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
bool charge_mem);
int
bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *first_selem);
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags);
#endif /* _BPF_LOCAL_STORAGE_H */

View File

@ -17,9 +17,28 @@
#include <linux/lsm_hook_defs.h> #include <linux/lsm_hook_defs.h>
#undef LSM_HOOK #undef LSM_HOOK
struct bpf_storage_blob {
struct bpf_local_storage __rcu *storage;
};
extern struct lsm_blob_sizes bpf_lsm_blob_sizes;
int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
const struct bpf_prog *prog); const struct bpf_prog *prog);
static inline struct bpf_storage_blob *bpf_inode(
const struct inode *inode)
{
if (unlikely(!inode->i_security))
return NULL;
return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
}
extern const struct bpf_func_proto bpf_inode_storage_get_proto;
extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
void bpf_inode_storage_free(struct inode *inode);
#else /* !CONFIG_BPF_LSM */ #else /* !CONFIG_BPF_LSM */
static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
@ -28,6 +47,16 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
return -EOPNOTSUPP; return -EOPNOTSUPP;
} }
static inline struct bpf_storage_blob *bpf_inode(
const struct inode *inode)
{
return NULL;
}
static inline void bpf_inode_storage_free(struct inode *inode)
{
}
#endif /* CONFIG_BPF_LSM */ #endif /* CONFIG_BPF_LSM */
#endif /* _LINUX_BPF_LSM_H */ #endif /* _LINUX_BPF_LSM_H */

View File

@ -107,6 +107,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
#endif #endif
#ifdef CONFIG_BPF_LSM
BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
#if defined(CONFIG_XDP_SOCKETS) #if defined(CONFIG_XDP_SOCKETS)
BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)

View File

@ -64,8 +64,7 @@ const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf,
u32 id, u32 *res_id); u32 id, u32 *res_id);
const struct btf_type * const struct btf_type *
btf_resolve_size(const struct btf *btf, const struct btf_type *type, btf_resolve_size(const struct btf *btf, const struct btf_type *type,
u32 *type_size, const struct btf_type **elem_type, u32 *type_size);
u32 *total_nelems);
#define for_each_member(i, struct_type, member) \ #define for_each_member(i, struct_type, member) \
for (i = 0, member = btf_type_member(struct_type); \ for (i = 0, member = btf_type_member(struct_type); \

View File

@ -3,6 +3,11 @@
#ifndef _LINUX_BTF_IDS_H #ifndef _LINUX_BTF_IDS_H
#define _LINUX_BTF_IDS_H #define _LINUX_BTF_IDS_H
struct btf_id_set {
u32 cnt;
u32 ids[];
};
#ifdef CONFIG_DEBUG_INFO_BTF #ifdef CONFIG_DEBUG_INFO_BTF
#include <linux/compiler.h> /* for __PASTE */ #include <linux/compiler.h> /* for __PASTE */
@ -62,7 +67,7 @@ asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
"." #scope " " #name "; \n" \ "." #scope " " #name "; \n" \
#name ":; \n" \ #name ":; \n" \
".popsection; \n"); \ ".popsection; \n");
#define BTF_ID_LIST(name) \ #define BTF_ID_LIST(name) \
__BTF_ID_LIST(name, local) \ __BTF_ID_LIST(name, local) \
@ -88,12 +93,56 @@ asm( \
".zero 4 \n" \ ".zero 4 \n" \
".popsection; \n"); ".popsection; \n");
/*
* The BTF_SET_START/END macros pair defines sorted list of
* BTF IDs plus its members count, with following layout:
*
* BTF_SET_START(list)
* BTF_ID(type1, name1)
* BTF_ID(type2, name2)
* BTF_SET_END(list)
*
* __BTF_ID__set__list:
* .zero 4
* list:
* __BTF_ID__type1__name1__3:
* .zero 4
* __BTF_ID__type2__name2__4:
* .zero 4
*
*/
#define __BTF_SET_START(name, scope) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
"." #scope " __BTF_ID__set__" #name "; \n" \
"__BTF_ID__set__" #name ":; \n" \
".zero 4 \n" \
".popsection; \n");
#define BTF_SET_START(name) \
__BTF_ID_LIST(name, local) \
__BTF_SET_START(name, local)
#define BTF_SET_START_GLOBAL(name) \
__BTF_ID_LIST(name, globl) \
__BTF_SET_START(name, globl)
#define BTF_SET_END(name) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
".size __BTF_ID__set__" #name ", .-" #name " \n" \
".popsection; \n"); \
extern struct btf_id_set name;
#else #else
#define BTF_ID_LIST(name) static u32 name[5]; #define BTF_ID_LIST(name) static u32 name[5];
#define BTF_ID(prefix, name) #define BTF_ID(prefix, name)
#define BTF_ID_UNUSED #define BTF_ID_UNUSED
#define BTF_ID_LIST_GLOBAL(name) u32 name[1]; #define BTF_ID_LIST_GLOBAL(name) u32 name[1];
#define BTF_SET_START(name) static struct btf_id_set name = { 0 };
#define BTF_SET_START_GLOBAL(name) static struct btf_id_set name = { 0 };
#define BTF_SET_END(name)
#endif /* CONFIG_DEBUG_INFO_BTF */ #endif /* CONFIG_DEBUG_INFO_BTF */

View File

@ -1236,13 +1236,17 @@ struct bpf_sock_addr_kern {
struct bpf_sock_ops_kern { struct bpf_sock_ops_kern {
struct sock *sk; struct sock *sk;
u32 op;
union { union {
u32 args[4]; u32 args[4];
u32 reply; u32 reply;
u32 replylong[4]; u32 replylong[4];
}; };
u32 is_fullsock; struct sk_buff *syn_skb;
struct sk_buff *skb;
void *skb_data_end;
u8 op;
u8 is_fullsock;
u8 remaining_opt_len;
u64 temp; /* temp and everything after is not u64 temp; /* temp and everything after is not
* initialized to 0 before calling * initialized to 0 before calling
* the BPF program. New fields that * the BPF program. New fields that

View File

@ -27,9 +27,18 @@ struct tun_xdp_hdr {
#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
struct socket *tun_get_socket(struct file *); struct socket *tun_get_socket(struct file *);
struct ptr_ring *tun_get_tx_ring(struct file *file); struct ptr_ring *tun_get_tx_ring(struct file *file);
bool tun_is_xdp_frame(void *ptr); static inline bool tun_is_xdp_frame(void *ptr)
void *tun_xdp_to_ptr(void *ptr); {
void *tun_ptr_to_xdp(void *ptr); return (unsigned long)ptr & TUN_XDP_FLAG;
}
static inline void *tun_xdp_to_ptr(struct xdp_frame *xdp)
{
return (void *)((unsigned long)xdp | TUN_XDP_FLAG);
}
static inline struct xdp_frame *tun_ptr_to_xdp(void *ptr)
{
return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG);
}
void tun_ptr_free(void *ptr); void tun_ptr_free(void *ptr);
#else #else
#include <linux/err.h> #include <linux/err.h>
@ -48,11 +57,11 @@ static inline bool tun_is_xdp_frame(void *ptr)
{ {
return false; return false;
} }
static inline void *tun_xdp_to_ptr(void *ptr) static inline void *tun_xdp_to_ptr(struct xdp_frame *xdp)
{ {
return NULL; return NULL;
} }
static inline void *tun_ptr_to_xdp(void *ptr) static inline struct xdp_frame *tun_ptr_to_xdp(void *ptr)
{ {
return NULL; return NULL;
} }

View File

@ -618,7 +618,7 @@ struct netdev_queue {
/* Subordinate device that the queue has been assigned to */ /* Subordinate device that the queue has been assigned to */
struct net_device *sb_dev; struct net_device *sb_dev;
#ifdef CONFIG_XDP_SOCKETS #ifdef CONFIG_XDP_SOCKETS
struct xdp_umem *umem; struct xsk_buff_pool *pool;
#endif #endif
/* /*
* write-mostly part * write-mostly part
@ -755,7 +755,7 @@ struct netdev_rx_queue {
struct net_device *dev; struct net_device *dev;
struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xdp_rxq;
#ifdef CONFIG_XDP_SOCKETS #ifdef CONFIG_XDP_SOCKETS
struct xdp_umem *umem; struct xsk_buff_pool *pool;
#endif #endif
} ____cacheline_aligned_in_smp; } ____cacheline_aligned_in_smp;
@ -883,7 +883,7 @@ enum bpf_netdev_command {
/* BPF program for offload callbacks, invoked at program load time. */ /* BPF program for offload callbacks, invoked at program load time. */
BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_ALLOC,
BPF_OFFLOAD_MAP_FREE, BPF_OFFLOAD_MAP_FREE,
XDP_SETUP_XSK_UMEM, XDP_SETUP_XSK_POOL,
}; };
struct bpf_prog_offload_ops; struct bpf_prog_offload_ops;
@ -917,9 +917,9 @@ struct netdev_bpf {
struct { struct {
struct bpf_offloaded_map *offmap; struct bpf_offloaded_map *offmap;
}; };
/* XDP_SETUP_XSK_UMEM */ /* XDP_SETUP_XSK_POOL */
struct { struct {
struct xdp_umem *umem; struct xsk_buff_pool *pool;
u16 queue_id; u16 queue_id;
} xsk; } xsk;
}; };

View File

@ -82,7 +82,14 @@ static inline void rcu_read_unlock_trace(void)
void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func); void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
void synchronize_rcu_tasks_trace(void); void synchronize_rcu_tasks_trace(void);
void rcu_barrier_tasks_trace(void); void rcu_barrier_tasks_trace(void);
#else
/*
* The BPF JIT forms these addresses even when it doesn't call these
* functions, so provide definitions that result in runtime errors.
*/
static inline void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func) { BUG(); }
static inline void rcu_read_lock_trace(void) { BUG(); }
static inline void rcu_read_unlock_trace(void) { BUG(); }
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
#endif /* __LINUX_RCUPDATE_TRACE_H */ #endif /* __LINUX_RCUPDATE_TRACE_H */

View File

@ -340,23 +340,6 @@ static inline void sk_psock_update_proto(struct sock *sk,
struct sk_psock *psock, struct sk_psock *psock,
struct proto *ops) struct proto *ops)
{ {
/* Initialize saved callbacks and original proto only once, since this
* function may be called multiple times for a psock, e.g. when
* psock->progs.msg_parser is updated.
*
* Since we've not installed the new proto, psock is not yet in use and
* we can initialize it without synchronization.
*/
if (!psock->sk_proto) {
struct proto *orig = READ_ONCE(sk->sk_prot);
psock->saved_unhash = orig->unhash;
psock->saved_close = orig->close;
psock->saved_write_space = sk->sk_write_space;
psock->sk_proto = orig;
}
/* Pairs with lockless read in sk_clone_lock() */ /* Pairs with lockless read in sk_clone_lock() */
WRITE_ONCE(sk->sk_prot, ops); WRITE_ONCE(sk->sk_prot, ops);
} }

View File

@ -92,6 +92,8 @@ struct tcp_options_received {
smc_ok : 1, /* SMC seen on SYN packet */ smc_ok : 1, /* SMC seen on SYN packet */
snd_wscale : 4, /* Window scaling received from sender */ snd_wscale : 4, /* Window scaling received from sender */
rcv_wscale : 4; /* Window scaling to send to receiver */ rcv_wscale : 4; /* Window scaling to send to receiver */
u8 saw_unknown:1, /* Received unknown option */
unused:7;
u8 num_sacks; /* Number of SACK blocks */ u8 num_sacks; /* Number of SACK blocks */
u16 user_mss; /* mss requested by user in ioctl */ u16 user_mss; /* mss requested by user in ioctl */
u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ u16 mss_clamp; /* Maximal mss, negotiated at connection setup */
@ -237,14 +239,13 @@ struct tcp_sock {
repair : 1, repair : 1,
frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */ frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */
u8 repair_queue; u8 repair_queue;
u8 syn_data:1, /* SYN includes data */ u8 save_syn:2, /* Save headers of SYN packet */
syn_data:1, /* SYN includes data */
syn_fastopen:1, /* SYN includes Fast Open option */ syn_fastopen:1, /* SYN includes Fast Open option */
syn_fastopen_exp:1,/* SYN includes Fast Open exp. option */ syn_fastopen_exp:1,/* SYN includes Fast Open exp. option */
syn_fastopen_ch:1, /* Active TFO re-enabling probe */ syn_fastopen_ch:1, /* Active TFO re-enabling probe */
syn_data_acked:1,/* data in SYN is acked by SYN-ACK */ syn_data_acked:1,/* data in SYN is acked by SYN-ACK */
save_syn:1, /* Save headers of SYN packet */ is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */
is_cwnd_limited:1,/* forward progress limited by snd_cwnd? */
syn_smc:1; /* SYN includes SMC */
u32 tlp_high_seq; /* snd_nxt at the time of TLP */ u32 tlp_high_seq; /* snd_nxt at the time of TLP */
u32 tcp_tx_delay; /* delay (in usec) added to TX packets */ u32 tcp_tx_delay; /* delay (in usec) added to TX packets */
@ -391,6 +392,9 @@ struct tcp_sock {
#if IS_ENABLED(CONFIG_MPTCP) #if IS_ENABLED(CONFIG_MPTCP)
bool is_mptcp; bool is_mptcp;
#endif #endif
#if IS_ENABLED(CONFIG_SMC)
bool syn_smc; /* SYN includes SMC */
#endif
#ifdef CONFIG_TCP_MD5SIG #ifdef CONFIG_TCP_MD5SIG
/* TCP AF-Specific parts; only used by MD5 Signature support so far */ /* TCP AF-Specific parts; only used by MD5 Signature support so far */
@ -406,7 +410,7 @@ struct tcp_sock {
* socket. Used to retransmit SYNACKs etc. * socket. Used to retransmit SYNACKs etc.
*/ */
struct request_sock __rcu *fastopen_rsk; struct request_sock __rcu *fastopen_rsk;
u32 *saved_syn; struct saved_syn *saved_syn;
}; };
enum tsq_enum { enum tsq_enum {
@ -484,6 +488,12 @@ static inline void tcp_saved_syn_free(struct tcp_sock *tp)
tp->saved_syn = NULL; tp->saved_syn = NULL;
} }
static inline u32 tcp_saved_syn_len(const struct saved_syn *saved_syn)
{
return saved_syn->mac_hdrlen + saved_syn->network_hdrlen +
saved_syn->tcp_hdrlen;
}
struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk, struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk,
const struct sk_buff *orig_skb); const struct sk_buff *orig_skb);

View File

@ -3,13 +3,27 @@
#ifndef _BPF_SK_STORAGE_H #ifndef _BPF_SK_STORAGE_H
#define _BPF_SK_STORAGE_H #define _BPF_SK_STORAGE_H
#include <linux/rculist.h>
#include <linux/list.h>
#include <linux/hash.h>
#include <linux/types.h>
#include <linux/spinlock.h>
#include <linux/bpf.h>
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
#include <linux/bpf_local_storage.h>
struct sock; struct sock;
void bpf_sk_storage_free(struct sock *sk); void bpf_sk_storage_free(struct sock *sk);
extern const struct bpf_func_proto bpf_sk_storage_get_proto; extern const struct bpf_func_proto bpf_sk_storage_get_proto;
extern const struct bpf_func_proto bpf_sk_storage_delete_proto; extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
extern const struct bpf_func_proto sk_storage_get_btf_proto;
extern const struct bpf_func_proto sk_storage_delete_btf_proto;
struct bpf_local_storage_elem;
struct bpf_sk_storage_diag; struct bpf_sk_storage_diag;
struct sk_buff; struct sk_buff;
struct nlattr; struct nlattr;

View File

@ -86,6 +86,8 @@ struct inet_connection_sock {
struct timer_list icsk_retransmit_timer; struct timer_list icsk_retransmit_timer;
struct timer_list icsk_delack_timer; struct timer_list icsk_delack_timer;
__u32 icsk_rto; __u32 icsk_rto;
__u32 icsk_rto_min;
__u32 icsk_delack_max;
__u32 icsk_pmtu_cookie; __u32 icsk_pmtu_cookie;
const struct tcp_congestion_ops *icsk_ca_ops; const struct tcp_congestion_ops *icsk_ca_ops;
const struct inet_connection_sock_af_ops *icsk_af_ops; const struct inet_connection_sock_af_ops *icsk_af_ops;

View File

@ -41,6 +41,13 @@ struct request_sock_ops {
int inet_rtx_syn_ack(const struct sock *parent, struct request_sock *req); int inet_rtx_syn_ack(const struct sock *parent, struct request_sock *req);
struct saved_syn {
u32 mac_hdrlen;
u32 network_hdrlen;
u32 tcp_hdrlen;
u8 data[];
};
/* struct request_sock - mini sock to represent a connection request /* struct request_sock - mini sock to represent a connection request
*/ */
struct request_sock { struct request_sock {
@ -60,7 +67,7 @@ struct request_sock {
struct timer_list rsk_timer; struct timer_list rsk_timer;
const struct request_sock_ops *rsk_ops; const struct request_sock_ops *rsk_ops;
struct sock *sk; struct sock *sk;
u32 *saved_syn; struct saved_syn *saved_syn;
u32 secid; u32 secid;
u32 peer_secid; u32 peer_secid;
}; };

View File

@ -246,7 +246,7 @@ struct sock_common {
/* public: */ /* public: */
}; };
struct bpf_sk_storage; struct bpf_local_storage;
/** /**
* struct sock - network layer representation of sockets * struct sock - network layer representation of sockets
@ -517,7 +517,7 @@ struct sock {
void (*sk_destruct)(struct sock *sk); void (*sk_destruct)(struct sock *sk);
struct sock_reuseport __rcu *sk_reuseport_cb; struct sock_reuseport __rcu *sk_reuseport_cb;
#ifdef CONFIG_BPF_SYSCALL #ifdef CONFIG_BPF_SYSCALL
struct bpf_sk_storage __rcu *sk_bpf_storage; struct bpf_local_storage __rcu *sk_bpf_storage;
#endif #endif
struct rcu_head sk_rcu; struct rcu_head sk_rcu;
}; };

View File

@ -394,7 +394,7 @@ void tcp_metrics_init(void);
bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst); bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst);
void tcp_close(struct sock *sk, long timeout); void tcp_close(struct sock *sk, long timeout);
void tcp_init_sock(struct sock *sk); void tcp_init_sock(struct sock *sk);
void tcp_init_transfer(struct sock *sk, int bpf_op); void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb);
__poll_t tcp_poll(struct file *file, struct socket *sock, __poll_t tcp_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait); struct poll_table_struct *wait);
int tcp_getsockopt(struct sock *sk, int level, int optname, int tcp_getsockopt(struct sock *sk, int level, int optname,
@ -455,7 +455,8 @@ enum tcp_synack_type {
struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
struct request_sock *req, struct request_sock *req,
struct tcp_fastopen_cookie *foc, struct tcp_fastopen_cookie *foc,
enum tcp_synack_type synack_type); enum tcp_synack_type synack_type,
struct sk_buff *syn_skb);
int tcp_disconnect(struct sock *sk, int flags); int tcp_disconnect(struct sock *sk, int flags);
void tcp_finish_connect(struct sock *sk, struct sk_buff *skb); void tcp_finish_connect(struct sock *sk, struct sk_buff *skb);
@ -699,7 +700,7 @@ static inline void tcp_fast_path_check(struct sock *sk)
static inline u32 tcp_rto_min(struct sock *sk) static inline u32 tcp_rto_min(struct sock *sk)
{ {
const struct dst_entry *dst = __sk_dst_get(sk); const struct dst_entry *dst = __sk_dst_get(sk);
u32 rto_min = TCP_RTO_MIN; u32 rto_min = inet_csk(sk)->icsk_rto_min;
if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
@ -2025,7 +2026,8 @@ struct tcp_request_sock_ops {
int (*send_synack)(const struct sock *sk, struct dst_entry *dst, int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
struct flowi *fl, struct request_sock *req, struct flowi *fl, struct request_sock *req,
struct tcp_fastopen_cookie *foc, struct tcp_fastopen_cookie *foc,
enum tcp_synack_type synack_type); enum tcp_synack_type synack_type,
struct sk_buff *syn_skb);
}; };
extern const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops; extern const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops;
@ -2223,6 +2225,55 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
struct msghdr *msg, int len, int flags); struct msghdr *msg, int len, int flags);
#endif /* CONFIG_NET_SOCK_MSG */ #endif /* CONFIG_NET_SOCK_MSG */
#ifdef CONFIG_CGROUP_BPF
/* Copy the listen sk's HDR_OPT_CB flags to its child.
*
* During 3-Way-HandShake, the synack is usually sent from
* the listen sk with the HDR_OPT_CB flags set so that
* bpf-prog will be called to write the BPF hdr option.
*
* In fastopen, the child sk is used to send synack instead
* of the listen sk. Thus, inheriting the HDR_OPT_CB flags
* from the listen sk gives the bpf-prog a chance to write
* BPF hdr option in the synack pkt during fastopen.
*
* Both fastopen and non-fastopen child will inherit the
* HDR_OPT_CB flags to keep the bpf-prog having a consistent
* behavior when deciding to clear this cb flags (or not)
* during the PASSIVE_ESTABLISHED_CB.
*
* In the future, other cb flags could be inherited here also.
*/
static inline void bpf_skops_init_child(const struct sock *sk,
struct sock *child)
{
tcp_sk(child)->bpf_sock_ops_cb_flags =
tcp_sk(sk)->bpf_sock_ops_cb_flags &
(BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG |
BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG |
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG);
}
static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops,
struct sk_buff *skb,
unsigned int end_offset)
{
skops->skb = skb;
skops->skb_data_end = skb->data + end_offset;
}
#else
static inline void bpf_skops_init_child(const struct sock *sk,
struct sock *child)
{
}
static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops,
struct sk_buff *skb,
unsigned int end_offset)
{
}
#endif
/* Call BPF_SOCK_OPS program that returns an int. If the return value /* Call BPF_SOCK_OPS program that returns an int. If the return value
* is < 0, then the BPF op failed (for example if the loaded BPF * is < 0, then the BPF op failed (for example if the loaded BPF
* program does not support the chosen operation or there is no BPF * program does not support the chosen operation or there is no BPF

View File

@ -18,25 +18,19 @@ struct xsk_queue;
struct xdp_buff; struct xdp_buff;
struct xdp_umem { struct xdp_umem {
struct xsk_queue *fq; void *addrs;
struct xsk_queue *cq;
struct xsk_buff_pool *pool;
u64 size; u64 size;
u32 headroom; u32 headroom;
u32 chunk_size; u32 chunk_size;
u32 chunks;
u32 npgs;
struct user_struct *user; struct user_struct *user;
refcount_t users; refcount_t users;
struct work_struct work;
struct page **pgs;
u32 npgs;
u16 queue_id;
u8 need_wakeup;
u8 flags; u8 flags;
int id;
struct net_device *dev;
bool zc; bool zc;
spinlock_t xsk_tx_list_lock; struct page **pgs;
struct list_head xsk_tx_list; int id;
struct list_head xsk_dma_list;
}; };
struct xsk_map { struct xsk_map {
@ -48,10 +42,11 @@ struct xsk_map {
struct xdp_sock { struct xdp_sock {
/* struct sock must be the first member of struct xdp_sock */ /* struct sock must be the first member of struct xdp_sock */
struct sock sk; struct sock sk;
struct xsk_queue *rx; struct xsk_queue *rx ____cacheline_aligned_in_smp;
struct net_device *dev; struct net_device *dev;
struct xdp_umem *umem; struct xdp_umem *umem;
struct list_head flush_node; struct list_head flush_node;
struct xsk_buff_pool *pool;
u16 queue_id; u16 queue_id;
bool zc; bool zc;
enum { enum {
@ -59,10 +54,9 @@ struct xdp_sock {
XSK_BOUND, XSK_BOUND,
XSK_UNBOUND, XSK_UNBOUND,
} state; } state;
/* Protects multiple processes in the control path */
struct mutex mutex;
struct xsk_queue *tx ____cacheline_aligned_in_smp; struct xsk_queue *tx ____cacheline_aligned_in_smp;
struct list_head list; struct list_head tx_list;
/* Mutual exclusion of NAPI TX thread and sendmsg error paths /* Mutual exclusion of NAPI TX thread and sendmsg error paths
* in the SKB destructor callback. * in the SKB destructor callback.
*/ */
@ -77,6 +71,10 @@ struct xdp_sock {
struct list_head map_list; struct list_head map_list;
/* Protects map_list */ /* Protects map_list */
spinlock_t map_list_lock; spinlock_t map_list_lock;
/* Protects multiple processes in the control path */
struct mutex mutex;
struct xsk_queue *fq_tmp; /* Only as tmp storage before bind */
struct xsk_queue *cq_tmp; /* Only as tmp storage before bind */
}; };
#ifdef CONFIG_XDP_SOCKETS #ifdef CONFIG_XDP_SOCKETS

View File

@ -11,47 +11,50 @@
#ifdef CONFIG_XDP_SOCKETS #ifdef CONFIG_XDP_SOCKETS
void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries);
bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc);
void xsk_umem_consume_tx_done(struct xdp_umem *umem); void xsk_tx_release(struct xsk_buff_pool *pool);
struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id); struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
void xsk_set_rx_need_wakeup(struct xdp_umem *umem); u16 queue_id);
void xsk_set_tx_need_wakeup(struct xdp_umem *umem); void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool);
void xsk_clear_rx_need_wakeup(struct xdp_umem *umem); void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool);
void xsk_clear_tx_need_wakeup(struct xdp_umem *umem); void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool);
bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem); void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool);
bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool);
static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
{ {
return XDP_PACKET_HEADROOM + umem->headroom; return XDP_PACKET_HEADROOM + pool->headroom;
} }
static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
{ {
return umem->chunk_size; return pool->chunk_size;
} }
static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
{ {
return xsk_umem_get_chunk_size(umem) - xsk_umem_get_headroom(umem); return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool);
} }
static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
struct xdp_rxq_info *rxq) struct xdp_rxq_info *rxq)
{ {
xp_set_rxq_info(umem->pool, rxq); xp_set_rxq_info(pool, rxq);
} }
static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
unsigned long attrs) unsigned long attrs)
{ {
xp_dma_unmap(umem->pool, attrs); xp_dma_unmap(pool, attrs);
} }
static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool,
unsigned long attrs) struct device *dev, unsigned long attrs)
{ {
return xp_dma_map(umem->pool, dev, attrs, umem->pgs, umem->npgs); struct xdp_umem *umem = pool->umem;
return xp_dma_map(pool, dev, attrs, umem->pgs, umem->npgs);
} }
static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp) static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp)
@ -68,14 +71,14 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp)
return xp_get_frame_dma(xskb); return xp_get_frame_dma(xskb);
} }
static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
{ {
return xp_alloc(umem->pool); return xp_alloc(pool);
} }
static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count)
{ {
return xp_can_alloc(umem->pool, count); return xp_can_alloc(pool, count);
} }
static inline void xsk_buff_free(struct xdp_buff *xdp) static inline void xsk_buff_free(struct xdp_buff *xdp)
@ -85,100 +88,104 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
xp_free(xskb); xp_free(xskb);
} }
static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool,
u64 addr)
{ {
return xp_raw_get_dma(umem->pool, addr); return xp_raw_get_dma(pool, addr);
} }
static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
{ {
return xp_raw_get_data(umem->pool, addr); return xp_raw_get_data(pool, addr);
} }
static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool)
{ {
struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
if (!pool->dma_need_sync)
return;
xp_dma_sync_for_cpu(xskb); xp_dma_sync_for_cpu(xskb);
} }
static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
dma_addr_t dma, dma_addr_t dma,
size_t size) size_t size)
{ {
xp_dma_sync_for_device(umem->pool, dma, size); xp_dma_sync_for_device(pool, dma, size);
} }
#else #else
static inline void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries)
{ {
} }
static inline bool xsk_umem_consume_tx(struct xdp_umem *umem, static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc) struct xdp_desc *desc)
{ {
return false; return false;
} }
static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem) static inline void xsk_tx_release(struct xsk_buff_pool *pool)
{ {
} }
static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, static inline struct xsk_buff_pool *
u16 queue_id) xsk_get_pool_from_qid(struct net_device *dev, u16 queue_id)
{ {
return NULL; return NULL;
} }
static inline void xsk_set_rx_need_wakeup(struct xdp_umem *umem) static inline void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool)
{ {
} }
static inline void xsk_set_tx_need_wakeup(struct xdp_umem *umem) static inline void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool)
{ {
} }
static inline void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) static inline void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool)
{ {
} }
static inline void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) static inline void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool)
{ {
} }
static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool)
{ {
return false; return false;
} }
static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
{ {
return 0; return 0;
} }
static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
{ {
return 0; return 0;
} }
static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
{ {
return 0; return 0;
} }
static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
struct xdp_rxq_info *rxq) struct xdp_rxq_info *rxq)
{ {
} }
static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
unsigned long attrs) unsigned long attrs)
{ {
} }
static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool,
unsigned long attrs) struct device *dev, unsigned long attrs)
{ {
return 0; return 0;
} }
@ -193,12 +200,12 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp)
return 0; return 0;
} }
static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
{ {
return NULL; return NULL;
} }
static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count)
{ {
return false; return false;
} }
@ -207,21 +214,22 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
{ {
} }
static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool,
u64 addr)
{ {
return 0; return 0;
} }
static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
{ {
return NULL; return NULL;
} }
static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool)
{ {
} }
static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool,
dma_addr_t dma, dma_addr_t dma,
size_t size) size_t size)
{ {

View File

@ -13,6 +13,8 @@ struct xsk_buff_pool;
struct xdp_rxq_info; struct xdp_rxq_info;
struct xsk_queue; struct xsk_queue;
struct xdp_desc; struct xdp_desc;
struct xdp_umem;
struct xdp_sock;
struct device; struct device;
struct page; struct page;
@ -26,34 +28,68 @@ struct xdp_buff_xsk {
struct list_head free_list_node; struct list_head free_list_node;
}; };
struct xsk_dma_map {
dma_addr_t *dma_pages;
struct device *dev;
struct net_device *netdev;
refcount_t users;
struct list_head list; /* Protected by the RTNL_LOCK */
u32 dma_pages_cnt;
bool dma_need_sync;
};
struct xsk_buff_pool { struct xsk_buff_pool {
struct xsk_queue *fq; /* Members only used in the control path first. */
struct device *dev;
struct net_device *netdev;
struct list_head xsk_tx_list;
/* Protects modifications to the xsk_tx_list */
spinlock_t xsk_tx_list_lock;
refcount_t users;
struct xdp_umem *umem;
struct work_struct work;
struct list_head free_list; struct list_head free_list;
u32 heads_cnt;
u16 queue_id;
/* Data path members as close to free_heads at the end as possible. */
struct xsk_queue *fq ____cacheline_aligned_in_smp;
struct xsk_queue *cq;
/* For performance reasons, each buff pool has its own array of dma_pages
* even when they are identical.
*/
dma_addr_t *dma_pages; dma_addr_t *dma_pages;
struct xdp_buff_xsk *heads; struct xdp_buff_xsk *heads;
u64 chunk_mask; u64 chunk_mask;
u64 addrs_cnt; u64 addrs_cnt;
u32 free_list_cnt; u32 free_list_cnt;
u32 dma_pages_cnt; u32 dma_pages_cnt;
u32 heads_cnt;
u32 free_heads_cnt; u32 free_heads_cnt;
u32 headroom; u32 headroom;
u32 chunk_size; u32 chunk_size;
u32 frame_len; u32 frame_len;
u8 cached_need_wakeup;
bool uses_need_wakeup;
bool dma_need_sync; bool dma_need_sync;
bool unaligned; bool unaligned;
void *addrs; void *addrs;
struct device *dev;
struct xdp_buff_xsk *free_heads[]; struct xdp_buff_xsk *free_heads[];
}; };
/* AF_XDP core. */ /* AF_XDP core. */
struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
u32 chunk_size, u32 headroom, u64 size, struct xdp_umem *umem);
bool unaligned); int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev,
void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); u16 queue_id, u16 flags);
int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem,
struct net_device *dev, u16 queue_id);
void xp_destroy(struct xsk_buff_pool *pool); void xp_destroy(struct xsk_buff_pool *pool);
void xp_release(struct xdp_buff_xsk *xskb); void xp_release(struct xdp_buff_xsk *xskb);
void xp_get_pool(struct xsk_buff_pool *pool);
void xp_put_pool(struct xsk_buff_pool *pool);
void xp_clear_dev(struct xsk_buff_pool *pool);
void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs);
void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs);
/* AF_XDP, and XDP core. */ /* AF_XDP, and XDP core. */
void xp_free(struct xdp_buff_xsk *xskb); void xp_free(struct xdp_buff_xsk *xskb);
@ -80,9 +116,6 @@ static inline dma_addr_t xp_get_frame_dma(struct xdp_buff_xsk *xskb)
void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb); void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb);
static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb) static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb)
{ {
if (!xskb->pool->dma_need_sync)
return;
xp_dma_sync_for_cpu_slow(xskb); xp_dma_sync_for_cpu_slow(xskb);
} }

View File

@ -155,6 +155,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_DEVMAP_HASH, BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS, BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF, BPF_MAP_TYPE_RINGBUF,
BPF_MAP_TYPE_INODE_STORAGE,
}; };
/* Note that tracing related programs such as /* Note that tracing related programs such as
@ -345,6 +346,14 @@ enum bpf_link_type {
/* The verifier internal test flag. Behavior is undefined */ /* The verifier internal test flag. Behavior is undefined */
#define BPF_F_TEST_STATE_FREQ (1U << 3) #define BPF_F_TEST_STATE_FREQ (1U << 3)
/* If BPF_F_SLEEPABLE is used in BPF_PROG_LOAD command, the verifier will
* restrict map and helper usage for such programs. Sleepable BPF programs can
* only be attached to hooks where kernel execution context allows sleeping.
* Such programs are allowed to use helpers that may sleep like
* bpf_copy_from_user().
*/
#define BPF_F_SLEEPABLE (1U << 4)
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* two extensions: * two extensions:
* *
@ -2807,7 +2816,7 @@ union bpf_attr {
* *
* **-ERANGE** if resulting value was out of range. * **-ERANGE** if resulting value was out of range.
* *
* void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
* Description * Description
* Get a bpf-local-storage from a *sk*. * Get a bpf-local-storage from a *sk*.
* *
@ -2823,6 +2832,9 @@ union bpf_attr {
* "type". The bpf-local-storage "type" (i.e. the *map*) is * "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf-local-storages residing at *sk*. * searched against all bpf-local-storages residing at *sk*.
* *
* *sk* is a kernel **struct sock** pointer for LSM program.
* *sk* is a **struct bpf_sock** pointer for other program types.
*
* An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf-local-storage will be * used such that a new bpf-local-storage will be
* created if one does not exist. *value* can be used * created if one does not exist. *value* can be used
@ -2835,7 +2847,7 @@ union bpf_attr {
* **NULL** if not found or there was an error in adding * **NULL** if not found or there was an error in adding
* a new bpf-local-storage. * a new bpf-local-storage.
* *
* long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) * long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
* Description * Description
* Delete a bpf-local-storage from a *sk*. * Delete a bpf-local-storage from a *sk*.
* Return * Return
@ -3395,6 +3407,175 @@ union bpf_attr {
* A non-negative value equal to or less than *size* on success, * A non-negative value equal to or less than *size* on success,
* or a negative error in case of failure. * or a negative error in case of failure.
* *
* long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags)
* Description
* Load header option. Support reading a particular TCP header
* option for bpf program (BPF_PROG_TYPE_SOCK_OPS).
*
* If *flags* is 0, it will search the option from the
* sock_ops->skb_data. The comment in "struct bpf_sock_ops"
* has details on what skb_data contains under different
* sock_ops->op.
*
* The first byte of the *searchby_res* specifies the
* kind that it wants to search.
*
* If the searching kind is an experimental kind
* (i.e. 253 or 254 according to RFC6994). It also
* needs to specify the "magic" which is either
* 2 bytes or 4 bytes. It then also needs to
* specify the size of the magic by using
* the 2nd byte which is "kind-length" of a TCP
* header option and the "kind-length" also
* includes the first 2 bytes "kind" and "kind-length"
* itself as a normal TCP header option also does.
*
* For example, to search experimental kind 254 with
* 2 byte magic 0xeB9F, the searchby_res should be
* [ 254, 4, 0xeB, 0x9F, 0, 0, .... 0 ].
*
* To search for the standard window scale option (3),
* the searchby_res should be [ 3, 0, 0, .... 0 ].
* Note, kind-length must be 0 for regular option.
*
* Searching for No-Op (0) and End-of-Option-List (1) are
* not supported.
*
* *len* must be at least 2 bytes which is the minimal size
* of a header option.
*
* Supported flags:
* * **BPF_LOAD_HDR_OPT_TCP_SYN** to search from the
* saved_syn packet or the just-received syn packet.
*
* Return
* >0 when found, the header option is copied to *searchby_res*.
* The return value is the total length copied.
*
* **-EINVAL** If param is invalid
*
* **-ENOMSG** The option is not found
*
* **-ENOENT** No syn packet available when
* **BPF_LOAD_HDR_OPT_TCP_SYN** is used
*
* **-ENOSPC** Not enough space. Only *len* number of
* bytes are copied.
*
* **-EFAULT** Cannot parse the header options in the packet
*
* **-EPERM** This helper cannot be used under the
* current sock_ops->op.
*
* long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from, u32 len, u64 flags)
* Description
* Store header option. The data will be copied
* from buffer *from* with length *len* to the TCP header.
*
* The buffer *from* should have the whole option that
* includes the kind, kind-length, and the actual
* option data. The *len* must be at least kind-length
* long. The kind-length does not have to be 4 byte
* aligned. The kernel will take care of the padding
* and setting the 4 bytes aligned value to th->doff.
*
* This helper will check for duplicated option
* by searching the same option in the outgoing skb.
*
* This helper can only be called during
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
*
* Return
* 0 on success, or negative error in case of failure:
*
* **-EINVAL** If param is invalid
*
* **-ENOSPC** Not enough space in the header.
* Nothing has been written
*
* **-EEXIST** The option has already existed
*
* **-EFAULT** Cannot parse the existing header options
*
* **-EPERM** This helper cannot be used under the
* current sock_ops->op.
*
* long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, u32 len, u64 flags)
* Description
* Reserve *len* bytes for the bpf header option. The
* space will be used by bpf_store_hdr_opt() later in
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
*
* If bpf_reserve_hdr_opt() is called multiple times,
* the total number of bytes will be reserved.
*
* This helper can only be called during
* BPF_SOCK_OPS_HDR_OPT_LEN_CB.
*
* Return
* 0 on success, or negative error in case of failure:
*
* **-EINVAL** if param is invalid
*
* **-ENOSPC** Not enough space in the header.
*
* **-EPERM** This helper cannot be used under the
* current sock_ops->op.
*
* void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
* Description
* Get a bpf_local_storage from an *inode*.
*
* Logically, it could be thought of as getting the value from
* a *map* with *inode* as the **key**. From this
* perspective, the usage is not much different from
* **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
* helper enforces the key must be an inode and the map must also
* be a **BPF_MAP_TYPE_INODE_STORAGE**.
*
* Underneath, the value is stored locally at *inode* instead of
* the *map*. The *map* is used as the bpf-local-storage
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf_local_storage residing at *inode*.
*
* An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf_local_storage will be
* created if one does not exist. *value* can be used
* together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
* the initial value of a bpf_local_storage. If *value* is
* **NULL**, the new bpf_local_storage will be zero initialized.
* Return
* A bpf_local_storage pointer is returned on success.
*
* **NULL** if not found or there was an error in adding
* a new bpf_local_storage.
*
* int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
* Description
* Delete a bpf_local_storage from an *inode*.
* Return
* 0 on success.
*
* **-ENOENT** if the bpf_local_storage cannot be found.
*
* long bpf_d_path(struct path *path, char *buf, u32 sz)
* Description
* Return full path for given 'struct path' object, which
* needs to be the kernel BTF 'path' object. The path is
* returned in the provided buffer 'buf' of size 'sz' and
* is zero terminated.
*
* Return
* On success, the strictly positive length of the string,
* including the trailing NUL character. On error, a negative
* value.
*
* long bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
* Description
* Read *size* bytes from user space address *user_ptr* and store
* the data in *dst*. This is a wrapper of copy_from_user().
* Return
* 0 on success, or a negative error in case of failure.
*/ */
#define __BPF_FUNC_MAPPER(FN) \ #define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \ FN(unspec), \
@ -3539,6 +3720,13 @@ union bpf_attr {
FN(skc_to_tcp_request_sock), \ FN(skc_to_tcp_request_sock), \
FN(skc_to_udp6_sock), \ FN(skc_to_udp6_sock), \
FN(get_task_stack), \ FN(get_task_stack), \
FN(load_hdr_opt), \
FN(store_hdr_opt), \
FN(reserve_hdr_opt), \
FN(inode_storage_get), \
FN(inode_storage_delete), \
FN(d_path), \
FN(copy_from_user), \
/* */ /* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -3648,9 +3836,13 @@ enum {
BPF_F_SYSCTL_BASE_NAME = (1ULL << 0), BPF_F_SYSCTL_BASE_NAME = (1ULL << 0),
}; };
/* BPF_FUNC_sk_storage_get flags */ /* BPF_FUNC_<kernel_obj>_storage_get flags */
enum { enum {
BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0), BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0),
/* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
* and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
*/
BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE,
}; };
/* BPF_FUNC_read_branch_records flags. */ /* BPF_FUNC_read_branch_records flags. */
@ -4071,6 +4263,15 @@ struct bpf_link_info {
__u64 cgroup_id; __u64 cgroup_id;
__u32 attach_type; __u32 attach_type;
} cgroup; } cgroup;
struct {
__aligned_u64 target_name; /* in/out: target_name buffer ptr */
__u32 target_name_len; /* in/out: target_name buffer len */
union {
struct {
__u32 map_id;
} map;
};
} iter;
struct { struct {
__u32 netns_ino; __u32 netns_ino;
__u32 attach_type; __u32 attach_type;
@ -4158,6 +4359,36 @@ struct bpf_sock_ops {
__u64 bytes_received; __u64 bytes_received;
__u64 bytes_acked; __u64 bytes_acked;
__bpf_md_ptr(struct bpf_sock *, sk); __bpf_md_ptr(struct bpf_sock *, sk);
/* [skb_data, skb_data_end) covers the whole TCP header.
*
* BPF_SOCK_OPS_PARSE_HDR_OPT_CB: The packet received
* BPF_SOCK_OPS_HDR_OPT_LEN_CB: Not useful because the
* header has not been written.
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB: The header and options have
* been written so far.
* BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: The SYNACK that concludes
* the 3WHS.
* BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: The ACK that concludes
* the 3WHS.
*
* bpf_load_hdr_opt() can also be used to read a particular option.
*/
__bpf_md_ptr(void *, skb_data);
__bpf_md_ptr(void *, skb_data_end);
__u32 skb_len; /* The total length of a packet.
* It includes the header, options,
* and payload.
*/
__u32 skb_tcp_flags; /* tcp_flags of the header. It provides
* an easy way to check for tcp_flags
* without parsing skb_data.
*
* In particular, the skb_tcp_flags
* will still be available in
* BPF_SOCK_OPS_HDR_OPT_LEN even though
* the outgoing header has not
* been written yet.
*/
}; };
/* Definitions for bpf_sock_ops_cb_flags */ /* Definitions for bpf_sock_ops_cb_flags */
@ -4166,8 +4397,51 @@ enum {
BPF_SOCK_OPS_RETRANS_CB_FLAG = (1<<1), BPF_SOCK_OPS_RETRANS_CB_FLAG = (1<<1),
BPF_SOCK_OPS_STATE_CB_FLAG = (1<<2), BPF_SOCK_OPS_STATE_CB_FLAG = (1<<2),
BPF_SOCK_OPS_RTT_CB_FLAG = (1<<3), BPF_SOCK_OPS_RTT_CB_FLAG = (1<<3),
/* Call bpf for all received TCP headers. The bpf prog will be
* called under sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB
*
* Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB
* for the header option related helpers that will be useful
* to the bpf programs.
*
* It could be used at the client/active side (i.e. connect() side)
* when the server told it that the server was in syncookie
* mode and required the active side to resend the bpf-written
* options. The active side can keep writing the bpf-options until
* it received a valid packet from the server side to confirm
* the earlier packet (and options) has been received. The later
* example patch is using it like this at the active side when the
* server is in syncookie mode.
*
* The bpf prog will usually turn this off in the common cases.
*/
BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG = (1<<4),
/* Call bpf when kernel has received a header option that
* the kernel cannot handle. The bpf prog will be called under
* sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB.
*
* Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB
* for the header option related helpers that will be useful
* to the bpf programs.
*/
BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG = (1<<5),
/* Call bpf when the kernel is writing header options for the
* outgoing packet. The bpf prog will first be called
* to reserve space in a skb under
* sock_ops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB. Then
* the bpf prog will be called to write the header option(s)
* under sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
*
* Please refer to the comment in BPF_SOCK_OPS_HDR_OPT_LEN_CB
* and BPF_SOCK_OPS_WRITE_HDR_OPT_CB for the header option
* related helpers that will be useful to the bpf programs.
*
* The kernel gets its chance to reserve space and write
* options first before the BPF program does.
*/
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6),
/* Mask of all currently supported cb flags */ /* Mask of all currently supported cb flags */
BPF_SOCK_OPS_ALL_CB_FLAGS = 0xF, BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F,
}; };
/* List of known BPF sock_ops operators. /* List of known BPF sock_ops operators.
@ -4223,6 +4497,63 @@ enum {
*/ */
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT. BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
*/ */
BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option.
* It will be called to handle
* the packets received at
* an already established
* connection.
*
* sock_ops->skb_data:
* Referring to the received skb.
* It covers the TCP header only.
*
* bpf_load_hdr_opt() can also
* be used to search for a
* particular option.
*/
BPF_SOCK_OPS_HDR_OPT_LEN_CB, /* Reserve space for writing the
* header option later in
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
* Arg1: bool want_cookie. (in
* writing SYNACK only)
*
* sock_ops->skb_data:
* Not available because no header has
* been written yet.
*
* sock_ops->skb_tcp_flags:
* The tcp_flags of the
* outgoing skb. (e.g. SYN, ACK, FIN).
*
* bpf_reserve_hdr_opt() should
* be used to reserve space.
*/
BPF_SOCK_OPS_WRITE_HDR_OPT_CB, /* Write the header options
* Arg1: bool want_cookie. (in
* writing SYNACK only)
*
* sock_ops->skb_data:
* Referring to the outgoing skb.
* It covers the TCP header
* that has already been written
* by the kernel and the
* earlier bpf-progs.
*
* sock_ops->skb_tcp_flags:
* The tcp_flags of the outgoing
* skb. (e.g. SYN, ACK, FIN).
*
* bpf_store_hdr_opt() should
* be used to write the
* option.
*
* bpf_load_hdr_opt() can also
* be used to search for a
* particular option that
* has already been written
* by the kernel or the
* earlier bpf-progs.
*/
}; };
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
@ -4250,6 +4581,63 @@ enum {
enum { enum {
TCP_BPF_IW = 1001, /* Set TCP initial congestion window */ TCP_BPF_IW = 1001, /* Set TCP initial congestion window */
TCP_BPF_SNDCWND_CLAMP = 1002, /* Set sndcwnd_clamp */ TCP_BPF_SNDCWND_CLAMP = 1002, /* Set sndcwnd_clamp */
TCP_BPF_DELACK_MAX = 1003, /* Max delay ack in usecs */
TCP_BPF_RTO_MIN = 1004, /* Min delay ack in usecs */
/* Copy the SYN pkt to optval
*
* BPF_PROG_TYPE_SOCK_OPS only. It is similar to the
* bpf_getsockopt(TCP_SAVED_SYN) but it does not limit
* to only getting from the saved_syn. It can either get the
* syn packet from:
*
* 1. the just-received SYN packet (only available when writing the
* SYNACK). It will be useful when it is not necessary to
* save the SYN packet for latter use. It is also the only way
* to get the SYN during syncookie mode because the syn
* packet cannot be saved during syncookie.
*
* OR
*
* 2. the earlier saved syn which was done by
* bpf_setsockopt(TCP_SAVE_SYN).
*
* The bpf_getsockopt(TCP_BPF_SYN*) option will hide where the
* SYN packet is obtained.
*
* If the bpf-prog does not need the IP[46] header, the
* bpf-prog can avoid parsing the IP header by using
* TCP_BPF_SYN. Otherwise, the bpf-prog can get both
* IP[46] and TCP header by using TCP_BPF_SYN_IP.
*
* >0: Total number of bytes copied
* -ENOSPC: Not enough space in optval. Only optlen number of
* bytes is copied.
* -ENOENT: The SYN skb is not available now and the earlier SYN pkt
* is not saved by setsockopt(TCP_SAVE_SYN).
*/
TCP_BPF_SYN = 1005, /* Copy the TCP header */
TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */
TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */
};
enum {
BPF_LOAD_HDR_OPT_TCP_SYN = (1ULL << 0),
};
/* args[0] value during BPF_SOCK_OPS_HDR_OPT_LEN_CB and
* BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
*/
enum {
BPF_WRITE_HDR_TCP_CURRENT_MSS = 1, /* Kernel is finding the
* total option spaces
* required for an established
* sk in order to calculate the
* MSS. No skb is actually
* sent.
*/
BPF_WRITE_HDR_TCP_SYNACK_COOKIE = 2, /* Kernel is in syncookie mode
* when sending a SYN.
*/
}; };
struct bpf_perf_event_value { struct bpf_perf_event_value {

View File

@ -1691,6 +1691,7 @@ config BPF_SYSCALL
bool "Enable bpf() system call" bool "Enable bpf() system call"
select BPF select BPF
select IRQ_WORK select IRQ_WORK
select TASKS_TRACE_RCU
default n default n
help help
Enable the bpf() system call that allows to manipulate eBPF Enable the bpf() system call that allows to manipulate eBPF
@ -1710,6 +1711,8 @@ config BPF_JIT_DEFAULT_ON
def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON
depends on HAVE_EBPF_JIT && BPF_JIT depends on HAVE_EBPF_JIT && BPF_JIT
source "kernel/bpf/preload/Kconfig"
config USERFAULTFD config USERFAULTFD
bool "Enable userfaultfd() system call" bool "Enable userfaultfd() system call"
depends on MMU depends on MMU

View File

@ -12,7 +12,7 @@ obj-y = fork.o exec_domain.o panic.o \
notifier.o ksysfs.o cred.o reboot.o \ notifier.o ksysfs.o cred.o reboot.o \
async.o range.o smpboot.o ucount.o regset.o async.o range.o smpboot.o ucount.o regset.o
obj-$(CONFIG_BPFILTER) += usermode_driver.o obj-$(CONFIG_USERMODE_DRIVER) += usermode_driver.o
obj-$(CONFIG_MODULES) += kmod.o obj-$(CONFIG_MODULES) += kmod.o
obj-$(CONFIG_MULTIUSER) += groups.o obj-$(CONFIG_MULTIUSER) += groups.o

View File

@ -5,6 +5,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_JIT) += trampoline.o obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o obj-$(CONFIG_BPF_SYSCALL) += btf.o
@ -12,6 +13,7 @@ obj-$(CONFIG_BPF_JIT) += dispatcher.o
ifeq ($(CONFIG_NET),y) ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_BPF_SYSCALL) += devmap.o obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o
obj-$(CONFIG_BPF_SYSCALL) += offload.o obj-$(CONFIG_BPF_SYSCALL) += offload.o
obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
endif endif
@ -29,3 +31,4 @@ ifeq ($(CONFIG_BPF_JIT),y)
obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
obj-${CONFIG_BPF_LSM} += bpf_lsm.o obj-${CONFIG_BPF_LSM} += bpf_lsm.o
endif endif
obj-$(CONFIG_BPF_PRELOAD) += preload/

View File

@ -10,6 +10,7 @@
#include <linux/filter.h> #include <linux/filter.h>
#include <linux/perf_event.h> #include <linux/perf_event.h>
#include <uapi/linux/btf.h> #include <uapi/linux/btf.h>
#include <linux/rcupdate_trace.h>
#include "map_in_map.h" #include "map_in_map.h"
@ -487,6 +488,13 @@ static int array_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
vma->vm_pgoff + pgoff); vma->vm_pgoff + pgoff);
} }
static bool array_map_meta_equal(const struct bpf_map *meta0,
const struct bpf_map *meta1)
{
return meta0->max_entries == meta1->max_entries &&
bpf_map_meta_equal(meta0, meta1);
}
struct bpf_iter_seq_array_map_info { struct bpf_iter_seq_array_map_info {
struct bpf_map *map; struct bpf_map *map;
void *percpu_value_buf; void *percpu_value_buf;
@ -625,6 +633,7 @@ static const struct bpf_iter_seq_info iter_seq_info = {
static int array_map_btf_id; static int array_map_btf_id;
const struct bpf_map_ops array_map_ops = { const struct bpf_map_ops array_map_ops = {
.map_meta_equal = array_map_meta_equal,
.map_alloc_check = array_map_alloc_check, .map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc, .map_alloc = array_map_alloc,
.map_free = array_map_free, .map_free = array_map_free,
@ -647,6 +656,7 @@ const struct bpf_map_ops array_map_ops = {
static int percpu_array_map_btf_id; static int percpu_array_map_btf_id;
const struct bpf_map_ops percpu_array_map_ops = { const struct bpf_map_ops percpu_array_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = array_map_alloc_check, .map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc, .map_alloc = array_map_alloc,
.map_free = array_map_free, .map_free = array_map_free,
@ -1003,6 +1013,11 @@ static void prog_array_map_free(struct bpf_map *map)
fd_array_map_free(map); fd_array_map_free(map);
} }
/* prog_array->aux->{type,jited} is a runtime binding.
* Doing static check alone in the verifier is not enough.
* Thus, prog_array_map cannot be used as an inner_map
* and map_meta_equal is not implemented.
*/
static int prog_array_map_btf_id; static int prog_array_map_btf_id;
const struct bpf_map_ops prog_array_map_ops = { const struct bpf_map_ops prog_array_map_ops = {
.map_alloc_check = fd_array_map_alloc_check, .map_alloc_check = fd_array_map_alloc_check,
@ -1101,6 +1116,7 @@ static void perf_event_fd_array_release(struct bpf_map *map,
static int perf_event_array_map_btf_id; static int perf_event_array_map_btf_id;
const struct bpf_map_ops perf_event_array_map_ops = { const struct bpf_map_ops perf_event_array_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = fd_array_map_alloc_check, .map_alloc_check = fd_array_map_alloc_check,
.map_alloc = array_map_alloc, .map_alloc = array_map_alloc,
.map_free = fd_array_map_free, .map_free = fd_array_map_free,
@ -1137,6 +1153,7 @@ static void cgroup_fd_array_free(struct bpf_map *map)
static int cgroup_array_map_btf_id; static int cgroup_array_map_btf_id;
const struct bpf_map_ops cgroup_array_map_ops = { const struct bpf_map_ops cgroup_array_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = fd_array_map_alloc_check, .map_alloc_check = fd_array_map_alloc_check,
.map_alloc = array_map_alloc, .map_alloc = array_map_alloc,
.map_free = cgroup_fd_array_free, .map_free = cgroup_fd_array_free,

View File

@ -0,0 +1,274 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2019 Facebook
* Copyright 2020 Google LLC.
*/
#include <linux/rculist.h>
#include <linux/list.h>
#include <linux/hash.h>
#include <linux/types.h>
#include <linux/spinlock.h>
#include <linux/bpf.h>
#include <linux/bpf_local_storage.h>
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
#include <linux/bpf_lsm.h>
#include <linux/btf_ids.h>
#include <linux/fdtable.h>
DEFINE_BPF_STORAGE_CACHE(inode_cache);
static struct bpf_local_storage __rcu **
inode_storage_ptr(void *owner)
{
struct inode *inode = owner;
struct bpf_storage_blob *bsb;
bsb = bpf_inode(inode);
if (!bsb)
return NULL;
return &bsb->storage;
}
static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
struct bpf_map *map,
bool cacheit_lockit)
{
struct bpf_local_storage *inode_storage;
struct bpf_local_storage_map *smap;
struct bpf_storage_blob *bsb;
bsb = bpf_inode(inode);
if (!bsb)
return NULL;
inode_storage = rcu_dereference(bsb->storage);
if (!inode_storage)
return NULL;
smap = (struct bpf_local_storage_map *)map;
return bpf_local_storage_lookup(inode_storage, smap, cacheit_lockit);
}
void bpf_inode_storage_free(struct inode *inode)
{
struct bpf_local_storage_elem *selem;
struct bpf_local_storage *local_storage;
bool free_inode_storage = false;
struct bpf_storage_blob *bsb;
struct hlist_node *n;
bsb = bpf_inode(inode);
if (!bsb)
return;
rcu_read_lock();
local_storage = rcu_dereference(bsb->storage);
if (!local_storage) {
rcu_read_unlock();
return;
}
/* Netiher the bpf_prog nor the bpf-map's syscall
* could be modifying the local_storage->list now.
* Thus, no elem can be added-to or deleted-from the
* local_storage->list by the bpf_prog or by the bpf-map's syscall.
*
* It is racing with bpf_local_storage_map_free() alone
* when unlinking elem from the local_storage->list and
* the map's bucket->list.
*/
raw_spin_lock_bh(&local_storage->lock);
hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
/* Always unlink from map before unlinking from
* local_storage.
*/
bpf_selem_unlink_map(selem);
free_inode_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, false);
}
raw_spin_unlock_bh(&local_storage->lock);
rcu_read_unlock();
/* free_inoode_storage should always be true as long as
* local_storage->list was non-empty.
*/
if (free_inode_storage)
kfree_rcu(local_storage, rcu);
}
static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_local_storage_data *sdata;
struct file *f;
int fd;
fd = *(int *)key;
f = fget_raw(fd);
if (!f)
return NULL;
sdata = inode_storage_lookup(f->f_inode, map, true);
fput(f);
return sdata ? sdata->data : NULL;
}
static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
struct bpf_local_storage_data *sdata;
struct file *f;
int fd;
fd = *(int *)key;
f = fget_raw(fd);
if (!f || !inode_storage_ptr(f->f_inode))
return -EBADF;
sdata = bpf_local_storage_update(f->f_inode,
(struct bpf_local_storage_map *)map,
value, map_flags);
fput(f);
return PTR_ERR_OR_ZERO(sdata);
}
static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
{
struct bpf_local_storage_data *sdata;
sdata = inode_storage_lookup(inode, map, false);
if (!sdata)
return -ENOENT;
bpf_selem_unlink(SELEM(sdata));
return 0;
}
static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
{
struct file *f;
int fd, err;
fd = *(int *)key;
f = fget_raw(fd);
if (!f)
return -EBADF;
err = inode_storage_delete(f->f_inode, map);
fput(f);
return err;
}
BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
void *, value, u64, flags)
{
struct bpf_local_storage_data *sdata;
if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
return (unsigned long)NULL;
/* explicitly check that the inode_storage_ptr is not
* NULL as inode_storage_lookup returns NULL in this case and
* bpf_local_storage_update expects the owner to have a
* valid storage pointer.
*/
if (!inode_storage_ptr(inode))
return (unsigned long)NULL;
sdata = inode_storage_lookup(inode, map, true);
if (sdata)
return (unsigned long)sdata->data;
/* This helper must only called from where the inode is gurranteed
* to have a refcount and cannot be freed.
*/
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
sdata = bpf_local_storage_update(
inode, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST);
return IS_ERR(sdata) ? (unsigned long)NULL :
(unsigned long)sdata->data;
}
return (unsigned long)NULL;
}
BPF_CALL_2(bpf_inode_storage_delete,
struct bpf_map *, map, struct inode *, inode)
{
/* This helper must only called from where the inode is gurranteed
* to have a refcount and cannot be freed.
*/
return inode_storage_delete(inode, map);
}
static int notsupp_get_next_key(struct bpf_map *map, void *key,
void *next_key)
{
return -ENOTSUPP;
}
static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
{
struct bpf_local_storage_map *smap;
smap = bpf_local_storage_map_alloc(attr);
if (IS_ERR(smap))
return ERR_CAST(smap);
smap->cache_idx = bpf_local_storage_cache_idx_get(&inode_cache);
return &smap->map;
}
static void inode_storage_map_free(struct bpf_map *map)
{
struct bpf_local_storage_map *smap;
smap = (struct bpf_local_storage_map *)map;
bpf_local_storage_cache_idx_free(&inode_cache, smap->cache_idx);
bpf_local_storage_map_free(smap);
}
static int inode_storage_map_btf_id;
const struct bpf_map_ops inode_storage_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = bpf_local_storage_map_alloc_check,
.map_alloc = inode_storage_map_alloc,
.map_free = inode_storage_map_free,
.map_get_next_key = notsupp_get_next_key,
.map_lookup_elem = bpf_fd_inode_storage_lookup_elem,
.map_update_elem = bpf_fd_inode_storage_update_elem,
.map_delete_elem = bpf_fd_inode_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_btf_name = "bpf_local_storage_map",
.map_btf_id = &inode_storage_map_btf_id,
.map_owner_storage_ptr = inode_storage_ptr,
};
BTF_ID_LIST(bpf_inode_storage_btf_ids)
BTF_ID_UNUSED
BTF_ID(struct, inode)
const struct bpf_func_proto bpf_inode_storage_get_proto = {
.func = bpf_inode_storage_get,
.gpl_only = false,
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
.arg4_type = ARG_ANYTHING,
.btf_id = bpf_inode_storage_btf_ids,
};
const struct bpf_func_proto bpf_inode_storage_delete_proto = {
.func = bpf_inode_storage_delete,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID,
.btf_id = bpf_inode_storage_btf_ids,
};

View File

@ -390,10 +390,68 @@ out_unlock:
return ret; return ret;
} }
static void bpf_iter_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
struct bpf_iter_link *iter_link =
container_of(link, struct bpf_iter_link, link);
bpf_iter_show_fdinfo_t show_fdinfo;
seq_printf(seq,
"target_name:\t%s\n",
iter_link->tinfo->reg_info->target);
show_fdinfo = iter_link->tinfo->reg_info->show_fdinfo;
if (show_fdinfo)
show_fdinfo(&iter_link->aux, seq);
}
static int bpf_iter_link_fill_link_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
struct bpf_iter_link *iter_link =
container_of(link, struct bpf_iter_link, link);
char __user *ubuf = u64_to_user_ptr(info->iter.target_name);
bpf_iter_fill_link_info_t fill_link_info;
u32 ulen = info->iter.target_name_len;
const char *target_name;
u32 target_len;
if (!ulen ^ !ubuf)
return -EINVAL;
target_name = iter_link->tinfo->reg_info->target;
target_len = strlen(target_name);
info->iter.target_name_len = target_len + 1;
if (ubuf) {
if (ulen >= target_len + 1) {
if (copy_to_user(ubuf, target_name, target_len + 1))
return -EFAULT;
} else {
char zero = '\0';
if (copy_to_user(ubuf, target_name, ulen - 1))
return -EFAULT;
if (put_user(zero, ubuf + ulen - 1))
return -EFAULT;
return -ENOSPC;
}
}
fill_link_info = iter_link->tinfo->reg_info->fill_link_info;
if (fill_link_info)
return fill_link_info(&iter_link->aux, info);
return 0;
}
static const struct bpf_link_ops bpf_iter_link_lops = { static const struct bpf_link_ops bpf_iter_link_lops = {
.release = bpf_iter_link_release, .release = bpf_iter_link_release,
.dealloc = bpf_iter_link_dealloc, .dealloc = bpf_iter_link_dealloc,
.update_prog = bpf_iter_link_replace, .update_prog = bpf_iter_link_replace,
.show_fdinfo = bpf_iter_link_show_fdinfo,
.fill_link_info = bpf_iter_link_fill_link_info,
}; };
bool bpf_link_is_iter(struct bpf_link *link) bool bpf_link_is_iter(struct bpf_link *link)

View File

@ -0,0 +1,600 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
#include <linux/rculist.h>
#include <linux/list.h>
#include <linux/hash.h>
#include <linux/types.h>
#include <linux/spinlock.h>
#include <linux/bpf.h>
#include <linux/btf_ids.h>
#include <linux/bpf_local_storage.h>
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
static struct bpf_local_storage_map_bucket *
select_bucket(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *selem)
{
return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
}
static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size)
{
struct bpf_map *map = &smap->map;
if (!map->ops->map_local_storage_charge)
return 0;
return map->ops->map_local_storage_charge(smap, owner, size);
}
static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner,
u32 size)
{
struct bpf_map *map = &smap->map;
if (map->ops->map_local_storage_uncharge)
map->ops->map_local_storage_uncharge(smap, owner, size);
}
static struct bpf_local_storage __rcu **
owner_storage(struct bpf_local_storage_map *smap, void *owner)
{
struct bpf_map *map = &smap->map;
return map->ops->map_owner_storage_ptr(owner);
}
static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->snode);
}
static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->map_node);
}
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
void *value, bool charge_mem)
{
struct bpf_local_storage_elem *selem;
if (charge_mem && mem_charge(smap, owner, smap->elem_size))
return NULL;
selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
if (selem) {
if (value)
memcpy(SDATA(selem)->data, value, smap->map.value_size);
return selem;
}
if (charge_mem)
mem_uncharge(smap, owner, smap->elem_size);
return NULL;
}
/* local_storage->lock must be held and selem->local_storage == local_storage.
* The caller must ensure selem->smap is still valid to be
* dereferenced for its smap->elem_size and smap->cache_idx.
*/
bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem,
bool uncharge_mem)
{
struct bpf_local_storage_map *smap;
bool free_local_storage;
void *owner;
smap = rcu_dereference(SDATA(selem)->smap);
owner = local_storage->owner;
/* All uncharging on the owner must be done first.
* The owner may be freed once the last selem is unlinked
* from local_storage.
*/
if (uncharge_mem)
mem_uncharge(smap, owner, smap->elem_size);
free_local_storage = hlist_is_singular_node(&selem->snode,
&local_storage->list);
if (free_local_storage) {
mem_uncharge(smap, owner, sizeof(struct bpf_local_storage));
local_storage->owner = NULL;
/* After this RCU_INIT, owner may be freed and cannot be used */
RCU_INIT_POINTER(*owner_storage(smap, owner), NULL);
/* local_storage is not freed now. local_storage->lock is
* still held and raw_spin_unlock_bh(&local_storage->lock)
* will be done by the caller.
*
* Although the unlock will be done under
* rcu_read_lock(), it is more intutivie to
* read if kfree_rcu(local_storage, rcu) is done
* after the raw_spin_unlock_bh(&local_storage->lock).
*
* Hence, a "bool free_local_storage" is returned
* to the caller which then calls the kfree_rcu()
* after unlock.
*/
}
hlist_del_init_rcu(&selem->snode);
if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
SDATA(selem))
RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
kfree_rcu(selem, rcu);
return free_local_storage;
}
static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage *local_storage;
bool free_local_storage = false;
if (unlikely(!selem_linked_to_storage(selem)))
/* selem has already been unlinked from sk */
return;
local_storage = rcu_dereference(selem->local_storage);
raw_spin_lock_bh(&local_storage->lock);
if (likely(selem_linked_to_storage(selem)))
free_local_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, true);
raw_spin_unlock_bh(&local_storage->lock);
if (free_local_storage)
kfree_rcu(local_storage, rcu);
}
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem)
{
RCU_INIT_POINTER(selem->local_storage, local_storage);
hlist_add_head(&selem->snode, &local_storage->list);
}
void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
if (unlikely(!selem_linked_to_map(selem)))
/* selem has already be unlinked from smap */
return;
smap = rcu_dereference(SDATA(selem)->smap);
b = select_bucket(smap, selem);
raw_spin_lock_bh(&b->lock);
if (likely(selem_linked_to_map(selem)))
hlist_del_init_rcu(&selem->map_node);
raw_spin_unlock_bh(&b->lock);
}
void bpf_selem_link_map(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
raw_spin_lock_bh(&b->lock);
RCU_INIT_POINTER(SDATA(selem)->smap, smap);
hlist_add_head_rcu(&selem->map_node, &b->list);
raw_spin_unlock_bh(&b->lock);
}
void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
{
/* Always unlink from map before unlinking from local_storage
* because selem will be freed after successfully unlinked from
* the local_storage.
*/
bpf_selem_unlink_map(selem);
__bpf_selem_unlink_storage(selem);
}
struct bpf_local_storage_data *
bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
bool cacheit_lockit)
{
struct bpf_local_storage_data *sdata;
struct bpf_local_storage_elem *selem;
/* Fast path (cache hit) */
sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
if (sdata && rcu_access_pointer(sdata->smap) == smap)
return sdata;
/* Slow path (cache miss) */
hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
if (rcu_access_pointer(SDATA(selem)->smap) == smap)
break;
if (!selem)
return NULL;
sdata = SDATA(selem);
if (cacheit_lockit) {
/* spinlock is needed to avoid racing with the
* parallel delete. Otherwise, publishing an already
* deleted sdata to the cache will become a use-after-free
* problem in the next bpf_local_storage_lookup().
*/
raw_spin_lock_bh(&local_storage->lock);
if (selem_linked_to_storage(selem))
rcu_assign_pointer(local_storage->cache[smap->cache_idx],
sdata);
raw_spin_unlock_bh(&local_storage->lock);
}
return sdata;
}
static int check_flags(const struct bpf_local_storage_data *old_sdata,
u64 map_flags)
{
if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
/* elem already exists */
return -EEXIST;
if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST)
/* elem doesn't exist, cannot update it */
return -ENOENT;
return 0;
}
int bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *first_selem)
{
struct bpf_local_storage *prev_storage, *storage;
struct bpf_local_storage **owner_storage_ptr;
int err;
err = mem_charge(smap, owner, sizeof(*storage));
if (err)
return err;
storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
if (!storage) {
err = -ENOMEM;
goto uncharge;
}
INIT_HLIST_HEAD(&storage->list);
raw_spin_lock_init(&storage->lock);
storage->owner = owner;
bpf_selem_link_storage_nolock(storage, first_selem);
bpf_selem_link_map(smap, first_selem);
owner_storage_ptr =
(struct bpf_local_storage **)owner_storage(smap, owner);
/* Publish storage to the owner.
* Instead of using any lock of the kernel object (i.e. owner),
* cmpxchg will work with any kernel object regardless what
* the running context is, bh, irq...etc.
*
* From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage)
* is protected by the storage->lock. Hence, when freeing
* the owner->storage, the storage->lock must be held before
* setting owner->storage ptr to NULL.
*/
prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
if (unlikely(prev_storage)) {
bpf_selem_unlink_map(first_selem);
err = -EAGAIN;
goto uncharge;
/* Note that even first_selem was linked to smap's
* bucket->list, first_selem can be freed immediately
* (instead of kfree_rcu) because
* bpf_local_storage_map_free() does a
* synchronize_rcu() before walking the bucket->list.
* Hence, no one is accessing selem from the
* bucket->list under rcu_read_lock().
*/
}
return 0;
uncharge:
kfree(storage);
mem_uncharge(smap, owner, sizeof(*storage));
return err;
}
/* sk cannot be going away because it is linking new elem
* to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0).
* Otherwise, it will become a leak (and other memory issues
* during map destruction).
*/
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags)
{
struct bpf_local_storage_data *old_sdata = NULL;
struct bpf_local_storage_elem *selem;
struct bpf_local_storage *local_storage;
int err;
/* BPF_EXIST and BPF_NOEXIST cannot be both set */
if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
/* BPF_F_LOCK can only be used in a value with spin_lock */
unlikely((map_flags & BPF_F_LOCK) &&
!map_value_has_spin_lock(&smap->map)))
return ERR_PTR(-EINVAL);
local_storage = rcu_dereference(*owner_storage(smap, owner));
if (!local_storage || hlist_empty(&local_storage->list)) {
/* Very first elem for the owner */
err = check_flags(NULL, map_flags);
if (err)
return ERR_PTR(err);
selem = bpf_selem_alloc(smap, owner, value, true);
if (!selem)
return ERR_PTR(-ENOMEM);
err = bpf_local_storage_alloc(owner, smap, selem);
if (err) {
kfree(selem);
mem_uncharge(smap, owner, smap->elem_size);
return ERR_PTR(err);
}
return SDATA(selem);
}
if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
/* Hoping to find an old_sdata to do inline update
* such that it can avoid taking the local_storage->lock
* and changing the lists.
*/
old_sdata =
bpf_local_storage_lookup(local_storage, smap, false);
err = check_flags(old_sdata, map_flags);
if (err)
return ERR_PTR(err);
if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
copy_map_value_locked(&smap->map, old_sdata->data,
value, false);
return old_sdata;
}
}
raw_spin_lock_bh(&local_storage->lock);
/* Recheck local_storage->list under local_storage->lock */
if (unlikely(hlist_empty(&local_storage->list))) {
/* A parallel del is happening and local_storage is going
* away. It has just been checked before, so very
* unlikely. Return instead of retry to keep things
* simple.
*/
err = -EAGAIN;
goto unlock_err;
}
old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
err = check_flags(old_sdata, map_flags);
if (err)
goto unlock_err;
if (old_sdata && (map_flags & BPF_F_LOCK)) {
copy_map_value_locked(&smap->map, old_sdata->data, value,
false);
selem = SELEM(old_sdata);
goto unlock;
}
/* local_storage->lock is held. Hence, we are sure
* we can unlink and uncharge the old_sdata successfully
* later. Hence, instead of charging the new selem now
* and then uncharge the old selem later (which may cause
* a potential but unnecessary charge failure), avoid taking
* a charge at all here (the "!old_sdata" check) and the
* old_sdata will not be uncharged later during
* bpf_selem_unlink_storage_nolock().
*/
selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
if (!selem) {
err = -ENOMEM;
goto unlock_err;
}
/* First, link the new selem to the map */
bpf_selem_link_map(smap, selem);
/* Second, link (and publish) the new selem to local_storage */
bpf_selem_link_storage_nolock(local_storage, selem);
/* Third, remove old selem, SELEM(old_sdata) */
if (old_sdata) {
bpf_selem_unlink_map(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
false);
}
unlock:
raw_spin_unlock_bh(&local_storage->lock);
return SDATA(selem);
unlock_err:
raw_spin_unlock_bh(&local_storage->lock);
return ERR_PTR(err);
}
u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
{
u64 min_usage = U64_MAX;
u16 i, res = 0;
spin_lock(&cache->idx_lock);
for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
if (cache->idx_usage_counts[i] < min_usage) {
min_usage = cache->idx_usage_counts[i];
res = i;
/* Found a free cache_idx */
if (!min_usage)
break;
}
}
cache->idx_usage_counts[res]++;
spin_unlock(&cache->idx_lock);
return res;
}
void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
u16 idx)
{
spin_lock(&cache->idx_lock);
cache->idx_usage_counts[idx]--;
spin_unlock(&cache->idx_lock);
}
void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
{
struct bpf_local_storage_elem *selem;
struct bpf_local_storage_map_bucket *b;
unsigned int i;
/* Note that this map might be concurrently cloned from
* bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
* RCU read section to finish before proceeding. New RCU
* read sections should be prevented via bpf_map_inc_not_zero.
*/
synchronize_rcu();
/* bpf prog and the userspace can no longer access this map
* now. No new selem (of this map) can be added
* to the owner->storage or to the map bucket's list.
*
* The elem of this map can be cleaned up here
* or when the storage is freed e.g.
* by bpf_sk_storage_free() during __sk_destruct().
*/
for (i = 0; i < (1U << smap->bucket_log); i++) {
b = &smap->buckets[i];
rcu_read_lock();
/* No one is adding to b->list now */
while ((selem = hlist_entry_safe(
rcu_dereference_raw(hlist_first_rcu(&b->list)),
struct bpf_local_storage_elem, map_node))) {
bpf_selem_unlink(selem);
cond_resched_rcu();
}
rcu_read_unlock();
}
/* While freeing the storage we may still need to access the map.
*
* e.g. when bpf_sk_storage_free() has unlinked selem from the map
* which then made the above while((selem = ...)) loop
* exit immediately.
*
* However, while freeing the storage one still needs to access the
* smap->elem_size to do the uncharging in
* bpf_selem_unlink_storage_nolock().
*
* Hence, wait another rcu grace period for the storage to be freed.
*/
synchronize_rcu();
kvfree(smap->buckets);
kfree(smap);
}
int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
{
if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
!(attr->map_flags & BPF_F_NO_PREALLOC) ||
attr->max_entries ||
attr->key_size != sizeof(int) || !attr->value_size ||
/* Enforce BTF for userspace sk dumping */
!attr->btf_key_type_id || !attr->btf_value_type_id)
return -EINVAL;
if (!bpf_capable())
return -EPERM;
if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
return -E2BIG;
return 0;
}
struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
{
struct bpf_local_storage_map *smap;
unsigned int i;
u32 nbuckets;
u64 cost;
int ret;
smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
if (!smap)
return ERR_PTR(-ENOMEM);
bpf_map_init_from_attr(&smap->map, attr);
nbuckets = roundup_pow_of_two(num_possible_cpus());
/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
nbuckets = max_t(u32, 2, nbuckets);
smap->bucket_log = ilog2(nbuckets);
cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
ret = bpf_map_charge_init(&smap->map.memory, cost);
if (ret < 0) {
kfree(smap);
return ERR_PTR(ret);
}
smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
GFP_USER | __GFP_NOWARN);
if (!smap->buckets) {
bpf_map_charge_finish(&smap->map.memory);
kfree(smap);
return ERR_PTR(-ENOMEM);
}
for (i = 0; i < nbuckets; i++) {
INIT_HLIST_HEAD(&smap->buckets[i].list);
raw_spin_lock_init(&smap->buckets[i].lock);
}
smap->elem_size =
sizeof(struct bpf_local_storage_elem) + attr->value_size;
return smap;
}
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
const struct btf *btf,
const struct btf_type *key_type,
const struct btf_type *value_type)
{
u32 int_data;
if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
return -EINVAL;
int_data = *(u32 *)(key_type + 1);
if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
return -EINVAL;
return 0;
}

View File

@ -11,6 +11,8 @@
#include <linux/bpf_lsm.h> #include <linux/bpf_lsm.h>
#include <linux/kallsyms.h> #include <linux/kallsyms.h>
#include <linux/bpf_verifier.h> #include <linux/bpf_verifier.h>
#include <net/bpf_sk_storage.h>
#include <linux/bpf_local_storage.h>
/* For every LSM hook that allows attachment of BPF programs, declare a nop /* For every LSM hook that allows attachment of BPF programs, declare a nop
* function where a BPF program can be attached. * function where a BPF program can be attached.
@ -45,10 +47,27 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
return 0; return 0;
} }
static const struct bpf_func_proto *
bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_inode_storage_get:
return &bpf_inode_storage_get_proto;
case BPF_FUNC_inode_storage_delete:
return &bpf_inode_storage_delete_proto;
case BPF_FUNC_sk_storage_get:
return &sk_storage_get_btf_proto;
case BPF_FUNC_sk_storage_delete:
return &sk_storage_delete_btf_proto;
default:
return tracing_prog_func_proto(func_id, prog);
}
}
const struct bpf_prog_ops lsm_prog_ops = { const struct bpf_prog_ops lsm_prog_ops = {
}; };
const struct bpf_verifier_ops lsm_verifier_ops = { const struct bpf_verifier_ops lsm_verifier_ops = {
.get_func_proto = tracing_prog_func_proto, .get_func_proto = bpf_lsm_func_proto,
.is_valid_access = btf_ctx_access, .is_valid_access = btf_ctx_access,
}; };

View File

@ -298,8 +298,7 @@ static int check_zero_holes(const struct btf_type *t, void *data)
return -EINVAL; return -EINVAL;
mtype = btf_type_by_id(btf_vmlinux, member->type); mtype = btf_type_by_id(btf_vmlinux, member->type);
mtype = btf_resolve_size(btf_vmlinux, mtype, &msize, mtype = btf_resolve_size(btf_vmlinux, mtype, &msize);
NULL, NULL);
if (IS_ERR(mtype)) if (IS_ERR(mtype))
return PTR_ERR(mtype); return PTR_ERR(mtype);
prev_mend = moff + msize; prev_mend = moff + msize;
@ -396,8 +395,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
u32 msize; u32 msize;
mtype = btf_type_by_id(btf_vmlinux, member->type); mtype = btf_type_by_id(btf_vmlinux, member->type);
mtype = btf_resolve_size(btf_vmlinux, mtype, &msize, mtype = btf_resolve_size(btf_vmlinux, mtype, &msize);
NULL, NULL);
if (IS_ERR(mtype)) { if (IS_ERR(mtype)) {
err = PTR_ERR(mtype); err = PTR_ERR(mtype);
goto reset_unlock; goto reset_unlock;

View File

@ -21,6 +21,8 @@
#include <linux/btf_ids.h> #include <linux/btf_ids.h>
#include <linux/skmsg.h> #include <linux/skmsg.h>
#include <linux/perf_event.h> #include <linux/perf_event.h>
#include <linux/bsearch.h>
#include <linux/btf_ids.h>
#include <net/sock.h> #include <net/sock.h>
/* BTF (BPF Type Format) is the meta data format which describes /* BTF (BPF Type Format) is the meta data format which describes
@ -1079,23 +1081,27 @@ static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
* *type_size: (x * y * sizeof(u32)). Hence, *type_size always * *type_size: (x * y * sizeof(u32)). Hence, *type_size always
* corresponds to the return type. * corresponds to the return type.
* *elem_type: u32 * *elem_type: u32
* *elem_id: id of u32
* *total_nelems: (x * y). Hence, individual elem size is * *total_nelems: (x * y). Hence, individual elem size is
* (*type_size / *total_nelems) * (*type_size / *total_nelems)
* *type_id: id of type if it's changed within the function, 0 if not
* *
* type: is not an array (e.g. const struct X) * type: is not an array (e.g. const struct X)
* return type: type "struct X" * return type: type "struct X"
* *type_size: sizeof(struct X) * *type_size: sizeof(struct X)
* *elem_type: same as return type ("struct X") * *elem_type: same as return type ("struct X")
* *elem_id: 0
* *total_nelems: 1 * *total_nelems: 1
* *type_id: id of type if it's changed within the function, 0 if not
*/ */
const struct btf_type * static const struct btf_type *
btf_resolve_size(const struct btf *btf, const struct btf_type *type, __btf_resolve_size(const struct btf *btf, const struct btf_type *type,
u32 *type_size, const struct btf_type **elem_type, u32 *type_size, const struct btf_type **elem_type,
u32 *total_nelems) u32 *elem_id, u32 *total_nelems, u32 *type_id)
{ {
const struct btf_type *array_type = NULL; const struct btf_type *array_type = NULL;
const struct btf_array *array; const struct btf_array *array = NULL;
u32 i, size, nelems = 1; u32 i, size, nelems = 1, id = 0;
for (i = 0; i < MAX_RESOLVE_DEPTH; i++) { for (i = 0; i < MAX_RESOLVE_DEPTH; i++) {
switch (BTF_INFO_KIND(type->info)) { switch (BTF_INFO_KIND(type->info)) {
@ -1116,6 +1122,7 @@ btf_resolve_size(const struct btf *btf, const struct btf_type *type,
case BTF_KIND_VOLATILE: case BTF_KIND_VOLATILE:
case BTF_KIND_CONST: case BTF_KIND_CONST:
case BTF_KIND_RESTRICT: case BTF_KIND_RESTRICT:
id = type->type;
type = btf_type_by_id(btf, type->type); type = btf_type_by_id(btf, type->type);
break; break;
@ -1146,10 +1153,21 @@ resolved:
*total_nelems = nelems; *total_nelems = nelems;
if (elem_type) if (elem_type)
*elem_type = type; *elem_type = type;
if (elem_id)
*elem_id = array ? array->type : 0;
if (type_id && id)
*type_id = id;
return array_type ? : type; return array_type ? : type;
} }
const struct btf_type *
btf_resolve_size(const struct btf *btf, const struct btf_type *type,
u32 *type_size)
{
return __btf_resolve_size(btf, type, type_size, NULL, NULL, NULL, NULL);
}
/* The input param "type_id" must point to a needs_resolve type */ /* The input param "type_id" must point to a needs_resolve type */
static const struct btf_type *btf_type_id_resolve(const struct btf *btf, static const struct btf_type *btf_type_id_resolve(const struct btf *btf,
u32 *type_id) u32 *type_id)
@ -3870,16 +3888,22 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
return true; return true;
} }
int btf_struct_access(struct bpf_verifier_log *log, enum bpf_struct_walk_result {
/* < 0 error */
WALK_SCALAR = 0,
WALK_PTR,
WALK_STRUCT,
};
static int btf_struct_walk(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size, const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
u32 *next_btf_id) u32 *next_btf_id)
{ {
u32 i, moff, mtrue_end, msize = 0, total_nelems = 0; u32 i, moff, mtrue_end, msize = 0, total_nelems = 0;
const struct btf_type *mtype, *elem_type = NULL; const struct btf_type *mtype, *elem_type = NULL;
const struct btf_member *member; const struct btf_member *member;
const char *tname, *mname; const char *tname, *mname;
u32 vlen; u32 vlen, elem_id, mid;
again: again:
tname = __btf_name_by_offset(btf_vmlinux, t->name_off); tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
@ -3915,14 +3939,13 @@ again:
/* Only allow structure for now, can be relaxed for /* Only allow structure for now, can be relaxed for
* other types later. * other types later.
*/ */
elem_type = btf_type_skip_modifiers(btf_vmlinux, t = btf_type_skip_modifiers(btf_vmlinux, array_elem->type,
array_elem->type, NULL); NULL);
if (!btf_type_is_struct(elem_type)) if (!btf_type_is_struct(t))
goto error; goto error;
off = (off - moff) % elem_type->size; off = (off - moff) % t->size;
return btf_struct_access(log, elem_type, off, size, atype, goto again;
next_btf_id);
error: error:
bpf_log(log, "access beyond struct %s at off %u size %u\n", bpf_log(log, "access beyond struct %s at off %u size %u\n",
@ -3951,7 +3974,7 @@ error:
*/ */
if (off <= moff && if (off <= moff &&
BITS_ROUNDUP_BYTES(end_bit) <= off + size) BITS_ROUNDUP_BYTES(end_bit) <= off + size)
return SCALAR_VALUE; return WALK_SCALAR;
/* off may be accessing a following member /* off may be accessing a following member
* *
@ -3973,11 +3996,13 @@ error:
break; break;
/* type of the field */ /* type of the field */
mid = member->type;
mtype = btf_type_by_id(btf_vmlinux, member->type); mtype = btf_type_by_id(btf_vmlinux, member->type);
mname = __btf_name_by_offset(btf_vmlinux, member->name_off); mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
mtype = btf_resolve_size(btf_vmlinux, mtype, &msize, mtype = __btf_resolve_size(btf_vmlinux, mtype, &msize,
&elem_type, &total_nelems); &elem_type, &elem_id, &total_nelems,
&mid);
if (IS_ERR(mtype)) { if (IS_ERR(mtype)) {
bpf_log(log, "field %s doesn't have size\n", mname); bpf_log(log, "field %s doesn't have size\n", mname);
return -EFAULT; return -EFAULT;
@ -3991,7 +4016,7 @@ error:
if (btf_type_is_array(mtype)) { if (btf_type_is_array(mtype)) {
u32 elem_idx; u32 elem_idx;
/* btf_resolve_size() above helps to /* __btf_resolve_size() above helps to
* linearize a multi-dimensional array. * linearize a multi-dimensional array.
* *
* The logic here is treating an array * The logic here is treating an array
@ -4039,6 +4064,7 @@ error:
elem_idx = (off - moff) / msize; elem_idx = (off - moff) / msize;
moff += elem_idx * msize; moff += elem_idx * msize;
mtype = elem_type; mtype = elem_type;
mid = elem_id;
} }
/* the 'off' we're looking for is either equal to start /* the 'off' we're looking for is either equal to start
@ -4048,6 +4074,12 @@ error:
/* our field must be inside that union or struct */ /* our field must be inside that union or struct */
t = mtype; t = mtype;
/* return if the offset matches the member offset */
if (off == moff) {
*next_btf_id = mid;
return WALK_STRUCT;
}
/* adjust offset we're looking for */ /* adjust offset we're looking for */
off -= moff; off -= moff;
goto again; goto again;
@ -4063,11 +4095,10 @@ error:
mname, moff, tname, off, size); mname, moff, tname, off, size);
return -EACCES; return -EACCES;
} }
stype = btf_type_skip_modifiers(btf_vmlinux, mtype->type, &id); stype = btf_type_skip_modifiers(btf_vmlinux, mtype->type, &id);
if (btf_type_is_struct(stype)) { if (btf_type_is_struct(stype)) {
*next_btf_id = id; *next_btf_id = id;
return PTR_TO_BTF_ID; return WALK_PTR;
} }
} }
@ -4084,12 +4115,84 @@ error:
return -EACCES; return -EACCES;
} }
return SCALAR_VALUE; return WALK_SCALAR;
} }
bpf_log(log, "struct %s doesn't have field at offset %d\n", tname, off); bpf_log(log, "struct %s doesn't have field at offset %d\n", tname, off);
return -EINVAL; return -EINVAL;
} }
int btf_struct_access(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype __maybe_unused,
u32 *next_btf_id)
{
int err;
u32 id;
do {
err = btf_struct_walk(log, t, off, size, &id);
switch (err) {
case WALK_PTR:
/* If we found the pointer or scalar on t+off,
* we're done.
*/
*next_btf_id = id;
return PTR_TO_BTF_ID;
case WALK_SCALAR:
return SCALAR_VALUE;
case WALK_STRUCT:
/* We found nested struct, so continue the search
* by diving in it. At this point the offset is
* aligned with the new type, so set it to 0.
*/
t = btf_type_by_id(btf_vmlinux, id);
off = 0;
break;
default:
/* It's either error or unknown return value..
* scream and leave.
*/
if (WARN_ONCE(err > 0, "unknown btf_struct_walk return value"))
return -EINVAL;
return err;
}
} while (t);
return -EINVAL;
}
bool btf_struct_ids_match(struct bpf_verifier_log *log,
int off, u32 id, u32 need_type_id)
{
const struct btf_type *type;
int err;
/* Are we already done? */
if (need_type_id == id && off == 0)
return true;
again:
type = btf_type_by_id(btf_vmlinux, id);
if (!type)
return false;
err = btf_struct_walk(log, type, off, 1, &id);
if (err != WALK_STRUCT)
return false;
/* We found nested struct object. If it matches
* the requested ID, we're done. Otherwise let's
* continue the search with offset 0 in the new
* type.
*/
if (need_type_id != id) {
off = 0;
goto again;
}
return true;
}
int btf_resolve_helper_id(struct bpf_verifier_log *log, int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int arg) const struct bpf_func_proto *fn, int arg)
{ {
@ -4661,3 +4764,15 @@ u32 btf_id(const struct btf *btf)
{ {
return btf->id; return btf->id;
} }
static int btf_id_cmp_func(const void *a, const void *b)
{
const int *pa = a, *pb = b;
return *pa - *pb;
}
bool btf_id_set_contains(struct btf_id_set *set, u32 id)
{
return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
}

View File

@ -79,8 +79,6 @@ struct bpf_cpu_map {
static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list); static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list);
static int bq_flush_to_queue(struct xdp_bulk_queue *bq);
static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
{ {
u32 value_size = attr->value_size; u32 value_size = attr->value_size;
@ -658,6 +656,7 @@ static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
static int cpu_map_btf_id; static int cpu_map_btf_id;
const struct bpf_map_ops cpu_map_ops = { const struct bpf_map_ops cpu_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = cpu_map_alloc, .map_alloc = cpu_map_alloc,
.map_free = cpu_map_free, .map_free = cpu_map_free,
.map_delete_elem = cpu_map_delete_elem, .map_delete_elem = cpu_map_delete_elem,
@ -669,7 +668,7 @@ const struct bpf_map_ops cpu_map_ops = {
.map_btf_id = &cpu_map_btf_id, .map_btf_id = &cpu_map_btf_id,
}; };
static int bq_flush_to_queue(struct xdp_bulk_queue *bq) static void bq_flush_to_queue(struct xdp_bulk_queue *bq)
{ {
struct bpf_cpu_map_entry *rcpu = bq->obj; struct bpf_cpu_map_entry *rcpu = bq->obj;
unsigned int processed = 0, drops = 0; unsigned int processed = 0, drops = 0;
@ -678,7 +677,7 @@ static int bq_flush_to_queue(struct xdp_bulk_queue *bq)
int i; int i;
if (unlikely(!bq->count)) if (unlikely(!bq->count))
return 0; return;
q = rcpu->queue; q = rcpu->queue;
spin_lock(&q->producer_lock); spin_lock(&q->producer_lock);
@ -701,13 +700,12 @@ static int bq_flush_to_queue(struct xdp_bulk_queue *bq)
/* Feedback loop via tracepoints */ /* Feedback loop via tracepoints */
trace_xdp_cpumap_enqueue(rcpu->map_id, processed, drops, to_cpu); trace_xdp_cpumap_enqueue(rcpu->map_id, processed, drops, to_cpu);
return 0;
} }
/* Runs under RCU-read-side, plus in softirq under NAPI protection. /* Runs under RCU-read-side, plus in softirq under NAPI protection.
* Thus, safe percpu variable access. * Thus, safe percpu variable access.
*/ */
static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
{ {
struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list); struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list);
struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq); struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq);
@ -728,8 +726,6 @@ static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
if (!bq->flush_node.prev) if (!bq->flush_node.prev)
list_add(&bq->flush_node, flush_list); list_add(&bq->flush_node, flush_list);
return 0;
} }
int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,

View File

@ -341,14 +341,14 @@ bool dev_map_can_have_prog(struct bpf_map *map)
return false; return false;
} }
static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
{ {
struct net_device *dev = bq->dev; struct net_device *dev = bq->dev;
int sent = 0, drops = 0, err = 0; int sent = 0, drops = 0, err = 0;
int i; int i;
if (unlikely(!bq->count)) if (unlikely(!bq->count))
return 0; return;
for (i = 0; i < bq->count; i++) { for (i = 0; i < bq->count; i++) {
struct xdp_frame *xdpf = bq->q[i]; struct xdp_frame *xdpf = bq->q[i];
@ -369,7 +369,7 @@ out:
trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err); trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err);
bq->dev_rx = NULL; bq->dev_rx = NULL;
__list_del_clearprev(&bq->flush_node); __list_del_clearprev(&bq->flush_node);
return 0; return;
error: error:
/* If ndo_xdp_xmit fails with an errno, no frames have been /* If ndo_xdp_xmit fails with an errno, no frames have been
* xmit'ed and it's our responsibility to them free all. * xmit'ed and it's our responsibility to them free all.
@ -421,7 +421,7 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
/* Runs under RCU-read-side, plus in softirq under NAPI protection. /* Runs under RCU-read-side, plus in softirq under NAPI protection.
* Thus, safe percpu variable access. * Thus, safe percpu variable access.
*/ */
static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx) struct net_device *dev_rx)
{ {
struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
@ -441,8 +441,6 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
if (!bq->flush_node.prev) if (!bq->flush_node.prev)
list_add(&bq->flush_node, flush_list); list_add(&bq->flush_node, flush_list);
return 0;
} }
static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
@ -462,7 +460,8 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
if (unlikely(!xdpf)) if (unlikely(!xdpf))
return -EOVERFLOW; return -EOVERFLOW;
return bq_enqueue(dev, xdpf, dev_rx); bq_enqueue(dev, xdpf, dev_rx);
return 0;
} }
static struct xdp_buff *dev_map_run_prog(struct net_device *dev, static struct xdp_buff *dev_map_run_prog(struct net_device *dev,
@ -751,6 +750,7 @@ static int dev_map_hash_update_elem(struct bpf_map *map, void *key, void *value,
static int dev_map_btf_id; static int dev_map_btf_id;
const struct bpf_map_ops dev_map_ops = { const struct bpf_map_ops dev_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = dev_map_alloc, .map_alloc = dev_map_alloc,
.map_free = dev_map_free, .map_free = dev_map_free,
.map_get_next_key = dev_map_get_next_key, .map_get_next_key = dev_map_get_next_key,
@ -764,6 +764,7 @@ const struct bpf_map_ops dev_map_ops = {
static int dev_map_hash_map_btf_id; static int dev_map_hash_map_btf_id;
const struct bpf_map_ops dev_map_hash_ops = { const struct bpf_map_ops dev_map_hash_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = dev_map_alloc, .map_alloc = dev_map_alloc,
.map_free = dev_map_free, .map_free = dev_map_free,
.map_get_next_key = dev_map_hash_get_next_key, .map_get_next_key = dev_map_hash_get_next_key,

View File

@ -9,6 +9,7 @@
#include <linux/rculist_nulls.h> #include <linux/rculist_nulls.h>
#include <linux/random.h> #include <linux/random.h>
#include <uapi/linux/btf.h> #include <uapi/linux/btf.h>
#include <linux/rcupdate_trace.h>
#include "percpu_freelist.h" #include "percpu_freelist.h"
#include "bpf_lru_list.h" #include "bpf_lru_list.h"
#include "map_in_map.h" #include "map_in_map.h"
@ -577,8 +578,7 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
struct htab_elem *l; struct htab_elem *l;
u32 hash, key_size; u32 hash, key_size;
/* Must be called with rcu_read_lock. */ WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
WARN_ON_ONCE(!rcu_read_lock_held());
key_size = map->key_size; key_size = map->key_size;
@ -941,7 +941,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
/* unknown flags */ /* unknown flags */
return -EINVAL; return -EINVAL;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
@ -1032,7 +1032,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
/* unknown flags */ /* unknown flags */
return -EINVAL; return -EINVAL;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
@ -1220,7 +1220,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
u32 hash, key_size; u32 hash, key_size;
int ret = -ENOENT; int ret = -ENOENT;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
@ -1252,7 +1252,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
u32 hash, key_size; u32 hash, key_size;
int ret = -ENOENT; int ret = -ENOENT;
WARN_ON_ONCE(!rcu_read_lock_held()); WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
key_size = map->key_size; key_size = map->key_size;
@ -1810,6 +1810,7 @@ static const struct bpf_iter_seq_info iter_seq_info = {
static int htab_map_btf_id; static int htab_map_btf_id;
const struct bpf_map_ops htab_map_ops = { const struct bpf_map_ops htab_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = htab_map_alloc_check, .map_alloc_check = htab_map_alloc_check,
.map_alloc = htab_map_alloc, .map_alloc = htab_map_alloc,
.map_free = htab_map_free, .map_free = htab_map_free,
@ -1827,6 +1828,7 @@ const struct bpf_map_ops htab_map_ops = {
static int htab_lru_map_btf_id; static int htab_lru_map_btf_id;
const struct bpf_map_ops htab_lru_map_ops = { const struct bpf_map_ops htab_lru_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = htab_map_alloc_check, .map_alloc_check = htab_map_alloc_check,
.map_alloc = htab_map_alloc, .map_alloc = htab_map_alloc,
.map_free = htab_map_free, .map_free = htab_map_free,
@ -1947,6 +1949,7 @@ static void htab_percpu_map_seq_show_elem(struct bpf_map *map, void *key,
static int htab_percpu_map_btf_id; static int htab_percpu_map_btf_id;
const struct bpf_map_ops htab_percpu_map_ops = { const struct bpf_map_ops htab_percpu_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = htab_map_alloc_check, .map_alloc_check = htab_map_alloc_check,
.map_alloc = htab_map_alloc, .map_alloc = htab_map_alloc,
.map_free = htab_map_free, .map_free = htab_map_free,
@ -1963,6 +1966,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
static int htab_lru_percpu_map_btf_id; static int htab_lru_percpu_map_btf_id;
const struct bpf_map_ops htab_lru_percpu_map_ops = { const struct bpf_map_ops htab_lru_percpu_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = htab_map_alloc_check, .map_alloc_check = htab_map_alloc_check,
.map_alloc = htab_map_alloc, .map_alloc = htab_map_alloc,
.map_free = htab_map_free, .map_free = htab_map_free,

View File

@ -601,6 +601,28 @@ const struct bpf_func_proto bpf_event_output_data_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO, .arg5_type = ARG_CONST_SIZE_OR_ZERO,
}; };
BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
const void __user *, user_ptr)
{
int ret = copy_from_user(dst, user_ptr, size);
if (unlikely(ret)) {
memset(dst, 0, size);
ret = -EFAULT;
}
return ret;
}
const struct bpf_func_proto bpf_copy_from_user_proto = {
.func = bpf_copy_from_user,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
.arg3_type = ARG_ANYTHING,
};
const struct bpf_func_proto bpf_get_current_task_proto __weak; const struct bpf_func_proto bpf_get_current_task_proto __weak;
const struct bpf_func_proto bpf_probe_read_user_proto __weak; const struct bpf_func_proto bpf_probe_read_user_proto __weak;
const struct bpf_func_proto bpf_probe_read_user_str_proto __weak; const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;

View File

@ -20,6 +20,7 @@
#include <linux/filter.h> #include <linux/filter.h>
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/bpf_trace.h> #include <linux/bpf_trace.h>
#include "preload/bpf_preload.h"
enum bpf_type { enum bpf_type {
BPF_TYPE_UNSPEC = 0, BPF_TYPE_UNSPEC = 0,
@ -369,9 +370,10 @@ static struct dentry *
bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags) bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags)
{ {
/* Dots in names (e.g. "/sys/fs/bpf/foo.bar") are reserved for future /* Dots in names (e.g. "/sys/fs/bpf/foo.bar") are reserved for future
* extensions. * extensions. That allows popoulate_bpffs() create special files.
*/ */
if (strchr(dentry->d_name.name, '.')) if ((dir->i_mode & S_IALLUGO) &&
strchr(dentry->d_name.name, '.'))
return ERR_PTR(-EPERM); return ERR_PTR(-EPERM);
return simple_lookup(dir, dentry, flags); return simple_lookup(dir, dentry, flags);
@ -409,6 +411,27 @@ static const struct inode_operations bpf_dir_iops = {
.unlink = simple_unlink, .unlink = simple_unlink,
}; };
/* pin iterator link into bpffs */
static int bpf_iter_link_pin_kernel(struct dentry *parent,
const char *name, struct bpf_link *link)
{
umode_t mode = S_IFREG | S_IRUSR;
struct dentry *dentry;
int ret;
inode_lock(parent->d_inode);
dentry = lookup_one_len(name, parent, strlen(name));
if (IS_ERR(dentry)) {
inode_unlock(parent->d_inode);
return PTR_ERR(dentry);
}
ret = bpf_mkobj_ops(dentry, mode, link, &bpf_link_iops,
&bpf_iter_fops);
dput(dentry);
inode_unlock(parent->d_inode);
return ret;
}
static int bpf_obj_do_pin(const char __user *pathname, void *raw, static int bpf_obj_do_pin(const char __user *pathname, void *raw,
enum bpf_type type) enum bpf_type type)
{ {
@ -638,6 +661,91 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
return 0; return 0;
} }
struct bpf_preload_ops *bpf_preload_ops;
EXPORT_SYMBOL_GPL(bpf_preload_ops);
static bool bpf_preload_mod_get(void)
{
/* If bpf_preload.ko wasn't loaded earlier then load it now.
* When bpf_preload is built into vmlinux the module's __init
* function will populate it.
*/
if (!bpf_preload_ops) {
request_module("bpf_preload");
if (!bpf_preload_ops)
return false;
}
/* And grab the reference, so the module doesn't disappear while the
* kernel is interacting with the kernel module and its UMD.
*/
if (!try_module_get(bpf_preload_ops->owner)) {
pr_err("bpf_preload module get failed.\n");
return false;
}
return true;
}
static void bpf_preload_mod_put(void)
{
if (bpf_preload_ops)
/* now user can "rmmod bpf_preload" if necessary */
module_put(bpf_preload_ops->owner);
}
static DEFINE_MUTEX(bpf_preload_lock);
static int populate_bpffs(struct dentry *parent)
{
struct bpf_preload_info objs[BPF_PRELOAD_LINKS] = {};
struct bpf_link *links[BPF_PRELOAD_LINKS] = {};
int err = 0, i;
/* grab the mutex to make sure the kernel interactions with bpf_preload
* UMD are serialized
*/
mutex_lock(&bpf_preload_lock);
/* if bpf_preload.ko wasn't built into vmlinux then load it */
if (!bpf_preload_mod_get())
goto out;
if (!bpf_preload_ops->info.tgid) {
/* preload() will start UMD that will load BPF iterator programs */
err = bpf_preload_ops->preload(objs);
if (err)
goto out_put;
for (i = 0; i < BPF_PRELOAD_LINKS; i++) {
links[i] = bpf_link_by_id(objs[i].link_id);
if (IS_ERR(links[i])) {
err = PTR_ERR(links[i]);
goto out_put;
}
}
for (i = 0; i < BPF_PRELOAD_LINKS; i++) {
err = bpf_iter_link_pin_kernel(parent,
objs[i].link_name, links[i]);
if (err)
goto out_put;
/* do not unlink successfully pinned links even
* if later link fails to pin
*/
links[i] = NULL;
}
/* finish() will tell UMD process to exit */
err = bpf_preload_ops->finish();
if (err)
goto out_put;
}
out_put:
bpf_preload_mod_put();
out:
mutex_unlock(&bpf_preload_lock);
for (i = 0; i < BPF_PRELOAD_LINKS && err; i++)
if (!IS_ERR_OR_NULL(links[i]))
bpf_link_put(links[i]);
return err;
}
static int bpf_fill_super(struct super_block *sb, struct fs_context *fc) static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
{ {
static const struct tree_descr bpf_rfiles[] = { { "" } }; static const struct tree_descr bpf_rfiles[] = { { "" } };
@ -654,8 +762,8 @@ static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
inode = sb->s_root->d_inode; inode = sb->s_root->d_inode;
inode->i_op = &bpf_dir_iops; inode->i_op = &bpf_dir_iops;
inode->i_mode &= ~S_IALLUGO; inode->i_mode &= ~S_IALLUGO;
populate_bpffs(sb->s_root);
inode->i_mode |= S_ISVTX | opts->mode; inode->i_mode |= S_ISVTX | opts->mode;
return 0; return 0;
} }
@ -705,6 +813,8 @@ static int __init bpf_init(void)
{ {
int ret; int ret;
mutex_init(&bpf_preload_lock);
ret = sysfs_create_mount_point(fs_kobj, "bpf"); ret = sysfs_create_mount_point(fs_kobj, "bpf");
if (ret) if (ret)
return ret; return ret;

View File

@ -732,6 +732,7 @@ static int trie_check_btf(const struct bpf_map *map,
static int trie_map_btf_id; static int trie_map_btf_id;
const struct bpf_map_ops trie_map_ops = { const struct bpf_map_ops trie_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = trie_alloc, .map_alloc = trie_alloc,
.map_free = trie_free, .map_free = trie_free,
.map_get_next_key = trie_get_next_key, .map_get_next_key = trie_get_next_key,

View File

@ -17,23 +17,17 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
if (IS_ERR(inner_map)) if (IS_ERR(inner_map))
return inner_map; return inner_map;
/* prog_array->aux->{type,jited} is a runtime binding.
* Doing static check alone in the verifier is not enough.
*/
if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE ||
inner_map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE ||
inner_map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
fdput(f);
return ERR_PTR(-ENOTSUPP);
}
/* Does not support >1 level map-in-map */ /* Does not support >1 level map-in-map */
if (inner_map->inner_map_meta) { if (inner_map->inner_map_meta) {
fdput(f); fdput(f);
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
} }
if (!inner_map->ops->map_meta_equal) {
fdput(f);
return ERR_PTR(-ENOTSUPP);
}
if (map_value_has_spin_lock(inner_map)) { if (map_value_has_spin_lock(inner_map)) {
fdput(f); fdput(f);
return ERR_PTR(-ENOTSUPP); return ERR_PTR(-ENOTSUPP);
@ -81,15 +75,14 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
return meta0->map_type == meta1->map_type && return meta0->map_type == meta1->map_type &&
meta0->key_size == meta1->key_size && meta0->key_size == meta1->key_size &&
meta0->value_size == meta1->value_size && meta0->value_size == meta1->value_size &&
meta0->map_flags == meta1->map_flags && meta0->map_flags == meta1->map_flags;
meta0->max_entries == meta1->max_entries;
} }
void *bpf_map_fd_get_ptr(struct bpf_map *map, void *bpf_map_fd_get_ptr(struct bpf_map *map,
struct file *map_file /* not used */, struct file *map_file /* not used */,
int ufd) int ufd)
{ {
struct bpf_map *inner_map; struct bpf_map *inner_map, *inner_map_meta;
struct fd f; struct fd f;
f = fdget(ufd); f = fdget(ufd);
@ -97,7 +90,8 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
if (IS_ERR(inner_map)) if (IS_ERR(inner_map))
return inner_map; return inner_map;
if (bpf_map_meta_equal(map->inner_map_meta, inner_map)) inner_map_meta = map->inner_map_meta;
if (inner_map_meta->ops->map_meta_equal(inner_map_meta, inner_map))
bpf_map_inc(inner_map); bpf_map_inc(inner_map);
else else
inner_map = ERR_PTR(-EINVAL); inner_map = ERR_PTR(-EINVAL);

View File

@ -11,8 +11,6 @@ struct bpf_map;
struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd); struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd);
void bpf_map_meta_free(struct bpf_map *map_meta); void bpf_map_meta_free(struct bpf_map *map_meta);
bool bpf_map_meta_equal(const struct bpf_map *meta0,
const struct bpf_map *meta1);
void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file, void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file,
int ufd); int ufd);
void bpf_map_fd_put_ptr(void *ptr); void bpf_map_fd_put_ptr(void *ptr);

View File

@ -149,6 +149,19 @@ static void bpf_iter_detach_map(struct bpf_iter_aux_info *aux)
bpf_map_put_with_uref(aux->map); bpf_map_put_with_uref(aux->map);
} }
void bpf_iter_map_show_fdinfo(const struct bpf_iter_aux_info *aux,
struct seq_file *seq)
{
seq_printf(seq, "map_id:\t%u\n", aux->map->id);
}
int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux,
struct bpf_link_info *info)
{
info->iter.map.map_id = aux->map->id;
return 0;
}
DEFINE_BPF_ITER_FUNC(bpf_map_elem, struct bpf_iter_meta *meta, DEFINE_BPF_ITER_FUNC(bpf_map_elem, struct bpf_iter_meta *meta,
struct bpf_map *map, void *key, void *value) struct bpf_map *map, void *key, void *value)
@ -156,6 +169,8 @@ static const struct bpf_iter_reg bpf_map_elem_reg_info = {
.target = "bpf_map_elem", .target = "bpf_map_elem",
.attach_target = bpf_iter_attach_map, .attach_target = bpf_iter_attach_map,
.detach_target = bpf_iter_detach_map, .detach_target = bpf_iter_detach_map,
.show_fdinfo = bpf_iter_map_show_fdinfo,
.fill_link_info = bpf_iter_map_fill_link_info,
.ctx_arg_info_size = 2, .ctx_arg_info_size = 2,
.ctx_arg_info = { .ctx_arg_info = {
{ offsetof(struct bpf_iter__bpf_map_elem, key), { offsetof(struct bpf_iter__bpf_map_elem, key),

View File

@ -0,0 +1,26 @@
# SPDX-License-Identifier: GPL-2.0-only
config USERMODE_DRIVER
bool
default n
menuconfig BPF_PRELOAD
bool "Preload BPF file system with kernel specific program and map iterators"
depends on BPF
# The dependency on !COMPILE_TEST prevents it from being enabled
# in allmodconfig or allyesconfig configurations
depends on !COMPILE_TEST
select USERMODE_DRIVER
help
This builds kernel module with several embedded BPF programs that are
pinned into BPF FS mount point as human readable files that are
useful in debugging and introspection of BPF programs and maps.
if BPF_PRELOAD
config BPF_PRELOAD_UMD
tristate "bpf_preload kernel module with user mode driver"
depends on CC_CAN_LINK
depends on m || CC_CAN_LINK_STATIC
default m
help
This builds bpf_preload kernel module with embedded user mode driver.
endif

View File

@ -0,0 +1,23 @@
# SPDX-License-Identifier: GPL-2.0
LIBBPF_SRCS = $(srctree)/tools/lib/bpf/
LIBBPF_A = $(obj)/libbpf.a
LIBBPF_OUT = $(abspath $(obj))
$(LIBBPF_A):
$(Q)$(MAKE) -C $(LIBBPF_SRCS) OUTPUT=$(LIBBPF_OUT)/ $(LIBBPF_OUT)/libbpf.a
userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi \
-I $(srctree)/tools/lib/ -Wno-unused-result
userprogs := bpf_preload_umd
bpf_preload_umd-objs := iterators/iterators.o
bpf_preload_umd-userldlibs := $(LIBBPF_A) -lelf -lz
$(obj)/bpf_preload_umd: $(LIBBPF_A)
$(obj)/bpf_preload_umd_blob.o: $(obj)/bpf_preload_umd
obj-$(CONFIG_BPF_PRELOAD_UMD) += bpf_preload.o
bpf_preload-objs += bpf_preload_kern.o bpf_preload_umd_blob.o

View File

@ -0,0 +1,16 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _BPF_PRELOAD_H
#define _BPF_PRELOAD_H
#include <linux/usermode_driver.h>
#include "iterators/bpf_preload_common.h"
struct bpf_preload_ops {
struct umd_info info;
int (*preload)(struct bpf_preload_info *);
int (*finish)(void);
struct module *owner;
};
extern struct bpf_preload_ops *bpf_preload_ops;
#define BPF_PRELOAD_LINKS 2
#endif

View File

@ -0,0 +1,91 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/init.h>
#include <linux/module.h>
#include <linux/pid.h>
#include <linux/fs.h>
#include <linux/sched/signal.h>
#include "bpf_preload.h"
extern char bpf_preload_umd_start;
extern char bpf_preload_umd_end;
static int preload(struct bpf_preload_info *obj);
static int finish(void);
static struct bpf_preload_ops umd_ops = {
.info.driver_name = "bpf_preload",
.preload = preload,
.finish = finish,
.owner = THIS_MODULE,
};
static int preload(struct bpf_preload_info *obj)
{
int magic = BPF_PRELOAD_START;
loff_t pos = 0;
int i, err;
ssize_t n;
err = fork_usermode_driver(&umd_ops.info);
if (err)
return err;
/* send the start magic to let UMD proceed with loading BPF progs */
n = kernel_write(umd_ops.info.pipe_to_umh,
&magic, sizeof(magic), &pos);
if (n != sizeof(magic))
return -EPIPE;
/* receive bpf_link IDs and names from UMD */
pos = 0;
for (i = 0; i < BPF_PRELOAD_LINKS; i++) {
n = kernel_read(umd_ops.info.pipe_from_umh,
&obj[i], sizeof(*obj), &pos);
if (n != sizeof(*obj))
return -EPIPE;
}
return 0;
}
static int finish(void)
{
int magic = BPF_PRELOAD_END;
struct pid *tgid;
loff_t pos = 0;
ssize_t n;
/* send the last magic to UMD. It will do a normal exit. */
n = kernel_write(umd_ops.info.pipe_to_umh,
&magic, sizeof(magic), &pos);
if (n != sizeof(magic))
return -EPIPE;
tgid = umd_ops.info.tgid;
wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
umd_ops.info.tgid = NULL;
return 0;
}
static int __init load_umd(void)
{
int err;
err = umd_load_blob(&umd_ops.info, &bpf_preload_umd_start,
&bpf_preload_umd_end - &bpf_preload_umd_start);
if (err)
return err;
bpf_preload_ops = &umd_ops;
return err;
}
static void __exit fini_umd(void)
{
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
kill_pid(umd_ops.info.tgid, SIGKILL, 1);
umd_ops.info.tgid = NULL;
umd_unload_blob(&umd_ops.info);
}
late_initcall(load_umd);
module_exit(fini_umd);
MODULE_LICENSE("GPL");

View File

@ -0,0 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
.section .init.rodata, "a"
.global bpf_preload_umd_start
bpf_preload_umd_start:
.incbin "kernel/bpf/preload/bpf_preload_umd"
.global bpf_preload_umd_end
bpf_preload_umd_end:

View File

@ -0,0 +1,2 @@
# SPDX-License-Identifier: GPL-2.0-only
/.output

View File

@ -0,0 +1,57 @@
# SPDX-License-Identifier: GPL-2.0
OUTPUT := .output
CLANG ?= clang
LLC ?= llc
LLVM_STRIP ?= llvm-strip
DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool
BPFTOOL ?= $(DEFAULT_BPFTOOL)
LIBBPF_SRC := $(abspath ../../../../tools/lib/bpf)
BPFOBJ := $(OUTPUT)/libbpf.a
BPF_INCLUDE := $(OUTPUT)
INCLUDES := -I$(OUTPUT) -I$(BPF_INCLUDE) -I$(abspath ../../../../tools/lib) \
-I$(abspath ../../../../tools/include/uapi)
CFLAGS := -g -Wall
abs_out := $(abspath $(OUTPUT))
ifeq ($(V),1)
Q =
msg =
else
Q = @
msg = @printf ' %-8s %s%s\n' "$(1)" "$(notdir $(2))" "$(if $(3), $(3))";
MAKEFLAGS += --no-print-directory
submake_extras := feature_display=0
endif
.DELETE_ON_ERROR:
.PHONY: all clean
all: iterators.skel.h
clean:
$(call msg,CLEAN)
$(Q)rm -rf $(OUTPUT) iterators
iterators.skel.h: $(OUTPUT)/iterators.bpf.o | $(BPFTOOL)
$(call msg,GEN-SKEL,$@)
$(Q)$(BPFTOOL) gen skeleton $< > $@
$(OUTPUT)/iterators.bpf.o: iterators.bpf.c $(BPFOBJ) | $(OUTPUT)
$(call msg,BPF,$@)
$(Q)$(CLANG) -g -O2 -target bpf $(INCLUDES) \
-c $(filter %.c,$^) -o $@ && \
$(LLVM_STRIP) -g $@
$(OUTPUT):
$(call msg,MKDIR,$@)
$(Q)mkdir -p $(OUTPUT)
$(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT)
$(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) \
OUTPUT=$(abspath $(dir $@))/ $(abspath $@)
$(DEFAULT_BPFTOOL):
$(Q)$(MAKE) $(submake_extras) -C ../../../../tools/bpf/bpftool \
prefix= OUTPUT=$(abs_out)/ DESTDIR=$(abs_out) install

View File

@ -0,0 +1,4 @@
WARNING:
If you change "iterators.bpf.c" do "make -j" in this directory to rebuild "iterators.skel.h".
Make sure to have clang 10 installed.
See Documentation/bpf/bpf_devel_QA.rst

View File

@ -0,0 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _BPF_PRELOAD_COMMON_H
#define _BPF_PRELOAD_COMMON_H
#define BPF_PRELOAD_START 0x5555
#define BPF_PRELOAD_END 0xAAAA
struct bpf_preload_info {
char link_name[16];
int link_id;
};
#endif

View File

@ -0,0 +1,114 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
struct seq_file;
struct bpf_iter_meta {
struct seq_file *seq;
__u64 session_id;
__u64 seq_num;
};
struct bpf_map {
__u32 id;
char name[16];
__u32 max_entries;
};
struct bpf_iter__bpf_map {
struct bpf_iter_meta *meta;
struct bpf_map *map;
};
struct btf_type {
__u32 name_off;
};
struct btf_header {
__u32 str_len;
};
struct btf {
const char *strings;
struct btf_type **types;
struct btf_header hdr;
};
struct bpf_prog_aux {
__u32 id;
char name[16];
const char *attach_func_name;
struct bpf_prog *linked_prog;
struct bpf_func_info *func_info;
struct btf *btf;
};
struct bpf_prog {
struct bpf_prog_aux *aux;
};
struct bpf_iter__bpf_prog {
struct bpf_iter_meta *meta;
struct bpf_prog *prog;
};
#pragma clang attribute pop
static const char *get_name(struct btf *btf, long btf_id, const char *fallback)
{
struct btf_type **types, *t;
unsigned int name_off;
const char *str;
if (!btf)
return fallback;
str = btf->strings;
types = btf->types;
bpf_probe_read_kernel(&t, sizeof(t), types + btf_id);
name_off = BPF_CORE_READ(t, name_off);
if (name_off >= btf->hdr.str_len)
return fallback;
return str + name_off;
}
SEC("iter/bpf_map")
int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
{
struct seq_file *seq = ctx->meta->seq;
__u64 seq_num = ctx->meta->seq_num;
struct bpf_map *map = ctx->map;
if (!map)
return 0;
if (seq_num == 0)
BPF_SEQ_PRINTF(seq, " id name max_entries\n");
BPF_SEQ_PRINTF(seq, "%4u %-16s%6d\n", map->id, map->name, map->max_entries);
return 0;
}
SEC("iter/bpf_prog")
int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx)
{
struct seq_file *seq = ctx->meta->seq;
__u64 seq_num = ctx->meta->seq_num;
struct bpf_prog *prog = ctx->prog;
struct bpf_prog_aux *aux;
if (!prog)
return 0;
aux = prog->aux;
if (seq_num == 0)
BPF_SEQ_PRINTF(seq, " id name attached\n");
BPF_SEQ_PRINTF(seq, "%4u %-16s %s %s\n", aux->id,
get_name(aux->btf, aux->func_info[0].type_id, aux->name),
aux->attach_func_name, aux->linked_prog->aux->name);
return 0;
}
char LICENSE[] SEC("license") = "GPL";

View File

@ -0,0 +1,94 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <argp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/resource.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
#include <sys/mount.h>
#include "iterators.skel.h"
#include "bpf_preload_common.h"
int to_kernel = -1;
int from_kernel = 0;
static int send_link_to_kernel(struct bpf_link *link, const char *link_name)
{
struct bpf_preload_info obj = {};
struct bpf_link_info info = {};
__u32 info_len = sizeof(info);
int err;
err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
if (err)
return err;
obj.link_id = info.id;
if (strlen(link_name) >= sizeof(obj.link_name))
return -E2BIG;
strcpy(obj.link_name, link_name);
if (write(to_kernel, &obj, sizeof(obj)) != sizeof(obj))
return -EPIPE;
return 0;
}
int main(int argc, char **argv)
{
struct rlimit rlim = { RLIM_INFINITY, RLIM_INFINITY };
struct iterators_bpf *skel;
int err, magic;
int debug_fd;
debug_fd = open("/dev/console", O_WRONLY | O_NOCTTY | O_CLOEXEC);
if (debug_fd < 0)
return 1;
to_kernel = dup(1);
close(1);
dup(debug_fd);
/* now stdin and stderr point to /dev/console */
read(from_kernel, &magic, sizeof(magic));
if (magic != BPF_PRELOAD_START) {
printf("bad start magic %d\n", magic);
return 1;
}
setrlimit(RLIMIT_MEMLOCK, &rlim);
/* libbpf opens BPF object and loads it into the kernel */
skel = iterators_bpf__open_and_load();
if (!skel) {
/* iterators.skel.h is little endian.
* libbpf doesn't support automatic little->big conversion
* of BPF bytecode yet.
* The program load will fail in such case.
*/
printf("Failed load could be due to wrong endianness\n");
return 1;
}
err = iterators_bpf__attach(skel);
if (err)
goto cleanup;
/* send two bpf_link IDs with names to the kernel */
err = send_link_to_kernel(skel->links.dump_bpf_map, "maps.debug");
if (err)
goto cleanup;
err = send_link_to_kernel(skel->links.dump_bpf_prog, "progs.debug");
if (err)
goto cleanup;
/* The kernel will proceed with pinnging the links in bpffs.
* UMD will wait on read from pipe.
*/
read(from_kernel, &magic, sizeof(magic));
if (magic != BPF_PRELOAD_END) {
printf("bad final magic %d\n", magic);
err = -EINVAL;
}
cleanup:
iterators_bpf__destroy(skel);
return err != 0;
}

View File

@ -0,0 +1,410 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* THIS FILE IS AUTOGENERATED! */
#ifndef __ITERATORS_BPF_SKEL_H__
#define __ITERATORS_BPF_SKEL_H__
#include <stdlib.h>
#include <bpf/libbpf.h>
struct iterators_bpf {
struct bpf_object_skeleton *skeleton;
struct bpf_object *obj;
struct {
struct bpf_map *rodata;
} maps;
struct {
struct bpf_program *dump_bpf_map;
struct bpf_program *dump_bpf_prog;
} progs;
struct {
struct bpf_link *dump_bpf_map;
struct bpf_link *dump_bpf_prog;
} links;
struct iterators_bpf__rodata {
char dump_bpf_map____fmt[35];
char dump_bpf_map____fmt_1[14];
char dump_bpf_prog____fmt[32];
char dump_bpf_prog____fmt_2[17];
} *rodata;
};
static void
iterators_bpf__destroy(struct iterators_bpf *obj)
{
if (!obj)
return;
if (obj->skeleton)
bpf_object__destroy_skeleton(obj->skeleton);
free(obj);
}
static inline int
iterators_bpf__create_skeleton(struct iterators_bpf *obj);
static inline struct iterators_bpf *
iterators_bpf__open_opts(const struct bpf_object_open_opts *opts)
{
struct iterators_bpf *obj;
obj = (typeof(obj))calloc(1, sizeof(*obj));
if (!obj)
return NULL;
if (iterators_bpf__create_skeleton(obj))
goto err;
if (bpf_object__open_skeleton(obj->skeleton, opts))
goto err;
return obj;
err:
iterators_bpf__destroy(obj);
return NULL;
}
static inline struct iterators_bpf *
iterators_bpf__open(void)
{
return iterators_bpf__open_opts(NULL);
}
static inline int
iterators_bpf__load(struct iterators_bpf *obj)
{
return bpf_object__load_skeleton(obj->skeleton);
}
static inline struct iterators_bpf *
iterators_bpf__open_and_load(void)
{
struct iterators_bpf *obj;
obj = iterators_bpf__open();
if (!obj)
return NULL;
if (iterators_bpf__load(obj)) {
iterators_bpf__destroy(obj);
return NULL;
}
return obj;
}
static inline int
iterators_bpf__attach(struct iterators_bpf *obj)
{
return bpf_object__attach_skeleton(obj->skeleton);
}
static inline void
iterators_bpf__detach(struct iterators_bpf *obj)
{
return bpf_object__detach_skeleton(obj->skeleton);
}
static inline int
iterators_bpf__create_skeleton(struct iterators_bpf *obj)
{
struct bpf_object_skeleton *s;
s = (typeof(s))calloc(1, sizeof(*s));
if (!s)
return -1;
obj->skeleton = s;
s->sz = sizeof(*s);
s->name = "iterators_bpf";
s->obj = &obj->obj;
/* maps */
s->map_cnt = 1;
s->map_skel_sz = sizeof(*s->maps);
s->maps = (typeof(s->maps))calloc(s->map_cnt, s->map_skel_sz);
if (!s->maps)
goto err;
s->maps[0].name = "iterator.rodata";
s->maps[0].map = &obj->maps.rodata;
s->maps[0].mmaped = (void **)&obj->rodata;
/* programs */
s->prog_cnt = 2;
s->prog_skel_sz = sizeof(*s->progs);
s->progs = (typeof(s->progs))calloc(s->prog_cnt, s->prog_skel_sz);
if (!s->progs)
goto err;
s->progs[0].name = "dump_bpf_map";
s->progs[0].prog = &obj->progs.dump_bpf_map;
s->progs[0].link = &obj->links.dump_bpf_map;
s->progs[1].name = "dump_bpf_prog";
s->progs[1].prog = &obj->progs.dump_bpf_prog;
s->progs[1].link = &obj->links.dump_bpf_prog;
s->data_sz = 7128;
s->data = (void *)"\
\x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\x18\x18\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0f\0\
\x0e\0\x79\x12\0\0\0\0\0\0\x79\x26\0\0\0\0\0\0\x79\x17\x08\0\0\0\0\0\x15\x07\
\x1a\0\0\0\0\0\x79\x21\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\0\0\0\0\0\0\
\x07\x04\0\0\xe8\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x02\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\xb7\x03\0\0\x23\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\0\x61\x71\0\
\0\0\0\0\0\x7b\x1a\xe8\xff\0\0\0\0\xb7\x01\0\0\x04\0\0\0\xbf\x72\0\0\0\0\0\0\
\x0f\x12\0\0\0\0\0\0\x7b\x2a\xf0\xff\0\0\0\0\x61\x71\x14\0\0\0\0\0\x7b\x1a\xf8\
\xff\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xe8\xff\xff\xff\xbf\x61\0\0\0\0\0\
\0\x18\x02\0\0\x23\0\0\0\0\0\0\0\0\0\0\0\xb7\x03\0\0\x0e\0\0\0\xb7\x05\0\0\x18\
\0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\x79\x12\0\0\0\0\
\0\0\x79\x26\0\0\0\0\0\0\x79\x11\x08\0\0\0\0\0\x15\x01\x3b\0\0\0\0\0\x79\x17\0\
\0\0\0\0\0\x79\x21\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\
\x04\0\0\xd0\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x02\0\0\x31\0\0\0\0\0\0\0\0\0\
\0\0\xb7\x03\0\0\x20\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\0\x7b\x6a\xc8\
\xff\0\0\0\0\x61\x71\0\0\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xb7\x03\0\0\x04\0\0\0\
\xbf\x79\0\0\0\0\0\0\x0f\x39\0\0\0\0\0\0\x79\x71\x28\0\0\0\0\0\x79\x78\x30\0\0\
\0\0\0\x15\x08\x18\0\0\0\0\0\xb7\x02\0\0\0\0\0\0\x0f\x21\0\0\0\0\0\0\x61\x11\
\x04\0\0\0\0\0\x79\x83\x08\0\0\0\0\0\x67\x01\0\0\x03\0\0\0\x0f\x13\0\0\0\0\0\0\
\x79\x86\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\xf8\xff\xff\xff\xb7\x02\0\
\0\x08\0\0\0\x85\0\0\0\x71\0\0\0\xb7\x01\0\0\0\0\0\0\x79\xa3\xf8\xff\0\0\0\0\
\x0f\x13\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\xf4\xff\xff\xff\xb7\x02\0\
\0\x04\0\0\0\x85\0\0\0\x04\0\0\0\xb7\x03\0\0\x04\0\0\0\x61\xa1\xf4\xff\0\0\0\0\
\x61\x82\x10\0\0\0\0\0\x3d\x21\x02\0\0\0\0\0\x0f\x16\0\0\0\0\0\0\xbf\x69\0\0\0\
\0\0\0\x7b\x9a\xd8\xff\0\0\0\0\x79\x71\x18\0\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\
\x79\x71\x20\0\0\0\0\0\x79\x11\0\0\0\0\0\0\x0f\x31\0\0\0\0\0\0\x7b\x1a\xe8\xff\
\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xd0\xff\xff\xff\x79\xa1\xc8\xff\0\0\0\
\0\x18\x02\0\0\x51\0\0\0\0\0\0\0\0\0\0\0\xb7\x03\0\0\x11\0\0\0\xb7\x05\0\0\x20\
\0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\x20\x20\x69\x64\
\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x6d\
\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\
\x73\x25\x36\x64\x0a\0\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\
\x20\x20\x20\x20\x20\x20\x20\x20\x61\x74\x74\x61\x63\x68\x65\x64\x0a\0\x25\x34\
\x75\x20\x25\x2d\x31\x36\x73\x20\x25\x73\x20\x25\x73\x0a\0\x47\x50\x4c\0\x9f\
\xeb\x01\0\x18\0\0\0\0\0\0\0\x1c\x04\0\0\x1c\x04\0\0\0\x05\0\0\0\0\0\0\0\0\0\
\x02\x02\0\0\0\x01\0\0\0\x02\0\0\x04\x10\0\0\0\x13\0\0\0\x03\0\0\0\0\0\0\0\x18\
\0\0\0\x04\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\x08\0\0\0\0\0\0\0\0\0\0\x02\x0d\0\
\0\0\0\0\0\0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x01\0\0\0\x20\0\0\0\0\0\0\x01\x04\
\0\0\0\x20\0\0\x01\x24\0\0\0\x01\0\0\x0c\x05\0\0\0\xa3\0\0\0\x03\0\0\x04\x18\0\
\0\0\xb1\0\0\0\x09\0\0\0\0\0\0\0\xb5\0\0\0\x0b\0\0\0\x40\0\0\0\xc0\0\0\0\x0b\0\
\0\0\x80\0\0\0\0\0\0\0\0\0\0\x02\x0a\0\0\0\xc8\0\0\0\0\0\0\x07\0\0\0\0\xd1\0\0\
\0\0\0\0\x08\x0c\0\0\0\xd7\0\0\0\0\0\0\x01\x08\0\0\0\x40\0\0\0\x98\x01\0\0\x03\
\0\0\x04\x18\0\0\0\xa0\x01\0\0\x0e\0\0\0\0\0\0\0\xa3\x01\0\0\x11\0\0\0\x20\0\0\
\0\xa8\x01\0\0\x0e\0\0\0\xa0\0\0\0\xb4\x01\0\0\0\0\0\x08\x0f\0\0\0\xba\x01\0\0\
\0\0\0\x01\x04\0\0\0\x20\0\0\0\xc7\x01\0\0\0\0\0\x01\x01\0\0\0\x08\0\0\x01\0\0\
\0\0\0\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x10\0\0\0\xcc\x01\0\0\0\0\0\x01\x04\
\0\0\0\x20\0\0\0\0\0\0\0\0\0\0\x02\x14\0\0\0\x30\x02\0\0\x02\0\0\x04\x10\0\0\0\
\x13\0\0\0\x03\0\0\0\0\0\0\0\x43\x02\0\0\x15\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\
\x18\0\0\0\0\0\0\0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x13\0\0\0\x48\x02\0\0\x01\0\
\0\x0c\x16\0\0\0\x94\x02\0\0\x01\0\0\x04\x08\0\0\0\x9d\x02\0\0\x19\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\x02\x1a\0\0\0\xee\x02\0\0\x06\0\0\x04\x38\0\0\0\xa0\x01\0\0\
\x0e\0\0\0\0\0\0\0\xa3\x01\0\0\x11\0\0\0\x20\0\0\0\xfb\x02\0\0\x1b\0\0\0\xc0\0\
\0\0\x0c\x03\0\0\x15\0\0\0\0\x01\0\0\x18\x03\0\0\x1d\0\0\0\x40\x01\0\0\x22\x03\
\0\0\x1e\0\0\0\x80\x01\0\0\0\0\0\0\0\0\0\x02\x1c\0\0\0\0\0\0\0\0\0\0\x0a\x10\0\
\0\0\0\0\0\0\0\0\0\x02\x1f\0\0\0\0\0\0\0\0\0\0\x02\x20\0\0\0\x6c\x03\0\0\x02\0\
\0\x04\x08\0\0\0\x7a\x03\0\0\x0e\0\0\0\0\0\0\0\x83\x03\0\0\x0e\0\0\0\x20\0\0\0\
\x22\x03\0\0\x03\0\0\x04\x18\0\0\0\x8d\x03\0\0\x1b\0\0\0\0\0\0\0\x95\x03\0\0\
\x21\0\0\0\x40\0\0\0\x9b\x03\0\0\x23\0\0\0\x80\0\0\0\0\0\0\0\0\0\0\x02\x22\0\0\
\0\0\0\0\0\0\0\0\x02\x24\0\0\0\x9f\x03\0\0\x01\0\0\x04\x04\0\0\0\xaa\x03\0\0\
\x0e\0\0\0\0\0\0\0\x13\x04\0\0\x01\0\0\x04\x04\0\0\0\x1c\x04\0\0\x0e\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\x23\0\0\0\x92\x04\0\0\0\0\0\
\x0e\x25\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\x0e\0\0\0\
\xa6\x04\0\0\0\0\0\x0e\x27\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\
\x12\0\0\0\x20\0\0\0\xbc\x04\0\0\0\0\0\x0e\x29\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\
\0\0\0\0\x1c\0\0\0\x12\0\0\0\x11\0\0\0\xd1\x04\0\0\0\0\0\x0e\x2b\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x04\0\0\0\xe8\x04\0\0\0\0\0\x0e\
\x2d\0\0\0\x01\0\0\0\xf0\x04\0\0\x04\0\0\x0f\0\0\0\0\x26\0\0\0\0\0\0\0\x23\0\0\
\0\x28\0\0\0\x23\0\0\0\x0e\0\0\0\x2a\0\0\0\x31\0\0\0\x20\0\0\0\x2c\0\0\0\x51\0\
\0\0\x11\0\0\0\xf8\x04\0\0\x01\0\0\x0f\0\0\0\0\x2e\0\0\0\0\0\0\0\x04\0\0\0\0\
\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\x66\x5f\x6d\x61\x70\0\x6d\x65\
\x74\x61\0\x6d\x61\x70\0\x63\x74\x78\0\x69\x6e\x74\0\x64\x75\x6d\x70\x5f\x62\
\x70\x66\x5f\x6d\x61\x70\0\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x6d\x61\x70\0\
\x30\x3a\x30\0\x2f\x77\x2f\x6e\x65\x74\x2d\x6e\x65\x78\x74\x2f\x6b\x65\x72\x6e\
\x65\x6c\x2f\x62\x70\x66\x2f\x70\x72\x65\x6c\x6f\x61\x64\x2f\x69\x74\x65\x72\
\x61\x74\x6f\x72\x73\x2f\x69\x74\x65\x72\x61\x74\x6f\x72\x73\x2e\x62\x70\x66\
\x2e\x63\0\x09\x73\x74\x72\x75\x63\x74\x20\x73\x65\x71\x5f\x66\x69\x6c\x65\x20\
\x2a\x73\x65\x71\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\
\x65\x71\x3b\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x6d\x65\x74\x61\0\x73\x65\
\x71\0\x73\x65\x73\x73\x69\x6f\x6e\x5f\x69\x64\0\x73\x65\x71\x5f\x6e\x75\x6d\0\
\x73\x65\x71\x5f\x66\x69\x6c\x65\0\x5f\x5f\x75\x36\x34\0\x6c\x6f\x6e\x67\x20\
\x6c\x6f\x6e\x67\x20\x75\x6e\x73\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x30\x3a\
\x31\0\x09\x73\x74\x72\x75\x63\x74\x20\x62\x70\x66\x5f\x6d\x61\x70\x20\x2a\x6d\
\x61\x70\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x61\x70\x3b\0\x09\x69\x66\x20\x28\
\x21\x6d\x61\x70\x29\0\x30\x3a\x32\0\x09\x5f\x5f\x75\x36\x34\x20\x73\x65\x71\
\x5f\x6e\x75\x6d\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\
\x65\x71\x5f\x6e\x75\x6d\x3b\0\x09\x69\x66\x20\x28\x73\x65\x71\x5f\x6e\x75\x6d\
\x20\x3d\x3d\x20\x30\x29\0\x09\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\
\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\
\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x6d\x61\x78\x5f\x65\x6e\
\x74\x72\x69\x65\x73\x5c\x6e\x22\x29\x3b\0\x62\x70\x66\x5f\x6d\x61\x70\0\x69\
\x64\0\x6e\x61\x6d\x65\0\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\0\x5f\x5f\
\x75\x33\x32\0\x75\x6e\x73\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x63\x68\x61\
\x72\0\x5f\x5f\x41\x52\x52\x41\x59\x5f\x53\x49\x5a\x45\x5f\x54\x59\x50\x45\x5f\
\x5f\0\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\
\x71\x2c\x20\x22\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x25\x36\x64\x5c\x6e\x22\
\x2c\x20\x6d\x61\x70\x2d\x3e\x69\x64\x2c\x20\x6d\x61\x70\x2d\x3e\x6e\x61\x6d\
\x65\x2c\x20\x6d\x61\x70\x2d\x3e\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\
\x29\x3b\0\x7d\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\x66\x5f\x70\
\x72\x6f\x67\0\x70\x72\x6f\x67\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\
\x6f\x67\0\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x09\x73\x74\
\x72\x75\x63\x74\x20\x62\x70\x66\x5f\x70\x72\x6f\x67\x20\x2a\x70\x72\x6f\x67\
\x20\x3d\x20\x63\x74\x78\x2d\x3e\x70\x72\x6f\x67\x3b\0\x09\x69\x66\x20\x28\x21\
\x70\x72\x6f\x67\x29\0\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x61\x75\x78\0\x09\x61\
\x75\x78\x20\x3d\x20\x70\x72\x6f\x67\x2d\x3e\x61\x75\x78\x3b\0\x09\x09\x42\x50\
\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\
\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\
\x20\x20\x20\x61\x74\x74\x61\x63\x68\x65\x64\x5c\x6e\x22\x29\x3b\0\x62\x70\x66\
\x5f\x70\x72\x6f\x67\x5f\x61\x75\x78\0\x61\x74\x74\x61\x63\x68\x5f\x66\x75\x6e\
\x63\x5f\x6e\x61\x6d\x65\0\x6c\x69\x6e\x6b\x65\x64\x5f\x70\x72\x6f\x67\0\x66\
\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x62\x74\x66\0\x09\x42\x50\x46\x5f\x53\x45\
\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x25\x34\x75\x20\
\x25\x2d\x31\x36\x73\x20\x25\x73\x20\x25\x73\x5c\x6e\x22\x2c\x20\x61\x75\x78\
\x2d\x3e\x69\x64\x2c\0\x30\x3a\x34\0\x30\x3a\x35\0\x09\x69\x66\x20\x28\x21\x62\
\x74\x66\x29\0\x62\x70\x66\x5f\x66\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x69\x6e\
\x73\x6e\x5f\x6f\x66\x66\0\x74\x79\x70\x65\x5f\x69\x64\0\x30\0\x73\x74\x72\x69\
\x6e\x67\x73\0\x74\x79\x70\x65\x73\0\x68\x64\x72\0\x62\x74\x66\x5f\x68\x65\x61\
\x64\x65\x72\0\x73\x74\x72\x5f\x6c\x65\x6e\0\x09\x74\x79\x70\x65\x73\x20\x3d\
\x20\x62\x74\x66\x2d\x3e\x74\x79\x70\x65\x73\x3b\0\x09\x62\x70\x66\x5f\x70\x72\
\x6f\x62\x65\x5f\x72\x65\x61\x64\x5f\x6b\x65\x72\x6e\x65\x6c\x28\x26\x74\x2c\
\x20\x73\x69\x7a\x65\x6f\x66\x28\x74\x29\x2c\x20\x74\x79\x70\x65\x73\x20\x2b\
\x20\x62\x74\x66\x5f\x69\x64\x29\x3b\0\x09\x73\x74\x72\x20\x3d\x20\x62\x74\x66\
\x2d\x3e\x73\x74\x72\x69\x6e\x67\x73\x3b\0\x62\x74\x66\x5f\x74\x79\x70\x65\0\
\x6e\x61\x6d\x65\x5f\x6f\x66\x66\0\x09\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\x3d\
\x20\x42\x50\x46\x5f\x43\x4f\x52\x45\x5f\x52\x45\x41\x44\x28\x74\x2c\x20\x6e\
\x61\x6d\x65\x5f\x6f\x66\x66\x29\x3b\0\x30\x3a\x32\x3a\x30\0\x09\x69\x66\x20\
\x28\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\x3e\x3d\x20\x62\x74\x66\x2d\x3e\x68\
\x64\x72\x2e\x73\x74\x72\x5f\x6c\x65\x6e\x29\0\x09\x72\x65\x74\x75\x72\x6e\x20\
\x73\x74\x72\x20\x2b\x20\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x3b\0\x30\x3a\x33\0\
\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\0\
\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\
\x2e\x31\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\x5f\
\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\
\x5f\x66\x6d\x74\x2e\x32\0\x4c\x49\x43\x45\x4e\x53\x45\0\x2e\x72\x6f\x64\x61\
\x74\x61\0\x6c\x69\x63\x65\x6e\x73\x65\0\x9f\xeb\x01\0\x20\0\0\0\0\0\0\0\x24\0\
\0\0\x24\0\0\0\x44\x02\0\0\x68\x02\0\0\xa4\x01\0\0\x08\0\0\0\x31\0\0\0\x01\0\0\
\0\0\0\0\0\x07\0\0\0\x56\x02\0\0\x01\0\0\0\0\0\0\0\x17\0\0\0\x10\0\0\0\x31\0\0\
\0\x09\0\0\0\0\0\0\0\x42\0\0\0\x7b\0\0\0\x1e\x40\x01\0\x08\0\0\0\x42\0\0\0\x7b\
\0\0\0\x24\x40\x01\0\x10\0\0\0\x42\0\0\0\xf2\0\0\0\x1d\x48\x01\0\x18\0\0\0\x42\
\0\0\0\x13\x01\0\0\x06\x50\x01\0\x20\0\0\0\x42\0\0\0\x22\x01\0\0\x1d\x44\x01\0\
\x28\0\0\0\x42\0\0\0\x47\x01\0\0\x06\x5c\x01\0\x38\0\0\0\x42\0\0\0\x5a\x01\0\0\
\x03\x60\x01\0\x70\0\0\0\x42\0\0\0\xe0\x01\0\0\x02\x68\x01\0\xf0\0\0\0\x42\0\0\
\0\x2e\x02\0\0\x01\x70\x01\0\x56\x02\0\0\x1a\0\0\0\0\0\0\0\x42\0\0\0\x7b\0\0\0\
\x1e\x84\x01\0\x08\0\0\0\x42\0\0\0\x7b\0\0\0\x24\x84\x01\0\x10\0\0\0\x42\0\0\0\
\x64\x02\0\0\x1f\x8c\x01\0\x18\0\0\0\x42\0\0\0\x88\x02\0\0\x06\x98\x01\0\x20\0\
\0\0\x42\0\0\0\xa1\x02\0\0\x0e\xa4\x01\0\x28\0\0\0\x42\0\0\0\x22\x01\0\0\x1d\
\x88\x01\0\x30\0\0\0\x42\0\0\0\x47\x01\0\0\x06\xa8\x01\0\x40\0\0\0\x42\0\0\0\
\xb3\x02\0\0\x03\xac\x01\0\x80\0\0\0\x42\0\0\0\x26\x03\0\0\x02\xb4\x01\0\xb8\0\
\0\0\x42\0\0\0\x61\x03\0\0\x06\x08\x01\0\xd0\0\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\
\xd8\0\0\0\x42\0\0\0\xb2\x03\0\0\x0f\x14\x01\0\xe0\0\0\0\x42\0\0\0\xc7\x03\0\0\
\x2d\x18\x01\0\xf0\0\0\0\x42\0\0\0\xfe\x03\0\0\x0d\x10\x01\0\0\x01\0\0\x42\0\0\
\0\0\0\0\0\0\0\0\0\x08\x01\0\0\x42\0\0\0\xc7\x03\0\0\x02\x18\x01\0\x20\x01\0\0\
\x42\0\0\0\x25\x04\0\0\x0d\x1c\x01\0\x38\x01\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\x40\
\x01\0\0\x42\0\0\0\x25\x04\0\0\x0d\x1c\x01\0\x58\x01\0\0\x42\0\0\0\x25\x04\0\0\
\x0d\x1c\x01\0\x60\x01\0\0\x42\0\0\0\x53\x04\0\0\x1b\x20\x01\0\x68\x01\0\0\x42\
\0\0\0\x53\x04\0\0\x06\x20\x01\0\x70\x01\0\0\x42\0\0\0\x76\x04\0\0\x0d\x28\x01\
\0\x78\x01\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\x80\x01\0\0\x42\0\0\0\x26\x03\0\0\x02\
\xb4\x01\0\xf8\x01\0\0\x42\0\0\0\x2e\x02\0\0\x01\xc4\x01\0\x10\0\0\0\x31\0\0\0\
\x07\0\0\0\0\0\0\0\x02\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\0\0\
\0\0\0\x10\0\0\0\x02\0\0\0\xee\0\0\0\0\0\0\0\x20\0\0\0\x08\0\0\0\x1e\x01\0\0\0\
\0\0\0\x70\0\0\0\x0d\0\0\0\x3e\0\0\0\0\0\0\0\x80\0\0\0\x0d\0\0\0\xee\0\0\0\0\0\
\0\0\xa0\0\0\0\x0d\0\0\0\x1e\x01\0\0\0\0\0\0\x56\x02\0\0\x12\0\0\0\0\0\0\0\x14\
\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\0\0\0\0\0\x10\0\0\0\x14\0\
\0\0\xee\0\0\0\0\0\0\0\x20\0\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\x28\0\0\0\x08\0\0\
\0\x1e\x01\0\0\0\0\0\0\x80\0\0\0\x1a\0\0\0\x3e\0\0\0\0\0\0\0\x90\0\0\0\x1a\0\0\
\0\xee\0\0\0\0\0\0\0\xa8\0\0\0\x1a\0\0\0\x59\x03\0\0\0\0\0\0\xb0\0\0\0\x1a\0\0\
\0\x5d\x03\0\0\0\0\0\0\xc0\0\0\0\x1f\0\0\0\x8b\x03\0\0\0\0\0\0\xd8\0\0\0\x20\0\
\0\0\xee\0\0\0\0\0\0\0\xf0\0\0\0\x20\0\0\0\x3e\0\0\0\0\0\0\0\x18\x01\0\0\x24\0\
\0\0\x3e\0\0\0\0\0\0\0\x50\x01\0\0\x1a\0\0\0\xee\0\0\0\0\0\0\0\x60\x01\0\0\x20\
\0\0\0\x4d\x04\0\0\0\0\0\0\x88\x01\0\0\x1a\0\0\0\x1e\x01\0\0\0\0\0\0\x98\x01\0\
\0\x1a\0\0\0\x8e\x04\0\0\0\0\0\0\xa0\x01\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd6\0\0\0\0\0\x02\0\x70\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\xc8\0\0\0\0\0\x02\0\xf0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\xcf\0\0\0\0\0\x03\0\x78\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xc1\0\0\0\0\0\x03\0\x80\
\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xba\0\0\0\0\0\x03\0\xf8\x01\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\x14\0\0\0\x01\0\x04\0\0\0\0\0\0\0\0\0\x23\0\0\0\0\0\0\0\xf4\0\0\0\
\x01\0\x04\0\x23\0\0\0\0\0\0\0\x0e\0\0\0\0\0\0\0\x28\0\0\0\x01\0\x04\0\x31\0\0\
\0\0\0\0\0\x20\0\0\0\0\0\0\0\xdd\0\0\0\x01\0\x04\0\x51\0\0\0\0\0\0\0\x11\0\0\0\
\0\0\0\0\0\0\0\0\x03\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\x03\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\xb2\0\0\0\x11\0\x05\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\x3d\0\0\0\x12\
\0\x02\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\x5b\0\0\0\x12\0\x03\0\0\0\0\0\0\0\0\
\0\x08\x02\0\0\0\0\0\0\x48\0\0\0\0\0\0\0\x01\0\0\0\x0c\0\0\0\xc8\0\0\0\0\0\0\0\
\x01\0\0\0\x0c\0\0\0\x50\0\0\0\0\0\0\0\x01\0\0\0\x0c\0\0\0\xd0\x01\0\0\0\0\0\0\
\x01\0\0\0\x0c\0\0\0\xf0\x03\0\0\0\0\0\0\x0a\0\0\0\x0c\0\0\0\xfc\x03\0\0\0\0\0\
\0\x0a\0\0\0\x0c\0\0\0\x08\x04\0\0\0\0\0\0\x0a\0\0\0\x0c\0\0\0\x14\x04\0\0\0\0\
\0\0\x0a\0\0\0\x0c\0\0\0\x2c\x04\0\0\0\0\0\0\0\0\0\0\x0d\0\0\0\x2c\0\0\0\0\0\0\
\0\0\0\0\0\x0a\0\0\0\x3c\0\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x50\0\0\0\0\0\0\0\0\0\
\0\0\x0a\0\0\0\x60\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\x70\0\0\0\0\0\0\0\0\0\0\0\
\x0a\0\0\0\x80\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\x90\0\0\0\0\0\0\0\0\0\0\0\x0a\0\
\0\0\xa0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xb0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\
\xc0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xd0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xe8\0\
\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xf8\0\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x08\x01\0\0\
\0\0\0\0\0\0\0\0\x0b\0\0\0\x18\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x28\x01\0\0\0\
\0\0\0\0\0\0\0\x0b\0\0\0\x38\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x48\x01\0\0\0\0\
\0\0\0\0\0\0\x0b\0\0\0\x58\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x68\x01\0\0\0\0\0\
\0\0\0\0\0\x0b\0\0\0\x78\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x88\x01\0\0\0\0\0\0\
\0\0\0\0\x0b\0\0\0\x98\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xa8\x01\0\0\0\0\0\0\0\
\0\0\0\x0b\0\0\0\xb8\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xc8\x01\0\0\0\0\0\0\0\0\
\0\0\x0b\0\0\0\xd8\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xe8\x01\0\0\0\0\0\0\0\0\0\
\0\x0b\0\0\0\xf8\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x08\x02\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x18\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x28\x02\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x38\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x48\x02\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x58\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x68\x02\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x78\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x94\x02\0\0\0\0\0\0\0\0\0\0\
\x0a\0\0\0\xa4\x02\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xb4\x02\0\0\0\0\0\0\0\0\0\0\
\x0a\0\0\0\xc4\x02\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xd4\x02\0\0\0\0\0\0\0\0\0\0\
\x0a\0\0\0\xe4\x02\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xf4\x02\0\0\0\0\0\0\0\0\0\0\
\x0a\0\0\0\x0c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x1c\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x2c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x3c\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x4c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x5c\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x6c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x7c\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x8c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x9c\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\xac\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xbc\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\xcc\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xdc\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\xec\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xfc\x03\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x0c\x04\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x1c\x04\0\0\0\0\0\0\0\0\0\0\
\x0b\0\0\0\x4e\x4f\x41\x42\x43\x44\x4d\0\x2e\x74\x65\x78\x74\0\x2e\x72\x65\x6c\
\x2e\x42\x54\x46\x2e\x65\x78\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\
\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\
\x6f\x67\x2e\x5f\x5f\x5f\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\
\x61\x70\0\x2e\x72\x65\x6c\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x6d\x61\x70\0\
\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x2e\x72\x65\x6c\x69\x74\
\x65\x72\x2f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x2e\x6c\x6c\x76\x6d\x5f\x61\x64\
\x64\x72\x73\x69\x67\0\x6c\x69\x63\x65\x6e\x73\x65\0\x2e\x73\x74\x72\x74\x61\
\x62\0\x2e\x73\x79\x6d\x74\x61\x62\0\x2e\x72\x6f\x64\x61\x74\x61\0\x2e\x72\x65\
\x6c\x2e\x42\x54\x46\0\x4c\x49\x43\x45\x4e\x53\x45\0\x4c\x42\x42\x31\x5f\x37\0\
\x4c\x42\x42\x31\x5f\x36\0\x4c\x42\x42\x30\x5f\x34\0\x4c\x42\x42\x31\x5f\x33\0\
\x4c\x42\x42\x30\x5f\x33\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\
\x2e\x5f\x5f\x5f\x66\x6d\x74\x2e\x32\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\
\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\x2e\x31\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\x4e\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\
\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\x6d\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\x01\0\0\0\0\0\0\x08\
\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xa1\0\0\0\
\x01\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x48\x03\0\0\0\0\0\0\x62\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x89\0\0\0\x01\0\0\0\x03\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xaa\x03\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xad\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\xae\x03\0\0\0\0\0\0\x34\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\xe2\x0c\0\0\0\0\0\0\x2c\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x99\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x10\x11\0\0\0\
\0\0\0\x80\x01\0\0\0\0\0\0\x0e\0\0\0\x0d\0\0\0\x08\0\0\0\0\0\0\0\x18\0\0\0\0\0\
\0\0\x4a\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x90\x12\0\0\0\0\0\0\
\x20\0\0\0\0\0\0\0\x08\0\0\0\x02\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x69\
\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xb0\x12\0\0\0\0\0\0\x20\0\0\0\
\0\0\0\0\x08\0\0\0\x03\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\xa9\0\0\0\x09\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd0\x12\0\0\0\0\0\0\x50\0\0\0\0\0\0\0\
\x08\0\0\0\x06\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x07\0\0\0\x09\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x20\x13\0\0\0\0\0\0\xe0\x03\0\0\0\0\0\0\x08\0\0\
\0\x07\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x7b\0\0\0\x03\x4c\xff\x6f\0\0\
\0\x80\0\0\0\0\0\0\0\0\0\0\0\0\0\x17\0\0\0\0\0\0\x07\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x91\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x07\x17\0\0\0\0\0\0\x0a\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0";
return 0;
err:
bpf_object__destroy_skeleton(s);
return -1;
}
#endif /* __ITERATORS_BPF_SKEL_H__ */

View File

@ -257,6 +257,7 @@ static int queue_stack_map_get_next_key(struct bpf_map *map, void *key,
static int queue_map_btf_id; static int queue_map_btf_id;
const struct bpf_map_ops queue_map_ops = { const struct bpf_map_ops queue_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = queue_stack_map_alloc_check, .map_alloc_check = queue_stack_map_alloc_check,
.map_alloc = queue_stack_map_alloc, .map_alloc = queue_stack_map_alloc,
.map_free = queue_stack_map_free, .map_free = queue_stack_map_free,
@ -273,6 +274,7 @@ const struct bpf_map_ops queue_map_ops = {
static int stack_map_btf_id; static int stack_map_btf_id;
const struct bpf_map_ops stack_map_ops = { const struct bpf_map_ops stack_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = queue_stack_map_alloc_check, .map_alloc_check = queue_stack_map_alloc_check,
.map_alloc = queue_stack_map_alloc, .map_alloc = queue_stack_map_alloc,
.map_free = queue_stack_map_free, .map_free = queue_stack_map_free,

View File

@ -351,6 +351,7 @@ static int reuseport_array_get_next_key(struct bpf_map *map, void *key,
static int reuseport_array_map_btf_id; static int reuseport_array_map_btf_id;
const struct bpf_map_ops reuseport_array_ops = { const struct bpf_map_ops reuseport_array_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc_check = reuseport_array_alloc_check, .map_alloc_check = reuseport_array_alloc_check,
.map_alloc = reuseport_array_alloc, .map_alloc = reuseport_array_alloc,
.map_free = reuseport_array_free, .map_free = reuseport_array_free,

View File

@ -287,6 +287,7 @@ static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp,
static int ringbuf_map_btf_id; static int ringbuf_map_btf_id;
const struct bpf_map_ops ringbuf_map_ops = { const struct bpf_map_ops ringbuf_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = ringbuf_map_alloc, .map_alloc = ringbuf_map_alloc,
.map_free = ringbuf_map_free, .map_free = ringbuf_map_free,
.map_mmap = ringbuf_map_mmap, .map_mmap = ringbuf_map_mmap,

View File

@ -839,6 +839,7 @@ static void stack_map_free(struct bpf_map *map)
static int stack_trace_map_btf_id; static int stack_trace_map_btf_id;
const struct bpf_map_ops stack_trace_map_ops = { const struct bpf_map_ops stack_trace_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = stack_map_alloc, .map_alloc = stack_map_alloc,
.map_free = stack_map_free, .map_free = stack_map_free,
.map_get_next_key = stack_map_get_next_key, .map_get_next_key = stack_map_get_next_key,

View File

@ -29,6 +29,7 @@
#include <linux/bpf_lsm.h> #include <linux/bpf_lsm.h>
#include <linux/poll.h> #include <linux/poll.h>
#include <linux/bpf-netns.h> #include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@ -90,6 +91,7 @@ int bpf_check_uarg_tail_zero(void __user *uaddr,
} }
const struct bpf_map_ops bpf_map_offload_ops = { const struct bpf_map_ops bpf_map_offload_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = bpf_map_offload_map_alloc, .map_alloc = bpf_map_offload_map_alloc,
.map_free = bpf_map_offload_map_free, .map_free = bpf_map_offload_map_free,
.map_check_btf = map_check_no_btf, .map_check_btf = map_check_no_btf,
@ -157,10 +159,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
if (bpf_map_is_dev_bound(map)) { if (bpf_map_is_dev_bound(map)) {
return bpf_map_offload_update_elem(map, key, value, flags); return bpf_map_offload_update_elem(map, key, value, flags);
} else if (map->map_type == BPF_MAP_TYPE_CPUMAP || } else if (map->map_type == BPF_MAP_TYPE_CPUMAP ||
map->map_type == BPF_MAP_TYPE_SOCKHASH ||
map->map_type == BPF_MAP_TYPE_SOCKMAP ||
map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
return map->ops->map_update_elem(map, key, value, flags); return map->ops->map_update_elem(map, key, value, flags);
} else if (map->map_type == BPF_MAP_TYPE_SOCKHASH ||
map->map_type == BPF_MAP_TYPE_SOCKMAP) {
return sock_map_update_elem_sys(map, key, value, flags);
} else if (IS_FD_PROG_ARRAY(map)) { } else if (IS_FD_PROG_ARRAY(map)) {
return bpf_fd_array_map_update_elem(map, f.file, key, value, return bpf_fd_array_map_update_elem(map, f.file, key, value,
flags); flags);
@ -768,7 +771,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
if (map->map_type != BPF_MAP_TYPE_HASH && if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
map->map_type != BPF_MAP_TYPE_SK_STORAGE) map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
return -ENOTSUPP; return -ENOTSUPP;
if (map->spin_lock_off + sizeof(struct bpf_spin_lock) > if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
map->value_size) { map->value_size) {
@ -1728,11 +1732,15 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
btf_put(prog->aux->btf); btf_put(prog->aux->btf);
bpf_prog_free_linfo(prog); bpf_prog_free_linfo(prog);
if (deferred) if (deferred) {
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); if (prog->aux->sleepable)
call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu);
else else
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
} else {
__bpf_prog_put_rcu(&prog->aux->rcu); __bpf_prog_put_rcu(&prog->aux->rcu);
} }
}
static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock) static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
{ {
@ -2101,6 +2109,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT |
BPF_F_ANY_ALIGNMENT | BPF_F_ANY_ALIGNMENT |
BPF_F_TEST_STATE_FREQ | BPF_F_TEST_STATE_FREQ |
BPF_F_SLEEPABLE |
BPF_F_TEST_RND_HI32)) BPF_F_TEST_RND_HI32))
return -EINVAL; return -EINVAL;
@ -2156,6 +2165,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
} }
prog->aux->offload_requested = !!attr->prog_ifindex; prog->aux->offload_requested = !!attr->prog_ifindex;
prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
err = security_bpf_prog_alloc(prog->aux); err = security_bpf_prog_alloc(prog->aux);
if (err) if (err)
@ -4014,9 +4024,31 @@ static int link_detach(union bpf_attr *attr)
return ret; return ret;
} }
static int bpf_link_inc_not_zero(struct bpf_link *link) static struct bpf_link *bpf_link_inc_not_zero(struct bpf_link *link)
{ {
return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? 0 : -ENOENT; return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? link : ERR_PTR(-ENOENT);
}
struct bpf_link *bpf_link_by_id(u32 id)
{
struct bpf_link *link;
if (!id)
return ERR_PTR(-ENOENT);
spin_lock_bh(&link_idr_lock);
/* before link is "settled", ID is 0, pretend it doesn't exist yet */
link = idr_find(&link_idr, id);
if (link) {
if (link->id)
link = bpf_link_inc_not_zero(link);
else
link = ERR_PTR(-EAGAIN);
} else {
link = ERR_PTR(-ENOENT);
}
spin_unlock_bh(&link_idr_lock);
return link;
} }
#define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id #define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id
@ -4025,7 +4057,7 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr)
{ {
struct bpf_link *link; struct bpf_link *link;
u32 id = attr->link_id; u32 id = attr->link_id;
int fd, err; int fd;
if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID))
return -EINVAL; return -EINVAL;
@ -4033,21 +4065,9 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr)
if (!capable(CAP_SYS_ADMIN)) if (!capable(CAP_SYS_ADMIN))
return -EPERM; return -EPERM;
spin_lock_bh(&link_idr_lock); link = bpf_link_by_id(id);
link = idr_find(&link_idr, id); if (IS_ERR(link))
/* before link is "settled", ID is 0, pretend it doesn't exist yet */ return PTR_ERR(link);
if (link) {
if (link->id)
err = bpf_link_inc_not_zero(link);
else
err = -EAGAIN;
} else {
err = -ENOENT;
}
spin_unlock_bh(&link_idr_lock);
if (err)
return err;
fd = bpf_link_new_fd(link); fd = bpf_link_new_fd(link);
if (fd < 0) if (fd < 0)

Some files were not shown because too many files have changed in this diff Show More