Merge branch 'bpf-link-support-for-tc-bpf-programs'

Daniel Borkmann says:

====================
BPF link support for tc BPF programs

This series adds BPF link support for tc BPF programs. We initially
presented the motivation, related work and design at last year's LPC
conference in the networking & BPF track [0], and a recent update on
our progress of the rework during this year's LSF/MM/BPF summit [1].
The main changes are in first two patches and the last two have an
extensive batch of test cases we developed along with it, please see
individual patches for details. We tested this series with tc-testing
selftest suite as well as BPF CI/selftests. Thanks!

v5 -> v6:
  - Remove export symbol on tcx_inc/dec (Jakub)
  - Treat fd==0 as invalid (Stan, Alexei)
v4 -> v5:
  - Updated bpftool docs and usage of bpftool net (Quentin)
  - Consistent dump "prog id"/"link id" -> "prog_id"/"link_id" (Quentin)
  - Reworked bpftool flag output handling (Quentin)
  - LIBBPF_OPTS_RESET() macro with varargs for reinit (Andrii)
  - libbpf opts/link bail out on relative_fd && relative_id (Andrii)
  - libbpf improvements for assigning attr.relative_{id,fd} (Andrii)
  - libbpf sorting in libbpf.map (Andrii)
  - libbpf move ifindex to bpf_program__attach_tcx param (Andrii)
  - libbpf move BPF_F_ID flag handling to bpf_link_create (Andrii)
  - bpf_program_attach_fd with tcx instead of tc (Andrii)
  - Reworking kernel-internal bpf_mprog API (Alexei, Andrii)
  - Change "object" notation to "id_or_fd" (Andrii)
  - Remove on stack cpp[BPF_MPROG_MAX] and switch to memmove (Andrii)
  - Simplify bpf_mprog_{insert,delete} and add comment on internals
  - Get rid of BPF_MPROG_* return codes (Alexei, Andrii)
v3 -> v4:
  - Fix bpftool output to display tcx/{ingress,egress} (Stan)
  - Documentation around API, BPF_MPROG_* return codes and locking
    expectations (Stan, Alexei)
  - Change _after and _before to have the same semantics for return
    value (Alexei)
  - Rework mprog initialization and move allocation/free one layer
    up into tcx to simplify the code (Stan)
  - Add comment on synchronize_rcu and parent->ref (Stan)
  - Add comment on bpf_mprog_pos_() helpers wrt target position (Stan)
v2 -> v3:
  - Removal of BPF_F_FIRST/BPF_F_LAST from control UAPI (Toke, Stan)
  - Along with that full rework of bpf_mprog internals to simplify
    dependency management, looks much nicer now imho
  - Just single bpf_mprog_cp instead of two (Andrii)
  - atomic64_t for revision counter (Andrii)
  - Evaluate target position and reject on conflicts (Andrii)
  - Keep track of actual count in bpf_mprob_bundle (Andrii)
  - Make combo of REPLACE and BEFORE/AFTER work (Andrii)
  - Moved miniq as first struct member (Jamal)
  - Rework tcx_link_attach with regards to rtnl (Jakub, Andrii)
  - Moved wrappers after bpf_prog_detach_ops (Andrii)
  - Removed union for relative_fd and friends for opts and link in
    libbpf (Andrii)
  - Add doc comments to attach/detach/query libbpf APIs (Andrii)
  - Dropped SEC_ATTACHABLE_OPT (Andrii)
  - Add an OPTS_ZEROED check to bpf_link_create (Andrii)
  - Keep opts as the last argument in bpf_program_attach_fd (Andrii)
  - Rework bpf_program_attach_fd (Andrii)
  - Remove OPTS_GET before we checked OPTS_VALID in
    bpf_program__attach_tcx (Andrii)
  - Add `size_t :0;` to prevent compiler from leaving garbage (Andrii)
  - Add helper macro to clear opts structs which I found useful
    when writing tests
  - Rework of both opts and link test cases to accommodate for changes
v1 -> v2:
  - Rework of almost entire series to remove prio from UAPI and switch
    to better control directives BPF_F_FIRST/BPF_F_LAST/BPF_F_BEFORE/
    BPF_F_AFTER (Alexei, Toke, Stan, Andrii)
  - Addition of big test suite to cover all corner cases

  [0] https://lpc.events/event/16/contributions/1353/
  [1] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
====================

Link: https://lore.kernel.org/r/20230719140858.13224-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit is contained in:
Alexei Starovoitov 2023-07-19 10:07:28 -07:00
commit 24cc7564e0
31 changed files with 6082 additions and 254 deletions

View File

@ -3684,6 +3684,7 @@ F: include/linux/filter.h
F: include/linux/tnum.h
F: kernel/bpf/core.c
F: kernel/bpf/dispatcher.c
F: kernel/bpf/mprog.c
F: kernel/bpf/syscall.c
F: kernel/bpf/tnum.c
F: kernel/bpf/trampoline.c
@ -3777,13 +3778,15 @@ L: netdev@vger.kernel.org
S: Maintained
F: kernel/bpf/bpf_struct*
BPF [NETWORKING] (tc BPF, sock_addr)
BPF [NETWORKING] (tcx & tc BPF, sock_addr)
M: Martin KaFai Lau <martin.lau@linux.dev>
M: Daniel Borkmann <daniel@iogearbox.net>
R: John Fastabend <john.fastabend@gmail.com>
L: bpf@vger.kernel.org
L: netdev@vger.kernel.org
S: Maintained
F: include/net/tcx.h
F: kernel/bpf/tcx.c
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c

327
include/linux/bpf_mprog.h Normal file
View File

@ -0,0 +1,327 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2023 Isovalent */
#ifndef __BPF_MPROG_H
#define __BPF_MPROG_H
#include <linux/bpf.h>
/* bpf_mprog framework:
*
* bpf_mprog is a generic layer for multi-program attachment. In-kernel users
* of the bpf_mprog don't need to care about the dependency resolution
* internals, they can just consume it with few API calls. Currently available
* dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of
* a BPF program or BPF link relative to an existing BPF program or BPF link
* inside the multi-program array as well as prepend and append behavior if
* no relative object was specified, see corresponding selftests for concrete
* examples (e.g. tc_links and tc_opts test cases of test_progs).
*
* Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code:
*
* Attach case:
*
* struct bpf_mprog_entry *entry, *entry_new;
* int ret;
*
* // bpf_mprog user-side lock
* // fetch active @entry from attach location
* [...]
* ret = bpf_mprog_attach(entry, &entry_new, [...]);
* if (!ret) {
* if (entry != entry_new) {
* // swap @entry to @entry_new at attach location
* // ensure there are no inflight users of @entry:
* synchronize_rcu();
* }
* bpf_mprog_commit(entry);
* } else {
* // error path, bail out, propagate @ret
* }
* // bpf_mprog user-side unlock
*
* Detach case:
*
* struct bpf_mprog_entry *entry, *entry_new;
* int ret;
*
* // bpf_mprog user-side lock
* // fetch active @entry from attach location
* [...]
* ret = bpf_mprog_detach(entry, &entry_new, [...]);
* if (!ret) {
* // all (*) marked is optional and depends on the use-case
* // whether bpf_mprog_bundle should be freed or not
* if (!bpf_mprog_total(entry_new)) (*)
* entry_new = NULL (*)
* // swap @entry to @entry_new at attach location
* // ensure there are no inflight users of @entry:
* synchronize_rcu();
* bpf_mprog_commit(entry);
* if (!entry_new) (*)
* // free bpf_mprog_bundle (*)
* } else {
* // error path, bail out, propagate @ret
* }
* // bpf_mprog user-side unlock
*
* Query case:
*
* struct bpf_mprog_entry *entry;
* int ret;
*
* // bpf_mprog user-side lock
* // fetch active @entry from attach location
* [...]
* ret = bpf_mprog_query(attr, uattr, entry);
* // bpf_mprog user-side unlock
*
* Data/fast path:
*
* struct bpf_mprog_entry *entry;
* struct bpf_mprog_fp *fp;
* struct bpf_prog *prog;
* int ret = [...];
*
* rcu_read_lock();
* // fetch active @entry from attach location
* [...]
* bpf_mprog_foreach_prog(entry, fp, prog) {
* ret = bpf_prog_run(prog, [...]);
* // process @ret from program
* }
* [...]
* rcu_read_unlock();
*
* bpf_mprog locking considerations:
*
* bpf_mprog_{attach,detach,query}() must be protected by an external lock
* (like RTNL in case of tcx).
*
* bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx
* the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets
* updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of
* the bpf_mprog_bundle.
*
* Fast path accesses the active bpf_mprog_entry within RCU critical section
* (in case of tcx it runs in NAPI which provides RCU protection there,
* other users might need explicit rcu_read_lock()). The bpf_mprog_commit()
* assumes that for the old bpf_mprog_entry there are no inflight users
* anymore.
*
* The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for
* the replacement case where we don't swap the bpf_mprog_entry.
*/
#define bpf_mprog_foreach_tuple(entry, fp, cp, t) \
for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
({ \
t.prog = READ_ONCE(fp->prog); \
t.link = cp->link; \
t.prog; \
}); \
fp++, cp++)
#define bpf_mprog_foreach_prog(entry, fp, p) \
for (fp = &entry->fp_items[0]; \
(p = READ_ONCE(fp->prog)); \
fp++)
#define BPF_MPROG_MAX 64
struct bpf_mprog_fp {
struct bpf_prog *prog;
};
struct bpf_mprog_cp {
struct bpf_link *link;
};
struct bpf_mprog_entry {
struct bpf_mprog_fp fp_items[BPF_MPROG_MAX];
struct bpf_mprog_bundle *parent;
};
struct bpf_mprog_bundle {
struct bpf_mprog_entry a;
struct bpf_mprog_entry b;
struct bpf_mprog_cp cp_items[BPF_MPROG_MAX];
struct bpf_prog *ref;
atomic64_t revision;
u32 count;
};
struct bpf_tuple {
struct bpf_prog *prog;
struct bpf_link *link;
};
static inline struct bpf_mprog_entry *
bpf_mprog_peer(const struct bpf_mprog_entry *entry)
{
if (entry == &entry->parent->a)
return &entry->parent->b;
else
return &entry->parent->a;
}
static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle)
{
BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64));
BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) !=
ARRAY_SIZE(bundle->cp_items));
memset(bundle, 0, sizeof(*bundle));
atomic64_set(&bundle->revision, 1);
bundle->a.parent = bundle;
bundle->b.parent = bundle;
}
static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
{
entry->parent->count++;
}
static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
{
entry->parent->count--;
}
static inline int bpf_mprog_max(void)
{
return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
}
static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
{
int total = entry->parent->count;
WARN_ON_ONCE(total > bpf_mprog_max());
return total;
}
static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry,
struct bpf_prog *prog)
{
const struct bpf_mprog_fp *fp;
const struct bpf_prog *tmp;
bpf_mprog_foreach_prog(entry, fp, tmp) {
if (tmp == prog)
return true;
}
return false;
}
static inline void bpf_mprog_mark_for_release(struct bpf_mprog_entry *entry,
struct bpf_tuple *tuple)
{
WARN_ON_ONCE(entry->parent->ref);
if (!tuple->link)
entry->parent->ref = tuple->prog;
}
static inline void bpf_mprog_complete_release(struct bpf_mprog_entry *entry)
{
/* In the non-link case prog deletions can only drop the reference
* to the prog after the bpf_mprog_entry got swapped and the
* bpf_mprog ensured that there are no inflight users anymore.
*
* Paired with bpf_mprog_mark_for_release().
*/
if (entry->parent->ref) {
bpf_prog_put(entry->parent->ref);
entry->parent->ref = NULL;
}
}
static inline void bpf_mprog_revision_new(struct bpf_mprog_entry *entry)
{
atomic64_inc(&entry->parent->revision);
}
static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry)
{
bpf_mprog_complete_release(entry);
bpf_mprog_revision_new(entry);
}
static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry)
{
return atomic64_read(&entry->parent->revision);
}
static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst,
struct bpf_mprog_entry *src)
{
memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items));
}
static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx)
{
int total = bpf_mprog_total(entry);
memmove(entry->fp_items + idx + 1,
entry->fp_items + idx,
(total - idx) * sizeof(struct bpf_mprog_fp));
memmove(entry->parent->cp_items + idx + 1,
entry->parent->cp_items + idx,
(total - idx) * sizeof(struct bpf_mprog_cp));
}
static inline void bpf_mprog_entry_shrink(struct bpf_mprog_entry *entry, int idx)
{
/* Total array size is needed in this case to enure the NULL
* entry is copied at the end.
*/
int total = ARRAY_SIZE(entry->fp_items);
memmove(entry->fp_items + idx,
entry->fp_items + idx + 1,
(total - idx - 1) * sizeof(struct bpf_mprog_fp));
memmove(entry->parent->cp_items + idx,
entry->parent->cp_items + idx + 1,
(total - idx - 1) * sizeof(struct bpf_mprog_cp));
}
static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx,
struct bpf_mprog_fp **fp,
struct bpf_mprog_cp **cp)
{
*fp = &entry->fp_items[idx];
*cp = &entry->parent->cp_items[idx];
}
static inline void bpf_mprog_write(struct bpf_mprog_fp *fp,
struct bpf_mprog_cp *cp,
struct bpf_tuple *tuple)
{
WRITE_ONCE(fp->prog, tuple->prog);
cp->link = tuple->link;
}
int bpf_mprog_attach(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_prog *prog_new, struct bpf_link *link,
struct bpf_prog *prog_old,
u32 flags, u32 id_or_fd, u64 revision);
int bpf_mprog_detach(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_prog *prog, struct bpf_link *link,
u32 flags, u32 id_or_fd, u64 revision);
int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
struct bpf_mprog_entry *entry);
static inline bool bpf_mprog_supported(enum bpf_prog_type type)
{
switch (type) {
case BPF_PROG_TYPE_SCHED_CLS:
return true;
default:
return false;
}
}
#endif /* __BPF_MPROG_H */

View File

@ -1930,8 +1930,7 @@ enum netdev_ml_priv_type {
*
* @rx_handler: handler for received packets
* @rx_handler_data: XXX: need comments on this one
* @miniq_ingress: ingress/clsact qdisc specific data for
* ingress processing
* @tcx_ingress: BPF & clsact qdisc specific data for ingress processing
* @ingress_queue: XXX: need comments on this one
* @nf_hooks_ingress: netfilter hooks executed for ingress packets
* @broadcast: hw bcast address
@ -1952,8 +1951,7 @@ enum netdev_ml_priv_type {
* @xps_maps: all CPUs/RXQs maps for XPS device
*
* @xps_maps: XXX: need comments on this one
* @miniq_egress: clsact qdisc specific data for
* egress processing
* @tcx_egress: BPF & clsact qdisc specific data for egress processing
* @nf_hooks_egress: netfilter hooks executed for egress packets
* @qdisc_hash: qdisc hash table
* @watchdog_timeo: Represents the timeout that is used by
@ -2253,9 +2251,8 @@ struct net_device {
unsigned int xdp_zc_max_segs;
rx_handler_func_t __rcu *rx_handler;
void __rcu *rx_handler_data;
#ifdef CONFIG_NET_CLS_ACT
struct mini_Qdisc __rcu *miniq_ingress;
#ifdef CONFIG_NET_XGRESS
struct bpf_mprog_entry __rcu *tcx_ingress;
#endif
struct netdev_queue __rcu *ingress_queue;
#ifdef CONFIG_NETFILTER_INGRESS
@ -2283,8 +2280,8 @@ struct net_device {
#ifdef CONFIG_XPS
struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX];
#endif
#ifdef CONFIG_NET_CLS_ACT
struct mini_Qdisc __rcu *miniq_egress;
#ifdef CONFIG_NET_XGRESS
struct bpf_mprog_entry __rcu *tcx_egress;
#endif
#ifdef CONFIG_NETFILTER_EGRESS
struct nf_hook_entries __rcu *nf_hooks_egress;

View File

@ -944,7 +944,7 @@ struct sk_buff {
__u8 __mono_tc_offset[0];
/* public: */
__u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
#ifdef CONFIG_NET_CLS_ACT
#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
#endif
@ -993,7 +993,7 @@ struct sk_buff {
__u8 csum_not_inet:1;
#endif
#ifdef CONFIG_NET_SCHED
#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
__u16 tc_index; /* traffic control index */
#endif

View File

@ -703,7 +703,7 @@ int skb_do_redirect(struct sk_buff *);
static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
{
#ifdef CONFIG_NET_CLS_ACT
#ifdef CONFIG_NET_XGRESS
return skb->tc_at_ingress;
#else
return false;

206
include/net/tcx.h Normal file
View File

@ -0,0 +1,206 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2023 Isovalent */
#ifndef __NET_TCX_H
#define __NET_TCX_H
#include <linux/bpf.h>
#include <linux/bpf_mprog.h>
#include <net/sch_generic.h>
struct mini_Qdisc;
struct tcx_entry {
struct mini_Qdisc __rcu *miniq;
struct bpf_mprog_bundle bundle;
bool miniq_active;
struct rcu_head rcu;
};
struct tcx_link {
struct bpf_link link;
struct net_device *dev;
u32 location;
};
static inline void tcx_set_ingress(struct sk_buff *skb, bool ingress)
{
#ifdef CONFIG_NET_XGRESS
skb->tc_at_ingress = ingress;
#endif
}
#ifdef CONFIG_NET_XGRESS
static inline struct tcx_entry *tcx_entry(struct bpf_mprog_entry *entry)
{
struct bpf_mprog_bundle *bundle = entry->parent;
return container_of(bundle, struct tcx_entry, bundle);
}
static inline struct tcx_link *tcx_link(struct bpf_link *link)
{
return container_of(link, struct tcx_link, link);
}
static inline const struct tcx_link *tcx_link_const(const struct bpf_link *link)
{
return tcx_link((struct bpf_link *)link);
}
void tcx_inc(void);
void tcx_dec(void);
static inline void tcx_entry_sync(void)
{
/* bpf_mprog_entry got a/b swapped, therefore ensure that
* there are no inflight users on the old one anymore.
*/
synchronize_rcu();
}
static inline void
tcx_entry_update(struct net_device *dev, struct bpf_mprog_entry *entry,
bool ingress)
{
ASSERT_RTNL();
if (ingress)
rcu_assign_pointer(dev->tcx_ingress, entry);
else
rcu_assign_pointer(dev->tcx_egress, entry);
}
static inline struct bpf_mprog_entry *
tcx_entry_fetch(struct net_device *dev, bool ingress)
{
ASSERT_RTNL();
if (ingress)
return rcu_dereference_rtnl(dev->tcx_ingress);
else
return rcu_dereference_rtnl(dev->tcx_egress);
}
static inline struct bpf_mprog_entry *tcx_entry_create(void)
{
struct tcx_entry *tcx = kzalloc(sizeof(*tcx), GFP_KERNEL);
if (tcx) {
bpf_mprog_bundle_init(&tcx->bundle);
return &tcx->bundle.a;
}
return NULL;
}
static inline void tcx_entry_free(struct bpf_mprog_entry *entry)
{
kfree_rcu(tcx_entry(entry), rcu);
}
static inline struct bpf_mprog_entry *
tcx_entry_fetch_or_create(struct net_device *dev, bool ingress, bool *created)
{
struct bpf_mprog_entry *entry = tcx_entry_fetch(dev, ingress);
*created = false;
if (!entry) {
entry = tcx_entry_create();
if (!entry)
return NULL;
*created = true;
}
return entry;
}
static inline void tcx_skeys_inc(bool ingress)
{
tcx_inc();
if (ingress)
net_inc_ingress_queue();
else
net_inc_egress_queue();
}
static inline void tcx_skeys_dec(bool ingress)
{
if (ingress)
net_dec_ingress_queue();
else
net_dec_egress_queue();
tcx_dec();
}
static inline void tcx_miniq_set_active(struct bpf_mprog_entry *entry,
const bool active)
{
ASSERT_RTNL();
tcx_entry(entry)->miniq_active = active;
}
static inline bool tcx_entry_is_active(struct bpf_mprog_entry *entry)
{
ASSERT_RTNL();
return bpf_mprog_total(entry) || tcx_entry(entry)->miniq_active;
}
static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb,
int code)
{
switch (code) {
case TCX_PASS:
skb->tc_index = qdisc_skb_cb(skb)->tc_classid;
fallthrough;
case TCX_DROP:
case TCX_REDIRECT:
return code;
case TCX_NEXT:
default:
return TCX_NEXT;
}
}
#endif /* CONFIG_NET_XGRESS */
#if defined(CONFIG_NET_XGRESS) && defined(CONFIG_BPF_SYSCALL)
int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog);
void tcx_uninstall(struct net_device *dev, bool ingress);
int tcx_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr);
static inline void dev_tcx_uninstall(struct net_device *dev)
{
ASSERT_RTNL();
tcx_uninstall(dev, true);
tcx_uninstall(dev, false);
}
#else
static inline int tcx_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
return -EINVAL;
}
static inline int tcx_link_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
return -EINVAL;
}
static inline int tcx_prog_detach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
return -EINVAL;
}
static inline int tcx_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
return -EINVAL;
}
static inline void dev_tcx_uninstall(struct net_device *dev)
{
}
#endif /* CONFIG_NET_XGRESS && CONFIG_BPF_SYSCALL */
#endif /* __NET_TCX_H */

View File

@ -1036,6 +1036,8 @@ enum bpf_attach_type {
BPF_LSM_CGROUP,
BPF_STRUCT_OPS,
BPF_NETFILTER,
BPF_TCX_INGRESS,
BPF_TCX_EGRESS,
__MAX_BPF_ATTACH_TYPE
};
@ -1053,7 +1055,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_KPROBE_MULTI = 8,
BPF_LINK_TYPE_STRUCT_OPS = 9,
BPF_LINK_TYPE_NETFILTER = 10,
BPF_LINK_TYPE_TCX = 11,
MAX_BPF_LINK_TYPE,
};
@ -1113,7 +1115,12 @@ enum bpf_perf_event_type {
*/
#define BPF_F_ALLOW_OVERRIDE (1U << 0)
#define BPF_F_ALLOW_MULTI (1U << 1)
/* Generic attachment flags. */
#define BPF_F_REPLACE (1U << 2)
#define BPF_F_BEFORE (1U << 3)
#define BPF_F_AFTER (1U << 4)
#define BPF_F_ID (1U << 5)
#define BPF_F_LINK BPF_F_LINK /* 1 << 13 */
/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
* verifier will perform strict alignment checking as if the kernel
@ -1444,14 +1451,19 @@ union bpf_attr {
};
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
__u32 target_fd; /* container object to attach to */
__u32 attach_bpf_fd; /* eBPF program to attach */
union {
__u32 target_fd; /* target object to attach to or ... */
__u32 target_ifindex; /* target ifindex */
};
__u32 attach_bpf_fd;
__u32 attach_type;
__u32 attach_flags;
__u32 replace_bpf_fd; /* previously attached eBPF
* program to replace if
* BPF_F_REPLACE is used
*/
__u32 replace_bpf_fd;
union {
__u32 relative_fd;
__u32 relative_id;
};
__u64 expected_revision;
};
struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
@ -1497,16 +1509,26 @@ union bpf_attr {
} info;
struct { /* anonymous struct used by BPF_PROG_QUERY command */
__u32 target_fd; /* container object to query */
union {
__u32 target_fd; /* target object to query or ... */
__u32 target_ifindex; /* target ifindex */
};
__u32 attach_type;
__u32 query_flags;
__u32 attach_flags;
__aligned_u64 prog_ids;
__u32 prog_cnt;
union {
__u32 prog_cnt;
__u32 count;
};
__u32 :32;
/* output: per-program attach_flags.
* not allowed to be set during effective query.
*/
__aligned_u64 prog_attach_flags;
__aligned_u64 link_ids;
__aligned_u64 link_attach_flags;
__u64 revision;
} query;
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
@ -1549,13 +1571,13 @@ union bpf_attr {
__u32 map_fd; /* struct_ops to attach */
};
union {
__u32 target_fd; /* object to attach to */
__u32 target_ifindex; /* target ifindex */
__u32 target_fd; /* target object to attach to or ... */
__u32 target_ifindex; /* target ifindex */
};
__u32 attach_type; /* attach type */
__u32 flags; /* extra flags */
union {
__u32 target_btf_id; /* btf_id of target to attach to */
__u32 target_btf_id; /* btf_id of target to attach to */
struct {
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
@ -1589,6 +1611,13 @@ union bpf_attr {
__s32 priority;
__u32 flags;
} netfilter;
struct {
union {
__u32 relative_fd;
__u32 relative_id;
};
__u64 expected_revision;
} tcx;
};
} link_create;
@ -6197,6 +6226,19 @@ struct bpf_sock_tuple {
};
};
/* (Simplified) user return codes for tcx prog type.
* A valid tcx program must return one of these defined values. All other
* return codes are reserved for future use. Must remain compatible with
* their TC_ACT_* counter-parts. For compatibility in behavior, unknown
* return codes are mapped to TCX_NEXT.
*/
enum tcx_action_base {
TCX_NEXT = -1,
TCX_PASS = 0,
TCX_DROP = 2,
TCX_REDIRECT = 7,
};
struct bpf_xdp_sock {
__u32 queue_id;
};
@ -6479,6 +6521,10 @@ struct bpf_link_info {
} event; /* BPF_PERF_EVENT_EVENT */
};
} perf_event;
struct {
__u32 ifindex;
__u32 attach_type;
} tcx;
};
} __attribute__((aligned(8)));

View File

@ -31,6 +31,7 @@ config BPF_SYSCALL
select TASKS_TRACE_RCU
select BINARY_PRINTF
select NET_SOCK_MSG if NET
select NET_XGRESS if NET
select PAGE_POOL if NET
default n
help

View File

@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
obj-$(CONFIG_BPF_JIT) += dispatcher.o
@ -21,6 +21,7 @@ obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
obj-$(CONFIG_BPF_SYSCALL) += offload.o
obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
obj-$(CONFIG_BPF_SYSCALL) += tcx.o
endif
ifeq ($(CONFIG_PERF_EVENTS),y)
obj-$(CONFIG_BPF_SYSCALL) += stackmap.o

445
kernel/bpf/mprog.c Normal file
View File

@ -0,0 +1,445 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Isovalent */
#include <linux/bpf.h>
#include <linux/bpf_mprog.h>
static int bpf_mprog_link(struct bpf_tuple *tuple,
u32 id_or_fd, u32 flags,
enum bpf_prog_type type)
{
struct bpf_link *link = ERR_PTR(-EINVAL);
bool id = flags & BPF_F_ID;
if (id)
link = bpf_link_by_id(id_or_fd);
else if (id_or_fd)
link = bpf_link_get_from_fd(id_or_fd);
if (IS_ERR(link))
return PTR_ERR(link);
if (type && link->prog->type != type) {
bpf_link_put(link);
return -EINVAL;
}
tuple->link = link;
tuple->prog = link->prog;
return 0;
}
static int bpf_mprog_prog(struct bpf_tuple *tuple,
u32 id_or_fd, u32 flags,
enum bpf_prog_type type)
{
struct bpf_prog *prog = ERR_PTR(-EINVAL);
bool id = flags & BPF_F_ID;
if (id)
prog = bpf_prog_by_id(id_or_fd);
else if (id_or_fd)
prog = bpf_prog_get(id_or_fd);
if (IS_ERR(prog))
return PTR_ERR(prog);
if (type && prog->type != type) {
bpf_prog_put(prog);
return -EINVAL;
}
tuple->link = NULL;
tuple->prog = prog;
return 0;
}
static int bpf_mprog_tuple_relative(struct bpf_tuple *tuple,
u32 id_or_fd, u32 flags,
enum bpf_prog_type type)
{
bool link = flags & BPF_F_LINK;
bool id = flags & BPF_F_ID;
memset(tuple, 0, sizeof(*tuple));
if (link)
return bpf_mprog_link(tuple, id_or_fd, flags, type);
/* If no relevant flag is set and no id_or_fd was passed, then
* tuple link/prog is just NULLed. This is the case when before/
* after selects first/last position without passing fd.
*/
if (!id && !id_or_fd)
return 0;
return bpf_mprog_prog(tuple, id_or_fd, flags, type);
}
static void bpf_mprog_tuple_put(struct bpf_tuple *tuple)
{
if (tuple->link)
bpf_link_put(tuple->link);
else if (tuple->prog)
bpf_prog_put(tuple->prog);
}
/* The bpf_mprog_{replace,delete}() operate on exact idx position with the
* one exception that for deletion we support delete from front/back. In
* case of front idx is -1, in case of back idx is bpf_mprog_total(entry).
* Adjustment to first and last entry is trivial. The bpf_mprog_insert()
* we have to deal with the following cases:
*
* idx + before:
*
* Insert P4 before P3: idx for old array is 1, idx for new array is 2,
* hence we adjust target idx for the new array, so that memmove copies
* P1 and P2 to the new entry, and we insert P4 into idx 2. Inserting
* before P1 would have old idx -1 and new idx 0.
*
* +--+--+--+ +--+--+--+--+ +--+--+--+--+
* |P1|P2|P3| ==> |P1|P2| |P3| ==> |P1|P2|P4|P3|
* +--+--+--+ +--+--+--+--+ +--+--+--+--+
*
* idx + after:
*
* Insert P4 after P2: idx for old array is 2, idx for new array is 2.
* Again, memmove copies P1 and P2 to the new entry, and we insert P4
* into idx 2. Inserting after P3 would have both old/new idx at 4 aka
* bpf_mprog_total(entry).
*
* +--+--+--+ +--+--+--+--+ +--+--+--+--+
* |P1|P2|P3| ==> |P1|P2| |P3| ==> |P1|P2|P4|P3|
* +--+--+--+ +--+--+--+--+ +--+--+--+--+
*/
static int bpf_mprog_replace(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_tuple *ntuple, int idx)
{
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
struct bpf_prog *oprog;
bpf_mprog_read(entry, idx, &fp, &cp);
oprog = READ_ONCE(fp->prog);
bpf_mprog_write(fp, cp, ntuple);
if (!ntuple->link) {
WARN_ON_ONCE(cp->link);
bpf_prog_put(oprog);
}
*entry_new = entry;
return 0;
}
static int bpf_mprog_insert(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_tuple *ntuple, int idx, u32 flags)
{
int total = bpf_mprog_total(entry);
struct bpf_mprog_entry *peer;
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
peer = bpf_mprog_peer(entry);
bpf_mprog_entry_copy(peer, entry);
if (idx == total)
goto insert;
else if (flags & BPF_F_BEFORE)
idx += 1;
bpf_mprog_entry_grow(peer, idx);
insert:
bpf_mprog_read(peer, idx, &fp, &cp);
bpf_mprog_write(fp, cp, ntuple);
bpf_mprog_inc(peer);
*entry_new = peer;
return 0;
}
static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_tuple *dtuple, int idx)
{
int total = bpf_mprog_total(entry);
struct bpf_mprog_entry *peer;
peer = bpf_mprog_peer(entry);
bpf_mprog_entry_copy(peer, entry);
if (idx == -1)
idx = 0;
else if (idx == total)
idx = total - 1;
bpf_mprog_entry_shrink(peer, idx);
bpf_mprog_dec(peer);
bpf_mprog_mark_for_release(peer, dtuple);
*entry_new = peer;
return 0;
}
/* In bpf_mprog_pos_*() we evaluate the target position for the BPF
* program/link that needs to be replaced, inserted or deleted for
* each "rule" independently. If all rules agree on that position
* or existing element, then enact replacement, addition or deletion.
* If this is not the case, then the request cannot be satisfied and
* we bail out with an error.
*/
static int bpf_mprog_pos_exact(struct bpf_mprog_entry *entry,
struct bpf_tuple *tuple)
{
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
int i;
for (i = 0; i < bpf_mprog_total(entry); i++) {
bpf_mprog_read(entry, i, &fp, &cp);
if (tuple->prog == READ_ONCE(fp->prog))
return tuple->link == cp->link ? i : -EBUSY;
}
return -ENOENT;
}
static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry,
struct bpf_tuple *tuple)
{
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
int i;
for (i = 0; i < bpf_mprog_total(entry); i++) {
bpf_mprog_read(entry, i, &fp, &cp);
if (tuple->prog == READ_ONCE(fp->prog) &&
(!tuple->link || tuple->link == cp->link))
return i - 1;
}
return tuple->prog ? -ENOENT : -1;
}
static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry,
struct bpf_tuple *tuple)
{
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
int i;
for (i = 0; i < bpf_mprog_total(entry); i++) {
bpf_mprog_read(entry, i, &fp, &cp);
if (tuple->prog == READ_ONCE(fp->prog) &&
(!tuple->link || tuple->link == cp->link))
return i + 1;
}
return tuple->prog ? -ENOENT : bpf_mprog_total(entry);
}
int bpf_mprog_attach(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_prog *prog_new, struct bpf_link *link,
struct bpf_prog *prog_old,
u32 flags, u32 id_or_fd, u64 revision)
{
struct bpf_tuple rtuple, ntuple = {
.prog = prog_new,
.link = link,
}, otuple = {
.prog = prog_old,
.link = link,
};
int ret, idx = -ERANGE, tidx;
if (revision && revision != bpf_mprog_revision(entry))
return -ESTALE;
if (bpf_mprog_exists(entry, prog_new))
return -EEXIST;
ret = bpf_mprog_tuple_relative(&rtuple, id_or_fd,
flags & ~BPF_F_REPLACE,
prog_new->type);
if (ret)
return ret;
if (flags & BPF_F_REPLACE) {
tidx = bpf_mprog_pos_exact(entry, &otuple);
if (tidx < 0) {
ret = tidx;
goto out;
}
idx = tidx;
}
if (flags & BPF_F_BEFORE) {
tidx = bpf_mprog_pos_before(entry, &rtuple);
if (tidx < -1 || (idx >= -1 && tidx != idx)) {
ret = tidx < -1 ? tidx : -ERANGE;
goto out;
}
idx = tidx;
}
if (flags & BPF_F_AFTER) {
tidx = bpf_mprog_pos_after(entry, &rtuple);
if (tidx < -1 || (idx >= -1 && tidx != idx)) {
ret = tidx < 0 ? tidx : -ERANGE;
goto out;
}
idx = tidx;
}
if (idx < -1) {
if (rtuple.prog || flags) {
ret = -EINVAL;
goto out;
}
idx = bpf_mprog_total(entry);
flags = BPF_F_AFTER;
}
if (idx >= bpf_mprog_max()) {
ret = -ERANGE;
goto out;
}
if (flags & BPF_F_REPLACE)
ret = bpf_mprog_replace(entry, entry_new, &ntuple, idx);
else
ret = bpf_mprog_insert(entry, entry_new, &ntuple, idx, flags);
out:
bpf_mprog_tuple_put(&rtuple);
return ret;
}
static int bpf_mprog_fetch(struct bpf_mprog_entry *entry,
struct bpf_tuple *tuple, int idx)
{
int total = bpf_mprog_total(entry);
struct bpf_mprog_cp *cp;
struct bpf_mprog_fp *fp;
struct bpf_prog *prog;
struct bpf_link *link;
if (idx == -1)
idx = 0;
else if (idx == total)
idx = total - 1;
bpf_mprog_read(entry, idx, &fp, &cp);
prog = READ_ONCE(fp->prog);
link = cp->link;
/* The deletion request can either be without filled tuple in which
* case it gets populated here based on idx, or with filled tuple
* where the only thing we end up doing is the WARN_ON_ONCE() assert.
* If we hit a BPF link at the given index, it must not be removed
* from opts path.
*/
if (link && !tuple->link)
return -EBUSY;
WARN_ON_ONCE(tuple->prog && tuple->prog != prog);
WARN_ON_ONCE(tuple->link && tuple->link != link);
tuple->prog = prog;
tuple->link = link;
return 0;
}
int bpf_mprog_detach(struct bpf_mprog_entry *entry,
struct bpf_mprog_entry **entry_new,
struct bpf_prog *prog, struct bpf_link *link,
u32 flags, u32 id_or_fd, u64 revision)
{
struct bpf_tuple rtuple, dtuple = {
.prog = prog,
.link = link,
};
int ret, idx = -ERANGE, tidx;
if (flags & BPF_F_REPLACE)
return -EINVAL;
if (revision && revision != bpf_mprog_revision(entry))
return -ESTALE;
ret = bpf_mprog_tuple_relative(&rtuple, id_or_fd, flags,
prog ? prog->type :
BPF_PROG_TYPE_UNSPEC);
if (ret)
return ret;
if (dtuple.prog) {
tidx = bpf_mprog_pos_exact(entry, &dtuple);
if (tidx < 0) {
ret = tidx;
goto out;
}
idx = tidx;
}
if (flags & BPF_F_BEFORE) {
tidx = bpf_mprog_pos_before(entry, &rtuple);
if (tidx < -1 || (idx >= -1 && tidx != idx)) {
ret = tidx < -1 ? tidx : -ERANGE;
goto out;
}
idx = tidx;
}
if (flags & BPF_F_AFTER) {
tidx = bpf_mprog_pos_after(entry, &rtuple);
if (tidx < -1 || (idx >= -1 && tidx != idx)) {
ret = tidx < 0 ? tidx : -ERANGE;
goto out;
}
idx = tidx;
}
if (idx < -1) {
if (rtuple.prog || flags) {
ret = -EINVAL;
goto out;
}
idx = bpf_mprog_total(entry);
flags = BPF_F_AFTER;
}
if (idx >= bpf_mprog_max()) {
ret = -ERANGE;
goto out;
}
ret = bpf_mprog_fetch(entry, &dtuple, idx);
if (ret)
goto out;
ret = bpf_mprog_delete(entry, entry_new, &dtuple, idx);
out:
bpf_mprog_tuple_put(&rtuple);
return ret;
}
int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
struct bpf_mprog_entry *entry)
{
u32 __user *uprog_flags, *ulink_flags;
u32 __user *uprog_id, *ulink_id;
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
struct bpf_prog *prog;
const u32 flags = 0;
int i, ret = 0;
u32 id, count;
u64 revision;
if (attr->query.query_flags || attr->query.attach_flags)
return -EINVAL;
revision = bpf_mprog_revision(entry);
count = bpf_mprog_total(entry);
if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
return -EFAULT;
if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
return -EFAULT;
if (copy_to_user(&uattr->query.count, &count, sizeof(count)))
return -EFAULT;
uprog_id = u64_to_user_ptr(attr->query.prog_ids);
uprog_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
ulink_id = u64_to_user_ptr(attr->query.link_ids);
ulink_flags = u64_to_user_ptr(attr->query.link_attach_flags);
if (attr->query.count == 0 || !uprog_id || !count)
return 0;
if (attr->query.count < count) {
count = attr->query.count;
ret = -ENOSPC;
}
for (i = 0; i < bpf_mprog_max(); i++) {
bpf_mprog_read(entry, i, &fp, &cp);
prog = READ_ONCE(fp->prog);
if (!prog)
break;
id = prog->aux->id;
if (copy_to_user(uprog_id + i, &id, sizeof(id)))
return -EFAULT;
if (uprog_flags &&
copy_to_user(uprog_flags + i, &flags, sizeof(flags)))
return -EFAULT;
id = cp->link ? cp->link->id : 0;
if (ulink_id &&
copy_to_user(ulink_id + i, &id, sizeof(id)))
return -EFAULT;
if (ulink_flags &&
copy_to_user(ulink_flags + i, &flags, sizeof(flags)))
return -EFAULT;
if (i + 1 == count)
break;
}
return ret;
}

View File

@ -37,6 +37,8 @@
#include <linux/trace_events.h>
#include <net/netfilter/nf_bpf_link.h>
#include <net/tcx.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
@ -3740,31 +3742,45 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
return BPF_PROG_TYPE_XDP;
case BPF_LSM_CGROUP:
return BPF_PROG_TYPE_LSM;
case BPF_TCX_INGRESS:
case BPF_TCX_EGRESS:
return BPF_PROG_TYPE_SCHED_CLS;
default:
return BPF_PROG_TYPE_UNSPEC;
}
}
#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
#define BPF_PROG_ATTACH_LAST_FIELD expected_revision
#define BPF_F_ATTACH_MASK \
(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
#define BPF_F_ATTACH_MASK_BASE \
(BPF_F_ALLOW_OVERRIDE | \
BPF_F_ALLOW_MULTI | \
BPF_F_REPLACE)
#define BPF_F_ATTACH_MASK_MPROG \
(BPF_F_REPLACE | \
BPF_F_BEFORE | \
BPF_F_AFTER | \
BPF_F_ID | \
BPF_F_LINK)
static int bpf_prog_attach(const union bpf_attr *attr)
{
enum bpf_prog_type ptype;
struct bpf_prog *prog;
u32 mask;
int ret;
if (CHECK_ATTR(BPF_PROG_ATTACH))
return -EINVAL;
if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
return -EINVAL;
ptype = attach_type_to_prog_type(attr->attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC)
return -EINVAL;
mask = bpf_mprog_supported(ptype) ?
BPF_F_ATTACH_MASK_MPROG : BPF_F_ATTACH_MASK_BASE;
if (attr->attach_flags & ~mask)
return -EINVAL;
prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
if (IS_ERR(prog))
@ -3800,6 +3816,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
else
ret = cgroup_bpf_prog_attach(attr, ptype, prog);
break;
case BPF_PROG_TYPE_SCHED_CLS:
ret = tcx_prog_attach(attr, prog);
break;
default:
ret = -EINVAL;
}
@ -3809,25 +3828,41 @@ static int bpf_prog_attach(const union bpf_attr *attr)
return ret;
}
#define BPF_PROG_DETACH_LAST_FIELD attach_type
#define BPF_PROG_DETACH_LAST_FIELD expected_revision
static int bpf_prog_detach(const union bpf_attr *attr)
{
struct bpf_prog *prog = NULL;
enum bpf_prog_type ptype;
int ret;
if (CHECK_ATTR(BPF_PROG_DETACH))
return -EINVAL;
ptype = attach_type_to_prog_type(attr->attach_type);
if (bpf_mprog_supported(ptype)) {
if (ptype == BPF_PROG_TYPE_UNSPEC)
return -EINVAL;
if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG)
return -EINVAL;
if (attr->attach_bpf_fd) {
prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
if (IS_ERR(prog))
return PTR_ERR(prog);
}
}
switch (ptype) {
case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_SK_SKB:
return sock_map_prog_detach(attr, ptype);
ret = sock_map_prog_detach(attr, ptype);
break;
case BPF_PROG_TYPE_LIRC_MODE2:
return lirc_prog_detach(attr);
ret = lirc_prog_detach(attr);
break;
case BPF_PROG_TYPE_FLOW_DISSECTOR:
return netns_bpf_prog_detach(attr, ptype);
ret = netns_bpf_prog_detach(attr, ptype);
break;
case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_CGROUP_SKB:
case BPF_PROG_TYPE_CGROUP_SOCK:
@ -3836,13 +3871,21 @@ static int bpf_prog_detach(const union bpf_attr *attr)
case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_SOCK_OPS:
case BPF_PROG_TYPE_LSM:
return cgroup_bpf_prog_detach(attr, ptype);
ret = cgroup_bpf_prog_detach(attr, ptype);
break;
case BPF_PROG_TYPE_SCHED_CLS:
ret = tcx_prog_detach(attr, prog);
break;
default:
return -EINVAL;
ret = -EINVAL;
}
if (prog)
bpf_prog_put(prog);
return ret;
}
#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags
#define BPF_PROG_QUERY_LAST_FIELD query.link_attach_flags
static int bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
@ -3890,6 +3933,9 @@ static int bpf_prog_query(const union bpf_attr *attr,
case BPF_SK_MSG_VERDICT:
case BPF_SK_SKB_VERDICT:
return sock_map_bpf_prog_query(attr, uattr);
case BPF_TCX_INGRESS:
case BPF_TCX_EGRESS:
return tcx_prog_query(attr, uattr);
default:
return -EINVAL;
}
@ -4852,6 +4898,13 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
goto out;
}
break;
case BPF_PROG_TYPE_SCHED_CLS:
if (attr->link_create.attach_type != BPF_TCX_INGRESS &&
attr->link_create.attach_type != BPF_TCX_EGRESS) {
ret = -EINVAL;
goto out;
}
break;
default:
ptype = attach_type_to_prog_type(attr->link_create.attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@ -4903,6 +4956,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_PROG_TYPE_XDP:
ret = bpf_xdp_link_attach(attr, prog);
break;
case BPF_PROG_TYPE_SCHED_CLS:
ret = tcx_link_attach(attr, prog);
break;
case BPF_PROG_TYPE_NETFILTER:
ret = bpf_nf_link_attach(attr, prog);
break;

348
kernel/bpf/tcx.c Normal file
View File

@ -0,0 +1,348 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Isovalent */
#include <linux/bpf.h>
#include <linux/bpf_mprog.h>
#include <linux/netdevice.h>
#include <net/tcx.h>
int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
bool created, ingress = attr->attach_type == BPF_TCX_INGRESS;
struct net *net = current->nsproxy->net_ns;
struct bpf_mprog_entry *entry, *entry_new;
struct bpf_prog *replace_prog = NULL;
struct net_device *dev;
int ret;
rtnl_lock();
dev = __dev_get_by_index(net, attr->target_ifindex);
if (!dev) {
ret = -ENODEV;
goto out;
}
if (attr->attach_flags & BPF_F_REPLACE) {
replace_prog = bpf_prog_get_type(attr->replace_bpf_fd,
prog->type);
if (IS_ERR(replace_prog)) {
ret = PTR_ERR(replace_prog);
replace_prog = NULL;
goto out;
}
}
entry = tcx_entry_fetch_or_create(dev, ingress, &created);
if (!entry) {
ret = -ENOMEM;
goto out;
}
ret = bpf_mprog_attach(entry, &entry_new, prog, NULL, replace_prog,
attr->attach_flags, attr->relative_fd,
attr->expected_revision);
if (!ret) {
if (entry != entry_new) {
tcx_entry_update(dev, entry_new, ingress);
tcx_entry_sync();
tcx_skeys_inc(ingress);
}
bpf_mprog_commit(entry);
} else if (created) {
tcx_entry_free(entry);
}
out:
if (replace_prog)
bpf_prog_put(replace_prog);
rtnl_unlock();
return ret;
}
int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog)
{
bool ingress = attr->attach_type == BPF_TCX_INGRESS;
struct net *net = current->nsproxy->net_ns;
struct bpf_mprog_entry *entry, *entry_new;
struct net_device *dev;
int ret;
rtnl_lock();
dev = __dev_get_by_index(net, attr->target_ifindex);
if (!dev) {
ret = -ENODEV;
goto out;
}
entry = tcx_entry_fetch(dev, ingress);
if (!entry) {
ret = -ENOENT;
goto out;
}
ret = bpf_mprog_detach(entry, &entry_new, prog, NULL, attr->attach_flags,
attr->relative_fd, attr->expected_revision);
if (!ret) {
if (!tcx_entry_is_active(entry_new))
entry_new = NULL;
tcx_entry_update(dev, entry_new, ingress);
tcx_entry_sync();
tcx_skeys_dec(ingress);
bpf_mprog_commit(entry);
if (!entry_new)
tcx_entry_free(entry);
}
out:
rtnl_unlock();
return ret;
}
void tcx_uninstall(struct net_device *dev, bool ingress)
{
struct bpf_tuple tuple = {};
struct bpf_mprog_entry *entry;
struct bpf_mprog_fp *fp;
struct bpf_mprog_cp *cp;
entry = tcx_entry_fetch(dev, ingress);
if (!entry)
return;
tcx_entry_update(dev, NULL, ingress);
tcx_entry_sync();
bpf_mprog_foreach_tuple(entry, fp, cp, tuple) {
if (tuple.link)
tcx_link(tuple.link)->dev = NULL;
else
bpf_prog_put(tuple.prog);
tcx_skeys_dec(ingress);
}
WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
tcx_entry_free(entry);
}
int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
{
bool ingress = attr->query.attach_type == BPF_TCX_INGRESS;
struct net *net = current->nsproxy->net_ns;
struct bpf_mprog_entry *entry;
struct net_device *dev;
int ret;
rtnl_lock();
dev = __dev_get_by_index(net, attr->query.target_ifindex);
if (!dev) {
ret = -ENODEV;
goto out;
}
entry = tcx_entry_fetch(dev, ingress);
if (!entry) {
ret = -ENOENT;
goto out;
}
ret = bpf_mprog_query(attr, uattr, entry);
out:
rtnl_unlock();
return ret;
}
static int tcx_link_prog_attach(struct bpf_link *link, u32 flags, u32 id_or_fd,
u64 revision)
{
struct tcx_link *tcx = tcx_link(link);
bool created, ingress = tcx->location == BPF_TCX_INGRESS;
struct bpf_mprog_entry *entry, *entry_new;
struct net_device *dev = tcx->dev;
int ret;
ASSERT_RTNL();
entry = tcx_entry_fetch_or_create(dev, ingress, &created);
if (!entry)
return -ENOMEM;
ret = bpf_mprog_attach(entry, &entry_new, link->prog, link, NULL, flags,
id_or_fd, revision);
if (!ret) {
if (entry != entry_new) {
tcx_entry_update(dev, entry_new, ingress);
tcx_entry_sync();
tcx_skeys_inc(ingress);
}
bpf_mprog_commit(entry);
} else if (created) {
tcx_entry_free(entry);
}
return ret;
}
static void tcx_link_release(struct bpf_link *link)
{
struct tcx_link *tcx = tcx_link(link);
bool ingress = tcx->location == BPF_TCX_INGRESS;
struct bpf_mprog_entry *entry, *entry_new;
struct net_device *dev;
int ret = 0;
rtnl_lock();
dev = tcx->dev;
if (!dev)
goto out;
entry = tcx_entry_fetch(dev, ingress);
if (!entry) {
ret = -ENOENT;
goto out;
}
ret = bpf_mprog_detach(entry, &entry_new, link->prog, link, 0, 0, 0);
if (!ret) {
if (!tcx_entry_is_active(entry_new))
entry_new = NULL;
tcx_entry_update(dev, entry_new, ingress);
tcx_entry_sync();
tcx_skeys_dec(ingress);
bpf_mprog_commit(entry);
if (!entry_new)
tcx_entry_free(entry);
tcx->dev = NULL;
}
out:
WARN_ON_ONCE(ret);
rtnl_unlock();
}
static int tcx_link_update(struct bpf_link *link, struct bpf_prog *nprog,
struct bpf_prog *oprog)
{
struct tcx_link *tcx = tcx_link(link);
bool ingress = tcx->location == BPF_TCX_INGRESS;
struct bpf_mprog_entry *entry, *entry_new;
struct net_device *dev;
int ret = 0;
rtnl_lock();
dev = tcx->dev;
if (!dev) {
ret = -ENOLINK;
goto out;
}
if (oprog && link->prog != oprog) {
ret = -EPERM;
goto out;
}
oprog = link->prog;
if (oprog == nprog) {
bpf_prog_put(nprog);
goto out;
}
entry = tcx_entry_fetch(dev, ingress);
if (!entry) {
ret = -ENOENT;
goto out;
}
ret = bpf_mprog_attach(entry, &entry_new, nprog, link, oprog,
BPF_F_REPLACE | BPF_F_ID,
link->prog->aux->id, 0);
if (!ret) {
WARN_ON_ONCE(entry != entry_new);
oprog = xchg(&link->prog, nprog);
bpf_prog_put(oprog);
bpf_mprog_commit(entry);
}
out:
rtnl_unlock();
return ret;
}
static void tcx_link_dealloc(struct bpf_link *link)
{
kfree(tcx_link(link));
}
static void tcx_link_fdinfo(const struct bpf_link *link, struct seq_file *seq)
{
const struct tcx_link *tcx = tcx_link_const(link);
u32 ifindex = 0;
rtnl_lock();
if (tcx->dev)
ifindex = tcx->dev->ifindex;
rtnl_unlock();
seq_printf(seq, "ifindex:\t%u\n", ifindex);
seq_printf(seq, "attach_type:\t%u (%s)\n",
tcx->location,
tcx->location == BPF_TCX_INGRESS ? "ingress" : "egress");
}
static int tcx_link_fill_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
const struct tcx_link *tcx = tcx_link_const(link);
u32 ifindex = 0;
rtnl_lock();
if (tcx->dev)
ifindex = tcx->dev->ifindex;
rtnl_unlock();
info->tcx.ifindex = ifindex;
info->tcx.attach_type = tcx->location;
return 0;
}
static int tcx_link_detach(struct bpf_link *link)
{
tcx_link_release(link);
return 0;
}
static const struct bpf_link_ops tcx_link_lops = {
.release = tcx_link_release,
.detach = tcx_link_detach,
.dealloc = tcx_link_dealloc,
.update_prog = tcx_link_update,
.show_fdinfo = tcx_link_fdinfo,
.fill_link_info = tcx_link_fill_info,
};
static int tcx_link_init(struct tcx_link *tcx,
struct bpf_link_primer *link_primer,
const union bpf_attr *attr,
struct net_device *dev,
struct bpf_prog *prog)
{
bpf_link_init(&tcx->link, BPF_LINK_TYPE_TCX, &tcx_link_lops, prog);
tcx->location = attr->link_create.attach_type;
tcx->dev = dev;
return bpf_link_prime(&tcx->link, link_primer);
}
int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
struct net *net = current->nsproxy->net_ns;
struct bpf_link_primer link_primer;
struct net_device *dev;
struct tcx_link *tcx;
int ret;
rtnl_lock();
dev = __dev_get_by_index(net, attr->link_create.target_ifindex);
if (!dev) {
ret = -ENODEV;
goto out;
}
tcx = kzalloc(sizeof(*tcx), GFP_USER);
if (!tcx) {
ret = -ENOMEM;
goto out;
}
ret = tcx_link_init(tcx, &link_primer, attr, dev, prog);
if (ret) {
kfree(tcx);
goto out;
}
ret = tcx_link_prog_attach(&tcx->link, attr->link_create.flags,
attr->link_create.tcx.relative_fd,
attr->link_create.tcx.expected_revision);
if (ret) {
tcx->dev = NULL;
bpf_link_cleanup(&link_primer);
goto out;
}
ret = bpf_link_settle(&link_primer);
out:
rtnl_unlock();
return ret;
}

View File

@ -52,6 +52,11 @@ config NET_INGRESS
config NET_EGRESS
bool
config NET_XGRESS
select NET_INGRESS
select NET_EGRESS
bool
config NET_REDIRECT
bool

View File

@ -107,6 +107,7 @@
#include <net/pkt_cls.h>
#include <net/checksum.h>
#include <net/xfrm.h>
#include <net/tcx.h>
#include <linux/highmem.h>
#include <linux/init.h>
#include <linux/module.h>
@ -154,7 +155,6 @@
#include "dev.h"
#include "net-sysfs.h"
static DEFINE_SPINLOCK(ptype_lock);
struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
struct list_head ptype_all __read_mostly; /* Taps */
@ -3882,50 +3882,6 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
EXPORT_SYMBOL(dev_loopback_xmit);
#ifdef CONFIG_NET_EGRESS
static struct sk_buff *
sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
{
#ifdef CONFIG_NET_CLS_ACT
struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
struct tcf_result cl_res;
if (!miniq)
return skb;
/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
tc_skb_cb(skb)->mru = 0;
tc_skb_cb(skb)->post_ct = false;
mini_qdisc_bstats_cpu_update(miniq, skb);
switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
case TC_ACT_OK:
case TC_ACT_RECLASSIFY:
skb->tc_index = TC_H_MIN(cl_res.classid);
break;
case TC_ACT_SHOT:
mini_qdisc_qstats_cpu_drop(miniq);
*ret = NET_XMIT_DROP;
kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
return NULL;
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
*ret = NET_XMIT_SUCCESS;
consume_skb(skb);
return NULL;
case TC_ACT_REDIRECT:
/* No need to push/pop skb's mac_header here on egress! */
skb_do_redirect(skb);
*ret = NET_XMIT_SUCCESS;
return NULL;
default:
break;
}
#endif /* CONFIG_NET_CLS_ACT */
return skb;
}
static struct netdev_queue *
netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
{
@ -3946,6 +3902,179 @@ void netdev_xmit_skip_txqueue(bool skip)
EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
#endif /* CONFIG_NET_EGRESS */
#ifdef CONFIG_NET_XGRESS
static int tc_run(struct tcx_entry *entry, struct sk_buff *skb)
{
int ret = TC_ACT_UNSPEC;
#ifdef CONFIG_NET_CLS_ACT
struct mini_Qdisc *miniq = rcu_dereference_bh(entry->miniq);
struct tcf_result res;
if (!miniq)
return ret;
tc_skb_cb(skb)->mru = 0;
tc_skb_cb(skb)->post_ct = false;
mini_qdisc_bstats_cpu_update(miniq, skb);
ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false);
/* Only tcf related quirks below. */
switch (ret) {
case TC_ACT_SHOT:
mini_qdisc_qstats_cpu_drop(miniq);
break;
case TC_ACT_OK:
case TC_ACT_RECLASSIFY:
skb->tc_index = TC_H_MIN(res.classid);
break;
}
#endif /* CONFIG_NET_CLS_ACT */
return ret;
}
static DEFINE_STATIC_KEY_FALSE(tcx_needed_key);
void tcx_inc(void)
{
static_branch_inc(&tcx_needed_key);
}
void tcx_dec(void)
{
static_branch_dec(&tcx_needed_key);
}
static __always_inline enum tcx_action_base
tcx_run(const struct bpf_mprog_entry *entry, struct sk_buff *skb,
const bool needs_mac)
{
const struct bpf_mprog_fp *fp;
const struct bpf_prog *prog;
int ret = TCX_NEXT;
if (needs_mac)
__skb_push(skb, skb->mac_len);
bpf_mprog_foreach_prog(entry, fp, prog) {
bpf_compute_data_pointers(skb);
ret = bpf_prog_run(prog, skb);
if (ret != TCX_NEXT)
break;
}
if (needs_mac)
__skb_pull(skb, skb->mac_len);
return tcx_action_code(skb, ret);
}
static __always_inline struct sk_buff *
sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
struct net_device *orig_dev, bool *another)
{
struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
int sch_ret;
if (!entry)
return skb;
if (*pt_prev) {
*ret = deliver_skb(skb, *pt_prev, orig_dev);
*pt_prev = NULL;
}
qdisc_skb_cb(skb)->pkt_len = skb->len;
tcx_set_ingress(skb, true);
if (static_branch_unlikely(&tcx_needed_key)) {
sch_ret = tcx_run(entry, skb, true);
if (sch_ret != TC_ACT_UNSPEC)
goto ingress_verdict;
}
sch_ret = tc_run(tcx_entry(entry), skb);
ingress_verdict:
switch (sch_ret) {
case TC_ACT_REDIRECT:
/* skb_mac_header check was done by BPF, so we can safely
* push the L2 header back before redirecting to another
* netdev.
*/
__skb_push(skb, skb->mac_len);
if (skb_do_redirect(skb) == -EAGAIN) {
__skb_pull(skb, skb->mac_len);
*another = true;
break;
}
*ret = NET_RX_SUCCESS;
return NULL;
case TC_ACT_SHOT:
kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
*ret = NET_RX_DROP;
return NULL;
/* used by tc_run */
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
consume_skb(skb);
fallthrough;
case TC_ACT_CONSUMED:
*ret = NET_RX_SUCCESS;
return NULL;
}
return skb;
}
static __always_inline struct sk_buff *
sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
{
struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
int sch_ret;
if (!entry)
return skb;
/* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
* already set by the caller.
*/
if (static_branch_unlikely(&tcx_needed_key)) {
sch_ret = tcx_run(entry, skb, false);
if (sch_ret != TC_ACT_UNSPEC)
goto egress_verdict;
}
sch_ret = tc_run(tcx_entry(entry), skb);
egress_verdict:
switch (sch_ret) {
case TC_ACT_REDIRECT:
/* No need to push/pop skb's mac_header here on egress! */
skb_do_redirect(skb);
*ret = NET_XMIT_SUCCESS;
return NULL;
case TC_ACT_SHOT:
kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
*ret = NET_XMIT_DROP;
return NULL;
/* used by tc_run */
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
*ret = NET_XMIT_SUCCESS;
return NULL;
}
return skb;
}
#else
static __always_inline struct sk_buff *
sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
struct net_device *orig_dev, bool *another)
{
return skb;
}
static __always_inline struct sk_buff *
sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
{
return skb;
}
#endif /* CONFIG_NET_XGRESS */
#ifdef CONFIG_XPS
static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
struct xps_dev_maps *dev_maps, unsigned int tci)
@ -4128,9 +4257,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
skb_update_prio(skb);
qdisc_pkt_len_init(skb);
#ifdef CONFIG_NET_CLS_ACT
skb->tc_at_ingress = 0;
#endif
tcx_set_ingress(skb, false);
#ifdef CONFIG_NET_EGRESS
if (static_branch_unlikely(&egress_needed_key)) {
if (nf_hook_egress_active()) {
@ -5064,72 +5191,6 @@ int (*br_fdb_test_addr_hook)(struct net_device *dev,
EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
#endif
static inline struct sk_buff *
sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
struct net_device *orig_dev, bool *another)
{
#ifdef CONFIG_NET_CLS_ACT
struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
struct tcf_result cl_res;
/* If there's at least one ingress present somewhere (so
* we get here via enabled static key), remaining devices
* that are not configured with an ingress qdisc will bail
* out here.
*/
if (!miniq)
return skb;
if (*pt_prev) {
*ret = deliver_skb(skb, *pt_prev, orig_dev);
*pt_prev = NULL;
}
qdisc_skb_cb(skb)->pkt_len = skb->len;
tc_skb_cb(skb)->mru = 0;
tc_skb_cb(skb)->post_ct = false;
skb->tc_at_ingress = 1;
mini_qdisc_bstats_cpu_update(miniq, skb);
switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
case TC_ACT_OK:
case TC_ACT_RECLASSIFY:
skb->tc_index = TC_H_MIN(cl_res.classid);
break;
case TC_ACT_SHOT:
mini_qdisc_qstats_cpu_drop(miniq);
kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
*ret = NET_RX_DROP;
return NULL;
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
consume_skb(skb);
*ret = NET_RX_SUCCESS;
return NULL;
case TC_ACT_REDIRECT:
/* skb_mac_header check was done by cls/act_bpf, so
* we can safely push the L2 header back before
* redirecting to another netdev
*/
__skb_push(skb, skb->mac_len);
if (skb_do_redirect(skb) == -EAGAIN) {
__skb_pull(skb, skb->mac_len);
*another = true;
break;
}
*ret = NET_RX_SUCCESS;
return NULL;
case TC_ACT_CONSUMED:
*ret = NET_RX_SUCCESS;
return NULL;
default:
break;
}
#endif /* CONFIG_NET_CLS_ACT */
return skb;
}
/**
* netdev_is_rx_handler_busy - check if receive handler is registered
* @dev: device to check
@ -10835,7 +10896,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
/* Shutdown queueing discipline. */
dev_shutdown(dev);
dev_tcx_uninstall(dev);
dev_xdp_uninstall(dev);
bpf_dev_bound_netdev_unregister(dev);

View File

@ -9307,7 +9307,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
#ifdef CONFIG_NET_CLS_ACT
#ifdef CONFIG_NET_XGRESS
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, read skb->tstamp as is if tstamp_type_access is true.
@ -9341,7 +9341,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
__u8 value_reg = si->src_reg;
__u8 skb_reg = si->dst_reg;
#ifdef CONFIG_NET_CLS_ACT
#ifdef CONFIG_NET_XGRESS
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, write skb->tstamp as is if tstamp_type_access is true.

View File

@ -347,8 +347,7 @@ config NET_SCH_FQ_PIE
config NET_SCH_INGRESS
tristate "Ingress/classifier-action Qdisc"
depends on NET_CLS_ACT
select NET_INGRESS
select NET_EGRESS
select NET_XGRESS
help
Say Y here if you want to use classifiers for incoming and/or outgoing
packets. This qdisc doesn't do anything else besides running classifiers,
@ -679,6 +678,7 @@ config NET_EMATCH_IPT
config NET_CLS_ACT
bool "Actions"
select NET_CLS
select NET_XGRESS
help
Say Y here if you want to use traffic control actions. Actions
get attached to classifiers and are invoked after a successful

View File

@ -13,6 +13,7 @@
#include <net/netlink.h>
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>
#include <net/tcx.h>
struct ingress_sched_data {
struct tcf_block *block;
@ -78,6 +79,8 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
{
struct ingress_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct bpf_mprog_entry *entry;
bool created;
int err;
if (sch->parent != TC_H_INGRESS)
@ -85,7 +88,13 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
net_inc_ingress_queue();
mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
entry = tcx_entry_fetch_or_create(dev, true, &created);
if (!entry)
return -ENOMEM;
tcx_miniq_set_active(entry, true);
mini_qdisc_pair_init(&q->miniqp, sch, &tcx_entry(entry)->miniq);
if (created)
tcx_entry_update(dev, entry, true);
q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
q->block_info.chain_head_change = clsact_chain_head_change;
@ -103,11 +112,22 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
static void ingress_destroy(struct Qdisc *sch)
{
struct ingress_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct bpf_mprog_entry *entry = rtnl_dereference(dev->tcx_ingress);
if (sch->parent != TC_H_INGRESS)
return;
tcf_block_put_ext(q->block, sch, &q->block_info);
if (entry) {
tcx_miniq_set_active(entry, false);
if (!tcx_entry_is_active(entry)) {
tcx_entry_update(dev, NULL, false);
tcx_entry_free(entry);
}
}
net_dec_ingress_queue();
}
@ -223,6 +243,8 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
{
struct clsact_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct bpf_mprog_entry *entry;
bool created;
int err;
if (sch->parent != TC_H_CLSACT)
@ -231,7 +253,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
net_inc_ingress_queue();
net_inc_egress_queue();
mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
entry = tcx_entry_fetch_or_create(dev, true, &created);
if (!entry)
return -ENOMEM;
tcx_miniq_set_active(entry, true);
mini_qdisc_pair_init(&q->miniqp_ingress, sch, &tcx_entry(entry)->miniq);
if (created)
tcx_entry_update(dev, entry, true);
q->ingress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
q->ingress_block_info.chain_head_change = clsact_chain_head_change;
@ -244,7 +272,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
mini_qdisc_pair_block_init(&q->miniqp_ingress, q->ingress_block);
mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress);
entry = tcx_entry_fetch_or_create(dev, false, &created);
if (!entry)
return -ENOMEM;
tcx_miniq_set_active(entry, true);
mini_qdisc_pair_init(&q->miniqp_egress, sch, &tcx_entry(entry)->miniq);
if (created)
tcx_entry_update(dev, entry, false);
q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
q->egress_block_info.chain_head_change = clsact_chain_head_change;
@ -256,12 +290,31 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
static void clsact_destroy(struct Qdisc *sch)
{
struct clsact_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct bpf_mprog_entry *ingress_entry = rtnl_dereference(dev->tcx_ingress);
struct bpf_mprog_entry *egress_entry = rtnl_dereference(dev->tcx_egress);
if (sch->parent != TC_H_CLSACT)
return;
tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
if (ingress_entry) {
tcx_miniq_set_active(ingress_entry, false);
if (!tcx_entry_is_active(ingress_entry)) {
tcx_entry_update(dev, NULL, true);
tcx_entry_free(ingress_entry);
}
}
if (egress_entry) {
tcx_miniq_set_active(egress_entry, false);
if (!tcx_entry_is_active(egress_entry)) {
tcx_entry_update(dev, NULL, false);
tcx_entry_free(egress_entry);
}
}
net_dec_ingress_queue();
net_dec_egress_queue();

View File

@ -4,7 +4,7 @@
bpftool-net
================
-------------------------------------------------------------------------------
tool for inspection of netdev/tc related bpf prog attachments
tool for inspection of networking related bpf prog attachments
-------------------------------------------------------------------------------
:Manual section: 8
@ -37,10 +37,13 @@ DESCRIPTION
**bpftool net { show | list }** [ **dev** *NAME* ]
List bpf program attachments in the kernel networking subsystem.
Currently, only device driver xdp attachments and tc filter
classification/action attachments are implemented, i.e., for
program types **BPF_PROG_TYPE_SCHED_CLS**,
**BPF_PROG_TYPE_SCHED_ACT** and **BPF_PROG_TYPE_XDP**.
Currently, device driver xdp attachments, tcx and old-style tc
classifier/action attachments, flow_dissector as well as netfilter
attachments are implemented, i.e., for
program types **BPF_PROG_TYPE_XDP**, **BPF_PROG_TYPE_SCHED_CLS**,
**BPF_PROG_TYPE_SCHED_ACT**, **BPF_PROG_TYPE_FLOW_DISSECTOR**,
**BPF_PROG_TYPE_NETFILTER**.
For programs attached to a particular cgroup, e.g.,
**BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**,
**BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**,
@ -49,12 +52,13 @@ DESCRIPTION
bpf programs, users should consult other tools, e.g., iproute2.
The current output will start with all xdp program attachments, followed by
all tc class/qdisc bpf program attachments. Both xdp programs and
tc programs are ordered based on ifindex number. If multiple bpf
programs attached to the same networking device through **tc filter**,
the order will be first all bpf programs attached to tc classes, then
all bpf programs attached to non clsact qdiscs, and finally all
bpf programs attached to root and clsact qdisc.
all tcx, then tc class/qdisc bpf program attachments, then flow_dissector
and finally netfilter programs. Both xdp programs and tcx/tc programs are
ordered based on ifindex number. If multiple bpf programs attached
to the same networking device through **tc**, the order will be first
all bpf programs attached to tcx, then tc classes, then all bpf programs
attached to non clsact qdiscs, and finally all bpf programs attached
to root and clsact qdisc.
**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ]
Attach bpf program *PROG* to network interface *NAME* with

View File

@ -76,6 +76,11 @@ static const char * const attach_type_strings[] = {
[NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
};
static const char * const attach_loc_strings[] = {
[BPF_TCX_INGRESS] = "tcx/ingress",
[BPF_TCX_EGRESS] = "tcx/egress",
};
const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
static enum net_attach_type parse_attach_type(const char *str)
@ -422,8 +427,89 @@ static int dump_filter_nlmsg(void *cookie, void *msg, struct nlattr **tb)
filter_info->devname, filter_info->ifindex);
}
static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
struct ip_devname_ifindex *dev)
static int __show_dev_tc_bpf_name(__u32 id, char *name, size_t len)
{
struct bpf_prog_info info = {};
__u32 ilen = sizeof(info);
int fd, ret;
fd = bpf_prog_get_fd_by_id(id);
if (fd < 0)
return fd;
ret = bpf_obj_get_info_by_fd(fd, &info, &ilen);
if (ret < 0)
goto out;
ret = -ENOENT;
if (info.name[0]) {
get_prog_full_name(&info, fd, name, len);
ret = 0;
}
out:
close(fd);
return ret;
}
static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
const enum bpf_attach_type loc)
{
__u32 prog_flags[64] = {}, link_flags[64] = {}, i, j;
__u32 prog_ids[64] = {}, link_ids[64] = {};
LIBBPF_OPTS(bpf_prog_query_opts, optq);
char prog_name[MAX_PROG_FULL_NAME];
int ret;
optq.prog_ids = prog_ids;
optq.prog_attach_flags = prog_flags;
optq.link_ids = link_ids;
optq.link_attach_flags = link_flags;
optq.count = ARRAY_SIZE(prog_ids);
ret = bpf_prog_query_opts(dev->ifindex, loc, &optq);
if (ret)
return;
for (i = 0; i < optq.count; i++) {
NET_START_OBJECT;
NET_DUMP_STR("devname", "%s", dev->devname);
NET_DUMP_UINT("ifindex", "(%u)", dev->ifindex);
NET_DUMP_STR("kind", " %s", attach_loc_strings[loc]);
ret = __show_dev_tc_bpf_name(prog_ids[i], prog_name,
sizeof(prog_name));
if (!ret)
NET_DUMP_STR("name", " %s", prog_name);
NET_DUMP_UINT("prog_id", " prog_id %u ", prog_ids[i]);
if (prog_flags[i] || json_output) {
NET_START_ARRAY("prog_flags", "%s ");
for (j = 0; prog_flags[i] && j < 32; j++) {
if (!(prog_flags[i] & (1 << j)))
continue;
NET_DUMP_UINT_ONLY(1 << j);
}
NET_END_ARRAY("");
}
if (link_ids[i] || json_output) {
NET_DUMP_UINT("link_id", "link_id %u ", link_ids[i]);
if (link_flags[i] || json_output) {
NET_START_ARRAY("link_flags", "%s ");
for (j = 0; link_flags[i] && j < 32; j++) {
if (!(link_flags[i] & (1 << j)))
continue;
NET_DUMP_UINT_ONLY(1 << j);
}
NET_END_ARRAY("");
}
}
NET_END_OBJECT_FINAL;
}
}
static void show_dev_tc_bpf(struct ip_devname_ifindex *dev)
{
__show_dev_tc_bpf(dev, BPF_TCX_INGRESS);
__show_dev_tc_bpf(dev, BPF_TCX_EGRESS);
}
static int show_dev_tc_bpf_classic(int sock, unsigned int nl_pid,
struct ip_devname_ifindex *dev)
{
struct bpf_filter_t filter_info;
struct bpf_tcinfo_t tcinfo;
@ -790,8 +876,9 @@ static int do_show(int argc, char **argv)
if (!ret) {
NET_START_ARRAY("tc", "%s:\n");
for (i = 0; i < dev_array.used_len; i++) {
ret = show_dev_tc_bpf(sock, nl_pid,
&dev_array.devices[i]);
show_dev_tc_bpf(&dev_array.devices[i]);
ret = show_dev_tc_bpf_classic(sock, nl_pid,
&dev_array.devices[i]);
if (ret)
break;
}
@ -839,7 +926,8 @@ static int do_help(int argc, char **argv)
" ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
" " HELP_SPEC_OPTIONS " }\n"
"\n"
"Note: Only xdp and tc attachments are supported now.\n"
"Note: Only xdp, tcx, tc, flow_dissector and netfilter attachments\n"
" are currently supported.\n"
" For progs attached to cgroups, use \"bpftool cgroup\"\n"
" to dump program attachments. For program types\n"
" sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"

View File

@ -76,6 +76,14 @@
fprintf(stdout, fmt_str, val); \
}
#define NET_DUMP_UINT_ONLY(str) \
{ \
if (json_output) \
jsonw_uint(json_wtr, str); \
else \
fprintf(stdout, "%u ", str); \
}
#define NET_DUMP_STR(name, fmt_str, str) \
{ \
if (json_output) \

View File

@ -1036,6 +1036,8 @@ enum bpf_attach_type {
BPF_LSM_CGROUP,
BPF_STRUCT_OPS,
BPF_NETFILTER,
BPF_TCX_INGRESS,
BPF_TCX_EGRESS,
__MAX_BPF_ATTACH_TYPE
};
@ -1053,7 +1055,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_KPROBE_MULTI = 8,
BPF_LINK_TYPE_STRUCT_OPS = 9,
BPF_LINK_TYPE_NETFILTER = 10,
BPF_LINK_TYPE_TCX = 11,
MAX_BPF_LINK_TYPE,
};
@ -1113,7 +1115,12 @@ enum bpf_perf_event_type {
*/
#define BPF_F_ALLOW_OVERRIDE (1U << 0)
#define BPF_F_ALLOW_MULTI (1U << 1)
/* Generic attachment flags. */
#define BPF_F_REPLACE (1U << 2)
#define BPF_F_BEFORE (1U << 3)
#define BPF_F_AFTER (1U << 4)
#define BPF_F_ID (1U << 5)
#define BPF_F_LINK BPF_F_LINK /* 1 << 13 */
/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
* verifier will perform strict alignment checking as if the kernel
@ -1444,14 +1451,19 @@ union bpf_attr {
};
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
__u32 target_fd; /* container object to attach to */
__u32 attach_bpf_fd; /* eBPF program to attach */
union {
__u32 target_fd; /* target object to attach to or ... */
__u32 target_ifindex; /* target ifindex */
};
__u32 attach_bpf_fd;
__u32 attach_type;
__u32 attach_flags;
__u32 replace_bpf_fd; /* previously attached eBPF
* program to replace if
* BPF_F_REPLACE is used
*/
__u32 replace_bpf_fd;
union {
__u32 relative_fd;
__u32 relative_id;
};
__u64 expected_revision;
};
struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
@ -1497,16 +1509,26 @@ union bpf_attr {
} info;
struct { /* anonymous struct used by BPF_PROG_QUERY command */
__u32 target_fd; /* container object to query */
union {
__u32 target_fd; /* target object to query or ... */
__u32 target_ifindex; /* target ifindex */
};
__u32 attach_type;
__u32 query_flags;
__u32 attach_flags;
__aligned_u64 prog_ids;
__u32 prog_cnt;
union {
__u32 prog_cnt;
__u32 count;
};
__u32 :32;
/* output: per-program attach_flags.
* not allowed to be set during effective query.
*/
__aligned_u64 prog_attach_flags;
__aligned_u64 link_ids;
__aligned_u64 link_attach_flags;
__u64 revision;
} query;
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
@ -1549,13 +1571,13 @@ union bpf_attr {
__u32 map_fd; /* struct_ops to attach */
};
union {
__u32 target_fd; /* object to attach to */
__u32 target_ifindex; /* target ifindex */
__u32 target_fd; /* target object to attach to or ... */
__u32 target_ifindex; /* target ifindex */
};
__u32 attach_type; /* attach type */
__u32 flags; /* extra flags */
union {
__u32 target_btf_id; /* btf_id of target to attach to */
__u32 target_btf_id; /* btf_id of target to attach to */
struct {
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
@ -1589,6 +1611,13 @@ union bpf_attr {
__s32 priority;
__u32 flags;
} netfilter;
struct {
union {
__u32 relative_fd;
__u32 relative_id;
};
__u64 expected_revision;
} tcx;
};
} link_create;
@ -6197,6 +6226,19 @@ struct bpf_sock_tuple {
};
};
/* (Simplified) user return codes for tcx prog type.
* A valid tcx program must return one of these defined values. All other
* return codes are reserved for future use. Must remain compatible with
* their TC_ACT_* counter-parts. For compatibility in behavior, unknown
* return codes are mapped to TCX_NEXT.
*/
enum tcx_action_base {
TCX_NEXT = -1,
TCX_PASS = 0,
TCX_DROP = 2,
TCX_REDIRECT = 7,
};
struct bpf_xdp_sock {
__u32 queue_id;
};
@ -6479,6 +6521,10 @@ struct bpf_link_info {
} event; /* BPF_PERF_EVENT_EVENT */
};
} perf_event;
struct {
__u32 ifindex;
__u32 attach_type;
} tcx;
};
} __attribute__((aligned(8)));

View File

@ -629,55 +629,89 @@ int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts);
}
int bpf_prog_attach_opts(int prog_fd, int target_fd,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts)
int bpf_prog_attach_opts(int prog_fd, int target, enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
__u32 relative_id, flags;
int ret, relative_fd;
union bpf_attr attr;
int ret;
if (!OPTS_VALID(opts, bpf_prog_attach_opts))
return libbpf_err(-EINVAL);
relative_id = OPTS_GET(opts, relative_id, 0);
relative_fd = OPTS_GET(opts, relative_fd, 0);
flags = OPTS_GET(opts, flags, 0);
/* validate we don't have unexpected combinations of non-zero fields */
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.target_fd = target_fd;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
attr.attach_flags = OPTS_GET(opts, flags, 0);
attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0);
attr.target_fd = target;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
attr.replace_bpf_fd = OPTS_GET(opts, replace_fd, 0);
attr.expected_revision = OPTS_GET(opts, expected_revision, 0);
if (relative_id) {
attr.attach_flags = flags | BPF_F_ID;
attr.relative_id = relative_id;
} else {
attr.attach_flags = flags;
attr.relative_fd = relative_fd;
}
ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
int bpf_prog_detach_opts(int prog_fd, int target, enum bpf_attach_type type,
const struct bpf_prog_detach_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
__u32 relative_id, flags;
int ret, relative_fd;
union bpf_attr attr;
int ret;
if (!OPTS_VALID(opts, bpf_prog_detach_opts))
return libbpf_err(-EINVAL);
relative_id = OPTS_GET(opts, relative_id, 0);
relative_fd = OPTS_GET(opts, relative_fd, 0);
flags = OPTS_GET(opts, flags, 0);
/* validate we don't have unexpected combinations of non-zero fields */
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.target_fd = target_fd;
attr.attach_type = type;
attr.target_fd = target;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
attr.expected_revision = OPTS_GET(opts, expected_revision, 0);
if (relative_id) {
attr.attach_flags = flags | BPF_F_ID;
attr.relative_id = relative_id;
} else {
attr.attach_flags = flags;
attr.relative_fd = relative_fd;
}
ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
{
return bpf_prog_detach_opts(0, target_fd, type, NULL);
}
int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
{
const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.target_fd = target_fd;
attr.attach_bpf_fd = prog_fd;
attr.attach_type = type;
ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
return libbpf_err_errno(ret);
return bpf_prog_detach_opts(prog_fd, target_fd, type, NULL);
}
int bpf_link_create(int prog_fd, int target_fd,
@ -685,9 +719,9 @@ int bpf_link_create(int prog_fd, int target_fd,
const struct bpf_link_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, link_create);
__u32 target_btf_id, iter_info_len;
__u32 target_btf_id, iter_info_len, relative_id;
int fd, err, relative_fd;
union bpf_attr attr;
int fd, err;
if (!OPTS_VALID(opts, bpf_link_create_opts))
return libbpf_err(-EINVAL);
@ -749,6 +783,22 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, netfilter))
return libbpf_err(-EINVAL);
break;
case BPF_TCX_INGRESS:
case BPF_TCX_EGRESS:
relative_fd = OPTS_GET(opts, tcx.relative_fd, 0);
relative_id = OPTS_GET(opts, tcx.relative_id, 0);
if (relative_fd && relative_id)
return libbpf_err(-EINVAL);
if (relative_id) {
attr.link_create.tcx.relative_id = relative_id;
attr.link_create.flags |= BPF_F_ID;
} else {
attr.link_create.tcx.relative_fd = relative_fd;
}
attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0);
if (!OPTS_ZEROED(opts, tcx))
return libbpf_err(-EINVAL);
break;
default:
if (!OPTS_ZEROED(opts, flags))
return libbpf_err(-EINVAL);
@ -841,8 +891,7 @@ int bpf_iter_create(int link_fd)
return libbpf_err_errno(fd);
}
int bpf_prog_query_opts(int target_fd,
enum bpf_attach_type type,
int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, query);
@ -853,18 +902,20 @@ int bpf_prog_query_opts(int target_fd,
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.query.target_fd = target_fd;
attr.query.attach_type = type;
attr.query.query_flags = OPTS_GET(opts, query_flags, 0);
attr.query.prog_cnt = OPTS_GET(opts, prog_cnt, 0);
attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
attr.query.target_fd = target;
attr.query.attach_type = type;
attr.query.query_flags = OPTS_GET(opts, query_flags, 0);
attr.query.count = OPTS_GET(opts, count, 0);
attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
attr.query.link_ids = ptr_to_u64(OPTS_GET(opts, link_ids, NULL));
attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
attr.query.link_attach_flags = ptr_to_u64(OPTS_GET(opts, link_attach_flags, NULL));
ret = sys_bpf(BPF_PROG_QUERY, &attr, attr_sz);
OPTS_SET(opts, attach_flags, attr.query.attach_flags);
OPTS_SET(opts, prog_cnt, attr.query.prog_cnt);
OPTS_SET(opts, revision, attr.query.revision);
OPTS_SET(opts, count, attr.query.count);
return libbpf_err_errno(ret);
}

View File

@ -312,22 +312,68 @@ LIBBPF_API int bpf_obj_get(const char *pathname);
LIBBPF_API int bpf_obj_get_opts(const char *pathname,
const struct bpf_obj_get_opts *opts);
struct bpf_prog_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
unsigned int flags;
int replace_prog_fd;
};
#define bpf_prog_attach_opts__last_field replace_prog_fd
LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
enum bpf_attach_type type, unsigned int flags);
LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int attachable_fd,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts);
LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
enum bpf_attach_type type);
struct bpf_prog_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
union {
int replace_prog_fd;
int replace_fd;
};
int relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_prog_attach_opts__last_field expected_revision
struct bpf_prog_detach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
int relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_prog_detach_opts__last_field expected_revision
/**
* @brief **bpf_prog_attach_opts()** attaches the BPF program corresponding to
* *prog_fd* to a *target* which can represent a file descriptor or netdevice
* ifindex.
*
* @param prog_fd BPF program file descriptor
* @param target attach location file descriptor or ifindex
* @param type attach type for the BPF program
* @param opts options for configuring the attachment
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int target,
enum bpf_attach_type type,
const struct bpf_prog_attach_opts *opts);
/**
* @brief **bpf_prog_detach_opts()** detaches the BPF program corresponding to
* *prog_fd* from a *target* which can represent a file descriptor or netdevice
* ifindex.
*
* @param prog_fd BPF program file descriptor
* @param target detach location file descriptor or ifindex
* @param type detach type for the BPF program
* @param opts options for configuring the detachment
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_detach_opts(int prog_fd, int target,
enum bpf_attach_type type,
const struct bpf_prog_detach_opts *opts);
union bpf_iter_link_info; /* defined in up-to-date linux/bpf.h */
struct bpf_link_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@ -355,6 +401,11 @@ struct bpf_link_create_opts {
__s32 priority;
__u32 flags;
} netfilter;
struct {
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
} tcx;
};
size_t :0;
};
@ -495,13 +546,31 @@ struct bpf_prog_query_opts {
__u32 query_flags;
__u32 attach_flags; /* output argument */
__u32 *prog_ids;
__u32 prog_cnt; /* input+output argument */
union {
/* input+output argument */
__u32 prog_cnt;
__u32 count;
};
__u32 *prog_attach_flags;
__u32 *link_ids;
__u32 *link_attach_flags;
__u64 revision;
size_t :0;
};
#define bpf_prog_query_opts__last_field prog_attach_flags
#define bpf_prog_query_opts__last_field revision
LIBBPF_API int bpf_prog_query_opts(int target_fd,
enum bpf_attach_type type,
/**
* @brief **bpf_prog_query_opts()** queries the BPF programs and BPF links
* which are attached to *target* which can represent a file descriptor or
* netdevice ifindex.
*
* @param target query location file descriptor or ifindex
* @param type attach type for the BPF program
* @param opts options for configuring the query
* @return 0, on success; negative error code, otherwise (errno is also set to
* the error code)
*/
LIBBPF_API int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts);
LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
__u32 query_flags, __u32 *attach_flags,

View File

@ -118,6 +118,8 @@ static const char * const attach_type_name[] = {
[BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi",
[BPF_STRUCT_OPS] = "struct_ops",
[BPF_NETFILTER] = "netfilter",
[BPF_TCX_INGRESS] = "tcx_ingress",
[BPF_TCX_EGRESS] = "tcx_egress",
};
static const char * const link_type_name[] = {
@ -132,6 +134,7 @@ static const char * const link_type_name[] = {
[BPF_LINK_TYPE_KPROBE_MULTI] = "kprobe_multi",
[BPF_LINK_TYPE_STRUCT_OPS] = "struct_ops",
[BPF_LINK_TYPE_NETFILTER] = "netfilter",
[BPF_LINK_TYPE_TCX] = "tcx",
};
static const char * const map_type_name[] = {
@ -8696,9 +8699,13 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("ksyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall),
SEC_DEF("kretsyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall),
SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt),
SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE),
SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE),
SEC_DEF("action", SCHED_ACT, 0, SEC_NONE),
SEC_DEF("tc/ingress", SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE), /* alias for tcx */
SEC_DEF("tc/egress", SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE), /* alias for tcx */
SEC_DEF("tcx/ingress", SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE),
SEC_DEF("tcx/egress", SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE),
SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */
SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */
SEC_DEF("action", SCHED_ACT, 0, SEC_NONE), /* deprecated / legacy, use tcx */
SEC_DEF("tracepoint+", TRACEPOINT, 0, SEC_NONE, attach_tp),
SEC_DEF("tp+", TRACEPOINT, 0, SEC_NONE, attach_tp),
SEC_DEF("raw_tracepoint+", RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
@ -11848,11 +11855,10 @@ static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_li
}
static struct bpf_link *
bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id,
const char *target_name)
bpf_program_attach_fd(const struct bpf_program *prog,
int target_fd, const char *target_name,
const struct bpf_link_create_opts *opts)
{
DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
.target_btf_id = btf_id);
enum bpf_attach_type attach_type;
char errmsg[STRERR_BUFSIZE];
struct bpf_link *link;
@ -11870,7 +11876,7 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id
link->detach = &bpf_link__detach_fd;
attach_type = bpf_program__expected_attach_type(prog);
link_fd = bpf_link_create(prog_fd, target_fd, attach_type, &opts);
link_fd = bpf_link_create(prog_fd, target_fd, attach_type, opts);
if (link_fd < 0) {
link_fd = -errno;
free(link);
@ -11886,19 +11892,54 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id
struct bpf_link *
bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd)
{
return bpf_program__attach_fd(prog, cgroup_fd, 0, "cgroup");
return bpf_program_attach_fd(prog, cgroup_fd, "cgroup", NULL);
}
struct bpf_link *
bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
{
return bpf_program__attach_fd(prog, netns_fd, 0, "netns");
return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
}
struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
{
/* target_fd/target_ifindex use the same field in LINK_CREATE */
return bpf_program__attach_fd(prog, ifindex, 0, "xdp");
return bpf_program_attach_fd(prog, ifindex, "xdp", NULL);
}
struct bpf_link *
bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
const struct bpf_tcx_opts *opts)
{
LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
__u32 relative_id;
int relative_fd;
if (!OPTS_VALID(opts, bpf_tcx_opts))
return libbpf_err_ptr(-EINVAL);
relative_id = OPTS_GET(opts, relative_id, 0);
relative_fd = OPTS_GET(opts, relative_fd, 0);
/* validate we don't have unexpected combinations of non-zero fields */
if (!ifindex) {
pr_warn("prog '%s': target netdevice ifindex cannot be zero\n",
prog->name);
return libbpf_err_ptr(-EINVAL);
}
if (relative_fd && relative_id) {
pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
prog->name);
return libbpf_err_ptr(-EINVAL);
}
link_create_opts.tcx.expected_revision = OPTS_GET(opts, expected_revision, 0);
link_create_opts.tcx.relative_fd = relative_fd;
link_create_opts.tcx.relative_id = relative_id;
link_create_opts.flags = OPTS_GET(opts, flags, 0);
/* target_fd/target_ifindex use the same field in LINK_CREATE */
return bpf_program_attach_fd(prog, ifindex, "tcx", &link_create_opts);
}
struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
@ -11920,11 +11961,16 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
}
if (target_fd) {
LIBBPF_OPTS(bpf_link_create_opts, target_opts);
btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd);
if (btf_id < 0)
return libbpf_err_ptr(btf_id);
return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace");
target_opts.target_btf_id = btf_id;
return bpf_program_attach_fd(prog, target_fd, "freplace",
&target_opts);
} else {
/* no target, so use raw_tracepoint_open for compatibility
* with old kernels

View File

@ -733,6 +733,21 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_netfilter(const struct bpf_program *prog,
const struct bpf_netfilter_opts *opts);
struct bpf_tcx_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_tcx_opts__last_field expected_revision
LIBBPF_API struct bpf_link *
bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
const struct bpf_tcx_opts *opts);
struct bpf_map;
LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);

View File

@ -395,5 +395,7 @@ LIBBPF_1.2.0 {
LIBBPF_1.3.0 {
global:
bpf_obj_pin_opts;
bpf_prog_detach_opts;
bpf_program__attach_netfilter;
bpf_program__attach_tcx;
} LIBBPF_1.2.0;

View File

@ -70,4 +70,20 @@
}; \
})
/* Helper macro to clear and optionally reinitialize libbpf options struct
*
* Small helper macro to reset all fields and to reinitialize the common
* structure size member. Values provided by users in struct initializer-
* syntax as varargs can be provided as well to reinitialize options struct
* specific members.
*/
#define LIBBPF_OPTS_RESET(NAME, ...) \
do { \
memset(&NAME, 0, sizeof(NAME)); \
NAME = (typeof(NAME)) { \
.sz = sizeof(NAME), \
__VA_ARGS__ \
}; \
} while (0)
#endif /* __LIBBPF_LIBBPF_COMMON_H */

View File

@ -0,0 +1,72 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2023 Isovalent */
#ifndef TC_HELPERS
#define TC_HELPERS
#include <test_progs.h>
static inline __u32 id_from_prog_fd(int fd)
{
struct bpf_prog_info prog_info = {};
__u32 prog_info_len = sizeof(prog_info);
int err;
err = bpf_obj_get_info_by_fd(fd, &prog_info, &prog_info_len);
if (!ASSERT_OK(err, "id_from_prog_fd"))
return 0;
ASSERT_NEQ(prog_info.id, 0, "prog_info.id");
return prog_info.id;
}
static inline __u32 id_from_link_fd(int fd)
{
struct bpf_link_info link_info = {};
__u32 link_info_len = sizeof(link_info);
int err;
err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len);
if (!ASSERT_OK(err, "id_from_link_fd"))
return 0;
ASSERT_NEQ(link_info.id, 0, "link_info.id");
return link_info.id;
}
static inline __u32 ifindex_from_link_fd(int fd)
{
struct bpf_link_info link_info = {};
__u32 link_info_len = sizeof(link_info);
int err;
err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len);
if (!ASSERT_OK(err, "id_from_link_fd"))
return 0;
return link_info.tcx.ifindex;
}
static inline void __assert_mprog_count(int target, int expected, bool miniq, int ifindex)
{
__u32 count = 0, attach_flags = 0;
int err;
err = bpf_prog_query(ifindex, target, 0, &attach_flags,
NULL, &count);
ASSERT_EQ(count, expected, "count");
if (!expected && !miniq)
ASSERT_EQ(err, -ENOENT, "prog_query");
else
ASSERT_EQ(err, 0, "prog_query");
}
static inline void assert_mprog_count(int target, int expected)
{
__assert_mprog_count(target, expected, false, loopback);
}
static inline void assert_mprog_count_ifindex(int ifindex, int target, int expected)
{
__assert_mprog_count(target, expected, false, ifindex);
}
#endif /* TC_HELPERS */

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,40 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Isovalent */
#include <stdbool.h>
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
char LICENSE[] SEC("license") = "GPL";
bool seen_tc1;
bool seen_tc2;
bool seen_tc3;
bool seen_tc4;
SEC("tc/ingress")
int tc1(struct __sk_buff *skb)
{
seen_tc1 = true;
return TCX_NEXT;
}
SEC("tc/egress")
int tc2(struct __sk_buff *skb)
{
seen_tc2 = true;
return TCX_NEXT;
}
SEC("tc/egress")
int tc3(struct __sk_buff *skb)
{
seen_tc3 = true;
return TCX_NEXT;
}
SEC("tc/egress")
int tc4(struct __sk_buff *skb)
{
seen_tc4 = true;
return TCX_NEXT;
}