2022-07-21 21:42:37 +08:00
|
|
|
=============================
|
|
|
|
BPF Kernel Functions (kfuncs)
|
|
|
|
=============================
|
|
|
|
|
|
|
|
1. Introduction
|
|
|
|
===============
|
|
|
|
|
|
|
|
BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
|
|
|
|
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
|
|
|
|
kfuncs do not have a stable interface and can change from one kernel release to
|
|
|
|
another. Hence, BPF programs need to be updated in response to changes in the
|
|
|
|
kernel.
|
|
|
|
|
|
|
|
2. Defining a kfunc
|
|
|
|
===================
|
|
|
|
|
|
|
|
There are two ways to expose a kernel function to BPF programs, either make an
|
|
|
|
existing function in the kernel visible, or add a new wrapper for BPF. In both
|
|
|
|
cases, care must be taken that BPF program can only call such function in a
|
|
|
|
valid context. To enforce this, visibility of a kfunc can be per program type.
|
|
|
|
|
|
|
|
If you are not creating a BPF wrapper for existing kernel function, skip ahead
|
|
|
|
to :ref:`BPF_kfunc_nodef`.
|
|
|
|
|
|
|
|
2.1 Creating a wrapper kfunc
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
When defining a wrapper kfunc, the wrapper function should have extern linkage.
|
|
|
|
This prevents the compiler from optimizing away dead code, as this wrapper kfunc
|
|
|
|
is not invoked anywhere in the kernel itself. It is not necessary to provide a
|
|
|
|
prototype in a header for the wrapper kfunc.
|
|
|
|
|
|
|
|
An example is given below::
|
|
|
|
|
|
|
|
/* Disables missing prototype warnings */
|
|
|
|
__diag_push();
|
|
|
|
__diag_ignore_all("-Wmissing-prototypes",
|
|
|
|
"Global kfuncs as their definitions will be in BTF");
|
|
|
|
|
|
|
|
struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
|
|
|
|
{
|
|
|
|
return find_get_task_by_vpid(nr);
|
|
|
|
}
|
|
|
|
|
|
|
|
__diag_pop();
|
|
|
|
|
|
|
|
A wrapper kfunc is often needed when we need to annotate parameters of the
|
|
|
|
kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
|
|
|
|
registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
|
|
|
|
|
|
|
|
2.2 Annotating kfunc parameters
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
Similar to BPF helpers, there is sometime need for additional context required
|
|
|
|
by the verifier to make the usage of kernel functions safer and more useful.
|
|
|
|
Hence, we can annotate a parameter by suffixing the name of the argument of the
|
|
|
|
kfunc with a __tag, where tag may be one of the supported annotations.
|
|
|
|
|
|
|
|
2.2.1 __sz Annotation
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
This annotation is used to indicate a memory and size pair in the argument list.
|
|
|
|
An example is given below::
|
|
|
|
|
|
|
|
void bpf_memzero(void *mem, int mem__sz)
|
|
|
|
{
|
|
|
|
...
|
|
|
|
}
|
|
|
|
|
|
|
|
Here, the verifier will treat first argument as a PTR_TO_MEM, and second
|
|
|
|
argument as its size. By default, without __sz annotation, the size of the type
|
|
|
|
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
|
|
|
|
pointer.
|
|
|
|
|
2022-11-18 09:56:02 +08:00
|
|
|
2.2.2 __k Annotation
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
This annotation is only understood for scalar arguments, where it indicates that
|
|
|
|
the verifier must check the scalar argument to be a known constant, which does
|
|
|
|
not indicate a size parameter, and the value of the constant is relevant to the
|
|
|
|
safety of the program.
|
|
|
|
|
|
|
|
An example is given below::
|
|
|
|
|
|
|
|
void *bpf_obj_new(u32 local_type_id__k, ...)
|
|
|
|
{
|
|
|
|
...
|
|
|
|
}
|
|
|
|
|
|
|
|
Here, bpf_obj_new uses local_type_id argument to find out the size of that type
|
|
|
|
ID in program's BTF and return a sized pointer to it. Each type ID will have a
|
|
|
|
distinct size, hence it is crucial to treat each such call as distinct when
|
|
|
|
values don't match during verifier state pruning checks.
|
|
|
|
|
|
|
|
Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
|
|
|
|
size parameter, and the value of the constant matters for program safety, __k
|
|
|
|
suffix should be used.
|
|
|
|
|
2022-07-21 21:42:37 +08:00
|
|
|
.. _BPF_kfunc_nodef:
|
|
|
|
|
|
|
|
2.3 Using an existing kernel function
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
When an existing function in the kernel is fit for consumption by BPF programs,
|
|
|
|
it can be directly registered with the BPF subsystem. However, care must still
|
|
|
|
be taken to review the context in which it will be invoked by the BPF program
|
|
|
|
and whether it is safe to do so.
|
|
|
|
|
|
|
|
2.4 Annotating kfuncs
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
In addition to kfuncs' arguments, verifier may need more information about the
|
|
|
|
type of kfunc(s) being registered with the BPF subsystem. To do so, we define
|
|
|
|
flags on a set of kfuncs as follows::
|
|
|
|
|
|
|
|
BTF_SET8_START(bpf_task_set)
|
|
|
|
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
|
|
|
|
BTF_SET8_END(bpf_task_set)
|
|
|
|
|
|
|
|
This set encodes the BTF ID of each kfunc listed above, and encodes the flags
|
|
|
|
along with it. Ofcourse, it is also allowed to specify no flags.
|
|
|
|
|
|
|
|
2.4.1 KF_ACQUIRE flag
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
|
|
|
|
refcounted object. The verifier will then ensure that the pointer to the object
|
|
|
|
is eventually released using a release kfunc, or transferred to a map using a
|
|
|
|
referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
|
|
|
|
loading of the BPF program until no lingering references remain in all possible
|
|
|
|
explored states of the program.
|
|
|
|
|
|
|
|
2.4.2 KF_RET_NULL flag
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
|
|
|
|
may be NULL. Hence, it forces the user to do a NULL check on the pointer
|
|
|
|
returned from the kfunc before making use of it (dereferencing or passing to
|
|
|
|
another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
|
|
|
|
both are orthogonal to each other.
|
|
|
|
|
|
|
|
2.4.3 KF_RELEASE flag
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
|
|
|
|
passed in to it. There can be only one referenced pointer that can be passed in.
|
|
|
|
All copies of the pointer being released are invalidated as a result of invoking
|
|
|
|
kfunc with this flag.
|
|
|
|
|
|
|
|
2.4.4 KF_KPTR_GET flag
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
|
|
|
|
as a pointer to kptr, safely increments the refcount of the object it points to,
|
|
|
|
and returns a reference to the user. The rest of the arguments may be normal
|
|
|
|
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
|
|
|
|
KF_ACQUIRE and KF_RET_NULL flags.
|
|
|
|
|
|
|
|
2.4.5 KF_TRUSTED_ARGS flag
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.
The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.
There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:
SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
struct task_struct *task,
u64 clone_flags)
{
struct task_struct *acquired, *nested;
nested = task->last_wakee;
/* Would not be rejected by the verifier. */
acquired = bpf_task_acquire(nested);
if (!acquired)
return 0;
bpf_task_release(acquired);
return 0;
}
To address this, this patch defines a new type flag called PTR_TRUSTED
which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
passed directly from the kernel as a tracepoint or struct_ops callback
argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
pointer is no longer PTR_TRUSTED. From the example above, the struct
task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
obtained from 'task->last_wakee' is not PTR_TRUSTED.
A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will add selftests to validate.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20 13:10:02 +08:00
|
|
|
indicates that the all pointer arguments are valid, and that all pointers to
|
|
|
|
BTF objects have been passed in their unmodified form (that is, at a zero
|
|
|
|
offset, and without having been obtained from walking another pointer).
|
|
|
|
|
|
|
|
There are two types of pointers to kernel objects which are considered "valid":
|
|
|
|
|
|
|
|
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
|
|
|
|
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
|
|
|
|
|
|
|
|
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
|
|
|
|
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
|
|
|
|
|
|
|
|
The definition of "valid" pointers is subject to change at any time, and has
|
|
|
|
absolutely no ABI stability guarantees.
|
2022-07-21 21:42:37 +08:00
|
|
|
|
2022-08-06 05:48:14 +08:00
|
|
|
2.4.6 KF_SLEEPABLE flag
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
|
|
|
|
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
|
|
|
|
|
2022-08-10 14:59:03 +08:00
|
|
|
2.4.7 KF_DESTRUCTIVE flag
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
|
|
|
|
destructive to the system. For example such a call can result in system
|
|
|
|
rebooting or panicking. Due to this additional restrictions apply to these
|
|
|
|
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
|
|
|
|
added later.
|
|
|
|
|
2022-12-04 02:46:13 +08:00
|
|
|
2.4.8 KF_RCU flag
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
|
|
|
|
When used together with KF_ACQUIRE, it indicates the kfunc should have a
|
|
|
|
single argument which must be a trusted argument or a MEM_RCU pointer.
|
|
|
|
The argument may have reference count of 0 and the kfunc must take this
|
|
|
|
into consideration.
|
|
|
|
|
2022-07-21 21:42:37 +08:00
|
|
|
2.5 Registering the kfuncs
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
Once the kfunc is prepared for use, the final step to making it visible is
|
|
|
|
registering it with the BPF subsystem. Registration is done per BPF program
|
|
|
|
type. An example is shown below::
|
|
|
|
|
|
|
|
BTF_SET8_START(bpf_task_set)
|
|
|
|
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
|
|
|
|
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
|
|
|
|
BTF_SET8_END(bpf_task_set)
|
|
|
|
|
|
|
|
static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.set = &bpf_task_set,
|
|
|
|
};
|
|
|
|
|
|
|
|
static int init_subsystem(void)
|
|
|
|
{
|
|
|
|
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
|
|
|
|
}
|
|
|
|
late_initcall(init_subsystem);
|