Merge branch 'bpf-doc-improvements'
Andrii Nakryiko says: ==================== A bunch of BPF-related docs typo, wording and formatting fixes. v1->v2: - split off non-documentation changes into separate patchset ==================== Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit is contained in:
commit
4269f69bc9
|
@ -36,27 +36,27 @@ consideration important quirks of other architectures) and
|
||||||
defines calling convention that is compatible with C calling
|
defines calling convention that is compatible with C calling
|
||||||
convention of the linux kernel on those architectures.
|
convention of the linux kernel on those architectures.
|
||||||
|
|
||||||
Q: can multiple return values be supported in the future?
|
Q: Can multiple return values be supported in the future?
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
A: NO. BPF allows only register R0 to be used as return value.
|
A: NO. BPF allows only register R0 to be used as return value.
|
||||||
|
|
||||||
Q: can more than 5 function arguments be supported in the future?
|
Q: Can more than 5 function arguments be supported in the future?
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
A: NO. BPF calling convention only allows registers R1-R5 to be used
|
A: NO. BPF calling convention only allows registers R1-R5 to be used
|
||||||
as arguments. BPF is not a standalone instruction set.
|
as arguments. BPF is not a standalone instruction set.
|
||||||
(unlike x64 ISA that allows msft, cdecl and other conventions)
|
(unlike x64 ISA that allows msft, cdecl and other conventions)
|
||||||
|
|
||||||
Q: can BPF programs access instruction pointer or return address?
|
Q: Can BPF programs access instruction pointer or return address?
|
||||||
-----------------------------------------------------------------
|
-----------------------------------------------------------------
|
||||||
A: NO.
|
A: NO.
|
||||||
|
|
||||||
Q: can BPF programs access stack pointer ?
|
Q: Can BPF programs access stack pointer ?
|
||||||
------------------------------------------
|
------------------------------------------
|
||||||
A: NO.
|
A: NO.
|
||||||
|
|
||||||
Only frame pointer (register R10) is accessible.
|
Only frame pointer (register R10) is accessible.
|
||||||
From compiler point of view it's necessary to have stack pointer.
|
From compiler point of view it's necessary to have stack pointer.
|
||||||
For example LLVM defines register R11 as stack pointer in its
|
For example, LLVM defines register R11 as stack pointer in its
|
||||||
BPF backend, but it makes sure that generated code never uses it.
|
BPF backend, but it makes sure that generated code never uses it.
|
||||||
|
|
||||||
Q: Does C-calling convention diminishes possible use cases?
|
Q: Does C-calling convention diminishes possible use cases?
|
||||||
|
@ -66,8 +66,8 @@ A: YES.
|
||||||
BPF design forces addition of major functionality in the form
|
BPF design forces addition of major functionality in the form
|
||||||
of kernel helper functions and kernel objects like BPF maps with
|
of kernel helper functions and kernel objects like BPF maps with
|
||||||
seamless interoperability between them. It lets kernel call into
|
seamless interoperability between them. It lets kernel call into
|
||||||
BPF programs and programs call kernel helpers with zero overhead.
|
BPF programs and programs call kernel helpers with zero overhead,
|
||||||
As all of them were native C code. That is particularly the case
|
as all of them were native C code. That is particularly the case
|
||||||
for JITed BPF programs that are indistinguishable from
|
for JITed BPF programs that are indistinguishable from
|
||||||
native kernel C code.
|
native kernel C code.
|
||||||
|
|
||||||
|
@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
|
||||||
------------------------------------------------------------------------
|
------------------------------------------------------------------------
|
||||||
A: Soft yes.
|
A: Soft yes.
|
||||||
|
|
||||||
At least for now until BPF core has support for
|
At least for now, until BPF core has support for
|
||||||
bpf-to-bpf calls, indirect calls, loops, global variables,
|
bpf-to-bpf calls, indirect calls, loops, global variables,
|
||||||
jump tables, read only sections and all other normal constructs
|
jump tables, read-only sections, and all other normal constructs
|
||||||
that C code can produce.
|
that C code can produce.
|
||||||
|
|
||||||
Q: Can loops be supported in a safe way?
|
Q: Can loops be supported in a safe way?
|
||||||
|
@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like?
|
||||||
A: This was necessary to avoid introducing flags into ISA which are
|
A: This was necessary to avoid introducing flags into ISA which are
|
||||||
impossible to make generic and efficient across CPU architectures.
|
impossible to make generic and efficient across CPU architectures.
|
||||||
|
|
||||||
Q: why BPF_DIV instruction doesn't map to x64 div?
|
Q: Why BPF_DIV instruction doesn't map to x64 div?
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
A: Because if we picked one-to-one relationship to x64 it would have made
|
A: Because if we picked one-to-one relationship to x64 it would have made
|
||||||
it more complicated to support on arm64 and other archs. Also it
|
it more complicated to support on arm64 and other archs. Also it
|
||||||
needs div-by-zero runtime check.
|
needs div-by-zero runtime check.
|
||||||
|
|
||||||
Q: why there is no BPF_SDIV for signed divide operation?
|
Q: Why there is no BPF_SDIV for signed divide operation?
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
A: Because it would be rarely used. llvm errors in such case and
|
A: Because it would be rarely used. llvm errors in such case and
|
||||||
prints a suggestion to use unsigned divide instead
|
prints a suggestion to use unsigned divide instead.
|
||||||
|
|
||||||
Q: Why BPF has implicit prologue and epilogue?
|
Q: Why BPF has implicit prologue and epilogue?
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
|
@ -5,43 +5,35 @@ BPF Type Format (BTF)
|
||||||
1. Introduction
|
1. Introduction
|
||||||
***************
|
***************
|
||||||
|
|
||||||
BTF (BPF Type Format) is the meta data format which
|
BTF (BPF Type Format) is the metadata format which encodes the debug info
|
||||||
encodes the debug info related to BPF program/map.
|
related to BPF program/map. The name BTF was used initially to describe data
|
||||||
The name BTF was used initially to describe
|
types. The BTF was later extended to include function info for defined
|
||||||
data types. The BTF was later extended to include
|
subroutines, and line info for source/line information.
|
||||||
function info for defined subroutines, and line info
|
|
||||||
for source/line information.
|
|
||||||
|
|
||||||
The debug info is used for map pretty print, function
|
The debug info is used for map pretty print, function signature, etc. The
|
||||||
signature, etc. The function signature enables better
|
function signature enables better bpf program/function kernel symbol. The line
|
||||||
bpf program/function kernel symbol.
|
info helps generate source annotated translated byte code, jited code and
|
||||||
The line info helps generate
|
verifier log.
|
||||||
source annotated translated byte code, jited code
|
|
||||||
and verifier log.
|
|
||||||
|
|
||||||
The BTF specification contains two parts,
|
The BTF specification contains two parts,
|
||||||
* BTF kernel API
|
* BTF kernel API
|
||||||
* BTF ELF file format
|
* BTF ELF file format
|
||||||
|
|
||||||
The kernel API is the contract between
|
The kernel API is the contract between user space and kernel. The kernel
|
||||||
user space and kernel. The kernel verifies
|
verifies the BTF info before using it. The ELF file format is a user space
|
||||||
the BTF info before using it.
|
contract between ELF file and libbpf loader.
|
||||||
The ELF file format is a user space contract
|
|
||||||
between ELF file and libbpf loader.
|
|
||||||
|
|
||||||
The type and string sections are part of the
|
The type and string sections are part of the BTF kernel API, describing the
|
||||||
BTF kernel API, describing the debug info
|
debug info (mostly types related) referenced by the bpf program. These two
|
||||||
(mostly types related) referenced by the bpf program.
|
sections are discussed in details in :ref:`BTF_Type_String`.
|
||||||
These two sections are discussed in
|
|
||||||
details in :ref:`BTF_Type_String`.
|
|
||||||
|
|
||||||
.. _BTF_Type_String:
|
.. _BTF_Type_String:
|
||||||
|
|
||||||
2. BTF Type and String Encoding
|
2. BTF Type and String Encoding
|
||||||
*******************************
|
*******************************
|
||||||
|
|
||||||
The file ``include/uapi/linux/btf.h`` provides high
|
The file ``include/uapi/linux/btf.h`` provides high-level definition of how
|
||||||
level definition on how types/strings are encoded.
|
types/strings are encoded.
|
||||||
|
|
||||||
The beginning of data blob must be::
|
The beginning of data blob must be::
|
||||||
|
|
||||||
|
@ -59,25 +51,23 @@ The beginning of data blob must be::
|
||||||
};
|
};
|
||||||
|
|
||||||
The magic is ``0xeB9F``, which has different encoding for big and little
|
The magic is ``0xeB9F``, which has different encoding for big and little
|
||||||
endian system, and can be used to test whether BTF is generated for
|
endian systems, and can be used to test whether BTF is generated for big- or
|
||||||
big or little endian target.
|
little-endian target. The ``btf_header`` is designed to be extensible with
|
||||||
The btf_header is designed to be extensible with hdr_len equal to
|
``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is
|
||||||
``sizeof(struct btf_header)`` when the data blob is generated.
|
generated.
|
||||||
|
|
||||||
2.1 String Encoding
|
2.1 String Encoding
|
||||||
===================
|
===================
|
||||||
|
|
||||||
The first string in the string section must be a null string.
|
The first string in the string section must be a null string. The rest of
|
||||||
The rest of string table is a concatenation of other null-treminated
|
string table is a concatenation of other null-terminated strings.
|
||||||
strings.
|
|
||||||
|
|
||||||
2.2 Type Encoding
|
2.2 Type Encoding
|
||||||
=================
|
=================
|
||||||
|
|
||||||
The type id ``0`` is reserved for ``void`` type.
|
The type id ``0`` is reserved for ``void`` type. The type section is parsed
|
||||||
The type section is parsed sequentially and the type id is assigned to
|
sequentially and type id is assigned to each recognized type starting from id
|
||||||
each recognized type starting from id ``1``.
|
``1``. Currently, the following types are supported::
|
||||||
Currently, the following types are supported::
|
|
||||||
|
|
||||||
#define BTF_KIND_INT 1 /* Integer */
|
#define BTF_KIND_INT 1 /* Integer */
|
||||||
#define BTF_KIND_PTR 2 /* Pointer */
|
#define BTF_KIND_PTR 2 /* Pointer */
|
||||||
|
@ -122,9 +112,9 @@ Each type contains the following common data::
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
For certain kinds, the common data are followed by kind specific data.
|
For certain kinds, the common data are followed by kind-specific data. The
|
||||||
The ``name_off`` in ``struct btf_type`` specifies the offset in the string table.
|
``name_off`` in ``struct btf_type`` specifies the offset in the string table.
|
||||||
The following details encoding of each kind.
|
The following sections detail encoding of each kind.
|
||||||
|
|
||||||
2.2.1 BTF_KIND_INT
|
2.2.1 BTF_KIND_INT
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -136,7 +126,7 @@ The following details encoding of each kind.
|
||||||
* ``info.vlen``: 0
|
* ``info.vlen``: 0
|
||||||
* ``size``: the size of the int type in bytes.
|
* ``size``: the size of the int type in bytes.
|
||||||
|
|
||||||
``btf_type`` is followed by a ``u32`` with following bits arrangement::
|
``btf_type`` is followed by a ``u32`` with the following bits arrangement::
|
||||||
|
|
||||||
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
|
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
|
||||||
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
|
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
|
||||||
|
@ -148,39 +138,33 @@ The ``BTF_INT_ENCODING`` has the following attributes::
|
||||||
#define BTF_INT_CHAR (1 << 1)
|
#define BTF_INT_CHAR (1 << 1)
|
||||||
#define BTF_INT_BOOL (1 << 2)
|
#define BTF_INT_BOOL (1 << 2)
|
||||||
|
|
||||||
The ``BTF_INT_ENCODING()`` provides extra information, signness,
|
The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or
|
||||||
char, or bool, for the int type. The char and bool encoding
|
bool, for the int type. The char and bool encoding are mostly useful for
|
||||||
are mostly useful for pretty print. At most one encoding can
|
pretty print. At most one encoding can be specified for the int type.
|
||||||
be specified for the int type.
|
|
||||||
|
|
||||||
The ``BTF_INT_BITS()`` specifies the number of actual bits held by
|
The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int
|
||||||
this int type. For example, a 4-bit bitfield encodes
|
type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4.
|
||||||
``BTF_INT_BITS()`` equals to 4. The ``btf_type.size * 8``
|
The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()``
|
||||||
must be equal to or greater than ``BTF_INT_BITS()`` for the type.
|
for the type. The maximum value of ``BTF_INT_BITS()`` is 128.
|
||||||
The maximum value of ``BTF_INT_BITS()`` is 128.
|
|
||||||
|
|
||||||
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to
|
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
|
||||||
calculate values for this int. For example, a bitfield struct
|
for this int. For example, a bitfield struct member has: * btf member bit
|
||||||
member has
|
offset 100 from the start of the structure, * btf member pointing to an int
|
||||||
|
type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
|
||||||
|
|
||||||
* btf member bit offset 100 from the start of the structure,
|
Then in the struct memory layout, this member will occupy ``4`` bits starting
|
||||||
* btf member pointing to an int type,
|
from bits ``100 + 2 = 102``.
|
||||||
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
|
|
||||||
|
|
||||||
Then in the struct memory layout, this member will occupy
|
Alternatively, the bitfield struct member can be the following to access the
|
||||||
``4`` bits starting from bits ``100 + 2 = 102``.
|
same bits as the above:
|
||||||
|
|
||||||
Alternatively, the bitfield struct member can be the following to
|
|
||||||
access the same bits as the above:
|
|
||||||
|
|
||||||
* btf member bit offset 102,
|
* btf member bit offset 102,
|
||||||
* btf member pointing to an int type,
|
* btf member pointing to an int type,
|
||||||
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
|
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
|
||||||
|
|
||||||
The original intention of ``BTF_INT_OFFSET()`` is to provide
|
The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of
|
||||||
flexibility of bitfield encoding.
|
bitfield encoding. Currently, both llvm and pahole generate
|
||||||
Currently, both llvm and pahole generates ``BTF_INT_OFFSET() = 0``
|
``BTF_INT_OFFSET() = 0`` for all int types.
|
||||||
for all int types.
|
|
||||||
|
|
||||||
2.2.2 BTF_KIND_PTR
|
2.2.2 BTF_KIND_PTR
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -204,7 +188,7 @@ No additional type data follow ``btf_type``.
|
||||||
* ``info.vlen``: 0
|
* ``info.vlen``: 0
|
||||||
* ``size/type``: 0, not used
|
* ``size/type``: 0, not used
|
||||||
|
|
||||||
btf_type is followed by one "struct btf_array"::
|
``btf_type`` is followed by one ``struct btf_array``::
|
||||||
|
|
||||||
struct btf_array {
|
struct btf_array {
|
||||||
__u32 type;
|
__u32 type;
|
||||||
|
@ -217,27 +201,25 @@ The ``struct btf_array`` encoding:
|
||||||
* ``index_type``: the index type
|
* ``index_type``: the index type
|
||||||
* ``nelems``: the number of elements for this array (``0`` is also allowed).
|
* ``nelems``: the number of elements for this array (``0`` is also allowed).
|
||||||
|
|
||||||
The ``index_type`` can be any regular int types
|
The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``,
|
||||||
(u8, u16, u32, u64, unsigned __int128).
|
``u64``, ``unsigned __int128``). The original design of including
|
||||||
The original design of including ``index_type`` follows dwarf
|
``index_type`` follows DWARF, which has an ``index_type`` for its array type.
|
||||||
which has a ``index_type`` for its array type.
|
|
||||||
Currently in BTF, beyond type verification, the ``index_type`` is not used.
|
Currently in BTF, beyond type verification, the ``index_type`` is not used.
|
||||||
|
|
||||||
The ``struct btf_array`` allows chaining through element type to represent
|
The ``struct btf_array`` allows chaining through element type to represent
|
||||||
multiple dimensional arrays. For example, ``int a[5][6]``, the following
|
multidimensional arrays. For example, for ``int a[5][6]``, the following type
|
||||||
type system illustrates the chaining:
|
information illustrates the chaining:
|
||||||
|
|
||||||
* [1]: int
|
* [1]: int
|
||||||
* [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6``
|
* [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6``
|
||||||
* [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5``
|
* [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5``
|
||||||
|
|
||||||
Currently, both pahole and llvm collapse multiple dimensional array
|
Currently, both pahole and llvm collapse multidimensional array into
|
||||||
into one dimensional array, e.g., ``a[5][6]``, the btf_array.nelems
|
one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is
|
||||||
equal to ``30``. This is because the original use case is map pretty
|
equal to ``30``. This is because the original use case is map pretty print
|
||||||
print where the whole array is dumped out so one dimensional array
|
where the whole array is dumped out so one-dimensional array is enough. As
|
||||||
is enough. As more BTF usage is explored, pahole and llvm can be
|
more BTF usage is explored, pahole and llvm can be changed to generate proper
|
||||||
changed to generate proper chained representation for
|
chained representation for multidimensional arrays.
|
||||||
multiple dimensional arrays.
|
|
||||||
|
|
||||||
2.2.4 BTF_KIND_STRUCT
|
2.2.4 BTF_KIND_STRUCT
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -264,28 +246,26 @@ multiple dimensional arrays.
|
||||||
* ``type``: the member type
|
* ``type``: the member type
|
||||||
* ``offset``: <see below>
|
* ``offset``: <see below>
|
||||||
|
|
||||||
If the type info ``kind_flag`` is not set, the offset contains
|
If the type info ``kind_flag`` is not set, the offset contains only bit offset
|
||||||
only bit offset of the member. Note that the base type of the
|
of the member. Note that the base type of the bitfield can only be int or enum
|
||||||
bitfield can only be int or enum type. If the bitfield size
|
type. If the bitfield size is 32, the base type can be either int or enum
|
||||||
is 32, the base type can be either int or enum type.
|
type. If the bitfield size is not 32, the base type must be int, and int type
|
||||||
If the bitfield size is not 32, the base type must be int,
|
``BTF_INT_BITS()`` encodes the bitfield size.
|
||||||
and int type ``BTF_INT_BITS()`` encodes the bitfield size.
|
|
||||||
|
|
||||||
If the ``kind_flag`` is set, the ``btf_member.offset``
|
If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member
|
||||||
contains both member bitfield size and bit offset. The
|
bitfield size and bit offset. The bitfield size and bit offset are calculated
|
||||||
bitfield size and bit offset are calculated as below.::
|
as below.::
|
||||||
|
|
||||||
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
|
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
|
||||||
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
|
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
|
||||||
|
|
||||||
In this case, if the base type is an int type, it must
|
In this case, if the base type is an int type, it must be a regular int type:
|
||||||
be a regular int type:
|
|
||||||
|
|
||||||
* ``BTF_INT_OFFSET()`` must be 0.
|
* ``BTF_INT_OFFSET()`` must be 0.
|
||||||
* ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
|
* ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
|
||||||
|
|
||||||
The following kernel patch introduced ``kind_flag`` and
|
The following kernel patch introduced ``kind_flag`` and explained why both
|
||||||
explained why both modes exist:
|
modes exist:
|
||||||
|
|
||||||
https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3
|
https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3
|
||||||
|
|
||||||
|
@ -382,11 +362,11 @@ No additional type data follow ``btf_type``.
|
||||||
|
|
||||||
No additional type data follow ``btf_type``.
|
No additional type data follow ``btf_type``.
|
||||||
|
|
||||||
A BTF_KIND_FUNC defines, not a type, but a subprogram (function) whose
|
A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose
|
||||||
signature is defined by ``type``. The subprogram is thus an instance of
|
signature is defined by ``type``. The subprogram is thus an instance of that
|
||||||
that type. The BTF_KIND_FUNC may in turn be referenced by a func_info in
|
type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
|
||||||
the :ref:`BTF_Ext_Section` (ELF) or in the arguments to
|
:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
|
||||||
:ref:`BPF_Prog_Load` (ABI).
|
(ABI).
|
||||||
|
|
||||||
2.2.13 BTF_KIND_FUNC_PROTO
|
2.2.13 BTF_KIND_FUNC_PROTO
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -405,13 +385,13 @@ the :ref:`BTF_Ext_Section` (ELF) or in the arguments to
|
||||||
__u32 type;
|
__u32 type;
|
||||||
};
|
};
|
||||||
|
|
||||||
If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type,
|
If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then
|
||||||
then ``btf_param.name_off`` must point to a valid C identifier
|
``btf_param.name_off`` must point to a valid C identifier except for the
|
||||||
except for the possible last argument representing the variable
|
possible last argument representing the variable argument. The btf_param.type
|
||||||
argument. The btf_param.type refers to parameter type.
|
refers to parameter type.
|
||||||
|
|
||||||
If the function has variable arguments, the last parameter
|
If the function has variable arguments, the last parameter is encoded with
|
||||||
is encoded with ``name_off = 0`` and ``type = 0``.
|
``name_off = 0`` and ``type = 0``.
|
||||||
|
|
||||||
3. BTF Kernel API
|
3. BTF Kernel API
|
||||||
*****************
|
*****************
|
||||||
|
@ -459,10 +439,9 @@ The workflow typically looks like:
|
||||||
3.1 BPF_BTF_LOAD
|
3.1 BPF_BTF_LOAD
|
||||||
================
|
================
|
||||||
|
|
||||||
Load a blob of BTF data into kernel. A blob of data
|
Load a blob of BTF data into kernel. A blob of data, described in
|
||||||
described in :ref:`BTF_Type_String`
|
:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd``
|
||||||
can be directly loaded into the kernel.
|
is returned to a userspace.
|
||||||
A ``btf_fd`` returns to userspace.
|
|
||||||
|
|
||||||
3.2 BPF_MAP_CREATE
|
3.2 BPF_MAP_CREATE
|
||||||
==================
|
==================
|
||||||
|
@ -484,18 +463,18 @@ In libbpf, the map can be defined with extra annotation like below:
|
||||||
};
|
};
|
||||||
BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
|
BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
|
||||||
|
|
||||||
Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name,
|
Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and
|
||||||
key and value types for the map.
|
value types for the map. During ELF parsing, libbpf is able to extract
|
||||||
During ELF parsing, libbpf is able to extract key/value type_id's
|
key/value type_id's and assign them to BPF_MAP_CREATE attributes
|
||||||
and assigned them to BPF_MAP_CREATE attributes automatically.
|
automatically.
|
||||||
|
|
||||||
.. _BPF_Prog_Load:
|
.. _BPF_Prog_Load:
|
||||||
|
|
||||||
3.3 BPF_PROG_LOAD
|
3.3 BPF_PROG_LOAD
|
||||||
=================
|
=================
|
||||||
|
|
||||||
During prog_load, func_info and line_info can be passed to kernel with
|
During prog_load, func_info and line_info can be passed to kernel with proper
|
||||||
proper values for the following attributes:
|
values for the following attributes:
|
||||||
::
|
::
|
||||||
|
|
||||||
__u32 insn_cnt;
|
__u32 insn_cnt;
|
||||||
|
@ -522,9 +501,9 @@ The func_info and line_info are an array of below, respectively.::
|
||||||
__u32 line_col; /* line number and column number */
|
__u32 line_col; /* line number and column number */
|
||||||
};
|
};
|
||||||
|
|
||||||
func_info_rec_size is the size of each func_info record, and line_info_rec_size
|
func_info_rec_size is the size of each func_info record, and
|
||||||
is the size of each line_info record. Passing the record size to kernel make
|
line_info_rec_size is the size of each line_info record. Passing the record
|
||||||
it possible to extend the record itself in the future.
|
size to kernel make it possible to extend the record itself in the future.
|
||||||
|
|
||||||
Below are requirements for func_info:
|
Below are requirements for func_info:
|
||||||
* func_info[0].insn_off must be 0.
|
* func_info[0].insn_off must be 0.
|
||||||
|
@ -532,7 +511,7 @@ Below are requirements for func_info:
|
||||||
bpf func boundaries.
|
bpf func boundaries.
|
||||||
|
|
||||||
Below are requirements for line_info:
|
Below are requirements for line_info:
|
||||||
* the first insn in each func must points to a line_info record.
|
* the first insn in each func must have a line_info record pointing to it.
|
||||||
* the line_info insn_off is in strictly increasing order.
|
* the line_info insn_off is in strictly increasing order.
|
||||||
|
|
||||||
For line_info, the line number and column number are defined as below:
|
For line_info, the line number and column number are defined as below:
|
||||||
|
@ -543,40 +522,38 @@ For line_info, the line number and column number are defined as below:
|
||||||
|
|
||||||
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
|
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
|
||||||
|
|
||||||
In kernel, every loaded program, map or btf has a unique id.
|
In kernel, every loaded program, map or btf has a unique id. The id won't
|
||||||
The id won't change during the life time of the program, map or btf.
|
change during the lifetime of a program, map, or btf.
|
||||||
|
|
||||||
The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID
|
The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for
|
||||||
returns all id's, one for each command, to user space, for bpf
|
each command, to user space, for bpf program or maps, respectively, so an
|
||||||
program or maps,
|
inspection tool can inspect all programs and maps.
|
||||||
so the inspection tool can inspect all programs and maps.
|
|
||||||
|
|
||||||
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
|
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
|
||||||
|
|
||||||
The introspection tool cannot use id to get details about program or maps.
|
An introspection tool cannot use id to get details about program or maps.
|
||||||
A file descriptor needs to be obtained first for reference counting purpose.
|
A file descriptor needs to be obtained first for reference-counting purpose.
|
||||||
|
|
||||||
3.6 BPF_OBJ_GET_INFO_BY_FD
|
3.6 BPF_OBJ_GET_INFO_BY_FD
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
Once a program/map fd is acquired, the introspection tool can
|
Once a program/map fd is acquired, an introspection tool can get the detailed
|
||||||
get the detailed information from kernel about this fd,
|
information from kernel about this fd, some of which are BTF-related. For
|
||||||
some of which is btf related. For example,
|
example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids.
|
||||||
``bpf_map_info`` returns ``btf_id``, key/value type id.
|
``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated
|
||||||
``bpf_prog_info`` returns ``btf_id``, func_info and line info
|
bpf byte codes, and jited_line_info.
|
||||||
for translated bpf byte codes, and jited_line_info.
|
|
||||||
|
|
||||||
3.7 BPF_BTF_GET_FD_BY_ID
|
3.7 BPF_BTF_GET_FD_BY_ID
|
||||||
========================
|
========================
|
||||||
|
|
||||||
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``,
|
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf
|
||||||
bpf syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd.
|
syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with
|
||||||
Then, with command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally
|
command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the
|
||||||
loaded into the kernel with BPF_BTF_LOAD, can be retrieved.
|
kernel with BPF_BTF_LOAD, can be retrieved.
|
||||||
|
|
||||||
With the btf blob, ``bpf_map_info`` and ``bpf_prog_info``, the introspection
|
With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection
|
||||||
tool has full btf knowledge and is able to pretty print map key/values,
|
tool has full btf knowledge and is able to pretty print map key/values, dump
|
||||||
dump func signatures, dump line info along with byte/jit codes.
|
func signatures and line info, along with byte/jit codes.
|
||||||
|
|
||||||
4. ELF File Format Interface
|
4. ELF File Format Interface
|
||||||
****************************
|
****************************
|
||||||
|
@ -584,19 +561,19 @@ dump func signatures, dump line info along with byte/jit codes.
|
||||||
4.1 .BTF section
|
4.1 .BTF section
|
||||||
================
|
================
|
||||||
|
|
||||||
The .BTF section contains type and string data. The format of this section
|
The .BTF section contains type and string data. The format of this section is
|
||||||
is same as the one describe in :ref:`BTF_Type_String`.
|
same as the one describe in :ref:`BTF_Type_String`.
|
||||||
|
|
||||||
.. _BTF_Ext_Section:
|
.. _BTF_Ext_Section:
|
||||||
|
|
||||||
4.2 .BTF.ext section
|
4.2 .BTF.ext section
|
||||||
====================
|
====================
|
||||||
|
|
||||||
The .BTF.ext section encodes func_info and line_info which
|
The .BTF.ext section encodes func_info and line_info which needs loader
|
||||||
needs loader manipulation before loading into the kernel.
|
manipulation before loading into the kernel.
|
||||||
|
|
||||||
The specification for .BTF.ext section is defined at
|
The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h``
|
||||||
``tools/lib/bpf/btf.h`` and ``tools/lib/bpf/btf.c``.
|
and ``tools/lib/bpf/btf.c``.
|
||||||
|
|
||||||
The current header of .BTF.ext section::
|
The current header of .BTF.ext section::
|
||||||
|
|
||||||
|
@ -613,9 +590,9 @@ The current header of .BTF.ext section::
|
||||||
__u32 line_info_len;
|
__u32 line_info_len;
|
||||||
};
|
};
|
||||||
|
|
||||||
It is very similar to .BTF section. Instead of type/string section,
|
It is very similar to .BTF section. Instead of type/string section, it
|
||||||
it contains func_info and line_info section. See :ref:`BPF_Prog_Load`
|
contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details
|
||||||
for details about func_info and line_info record format.
|
about func_info and line_info record format.
|
||||||
|
|
||||||
The func_info is organized as below.::
|
The func_info is organized as below.::
|
||||||
|
|
||||||
|
@ -624,9 +601,9 @@ The func_info is organized as below.::
|
||||||
btf_ext_info_sec for section #2 /* func_info for section #2 */
|
btf_ext_info_sec for section #2 /* func_info for section #2 */
|
||||||
...
|
...
|
||||||
|
|
||||||
``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure
|
``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when
|
||||||
when .BTF.ext is generated. btf_ext_info_sec, defined below, is
|
.BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of
|
||||||
the func_info for each specific ELF section.::
|
func_info for each specific ELF section.::
|
||||||
|
|
||||||
struct btf_ext_info_sec {
|
struct btf_ext_info_sec {
|
||||||
__u32 sec_name_off; /* offset to section name */
|
__u32 sec_name_off; /* offset to section name */
|
||||||
|
@ -644,14 +621,14 @@ The line_info is organized as below.::
|
||||||
btf_ext_info_sec for section #2 /* line_info for section #2 */
|
btf_ext_info_sec for section #2 /* line_info for section #2 */
|
||||||
...
|
...
|
||||||
|
|
||||||
``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure
|
``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when
|
||||||
when .BTF.ext is generated.
|
.BTF.ext is generated.
|
||||||
|
|
||||||
The interpretation of ``bpf_func_info->insn_off`` and
|
The interpretation of ``bpf_func_info->insn_off`` and
|
||||||
``bpf_line_info->insn_off`` is different between kernel API and ELF API.
|
``bpf_line_info->insn_off`` is different between kernel API and ELF API. For
|
||||||
For kernel API, the ``insn_off`` is the instruction offset in the unit
|
kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct
|
||||||
of ``struct bpf_insn``. For ELF API, the ``insn_off`` is the byte offset
|
bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
|
||||||
from the beginning of section (``btf_ext_info_sec->sec_name_off``).
|
beginning of section (``btf_ext_info_sec->sec_name_off``).
|
||||||
|
|
||||||
5. Using BTF
|
5. Using BTF
|
||||||
************
|
************
|
||||||
|
@ -659,10 +636,9 @@ from the beginning of section (``btf_ext_info_sec->sec_name_off``).
|
||||||
5.1 bpftool map pretty print
|
5.1 bpftool map pretty print
|
||||||
============================
|
============================
|
||||||
|
|
||||||
With BTF, the map key/value can be printed based on fields rather than
|
With BTF, the map key/value can be printed based on fields rather than simply
|
||||||
simply raw bytes. This is especially
|
raw bytes. This is especially valuable for large structure or if your data
|
||||||
valuable for large structure or if you data structure
|
structure has bitfields. For example, for the following map,::
|
||||||
has bitfields. For example, for the following map,::
|
|
||||||
|
|
||||||
enum A { A1, A2, A3, A4, A5 };
|
enum A { A1, A2, A3, A4, A5 };
|
||||||
typedef enum A ___A;
|
typedef enum A ___A;
|
||||||
|
@ -702,9 +678,9 @@ bpftool is able to pretty print like below:
|
||||||
5.2 bpftool prog dump
|
5.2 bpftool prog dump
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
The following is an example to show func_info and line_info
|
The following is an example showing how func_info and line_info can help prog
|
||||||
can help prog dump with better kernel symbol name, function prototype
|
dump with better kernel symbol names, function prototypes and line
|
||||||
and line information.::
|
information.::
|
||||||
|
|
||||||
$ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
|
$ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
|
||||||
[...]
|
[...]
|
||||||
|
@ -733,10 +709,11 @@ and line information.::
|
||||||
; counts = bpf_map_lookup_elem(&btf_map, &key);
|
; counts = bpf_map_lookup_elem(&btf_map, &key);
|
||||||
[...]
|
[...]
|
||||||
|
|
||||||
5.3 verifier log
|
5.3 Verifier Log
|
||||||
================
|
================
|
||||||
|
|
||||||
The following is an example how line_info can help verifier failure debug.::
|
The following is an example of how line_info can help debugging verification
|
||||||
|
failure.::
|
||||||
|
|
||||||
/* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
|
/* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
|
||||||
* is modified as below.
|
* is modified as below.
|
||||||
|
@ -765,8 +742,8 @@ You need latest pahole
|
||||||
|
|
||||||
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
|
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
|
||||||
|
|
||||||
or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't support .BTF.ext
|
or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't
|
||||||
and btf BTF_KIND_FUNC type yet. For example,::
|
support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,::
|
||||||
|
|
||||||
-bash-4.4$ cat t.c
|
-bash-4.4$ cat t.c
|
||||||
struct t {
|
struct t {
|
||||||
|
@ -783,8 +760,9 @@ and btf BTF_KIND_FUNC type yet. For example,::
|
||||||
c type_id=2 bitfield_size=2 bits_offset=5
|
c type_id=2 bitfield_size=2 bits_offset=5
|
||||||
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
|
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
|
||||||
|
|
||||||
The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target only.
|
The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target
|
||||||
The assembly code (-S) is able to show the BTF encoding in assembly format.::
|
only. The assembly code (-S) is able to show the BTF encoding in assembly
|
||||||
|
format.::
|
||||||
|
|
||||||
-bash-4.4$ cat t2.c
|
-bash-4.4$ cat t2.c
|
||||||
typedef int __int32;
|
typedef int __int32;
|
||||||
|
@ -867,4 +845,4 @@ The assembly code (-S) is able to show the BTF encoding in assembly format.::
|
||||||
7. Testing
|
7. Testing
|
||||||
**********
|
**********
|
||||||
|
|
||||||
Kernel bpf selftest `test_btf.c` provides extensive set of BTF related tests.
|
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
|
||||||
|
|
|
@ -829,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9
|
||||||
is not used by socket filters either, but more complex filters may be running
|
is not used by socket filters either, but more complex filters may be running
|
||||||
out of registers and would have to resort to spill/fill to stack.
|
out of registers and would have to resort to spill/fill to stack.
|
||||||
|
|
||||||
Internal BPF can used as generic assembler for last step performance
|
Internal BPF can be used as a generic assembler for last step performance
|
||||||
optimizations, socket filters and seccomp are using it as assembler. Tracing
|
optimizations, socket filters and seccomp are using it as assembler. Tracing
|
||||||
filters may use it as assembler to generate code from kernel. In kernel usage
|
filters may use it as assembler to generate code from kernel. In kernel usage
|
||||||
may not be bounded by security considerations, since generated internal BPF code
|
may not be bounded by security considerations, since generated internal BPF code
|
||||||
|
|
Loading…
Reference in New Issue