2018-10-06 07:40:00 +08:00
|
|
|
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
2018-01-31 04:55:03 +08:00
|
|
|
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
/*
|
|
|
|
* Common eBPF ELF object loading operations.
|
|
|
|
*
|
|
|
|
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
|
|
|
|
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
|
|
|
|
* Copyright (C) 2015 Huawei Inc.
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
* Copyright (C) 2017 Nicira, Inc.
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
* Copyright (C) 2019 Isovalent, Inc.
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
*/
|
|
|
|
|
2018-11-30 07:31:45 +08:00
|
|
|
#ifndef _GNU_SOURCE
|
2018-07-11 05:43:05 +08:00
|
|
|
#define _GNU_SOURCE
|
2018-11-30 07:31:45 +08:00
|
|
|
#endif
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
#include <stdlib.h>
|
2015-07-01 10:13:52 +08:00
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdarg.h>
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
#include <libgen.h>
|
2015-07-01 10:14:02 +08:00
|
|
|
#include <inttypes.h>
|
2015-07-01 10:13:52 +08:00
|
|
|
#include <string.h>
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
#include <unistd.h>
|
2015-07-01 10:13:53 +08:00
|
|
|
#include <fcntl.h>
|
|
|
|
#include <errno.h>
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
#include <asm/unistd.h>
|
2017-01-23 09:11:25 +08:00
|
|
|
#include <linux/err.h>
|
2015-07-01 10:13:57 +08:00
|
|
|
#include <linux/kernel.h>
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
#include <linux/bpf.h>
|
2018-07-24 23:40:22 +08:00
|
|
|
#include <linux/btf.h>
|
2018-11-21 09:11:19 +08:00
|
|
|
#include <linux/filter.h>
|
2015-07-01 10:14:10 +08:00
|
|
|
#include <linux/list.h>
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
#include <linux/limits.h>
|
2018-10-10 07:14:47 +08:00
|
|
|
#include <linux/perf_event.h>
|
2018-10-19 21:51:03 +08:00
|
|
|
#include <linux/ring_buffer.h>
|
2019-07-02 07:58:57 +08:00
|
|
|
#include <sys/ioctl.h>
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/vfs.h>
|
2018-07-11 05:43:05 +08:00
|
|
|
#include <tools/libc_compat.h>
|
2015-07-01 10:13:53 +08:00
|
|
|
#include <libelf.h>
|
|
|
|
#include <gelf.h>
|
bpf tools: Introduce 'bpf' library and add bpf feature check
This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent.
Before building, it checks the existence of libelf in Makefile, and deny
to build if not found. Instead of throwing an error if libelf not found,
the error raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.
Because libbpf requires 'kern_version' field set for 'union bpf_attr'
(bpfdep" is used for that dependency), Kernel BPF API is also checked
by intruducing a new feature check 'bpf' into tools/build/feature,
which checks the existence and version of linux/bpf.h. When building
libbpf, it searches that file from include/uapi/linux in kernel source
tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel
source tree it reside, installing of newest kernel headers is not
required, except we are trying to port these files to an old kernel.
To avoid checking that file when perf building, the newly introduced
'bpf' feature check doesn't added into FEATURE_TESTS and
FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added
into libbpf's specific.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Bcc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:51 +08:00
|
|
|
|
|
|
|
#include "libbpf.h"
|
2015-07-01 10:14:04 +08:00
|
|
|
#include "bpf.h"
|
2018-04-19 06:56:05 +08:00
|
|
|
#include "btf.h"
|
2018-09-15 03:47:14 +08:00
|
|
|
#include "str_error.h"
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
#include "libbpf_internal.h"
|
2015-07-01 10:13:52 +08:00
|
|
|
|
2016-07-18 14:01:08 +08:00
|
|
|
#ifndef EM_BPF
|
|
|
|
#define EM_BPF 247
|
|
|
|
#endif
|
|
|
|
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
#ifndef BPF_FS_MAGIC
|
|
|
|
#define BPF_FS_MAGIC 0xcafe4a11
|
|
|
|
#endif
|
|
|
|
|
2019-04-07 13:37:34 +08:00
|
|
|
/* vsprintf() in __base_pr() uses nonliteral format string. It may break
|
|
|
|
* compilation if user enables corresponding warning. Disable it explicitly.
|
|
|
|
*/
|
|
|
|
#pragma GCC diagnostic ignored "-Wformat-nonliteral"
|
|
|
|
|
2015-07-01 10:13:52 +08:00
|
|
|
#define __printf(a, b) __attribute__((format(printf, a, b)))
|
|
|
|
|
2019-02-05 08:20:55 +08:00
|
|
|
static int __base_pr(enum libbpf_print_level level, const char *format,
|
|
|
|
va_list args)
|
2015-07-01 10:13:52 +08:00
|
|
|
{
|
2019-02-02 08:14:17 +08:00
|
|
|
if (level == LIBBPF_DEBUG)
|
|
|
|
return 0;
|
|
|
|
|
2019-02-05 08:20:55 +08:00
|
|
|
return vfprintf(stderr, format, args);
|
2015-07-01 10:13:52 +08:00
|
|
|
}
|
|
|
|
|
2019-02-05 08:20:55 +08:00
|
|
|
static libbpf_print_fn_t __libbpf_pr = __base_pr;
|
2015-07-01 10:13:52 +08:00
|
|
|
|
2019-02-02 08:14:17 +08:00
|
|
|
void libbpf_set_print(libbpf_print_fn_t fn)
|
2015-07-01 10:13:52 +08:00
|
|
|
{
|
2019-02-02 08:14:17 +08:00
|
|
|
__libbpf_pr = fn;
|
2015-07-01 10:13:52 +08:00
|
|
|
}
|
2015-07-01 10:13:53 +08:00
|
|
|
|
2019-02-02 08:14:14 +08:00
|
|
|
__printf(2, 3)
|
|
|
|
void libbpf_print(enum libbpf_print_level level, const char *format, ...)
|
|
|
|
{
|
|
|
|
va_list args;
|
|
|
|
|
2019-02-02 08:14:17 +08:00
|
|
|
if (!__libbpf_pr)
|
|
|
|
return;
|
|
|
|
|
2019-02-02 08:14:14 +08:00
|
|
|
va_start(args, format);
|
2019-02-02 08:14:17 +08:00
|
|
|
__libbpf_pr(level, format, args);
|
2019-02-02 08:14:14 +08:00
|
|
|
va_end(args);
|
|
|
|
}
|
|
|
|
|
2015-11-06 21:49:37 +08:00
|
|
|
#define STRERR_BUFSIZE 128
|
|
|
|
|
|
|
|
#define CHECK_ERR(action, err, out) do { \
|
|
|
|
err = action; \
|
|
|
|
if (err) \
|
|
|
|
goto out; \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
/* Copied from tools/perf/util/util.h */
|
|
|
|
#ifndef zfree
|
|
|
|
# define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef zclose
|
|
|
|
# define zclose(fd) ({ \
|
|
|
|
int ___err = 0; \
|
|
|
|
if ((fd) >= 0) \
|
|
|
|
___err = close((fd)); \
|
|
|
|
fd = -1; \
|
|
|
|
___err; })
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef HAVE_LIBELF_MMAP_SUPPORT
|
|
|
|
# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP
|
|
|
|
#else
|
|
|
|
# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
|
|
|
|
#endif
|
|
|
|
|
2019-03-12 13:30:38 +08:00
|
|
|
static inline __u64 ptr_to_u64(const void *ptr)
|
|
|
|
{
|
|
|
|
return (__u64) (unsigned long) ptr;
|
|
|
|
}
|
|
|
|
|
2018-11-21 09:11:19 +08:00
|
|
|
struct bpf_capabilities {
|
|
|
|
/* v4.14: kernel support for program & map names. */
|
|
|
|
__u32 name:1;
|
2019-04-24 06:45:56 +08:00
|
|
|
/* v5.2: kernel support for global data sections. */
|
|
|
|
__u32 global_data:1;
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
/* BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO support */
|
|
|
|
__u32 btf_func:1;
|
|
|
|
/* BTF_KIND_VAR and BTF_KIND_DATASEC support */
|
|
|
|
__u32 btf_datasec:1;
|
2018-11-21 09:11:19 +08:00
|
|
|
};
|
|
|
|
|
2015-07-01 10:14:00 +08:00
|
|
|
/*
|
|
|
|
* bpf_prog should be a better name but it has been used in
|
|
|
|
* linux/filter.h.
|
|
|
|
*/
|
|
|
|
struct bpf_program {
|
|
|
|
/* Index in elf obj file, for relocation use. */
|
|
|
|
int idx;
|
2017-09-28 05:37:54 +08:00
|
|
|
char *name;
|
2018-05-17 05:02:49 +08:00
|
|
|
int prog_ifindex;
|
2015-07-01 10:14:00 +08:00
|
|
|
char *section_name;
|
2018-11-10 00:21:43 +08:00
|
|
|
/* section_name with / replaced by _; makes recursive pinning
|
|
|
|
* in bpf_object__pin_programs easier
|
|
|
|
*/
|
|
|
|
char *pin_name;
|
2015-07-01 10:14:00 +08:00
|
|
|
struct bpf_insn *insns;
|
2017-12-15 09:55:10 +08:00
|
|
|
size_t insns_cnt, main_prog_cnt;
|
2016-07-13 18:44:01 +08:00
|
|
|
enum bpf_prog_type type;
|
2015-07-01 10:14:02 +08:00
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
struct reloc_desc {
|
|
|
|
enum {
|
|
|
|
RELO_LD64,
|
|
|
|
RELO_CALL,
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
RELO_DATA,
|
2017-12-15 09:55:10 +08:00
|
|
|
} type;
|
2015-07-01 10:14:02 +08:00
|
|
|
int insn_idx;
|
2017-12-15 09:55:10 +08:00
|
|
|
union {
|
|
|
|
int map_idx;
|
|
|
|
int text_off;
|
|
|
|
};
|
2015-07-01 10:14:02 +08:00
|
|
|
} *reloc_desc;
|
|
|
|
int nr_reloc;
|
2019-04-02 12:27:47 +08:00
|
|
|
int log_level;
|
2015-07-01 10:14:07 +08:00
|
|
|
|
2015-11-16 20:10:09 +08:00
|
|
|
struct {
|
|
|
|
int nr;
|
|
|
|
int *fds;
|
|
|
|
} instances;
|
|
|
|
bpf_program_prep_t preprocessor;
|
2015-07-01 10:14:08 +08:00
|
|
|
|
|
|
|
struct bpf_object *obj;
|
|
|
|
void *priv;
|
|
|
|
bpf_program_clear_priv_t clear_priv;
|
2018-03-31 06:08:01 +08:00
|
|
|
|
|
|
|
enum bpf_attach_type expected_attach_type;
|
2018-11-20 07:29:16 +08:00
|
|
|
int btf_fd;
|
|
|
|
void *func_info;
|
|
|
|
__u32 func_info_rec_size;
|
2018-12-08 08:42:29 +08:00
|
|
|
__u32 func_info_cnt;
|
2018-11-21 09:11:19 +08:00
|
|
|
|
|
|
|
struct bpf_capabilities *caps;
|
2018-12-08 08:42:31 +08:00
|
|
|
|
|
|
|
void *line_info;
|
|
|
|
__u32 line_info_rec_size;
|
|
|
|
__u32 line_info_cnt;
|
2019-05-25 06:25:19 +08:00
|
|
|
__u32 prog_flags;
|
2015-07-01 10:14:00 +08:00
|
|
|
};
|
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
enum libbpf_map_type {
|
|
|
|
LIBBPF_MAP_UNSPEC,
|
|
|
|
LIBBPF_MAP_DATA,
|
|
|
|
LIBBPF_MAP_BSS,
|
|
|
|
LIBBPF_MAP_RODATA,
|
|
|
|
};
|
|
|
|
|
|
|
|
static const char * const libbpf_type_to_btf_name[] = {
|
|
|
|
[LIBBPF_MAP_DATA] = ".data",
|
|
|
|
[LIBBPF_MAP_BSS] = ".bss",
|
|
|
|
[LIBBPF_MAP_RODATA] = ".rodata",
|
|
|
|
};
|
|
|
|
|
2015-11-27 16:47:35 +08:00
|
|
|
struct bpf_map {
|
|
|
|
int fd;
|
2015-11-27 16:47:36 +08:00
|
|
|
char *name;
|
2019-06-18 03:26:54 +08:00
|
|
|
int sec_idx;
|
|
|
|
size_t sec_offset;
|
2018-05-17 05:02:49 +08:00
|
|
|
int map_ifindex;
|
2018-11-21 12:55:56 +08:00
|
|
|
int inner_map_fd;
|
2015-11-27 16:47:35 +08:00
|
|
|
struct bpf_map_def def;
|
2018-07-24 23:40:21 +08:00
|
|
|
__u32 btf_key_type_id;
|
|
|
|
__u32 btf_value_type_id;
|
2015-11-27 16:47:35 +08:00
|
|
|
void *priv;
|
|
|
|
bpf_map_clear_priv_t clear_priv;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
enum libbpf_map_type libbpf_type;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct bpf_secdata {
|
|
|
|
void *rodata;
|
|
|
|
void *data;
|
2015-11-27 16:47:35 +08:00
|
|
|
};
|
|
|
|
|
2015-07-01 10:14:10 +08:00
|
|
|
static LIST_HEAD(bpf_objects_list);
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
struct bpf_object {
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
char name[BPF_OBJ_NAME_LEN];
|
2015-07-01 10:13:57 +08:00
|
|
|
char license[64];
|
2018-10-10 07:14:47 +08:00
|
|
|
__u32 kern_version;
|
2015-07-01 10:13:58 +08:00
|
|
|
|
2015-07-01 10:14:00 +08:00
|
|
|
struct bpf_program *programs;
|
|
|
|
size_t nr_programs;
|
2015-11-27 16:47:35 +08:00
|
|
|
struct bpf_map *maps;
|
|
|
|
size_t nr_maps;
|
2019-06-18 03:26:53 +08:00
|
|
|
size_t maps_cap;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
struct bpf_secdata sections;
|
2015-11-27 16:47:35 +08:00
|
|
|
|
2015-07-01 10:14:04 +08:00
|
|
|
bool loaded;
|
2018-06-29 05:41:38 +08:00
|
|
|
bool has_pseudo_calls;
|
2015-07-01 10:14:00 +08:00
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
/*
|
|
|
|
* Information when doing elf related work. Only valid if fd
|
|
|
|
* is valid.
|
|
|
|
*/
|
|
|
|
struct {
|
|
|
|
int fd;
|
2015-07-01 10:13:54 +08:00
|
|
|
void *obj_buf;
|
|
|
|
size_t obj_buf_sz;
|
2015-07-01 10:13:53 +08:00
|
|
|
Elf *elf;
|
|
|
|
GElf_Ehdr ehdr;
|
bpf tools: Collect symbol table from SHT_SYMTAB section
This patch collects symbols section. This section is useful when linking
BPF maps.
What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.
BPF programs should be written in this way:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(unsigned long),
.value_size = sizeof(unsigned long),
.max_entries = 1000000,
};
SEC("my_func=sys_write")
int my_func(void *ctx)
{
...
bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
...
}
Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.
However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:
1. In relocation section (type == SHT_REL), positions of each such
'ldr_64' instruction are recorded with a reference of an entry in
symbol table (SHT_SYMTAB);
2. From records in symbol table we can find the indics of map
variables.
Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.
This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:59 +08:00
|
|
|
Elf_Data *symbols;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
Elf_Data *data;
|
|
|
|
Elf_Data *rodata;
|
|
|
|
Elf_Data *bss;
|
2015-12-08 10:25:30 +08:00
|
|
|
size_t strtabidx;
|
2015-07-01 10:14:01 +08:00
|
|
|
struct {
|
|
|
|
GElf_Shdr shdr;
|
|
|
|
Elf_Data *data;
|
|
|
|
} *reloc;
|
|
|
|
int nr_reloc;
|
perf bpf: Check relocation target section
Libbpf should check the target section before doing relocation to ensure
the relocation is correct. If not, a bug in LLVM causes an error. See
[1]. Also, if an incorrect BPF script uses both global variable and
map, global variable whould be treated as map and be relocated without
error.
This patch saves the id of the map section into obj->efile and compare
target section of a relocation symbol against it during relocation.
Previous patch introduces a test case about this problem. After this
patch:
# ~/perf test BPF
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
# perf test -v BPF
...
37.3: Test BPF relocation checker :
...
libbpf: loading object '[bpf_relocation_test]' from buffer
libbpf: section .strtab, size 126, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section .data, size 0, link 0, flags 3, type=1
libbpf: section .bss, size 0, link 0, flags 3, type=8
libbpf: section func=sys_write, size 104, link 0, flags 6, type=1
libbpf: found program func=sys_write
libbpf: section .relfunc=sys_write, size 16, link 10, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: maps in [bpf_relocation_test]: 16 bytes
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of [bpf_relocation_test] is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of [bpf_relocation_test] is 40400
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: map 0 is "my_table"
libbpf: collecting relocating info for: 'func=sys_write'
libbpf: Program 'func=sys_write' contains non-map related relo data pointing to section 65522
bpf: failed to load buffer
Compile BPF program failed.
test child finished with 0
---- end ----
Test BPF filter subtest 2: Ok
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1453715801-7732-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-25 17:55:49 +08:00
|
|
|
int maps_shndx;
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
int btf_maps_shndx;
|
2017-12-15 09:55:10 +08:00
|
|
|
int text_shndx;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
int data_shndx;
|
|
|
|
int rodata_shndx;
|
|
|
|
int bss_shndx;
|
2015-07-01 10:13:53 +08:00
|
|
|
} efile;
|
2015-07-01 10:14:10 +08:00
|
|
|
/*
|
|
|
|
* All loaded bpf_object is linked in a list, which is
|
|
|
|
* hidden to caller. bpf_objects__<func> handlers deal with
|
|
|
|
* all objects.
|
|
|
|
*/
|
|
|
|
struct list_head list;
|
2016-11-26 15:03:26 +08:00
|
|
|
|
2018-04-19 06:56:05 +08:00
|
|
|
struct btf *btf;
|
2018-11-20 07:29:16 +08:00
|
|
|
struct btf_ext *btf_ext;
|
2018-04-19 06:56:05 +08:00
|
|
|
|
2016-11-26 15:03:26 +08:00
|
|
|
void *priv;
|
|
|
|
bpf_object_clear_priv_t clear_priv;
|
|
|
|
|
2018-11-21 09:11:19 +08:00
|
|
|
struct bpf_capabilities caps;
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
char path[];
|
|
|
|
};
|
|
|
|
#define obj_elf_valid(o) ((o)->efile.elf)
|
|
|
|
|
2018-10-03 04:35:39 +08:00
|
|
|
void bpf_program__unload(struct bpf_program *prog)
|
2015-07-01 10:14:07 +08:00
|
|
|
{
|
2015-11-16 20:10:09 +08:00
|
|
|
int i;
|
|
|
|
|
2015-07-01 10:14:07 +08:00
|
|
|
if (!prog)
|
|
|
|
return;
|
|
|
|
|
2015-11-16 20:10:09 +08:00
|
|
|
/*
|
|
|
|
* If the object is opened but the program was never loaded,
|
|
|
|
* it is possible that prog->instances.nr == -1.
|
|
|
|
*/
|
|
|
|
if (prog->instances.nr > 0) {
|
|
|
|
for (i = 0; i < prog->instances.nr; i++)
|
|
|
|
zclose(prog->instances.fds[i]);
|
|
|
|
} else if (prog->instances.nr != -1) {
|
|
|
|
pr_warning("Internal error: instances.nr is %d\n",
|
|
|
|
prog->instances.nr);
|
|
|
|
}
|
|
|
|
|
|
|
|
prog->instances.nr = -1;
|
|
|
|
zfree(&prog->instances.fds);
|
2018-11-20 07:29:16 +08:00
|
|
|
|
|
|
|
zclose(prog->btf_fd);
|
|
|
|
zfree(&prog->func_info);
|
2018-12-17 15:57:50 +08:00
|
|
|
zfree(&prog->line_info);
|
2015-07-01 10:14:07 +08:00
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:00 +08:00
|
|
|
static void bpf_program__exit(struct bpf_program *prog)
|
|
|
|
{
|
|
|
|
if (!prog)
|
|
|
|
return;
|
|
|
|
|
2015-07-01 10:14:08 +08:00
|
|
|
if (prog->clear_priv)
|
|
|
|
prog->clear_priv(prog, prog->priv);
|
|
|
|
|
|
|
|
prog->priv = NULL;
|
|
|
|
prog->clear_priv = NULL;
|
|
|
|
|
2015-07-01 10:14:07 +08:00
|
|
|
bpf_program__unload(prog);
|
2017-09-28 05:37:54 +08:00
|
|
|
zfree(&prog->name);
|
2015-07-01 10:14:00 +08:00
|
|
|
zfree(&prog->section_name);
|
2018-11-10 00:21:43 +08:00
|
|
|
zfree(&prog->pin_name);
|
2015-07-01 10:14:00 +08:00
|
|
|
zfree(&prog->insns);
|
2015-07-01 10:14:02 +08:00
|
|
|
zfree(&prog->reloc_desc);
|
|
|
|
|
|
|
|
prog->nr_reloc = 0;
|
2015-07-01 10:14:00 +08:00
|
|
|
prog->insns_cnt = 0;
|
|
|
|
prog->idx = -1;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:43 +08:00
|
|
|
static char *__bpf_program__pin_name(struct bpf_program *prog)
|
|
|
|
{
|
|
|
|
char *name, *p;
|
|
|
|
|
|
|
|
name = p = strdup(prog->section_name);
|
|
|
|
while ((p = strchr(p, '/')))
|
|
|
|
*p = '_';
|
|
|
|
|
|
|
|
return name;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:00 +08:00
|
|
|
static int
|
2017-09-28 05:37:54 +08:00
|
|
|
bpf_program__init(void *data, size_t size, char *section_name, int idx,
|
|
|
|
struct bpf_program *prog)
|
2015-07-01 10:14:00 +08:00
|
|
|
{
|
2019-05-30 01:36:03 +08:00
|
|
|
const size_t bpf_insn_sz = sizeof(struct bpf_insn);
|
|
|
|
|
|
|
|
if (size == 0 || size % bpf_insn_sz) {
|
|
|
|
pr_warning("corrupted section '%s', size: %zu\n",
|
|
|
|
section_name, size);
|
2015-07-01 10:14:00 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2019-02-14 02:25:53 +08:00
|
|
|
memset(prog, 0, sizeof(*prog));
|
2015-07-01 10:14:00 +08:00
|
|
|
|
2017-09-28 05:37:54 +08:00
|
|
|
prog->section_name = strdup(section_name);
|
2015-07-01 10:14:00 +08:00
|
|
|
if (!prog->section_name) {
|
2018-02-08 19:48:17 +08:00
|
|
|
pr_warning("failed to alloc name for prog under section(%d) %s\n",
|
|
|
|
idx, section_name);
|
2015-07-01 10:14:00 +08:00
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:43 +08:00
|
|
|
prog->pin_name = __bpf_program__pin_name(prog);
|
|
|
|
if (!prog->pin_name) {
|
|
|
|
pr_warning("failed to alloc pin name for prog under section(%d) %s\n",
|
|
|
|
idx, section_name);
|
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:00 +08:00
|
|
|
prog->insns = malloc(size);
|
|
|
|
if (!prog->insns) {
|
2017-09-28 05:37:54 +08:00
|
|
|
pr_warning("failed to alloc insns for prog under section %s\n",
|
|
|
|
section_name);
|
2015-07-01 10:14:00 +08:00
|
|
|
goto errout;
|
|
|
|
}
|
2019-05-30 01:36:03 +08:00
|
|
|
prog->insns_cnt = size / bpf_insn_sz;
|
|
|
|
memcpy(prog->insns, data, size);
|
2015-07-01 10:14:00 +08:00
|
|
|
prog->idx = idx;
|
2015-11-16 20:10:09 +08:00
|
|
|
prog->instances.fds = NULL;
|
|
|
|
prog->instances.nr = -1;
|
2018-11-24 04:58:12 +08:00
|
|
|
prog->type = BPF_PROG_TYPE_UNSPEC;
|
2018-11-20 07:29:16 +08:00
|
|
|
prog->btf_fd = -1;
|
2015-07-01 10:14:00 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
errout:
|
|
|
|
bpf_program__exit(prog);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
bpf_object__add_program(struct bpf_object *obj, void *data, size_t size,
|
2017-09-28 05:37:54 +08:00
|
|
|
char *section_name, int idx)
|
2015-07-01 10:14:00 +08:00
|
|
|
{
|
|
|
|
struct bpf_program prog, *progs;
|
|
|
|
int nr_progs, err;
|
|
|
|
|
2017-09-28 05:37:54 +08:00
|
|
|
err = bpf_program__init(data, size, section_name, idx, &prog);
|
2015-07-01 10:14:00 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2018-11-21 09:11:19 +08:00
|
|
|
prog.caps = &obj->caps;
|
2015-07-01 10:14:00 +08:00
|
|
|
progs = obj->programs;
|
|
|
|
nr_progs = obj->nr_programs;
|
|
|
|
|
2018-07-11 05:43:05 +08:00
|
|
|
progs = reallocarray(progs, nr_progs + 1, sizeof(progs[0]));
|
2015-07-01 10:14:00 +08:00
|
|
|
if (!progs) {
|
|
|
|
/*
|
|
|
|
* In this case the original obj->programs
|
|
|
|
* is still valid, so don't need special treat for
|
|
|
|
* bpf_close_object().
|
|
|
|
*/
|
2017-09-28 05:37:54 +08:00
|
|
|
pr_warning("failed to alloc a new program under section '%s'\n",
|
|
|
|
section_name);
|
2015-07-01 10:14:00 +08:00
|
|
|
bpf_program__exit(&prog);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
pr_debug("found program %s\n", prog.section_name);
|
|
|
|
obj->programs = progs;
|
|
|
|
obj->nr_programs = nr_progs + 1;
|
2015-07-01 10:14:08 +08:00
|
|
|
prog.obj = obj;
|
2015-07-01 10:14:00 +08:00
|
|
|
progs[nr_progs] = prog;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-09-28 05:37:54 +08:00
|
|
|
static int
|
|
|
|
bpf_object__init_prog_names(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
Elf_Data *symbols = obj->efile.symbols;
|
|
|
|
struct bpf_program *prog;
|
|
|
|
size_t pi, si;
|
|
|
|
|
|
|
|
for (pi = 0; pi < obj->nr_programs; pi++) {
|
2017-12-15 09:55:10 +08:00
|
|
|
const char *name = NULL;
|
2017-09-28 05:37:54 +08:00
|
|
|
|
|
|
|
prog = &obj->programs[pi];
|
|
|
|
|
|
|
|
for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name;
|
|
|
|
si++) {
|
|
|
|
GElf_Sym sym;
|
|
|
|
|
|
|
|
if (!gelf_getsym(symbols, si, &sym))
|
|
|
|
continue;
|
|
|
|
if (sym.st_shndx != prog->idx)
|
|
|
|
continue;
|
2017-12-13 23:18:52 +08:00
|
|
|
if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL)
|
|
|
|
continue;
|
2017-09-28 05:37:54 +08:00
|
|
|
|
|
|
|
name = elf_strptr(obj->efile.elf,
|
|
|
|
obj->efile.strtabidx,
|
|
|
|
sym.st_name);
|
|
|
|
if (!name) {
|
|
|
|
pr_warning("failed to get sym name string for prog %s\n",
|
|
|
|
prog->section_name);
|
|
|
|
return -LIBBPF_ERRNO__LIBELF;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-06-29 05:41:38 +08:00
|
|
|
if (!name && prog->idx == obj->efile.text_shndx)
|
|
|
|
name = ".text";
|
|
|
|
|
2017-09-28 05:37:54 +08:00
|
|
|
if (!name) {
|
|
|
|
pr_warning("failed to find sym for prog %s\n",
|
|
|
|
prog->section_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2018-06-29 05:41:38 +08:00
|
|
|
|
2017-09-28 05:37:54 +08:00
|
|
|
prog->name = strdup(name);
|
|
|
|
if (!prog->name) {
|
|
|
|
pr_warning("failed to allocate memory for prog sym %s\n",
|
|
|
|
name);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:54 +08:00
|
|
|
static struct bpf_object *bpf_object__new(const char *path,
|
|
|
|
void *obj_buf,
|
|
|
|
size_t obj_buf_sz)
|
2015-07-01 10:13:53 +08:00
|
|
|
{
|
|
|
|
struct bpf_object *obj;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
char *end;
|
2015-07-01 10:13:53 +08:00
|
|
|
|
|
|
|
obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
|
|
|
|
if (!obj) {
|
|
|
|
pr_warning("alloc memory failed for %s\n", path);
|
2015-11-06 21:49:37 +08:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
strcpy(obj->path, path);
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
/* Using basename() GNU version which doesn't modify arg. */
|
2019-05-30 01:36:11 +08:00
|
|
|
strncpy(obj->name, basename((void *)path), sizeof(obj->name) - 1);
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
end = strchr(obj->name, '.');
|
|
|
|
if (end)
|
|
|
|
*end = 0;
|
2015-07-01 10:13:54 +08:00
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
obj->efile.fd = -1;
|
2015-07-01 10:13:54 +08:00
|
|
|
/*
|
2019-05-30 01:36:10 +08:00
|
|
|
* Caller of this function should also call
|
2015-07-01 10:13:54 +08:00
|
|
|
* bpf_object__elf_finish() after data collection to return
|
|
|
|
* obj_buf to user. If not, we should duplicate the buffer to
|
|
|
|
* avoid user freeing them before elf finish.
|
|
|
|
*/
|
|
|
|
obj->efile.obj_buf = obj_buf;
|
|
|
|
obj->efile.obj_buf_sz = obj_buf_sz;
|
perf bpf: Check relocation target section
Libbpf should check the target section before doing relocation to ensure
the relocation is correct. If not, a bug in LLVM causes an error. See
[1]. Also, if an incorrect BPF script uses both global variable and
map, global variable whould be treated as map and be relocated without
error.
This patch saves the id of the map section into obj->efile and compare
target section of a relocation symbol against it during relocation.
Previous patch introduces a test case about this problem. After this
patch:
# ~/perf test BPF
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
# perf test -v BPF
...
37.3: Test BPF relocation checker :
...
libbpf: loading object '[bpf_relocation_test]' from buffer
libbpf: section .strtab, size 126, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section .data, size 0, link 0, flags 3, type=1
libbpf: section .bss, size 0, link 0, flags 3, type=8
libbpf: section func=sys_write, size 104, link 0, flags 6, type=1
libbpf: found program func=sys_write
libbpf: section .relfunc=sys_write, size 16, link 10, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: maps in [bpf_relocation_test]: 16 bytes
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of [bpf_relocation_test] is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of [bpf_relocation_test] is 40400
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: map 0 is "my_table"
libbpf: collecting relocating info for: 'func=sys_write'
libbpf: Program 'func=sys_write' contains non-map related relo data pointing to section 65522
bpf: failed to load buffer
Compile BPF program failed.
test child finished with 0
---- end ----
Test BPF filter subtest 2: Ok
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1453715801-7732-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-25 17:55:49 +08:00
|
|
|
obj->efile.maps_shndx = -1;
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
obj->efile.btf_maps_shndx = -1;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
obj->efile.data_shndx = -1;
|
|
|
|
obj->efile.rodata_shndx = -1;
|
|
|
|
obj->efile.bss_shndx = -1;
|
2015-07-01 10:13:54 +08:00
|
|
|
|
2015-07-01 10:14:04 +08:00
|
|
|
obj->loaded = false;
|
2015-07-01 10:14:10 +08:00
|
|
|
|
|
|
|
INIT_LIST_HEAD(&obj->list);
|
|
|
|
list_add(&obj->list, &bpf_objects_list);
|
2015-07-01 10:13:53 +08:00
|
|
|
return obj;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bpf_object__elf_finish(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
if (!obj_elf_valid(obj))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (obj->efile.elf) {
|
|
|
|
elf_end(obj->efile.elf);
|
|
|
|
obj->efile.elf = NULL;
|
|
|
|
}
|
bpf tools: Collect symbol table from SHT_SYMTAB section
This patch collects symbols section. This section is useful when linking
BPF maps.
What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.
BPF programs should be written in this way:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(unsigned long),
.value_size = sizeof(unsigned long),
.max_entries = 1000000,
};
SEC("my_func=sys_write")
int my_func(void *ctx)
{
...
bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
...
}
Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.
However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:
1. In relocation section (type == SHT_REL), positions of each such
'ldr_64' instruction are recorded with a reference of an entry in
symbol table (SHT_SYMTAB);
2. From records in symbol table we can find the indics of map
variables.
Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.
This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:59 +08:00
|
|
|
obj->efile.symbols = NULL;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
obj->efile.data = NULL;
|
|
|
|
obj->efile.rodata = NULL;
|
|
|
|
obj->efile.bss = NULL;
|
2015-07-01 10:14:01 +08:00
|
|
|
|
|
|
|
zfree(&obj->efile.reloc);
|
|
|
|
obj->efile.nr_reloc = 0;
|
2015-07-01 10:13:53 +08:00
|
|
|
zclose(obj->efile.fd);
|
2015-07-01 10:13:54 +08:00
|
|
|
obj->efile.obj_buf = NULL;
|
|
|
|
obj->efile.obj_buf_sz = 0;
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_object__elf_init(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
GElf_Ehdr *ep;
|
|
|
|
|
|
|
|
if (obj_elf_valid(obj)) {
|
|
|
|
pr_warning("elf init: internal error\n");
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__LIBELF;
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:54 +08:00
|
|
|
if (obj->efile.obj_buf_sz > 0) {
|
|
|
|
/*
|
|
|
|
* obj_buf should have been validated by
|
|
|
|
* bpf_object__open_buffer().
|
|
|
|
*/
|
|
|
|
obj->efile.elf = elf_memory(obj->efile.obj_buf,
|
|
|
|
obj->efile.obj_buf_sz);
|
|
|
|
} else {
|
|
|
|
obj->efile.fd = open(obj->path, O_RDONLY);
|
|
|
|
if (obj->efile.fd < 0) {
|
2019-05-30 01:36:04 +08:00
|
|
|
char errmsg[STRERR_BUFSIZE], *cp;
|
2018-07-30 16:53:23 +08:00
|
|
|
|
2019-05-30 01:36:04 +08:00
|
|
|
err = -errno;
|
|
|
|
cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
|
2018-07-30 16:53:23 +08:00
|
|
|
pr_warning("failed to open %s: %s\n", obj->path, cp);
|
2019-05-30 01:36:04 +08:00
|
|
|
return err;
|
2015-07-01 10:13:54 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
obj->efile.elf = elf_begin(obj->efile.fd,
|
2019-05-30 01:36:10 +08:00
|
|
|
LIBBPF_ELF_C_READ_MMAP, NULL);
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!obj->efile.elf) {
|
2019-05-30 01:36:11 +08:00
|
|
|
pr_warning("failed to open %s as ELF file\n", obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
err = -LIBBPF_ERRNO__LIBELF;
|
2015-07-01 10:13:53 +08:00
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!gelf_getehdr(obj->efile.elf, &obj->efile.ehdr)) {
|
2019-05-30 01:36:11 +08:00
|
|
|
pr_warning("failed to get EHDR from %s\n", obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
err = -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:53 +08:00
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
ep = &obj->efile.ehdr;
|
|
|
|
|
2016-07-18 14:01:08 +08:00
|
|
|
/* Old LLVM set e_machine to EM_NONE */
|
2019-05-30 01:36:10 +08:00
|
|
|
if (ep->e_type != ET_REL ||
|
|
|
|
(ep->e_machine && ep->e_machine != EM_BPF)) {
|
|
|
|
pr_warning("%s is not an eBPF object file\n", obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
err = -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:53 +08:00
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
errout:
|
|
|
|
bpf_object__elf_finish(obj);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-05-30 01:36:05 +08:00
|
|
|
static int bpf_object__check_endianness(struct bpf_object *obj)
|
2015-07-01 10:13:55 +08:00
|
|
|
{
|
2019-05-30 01:36:05 +08:00
|
|
|
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
|
|
|
|
if (obj->efile.ehdr.e_ident[EI_DATA] == ELFDATA2LSB)
|
|
|
|
return 0;
|
|
|
|
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
|
|
|
|
if (obj->efile.ehdr.e_ident[EI_DATA] == ELFDATA2MSB)
|
|
|
|
return 0;
|
|
|
|
#else
|
|
|
|
# error "Unrecognized __BYTE_ORDER__"
|
|
|
|
#endif
|
|
|
|
pr_warning("endianness mismatch.\n");
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__ENDIAN;
|
2015-07-01 10:13:55 +08:00
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:57 +08:00
|
|
|
static int
|
2019-05-30 01:36:11 +08:00
|
|
|
bpf_object__init_license(struct bpf_object *obj, void *data, size_t size)
|
2015-07-01 10:13:57 +08:00
|
|
|
{
|
2019-05-30 01:36:11 +08:00
|
|
|
memcpy(obj->license, data, min(size, sizeof(obj->license) - 1));
|
2015-07-01 10:13:57 +08:00
|
|
|
pr_debug("license of %s is %s\n", obj->path, obj->license);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2019-05-30 01:36:11 +08:00
|
|
|
bpf_object__init_kversion(struct bpf_object *obj, void *data, size_t size)
|
2015-07-01 10:13:57 +08:00
|
|
|
{
|
2018-10-10 07:14:47 +08:00
|
|
|
__u32 kver;
|
2015-07-01 10:13:57 +08:00
|
|
|
|
|
|
|
if (size != sizeof(kver)) {
|
|
|
|
pr_warning("invalid kver section in %s\n", obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:57 +08:00
|
|
|
}
|
|
|
|
memcpy(&kver, data, sizeof(kver));
|
|
|
|
obj->kern_version = kver;
|
2019-05-30 01:36:11 +08:00
|
|
|
pr_debug("kernel version of %s is %x\n", obj->path, obj->kern_version);
|
2015-07-01 10:13:57 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-11-15 12:05:47 +08:00
|
|
|
static int compare_bpf_map(const void *_a, const void *_b)
|
|
|
|
{
|
|
|
|
const struct bpf_map *a = _a;
|
|
|
|
const struct bpf_map *b = _b;
|
2015-11-27 16:47:35 +08:00
|
|
|
|
2019-06-18 03:26:54 +08:00
|
|
|
if (a->sec_idx != b->sec_idx)
|
|
|
|
return a->sec_idx - b->sec_idx;
|
|
|
|
return a->sec_offset - b->sec_offset;
|
2015-07-01 10:13:58 +08:00
|
|
|
}
|
|
|
|
|
2018-11-21 12:55:56 +08:00
|
|
|
static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
|
|
|
|
{
|
|
|
|
if (type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
|
|
|
|
type == BPF_MAP_TYPE_HASH_OF_MAPS)
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-04-10 05:20:14 +08:00
|
|
|
static int bpf_object_search_section_size(const struct bpf_object *obj,
|
|
|
|
const char *name, size_t *d_size)
|
|
|
|
{
|
|
|
|
const GElf_Ehdr *ep = &obj->efile.ehdr;
|
|
|
|
Elf *elf = obj->efile.elf;
|
|
|
|
Elf_Scn *scn = NULL;
|
|
|
|
int idx = 0;
|
|
|
|
|
|
|
|
while ((scn = elf_nextscn(elf, scn)) != NULL) {
|
|
|
|
const char *sec_name;
|
|
|
|
Elf_Data *data;
|
|
|
|
GElf_Shdr sh;
|
|
|
|
|
|
|
|
idx++;
|
|
|
|
if (gelf_getshdr(scn, &sh) != &sh) {
|
|
|
|
pr_warning("failed to get section(%d) header from %s\n",
|
|
|
|
idx, obj->path);
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
sec_name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
|
|
|
|
if (!sec_name) {
|
|
|
|
pr_warning("failed to get section(%d) name from %s\n",
|
|
|
|
idx, obj->path);
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (strcmp(name, sec_name))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
data = elf_getdata(scn, 0);
|
|
|
|
if (!data) {
|
|
|
|
pr_warning("failed to get section(%d) data from %s(%s)\n",
|
|
|
|
idx, name, obj->path);
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
*d_size = data->d_size;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_object__section_size(const struct bpf_object *obj, const char *name,
|
|
|
|
__u32 *size)
|
|
|
|
{
|
|
|
|
int ret = -ENOENT;
|
|
|
|
size_t d_size;
|
|
|
|
|
|
|
|
*size = 0;
|
|
|
|
if (!name) {
|
|
|
|
return -EINVAL;
|
|
|
|
} else if (!strcmp(name, ".data")) {
|
|
|
|
if (obj->efile.data)
|
|
|
|
*size = obj->efile.data->d_size;
|
|
|
|
} else if (!strcmp(name, ".bss")) {
|
|
|
|
if (obj->efile.bss)
|
|
|
|
*size = obj->efile.bss->d_size;
|
|
|
|
} else if (!strcmp(name, ".rodata")) {
|
|
|
|
if (obj->efile.rodata)
|
|
|
|
*size = obj->efile.rodata->d_size;
|
|
|
|
} else {
|
|
|
|
ret = bpf_object_search_section_size(obj, name, &d_size);
|
|
|
|
if (!ret)
|
|
|
|
*size = d_size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return *size ? 0 : ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
|
|
|
|
__u32 *off)
|
|
|
|
{
|
|
|
|
Elf_Data *symbols = obj->efile.symbols;
|
|
|
|
const char *sname;
|
|
|
|
size_t si;
|
|
|
|
|
|
|
|
if (!name || !off)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
for (si = 0; si < symbols->d_size / sizeof(GElf_Sym); si++) {
|
|
|
|
GElf_Sym sym;
|
|
|
|
|
|
|
|
if (!gelf_getsym(symbols, si, &sym))
|
|
|
|
continue;
|
|
|
|
if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
|
|
|
|
GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
sname = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
|
|
|
|
sym.st_name);
|
|
|
|
if (!sname) {
|
|
|
|
pr_warning("failed to get sym name string for var %s\n",
|
|
|
|
name);
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
if (strcmp(name, sname) == 0) {
|
|
|
|
*off = sym.st_value;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
static struct bpf_map *bpf_object__add_map(struct bpf_object *obj)
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
{
|
2019-06-18 03:26:53 +08:00
|
|
|
struct bpf_map *new_maps;
|
|
|
|
size_t new_cap;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (obj->nr_maps < obj->maps_cap)
|
|
|
|
return &obj->maps[obj->nr_maps++];
|
|
|
|
|
2019-06-26 18:38:37 +08:00
|
|
|
new_cap = max((size_t)4, obj->maps_cap * 3 / 2);
|
2019-06-18 03:26:53 +08:00
|
|
|
new_maps = realloc(obj->maps, new_cap * sizeof(*obj->maps));
|
|
|
|
if (!new_maps) {
|
|
|
|
pr_warning("alloc maps for object failed\n");
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
}
|
|
|
|
|
|
|
|
obj->maps_cap = new_cap;
|
|
|
|
obj->maps = new_maps;
|
|
|
|
|
|
|
|
/* zero out new maps */
|
|
|
|
memset(obj->maps + obj->nr_maps, 0,
|
|
|
|
(obj->maps_cap - obj->nr_maps) * sizeof(*obj->maps));
|
|
|
|
/*
|
|
|
|
* fill all fd with -1 so won't close incorrect fd (fd=0 is stdin)
|
|
|
|
* when failure (zclose won't close negative fd)).
|
|
|
|
*/
|
|
|
|
for (i = obj->nr_maps; i < obj->maps_cap; i++) {
|
|
|
|
obj->maps[i].fd = -1;
|
|
|
|
obj->maps[i].inner_map_fd = -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return &obj->maps[obj->nr_maps++];
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2019-06-18 03:26:53 +08:00
|
|
|
bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
|
2019-06-18 03:26:54 +08:00
|
|
|
int sec_idx, Elf_Data *data, void **data_buff)
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
{
|
|
|
|
char map_name[BPF_OBJ_NAME_LEN];
|
2019-06-18 03:26:53 +08:00
|
|
|
struct bpf_map_def *def;
|
|
|
|
struct bpf_map *map;
|
|
|
|
|
|
|
|
map = bpf_object__add_map(obj);
|
|
|
|
if (IS_ERR(map))
|
|
|
|
return PTR_ERR(map);
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
|
|
|
map->libbpf_type = type;
|
2019-06-18 03:26:54 +08:00
|
|
|
map->sec_idx = sec_idx;
|
|
|
|
map->sec_offset = 0;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
snprintf(map_name, sizeof(map_name), "%.8s%.7s", obj->name,
|
|
|
|
libbpf_type_to_btf_name[type]);
|
|
|
|
map->name = strdup(map_name);
|
|
|
|
if (!map->name) {
|
|
|
|
pr_warning("failed to alloc map name\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2019-06-18 03:26:54 +08:00
|
|
|
pr_debug("map '%s' (global data): at sec_idx %d, offset %zu.\n",
|
|
|
|
map_name, map->sec_idx, map->sec_offset);
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
def = &map->def;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
def->type = BPF_MAP_TYPE_ARRAY;
|
|
|
|
def->key_size = sizeof(int);
|
|
|
|
def->value_size = data->d_size;
|
|
|
|
def->max_entries = 1;
|
2019-05-30 01:36:11 +08:00
|
|
|
def->map_flags = type == LIBBPF_MAP_RODATA ? BPF_F_RDONLY_PROG : 0;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
if (data_buff) {
|
|
|
|
*data_buff = malloc(data->d_size);
|
|
|
|
if (!*data_buff) {
|
|
|
|
zfree(&map->name);
|
|
|
|
pr_warning("failed to alloc map content buffer\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
memcpy(*data_buff, data->d_buf, data->d_size);
|
|
|
|
}
|
|
|
|
|
2019-04-17 02:47:17 +08:00
|
|
|
pr_debug("map %td is \"%s\"\n", map - obj->maps, map->name);
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
static int bpf_object__init_global_data_maps(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!obj->caps.global_data)
|
|
|
|
return 0;
|
|
|
|
/*
|
|
|
|
* Populate obj->maps with libbpf internal maps.
|
|
|
|
*/
|
|
|
|
if (obj->efile.data_shndx >= 0) {
|
|
|
|
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_DATA,
|
2019-06-18 03:26:54 +08:00
|
|
|
obj->efile.data_shndx,
|
2019-06-18 03:26:53 +08:00
|
|
|
obj->efile.data,
|
|
|
|
&obj->sections.data);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
if (obj->efile.rodata_shndx >= 0) {
|
|
|
|
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_RODATA,
|
2019-06-18 03:26:54 +08:00
|
|
|
obj->efile.rodata_shndx,
|
2019-06-18 03:26:53 +08:00
|
|
|
obj->efile.rodata,
|
|
|
|
&obj->sections.rodata);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
if (obj->efile.bss_shndx >= 0) {
|
|
|
|
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_BSS,
|
2019-06-18 03:26:54 +08:00
|
|
|
obj->efile.bss_shndx,
|
2019-06-18 03:26:53 +08:00
|
|
|
obj->efile.bss, NULL);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
|
2015-11-27 16:47:36 +08:00
|
|
|
{
|
|
|
|
Elf_Data *symbols = obj->efile.symbols;
|
2019-06-18 03:26:53 +08:00
|
|
|
int i, map_def_sz = 0, nr_maps = 0, nr_syms;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
Elf_Data *data = NULL;
|
2019-06-18 03:26:53 +08:00
|
|
|
Elf_Scn *scn;
|
|
|
|
|
|
|
|
if (obj->efile.maps_shndx < 0)
|
|
|
|
return 0;
|
2015-11-27 16:47:36 +08:00
|
|
|
|
2016-11-15 12:05:47 +08:00
|
|
|
if (!symbols)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
|
|
|
|
if (scn)
|
|
|
|
data = elf_getdata(scn, NULL);
|
|
|
|
if (!scn || !data) {
|
|
|
|
pr_warning("failed to get Elf_Data from map section %d\n",
|
|
|
|
obj->efile.maps_shndx);
|
|
|
|
return -EINVAL;
|
2016-11-15 12:05:47 +08:00
|
|
|
}
|
2015-11-27 16:47:36 +08:00
|
|
|
|
2016-11-15 12:05:47 +08:00
|
|
|
/*
|
|
|
|
* Count number of maps. Each map has a name.
|
|
|
|
* Array of maps is not supported: only the first element is
|
|
|
|
* considered.
|
|
|
|
*
|
|
|
|
* TODO: Detect array of map and report error.
|
|
|
|
*/
|
2019-06-18 03:26:53 +08:00
|
|
|
nr_syms = symbols->d_size / sizeof(GElf_Sym);
|
|
|
|
for (i = 0; i < nr_syms; i++) {
|
2015-11-27 16:47:36 +08:00
|
|
|
GElf_Sym sym;
|
2016-11-15 12:05:47 +08:00
|
|
|
|
|
|
|
if (!gelf_getsym(symbols, i, &sym))
|
|
|
|
continue;
|
|
|
|
if (sym.st_shndx != obj->efile.maps_shndx)
|
|
|
|
continue;
|
|
|
|
nr_maps++;
|
|
|
|
}
|
2017-10-05 22:41:57 +08:00
|
|
|
/* Assume equally sized map definitions */
|
2019-06-18 03:26:53 +08:00
|
|
|
pr_debug("maps in %s: %d maps in %zd bytes\n",
|
|
|
|
obj->path, nr_maps, data->d_size);
|
|
|
|
|
|
|
|
map_def_sz = data->d_size / nr_maps;
|
|
|
|
if (!data->d_size || (data->d_size % nr_maps) != 0) {
|
|
|
|
pr_warning("unable to determine map definition size "
|
|
|
|
"section %s, %d maps in %zd bytes\n",
|
|
|
|
obj->path, nr_maps, data->d_size);
|
|
|
|
return -EINVAL;
|
2018-11-21 12:55:56 +08:00
|
|
|
}
|
2016-11-15 12:05:47 +08:00
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
/* Fill obj->maps using data in "maps" section. */
|
|
|
|
for (i = 0; i < nr_syms; i++) {
|
2016-11-15 12:05:47 +08:00
|
|
|
GElf_Sym sym;
|
2015-11-27 16:47:36 +08:00
|
|
|
const char *map_name;
|
2016-11-15 12:05:47 +08:00
|
|
|
struct bpf_map_def *def;
|
2019-06-18 03:26:53 +08:00
|
|
|
struct bpf_map *map;
|
2015-11-27 16:47:36 +08:00
|
|
|
|
|
|
|
if (!gelf_getsym(symbols, i, &sym))
|
|
|
|
continue;
|
perf bpf: Check relocation target section
Libbpf should check the target section before doing relocation to ensure
the relocation is correct. If not, a bug in LLVM causes an error. See
[1]. Also, if an incorrect BPF script uses both global variable and
map, global variable whould be treated as map and be relocated without
error.
This patch saves the id of the map section into obj->efile and compare
target section of a relocation symbol against it during relocation.
Previous patch introduces a test case about this problem. After this
patch:
# ~/perf test BPF
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
# perf test -v BPF
...
37.3: Test BPF relocation checker :
...
libbpf: loading object '[bpf_relocation_test]' from buffer
libbpf: section .strtab, size 126, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section .data, size 0, link 0, flags 3, type=1
libbpf: section .bss, size 0, link 0, flags 3, type=8
libbpf: section func=sys_write, size 104, link 0, flags 6, type=1
libbpf: found program func=sys_write
libbpf: section .relfunc=sys_write, size 16, link 10, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: maps in [bpf_relocation_test]: 16 bytes
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of [bpf_relocation_test] is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of [bpf_relocation_test] is 40400
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: map 0 is "my_table"
libbpf: collecting relocating info for: 'func=sys_write'
libbpf: Program 'func=sys_write' contains non-map related relo data pointing to section 65522
bpf: failed to load buffer
Compile BPF program failed.
test child finished with 0
---- end ----
Test BPF filter subtest 2: Ok
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1453715801-7732-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-25 17:55:49 +08:00
|
|
|
if (sym.st_shndx != obj->efile.maps_shndx)
|
2015-11-27 16:47:36 +08:00
|
|
|
continue;
|
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
map = bpf_object__add_map(obj);
|
|
|
|
if (IS_ERR(map))
|
|
|
|
return PTR_ERR(map);
|
|
|
|
|
|
|
|
map_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
|
2015-11-27 16:47:36 +08:00
|
|
|
sym.st_name);
|
2019-05-30 01:36:06 +08:00
|
|
|
if (!map_name) {
|
|
|
|
pr_warning("failed to get map #%d name sym string for obj %s\n",
|
2019-06-18 03:26:53 +08:00
|
|
|
i, obj->path);
|
2019-05-30 01:36:06 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
map->libbpf_type = LIBBPF_MAP_UNSPEC;
|
2019-06-18 03:26:54 +08:00
|
|
|
map->sec_idx = sym.st_shndx;
|
|
|
|
map->sec_offset = sym.st_value;
|
|
|
|
pr_debug("map '%s' (legacy): at sec_idx %d, offset %zu.\n",
|
|
|
|
map_name, map->sec_idx, map->sec_offset);
|
2017-10-05 22:41:57 +08:00
|
|
|
if (sym.st_value + map_def_sz > data->d_size) {
|
2016-11-15 12:05:47 +08:00
|
|
|
pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
|
|
|
|
obj->path, map_name);
|
|
|
|
return -EINVAL;
|
2015-11-27 16:47:36 +08:00
|
|
|
}
|
2016-11-15 12:05:47 +08:00
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
map->name = strdup(map_name);
|
|
|
|
if (!map->name) {
|
2015-12-08 10:25:29 +08:00
|
|
|
pr_warning("failed to alloc map name\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2019-06-18 03:26:53 +08:00
|
|
|
pr_debug("map %d is \"%s\"\n", i, map->name);
|
2016-11-15 12:05:47 +08:00
|
|
|
def = (struct bpf_map_def *)(data->d_buf + sym.st_value);
|
2017-10-05 22:41:57 +08:00
|
|
|
/*
|
|
|
|
* If the definition of the map in the object file fits in
|
|
|
|
* bpf_map_def, copy it. Any extra fields in our version
|
|
|
|
* of bpf_map_def will default to zero as a result of the
|
|
|
|
* calloc above.
|
|
|
|
*/
|
|
|
|
if (map_def_sz <= sizeof(struct bpf_map_def)) {
|
2019-06-18 03:26:53 +08:00
|
|
|
memcpy(&map->def, def, map_def_sz);
|
2017-10-05 22:41:57 +08:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Here the map structure being read is bigger than what
|
|
|
|
* we expect, truncate if the excess bits are all zero.
|
|
|
|
* If they are not zero, reject this map as
|
|
|
|
* incompatible.
|
|
|
|
*/
|
|
|
|
char *b;
|
|
|
|
for (b = ((char *)def) + sizeof(struct bpf_map_def);
|
|
|
|
b < ((char *)def) + map_def_sz; b++) {
|
|
|
|
if (*b != 0) {
|
|
|
|
pr_warning("maps section in %s: \"%s\" "
|
|
|
|
"has unrecognized, non-zero "
|
|
|
|
"options\n",
|
|
|
|
obj->path, map_name);
|
2018-10-16 02:19:55 +08:00
|
|
|
if (strict)
|
|
|
|
return -EINVAL;
|
2017-10-05 22:41:57 +08:00
|
|
|
}
|
|
|
|
}
|
2019-06-18 03:26:53 +08:00
|
|
|
memcpy(&map->def, def, sizeof(struct bpf_map_def));
|
2017-10-05 22:41:57 +08:00
|
|
|
}
|
2015-11-27 16:47:36 +08:00
|
|
|
}
|
2019-06-18 03:26:53 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2016-11-15 12:05:47 +08:00
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
|
|
|
|
__u32 id)
|
|
|
|
{
|
|
|
|
const struct btf_type *t = btf__type_by_id(btf, id);
|
2019-04-24 06:45:56 +08:00
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
while (true) {
|
|
|
|
switch (BTF_INFO_KIND(t->info)) {
|
|
|
|
case BTF_KIND_VOLATILE:
|
|
|
|
case BTF_KIND_CONST:
|
|
|
|
case BTF_KIND_RESTRICT:
|
|
|
|
case BTF_KIND_TYPEDEF:
|
|
|
|
t = btf__type_by_id(btf, t->type);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return t;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool get_map_field_int(const char *map_name,
|
|
|
|
const struct btf *btf,
|
|
|
|
const struct btf_type *def,
|
|
|
|
const struct btf_member *m,
|
|
|
|
const void *data, __u32 *res) {
|
|
|
|
const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
|
|
|
|
const char *name = btf__name_by_offset(btf, m->name_off);
|
|
|
|
__u32 int_info = *(const __u32 *)(const void *)(t + 1);
|
|
|
|
|
|
|
|
if (BTF_INFO_KIND(t->info) != BTF_KIND_INT) {
|
|
|
|
pr_warning("map '%s': attr '%s': expected INT, got %u.\n",
|
|
|
|
map_name, name, BTF_INFO_KIND(t->info));
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
if (t->size != 4 || BTF_INT_BITS(int_info) != 32 ||
|
|
|
|
BTF_INT_OFFSET(int_info)) {
|
|
|
|
pr_warning("map '%s': attr '%s': expected 32-bit non-bitfield integer, "
|
|
|
|
"got %u-byte (%d-bit) one with bit offset %d.\n",
|
|
|
|
map_name, name, t->size, BTF_INT_BITS(int_info),
|
|
|
|
BTF_INT_OFFSET(int_info));
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
if (BTF_INFO_KFLAG(def->info) && BTF_MEMBER_BITFIELD_SIZE(m->offset)) {
|
|
|
|
pr_warning("map '%s': attr '%s': bitfield is not supported.\n",
|
|
|
|
map_name, name);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
if (m->offset % 32) {
|
|
|
|
pr_warning("map '%s': attr '%s': unaligned fields are not supported.\n",
|
|
|
|
map_name, name);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
*res = *(const __u32 *)(data + m->offset / 8);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_object__init_user_btf_map(struct bpf_object *obj,
|
|
|
|
const struct btf_type *sec,
|
|
|
|
int var_idx, int sec_idx,
|
|
|
|
const Elf_Data *data, bool strict)
|
|
|
|
{
|
|
|
|
const struct btf_type *var, *def, *t;
|
|
|
|
const struct btf_var_secinfo *vi;
|
|
|
|
const struct btf_var *var_extra;
|
|
|
|
const struct btf_member *m;
|
|
|
|
const void *def_data;
|
|
|
|
const char *map_name;
|
|
|
|
struct bpf_map *map;
|
|
|
|
int vlen, i;
|
|
|
|
|
|
|
|
vi = (const struct btf_var_secinfo *)(const void *)(sec + 1) + var_idx;
|
|
|
|
var = btf__type_by_id(obj->btf, vi->type);
|
|
|
|
var_extra = (const void *)(var + 1);
|
|
|
|
map_name = btf__name_by_offset(obj->btf, var->name_off);
|
|
|
|
vlen = BTF_INFO_VLEN(var->info);
|
|
|
|
|
|
|
|
if (map_name == NULL || map_name[0] == '\0') {
|
|
|
|
pr_warning("map #%d: empty name.\n", var_idx);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if ((__u64)vi->offset + vi->size > data->d_size) {
|
|
|
|
pr_warning("map '%s' BTF data is corrupted.\n", map_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (BTF_INFO_KIND(var->info) != BTF_KIND_VAR) {
|
|
|
|
pr_warning("map '%s': unexpected var kind %u.\n",
|
|
|
|
map_name, BTF_INFO_KIND(var->info));
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED &&
|
|
|
|
var_extra->linkage != BTF_VAR_STATIC) {
|
|
|
|
pr_warning("map '%s': unsupported var linkage %u.\n",
|
|
|
|
map_name, var_extra->linkage);
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
def = skip_mods_and_typedefs(obj->btf, var->type);
|
|
|
|
if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
|
|
|
|
pr_warning("map '%s': unexpected def kind %u.\n",
|
|
|
|
map_name, BTF_INFO_KIND(var->info));
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (def->size > vi->size) {
|
|
|
|
pr_warning("map '%s': invalid def size.\n", map_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
map = bpf_object__add_map(obj);
|
|
|
|
if (IS_ERR(map))
|
|
|
|
return PTR_ERR(map);
|
|
|
|
map->name = strdup(map_name);
|
|
|
|
if (!map->name) {
|
|
|
|
pr_warning("map '%s': failed to alloc map name.\n", map_name);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
map->libbpf_type = LIBBPF_MAP_UNSPEC;
|
|
|
|
map->def.type = BPF_MAP_TYPE_UNSPEC;
|
|
|
|
map->sec_idx = sec_idx;
|
|
|
|
map->sec_offset = vi->offset;
|
|
|
|
pr_debug("map '%s': at sec_idx %d, offset %zu.\n",
|
|
|
|
map_name, map->sec_idx, map->sec_offset);
|
|
|
|
|
|
|
|
def_data = data->d_buf + vi->offset;
|
|
|
|
vlen = BTF_INFO_VLEN(def->info);
|
|
|
|
m = (const void *)(def + 1);
|
|
|
|
for (i = 0; i < vlen; i++, m++) {
|
|
|
|
const char *name = btf__name_by_offset(obj->btf, m->name_off);
|
|
|
|
|
|
|
|
if (!name) {
|
|
|
|
pr_warning("map '%s': invalid field #%d.\n",
|
|
|
|
map_name, i);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (strcmp(name, "type") == 0) {
|
|
|
|
if (!get_map_field_int(map_name, obj->btf, def, m,
|
|
|
|
def_data, &map->def.type))
|
|
|
|
return -EINVAL;
|
|
|
|
pr_debug("map '%s': found type = %u.\n",
|
|
|
|
map_name, map->def.type);
|
|
|
|
} else if (strcmp(name, "max_entries") == 0) {
|
|
|
|
if (!get_map_field_int(map_name, obj->btf, def, m,
|
|
|
|
def_data, &map->def.max_entries))
|
|
|
|
return -EINVAL;
|
|
|
|
pr_debug("map '%s': found max_entries = %u.\n",
|
|
|
|
map_name, map->def.max_entries);
|
|
|
|
} else if (strcmp(name, "map_flags") == 0) {
|
|
|
|
if (!get_map_field_int(map_name, obj->btf, def, m,
|
|
|
|
def_data, &map->def.map_flags))
|
|
|
|
return -EINVAL;
|
|
|
|
pr_debug("map '%s': found map_flags = %u.\n",
|
|
|
|
map_name, map->def.map_flags);
|
|
|
|
} else if (strcmp(name, "key_size") == 0) {
|
|
|
|
__u32 sz;
|
|
|
|
|
|
|
|
if (!get_map_field_int(map_name, obj->btf, def, m,
|
|
|
|
def_data, &sz))
|
|
|
|
return -EINVAL;
|
|
|
|
pr_debug("map '%s': found key_size = %u.\n",
|
|
|
|
map_name, sz);
|
|
|
|
if (map->def.key_size && map->def.key_size != sz) {
|
2019-06-20 00:27:42 +08:00
|
|
|
pr_warning("map '%s': conflicting key size %u != %u.\n",
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
map_name, map->def.key_size, sz);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
map->def.key_size = sz;
|
|
|
|
} else if (strcmp(name, "key") == 0) {
|
|
|
|
__s64 sz;
|
|
|
|
|
|
|
|
t = btf__type_by_id(obj->btf, m->type);
|
|
|
|
if (!t) {
|
|
|
|
pr_warning("map '%s': key type [%d] not found.\n",
|
|
|
|
map_name, m->type);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
|
|
|
|
pr_warning("map '%s': key spec is not PTR: %u.\n",
|
|
|
|
map_name, BTF_INFO_KIND(t->info));
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
sz = btf__resolve_size(obj->btf, t->type);
|
|
|
|
if (sz < 0) {
|
|
|
|
pr_warning("map '%s': can't determine key size for type [%u]: %lld.\n",
|
|
|
|
map_name, t->type, sz);
|
|
|
|
return sz;
|
|
|
|
}
|
|
|
|
pr_debug("map '%s': found key [%u], sz = %lld.\n",
|
|
|
|
map_name, t->type, sz);
|
|
|
|
if (map->def.key_size && map->def.key_size != sz) {
|
2019-06-20 00:27:42 +08:00
|
|
|
pr_warning("map '%s': conflicting key size %u != %lld.\n",
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
map_name, map->def.key_size, sz);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
map->def.key_size = sz;
|
|
|
|
map->btf_key_type_id = t->type;
|
|
|
|
} else if (strcmp(name, "value_size") == 0) {
|
|
|
|
__u32 sz;
|
|
|
|
|
|
|
|
if (!get_map_field_int(map_name, obj->btf, def, m,
|
|
|
|
def_data, &sz))
|
|
|
|
return -EINVAL;
|
|
|
|
pr_debug("map '%s': found value_size = %u.\n",
|
|
|
|
map_name, sz);
|
|
|
|
if (map->def.value_size && map->def.value_size != sz) {
|
2019-06-20 00:27:42 +08:00
|
|
|
pr_warning("map '%s': conflicting value size %u != %u.\n",
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
map_name, map->def.value_size, sz);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
map->def.value_size = sz;
|
|
|
|
} else if (strcmp(name, "value") == 0) {
|
|
|
|
__s64 sz;
|
|
|
|
|
|
|
|
t = btf__type_by_id(obj->btf, m->type);
|
|
|
|
if (!t) {
|
|
|
|
pr_warning("map '%s': value type [%d] not found.\n",
|
|
|
|
map_name, m->type);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
|
|
|
|
pr_warning("map '%s': value spec is not PTR: %u.\n",
|
|
|
|
map_name, BTF_INFO_KIND(t->info));
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
sz = btf__resolve_size(obj->btf, t->type);
|
|
|
|
if (sz < 0) {
|
|
|
|
pr_warning("map '%s': can't determine value size for type [%u]: %lld.\n",
|
|
|
|
map_name, t->type, sz);
|
|
|
|
return sz;
|
|
|
|
}
|
|
|
|
pr_debug("map '%s': found value [%u], sz = %lld.\n",
|
|
|
|
map_name, t->type, sz);
|
|
|
|
if (map->def.value_size && map->def.value_size != sz) {
|
2019-06-20 00:27:42 +08:00
|
|
|
pr_warning("map '%s': conflicting value size %u != %lld.\n",
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
map_name, map->def.value_size, sz);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
map->def.value_size = sz;
|
|
|
|
map->btf_value_type_id = t->type;
|
|
|
|
} else {
|
|
|
|
if (strict) {
|
|
|
|
pr_warning("map '%s': unknown field '%s'.\n",
|
|
|
|
map_name, name);
|
|
|
|
return -ENOTSUP;
|
|
|
|
}
|
|
|
|
pr_debug("map '%s': ignoring unknown field '%s'.\n",
|
|
|
|
map_name, name);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (map->def.type == BPF_MAP_TYPE_UNSPEC) {
|
|
|
|
pr_warning("map '%s': map type isn't specified.\n", map_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict)
|
|
|
|
{
|
|
|
|
const struct btf_type *sec = NULL;
|
|
|
|
int nr_types, i, vlen, err;
|
|
|
|
const struct btf_type *t;
|
|
|
|
const char *name;
|
|
|
|
Elf_Data *data;
|
|
|
|
Elf_Scn *scn;
|
|
|
|
|
|
|
|
if (obj->efile.btf_maps_shndx < 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx);
|
|
|
|
if (scn)
|
|
|
|
data = elf_getdata(scn, NULL);
|
|
|
|
if (!scn || !data) {
|
|
|
|
pr_warning("failed to get Elf_Data from map section %d (%s)\n",
|
|
|
|
obj->efile.maps_shndx, MAPS_ELF_SEC);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
nr_types = btf__get_nr_types(obj->btf);
|
|
|
|
for (i = 1; i <= nr_types; i++) {
|
|
|
|
t = btf__type_by_id(obj->btf, i);
|
|
|
|
if (BTF_INFO_KIND(t->info) != BTF_KIND_DATASEC)
|
|
|
|
continue;
|
|
|
|
name = btf__name_by_offset(obj->btf, t->name_off);
|
|
|
|
if (strcmp(name, MAPS_ELF_SEC) == 0) {
|
|
|
|
sec = t;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!sec) {
|
|
|
|
pr_warning("DATASEC '%s' not found.\n", MAPS_ELF_SEC);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
vlen = BTF_INFO_VLEN(sec->info);
|
|
|
|
for (i = 0; i < vlen; i++) {
|
|
|
|
err = bpf_object__init_user_btf_map(obj, sec, i,
|
|
|
|
obj->efile.btf_maps_shndx,
|
|
|
|
data, strict);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
static int bpf_object__init_maps(struct bpf_object *obj, int flags)
|
|
|
|
{
|
|
|
|
bool strict = !(flags & MAPS_RELAX_COMPAT);
|
|
|
|
int err;
|
2019-04-24 06:45:56 +08:00
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
err = bpf_object__init_user_maps(obj, strict);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
err = bpf_object__init_user_btf_maps(obj, strict);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2019-06-18 03:26:53 +08:00
|
|
|
err = bpf_object__init_global_data_maps(obj);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (obj->nr_maps) {
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]),
|
|
|
|
compare_bpf_map);
|
2019-06-18 03:26:53 +08:00
|
|
|
}
|
|
|
|
return 0;
|
2015-11-27 16:47:36 +08:00
|
|
|
}
|
|
|
|
|
2018-02-08 19:48:32 +08:00
|
|
|
static bool section_have_execinstr(struct bpf_object *obj, int idx)
|
|
|
|
{
|
|
|
|
Elf_Scn *scn;
|
|
|
|
GElf_Shdr sh;
|
|
|
|
|
|
|
|
scn = elf_getscn(obj->efile.elf, idx);
|
|
|
|
if (!scn)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (gelf_getshdr(scn, &sh) != &sh)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (sh.sh_flags & SHF_EXECINSTR)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
static void bpf_object__sanitize_btf(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
bool has_datasec = obj->caps.btf_datasec;
|
|
|
|
bool has_func = obj->caps.btf_func;
|
|
|
|
struct btf *btf = obj->btf;
|
|
|
|
struct btf_type *t;
|
|
|
|
int i, j, vlen;
|
|
|
|
__u16 kind;
|
|
|
|
|
|
|
|
if (!obj->btf || (has_func && has_datasec))
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 1; i <= btf__get_nr_types(btf); i++) {
|
|
|
|
t = (struct btf_type *)btf__type_by_id(btf, i);
|
|
|
|
kind = BTF_INFO_KIND(t->info);
|
|
|
|
|
|
|
|
if (!has_datasec && kind == BTF_KIND_VAR) {
|
|
|
|
/* replace VAR with INT */
|
|
|
|
t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
|
|
|
|
t->size = sizeof(int);
|
|
|
|
*(int *)(t+1) = BTF_INT_ENC(0, 0, 32);
|
|
|
|
} else if (!has_datasec && kind == BTF_KIND_DATASEC) {
|
|
|
|
/* replace DATASEC with STRUCT */
|
|
|
|
struct btf_var_secinfo *v = (void *)(t + 1);
|
|
|
|
struct btf_member *m = (void *)(t + 1);
|
|
|
|
struct btf_type *vt;
|
|
|
|
char *name;
|
|
|
|
|
|
|
|
name = (char *)btf__name_by_offset(btf, t->name_off);
|
|
|
|
while (*name) {
|
|
|
|
if (*name == '.')
|
|
|
|
*name = '_';
|
|
|
|
name++;
|
|
|
|
}
|
|
|
|
|
|
|
|
vlen = BTF_INFO_VLEN(t->info);
|
|
|
|
t->info = BTF_INFO_ENC(BTF_KIND_STRUCT, 0, vlen);
|
|
|
|
for (j = 0; j < vlen; j++, v++, m++) {
|
|
|
|
/* order of field assignments is important */
|
|
|
|
m->offset = v->offset * 8;
|
|
|
|
m->type = v->type;
|
|
|
|
/* preserve variable name as member name */
|
|
|
|
vt = (void *)btf__type_by_id(btf, v->type);
|
|
|
|
m->name_off = vt->name_off;
|
|
|
|
}
|
|
|
|
} else if (!has_func && kind == BTF_KIND_FUNC_PROTO) {
|
|
|
|
/* replace FUNC_PROTO with ENUM */
|
|
|
|
vlen = BTF_INFO_VLEN(t->info);
|
|
|
|
t->info = BTF_INFO_ENC(BTF_KIND_ENUM, 0, vlen);
|
|
|
|
t->size = sizeof(__u32); /* kernel enforced */
|
|
|
|
} else if (!has_func && kind == BTF_KIND_FUNC) {
|
|
|
|
/* replace FUNC with TYPEDEF */
|
|
|
|
t->info = BTF_INFO_ENC(BTF_KIND_TYPEDEF, 0, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bpf_object__sanitize_btf_ext(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
if (!obj->btf_ext)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (!obj->caps.btf_func) {
|
|
|
|
btf_ext__free(obj->btf_ext);
|
|
|
|
obj->btf_ext = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
static bool bpf_object__is_btf_mandatory(const struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
return obj->efile.btf_maps_shndx >= 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 03:26:55 +08:00
|
|
|
static int bpf_object__init_btf(struct bpf_object *obj,
|
2019-06-18 03:26:51 +08:00
|
|
|
Elf_Data *btf_data,
|
|
|
|
Elf_Data *btf_ext_data)
|
|
|
|
{
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
bool btf_required = bpf_object__is_btf_mandatory(obj);
|
2019-06-18 03:26:51 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (btf_data) {
|
|
|
|
obj->btf = btf__new(btf_data->d_buf, btf_data->d_size);
|
|
|
|
if (IS_ERR(obj->btf)) {
|
|
|
|
pr_warning("Error loading ELF section %s: %d.\n",
|
|
|
|
BTF_ELF_SEC, err);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
err = btf__finalize_data(obj, obj->btf);
|
|
|
|
if (err) {
|
|
|
|
pr_warning("Error finalizing %s: %d.\n",
|
|
|
|
BTF_ELF_SEC, err);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (btf_ext_data) {
|
|
|
|
if (!obj->btf) {
|
|
|
|
pr_debug("Ignore ELF section %s because its depending ELF section %s is not found.\n",
|
|
|
|
BTF_EXT_ELF_SEC, BTF_ELF_SEC);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
obj->btf_ext = btf_ext__new(btf_ext_data->d_buf,
|
|
|
|
btf_ext_data->d_size);
|
|
|
|
if (IS_ERR(obj->btf_ext)) {
|
|
|
|
pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
|
|
|
|
BTF_EXT_ELF_SEC, PTR_ERR(obj->btf_ext));
|
|
|
|
obj->btf_ext = NULL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
if (err || IS_ERR(obj->btf)) {
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
if (btf_required)
|
|
|
|
err = err ? : PTR_ERR(obj->btf);
|
|
|
|
else
|
|
|
|
err = 0;
|
2019-06-18 03:26:51 +08:00
|
|
|
if (!IS_ERR_OR_NULL(obj->btf))
|
|
|
|
btf__free(obj->btf);
|
|
|
|
obj->btf = NULL;
|
|
|
|
}
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
if (btf_required && !obj->btf) {
|
|
|
|
pr_warning("BTF is required, but is missing or corrupted.\n");
|
|
|
|
return err == 0 ? -ENOENT : err;
|
|
|
|
}
|
2019-06-18 03:26:51 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 03:26:55 +08:00
|
|
|
static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (!obj->btf)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
bpf_object__sanitize_btf(obj);
|
|
|
|
bpf_object__sanitize_btf_ext(obj);
|
|
|
|
|
|
|
|
err = btf__load(obj->btf);
|
|
|
|
if (err) {
|
|
|
|
pr_warning("Error loading %s into kernel: %d.\n",
|
|
|
|
BTF_ELF_SEC, err);
|
|
|
|
btf__free(obj->btf);
|
|
|
|
obj->btf = NULL;
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
if (bpf_object__is_btf_mandatory(obj))
|
|
|
|
return err;
|
2019-06-18 03:26:55 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-10-16 02:19:55 +08:00
|
|
|
static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
|
2015-07-01 10:13:56 +08:00
|
|
|
{
|
|
|
|
Elf *elf = obj->efile.elf;
|
|
|
|
GElf_Ehdr *ep = &obj->efile.ehdr;
|
2018-12-08 08:42:29 +08:00
|
|
|
Elf_Data *btf_ext_data = NULL;
|
2019-04-10 05:20:14 +08:00
|
|
|
Elf_Data *btf_data = NULL;
|
2015-07-01 10:13:56 +08:00
|
|
|
Elf_Scn *scn = NULL;
|
perf bpf: Check relocation target section
Libbpf should check the target section before doing relocation to ensure
the relocation is correct. If not, a bug in LLVM causes an error. See
[1]. Also, if an incorrect BPF script uses both global variable and
map, global variable whould be treated as map and be relocated without
error.
This patch saves the id of the map section into obj->efile and compare
target section of a relocation symbol against it during relocation.
Previous patch introduces a test case about this problem. After this
patch:
# ~/perf test BPF
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
# perf test -v BPF
...
37.3: Test BPF relocation checker :
...
libbpf: loading object '[bpf_relocation_test]' from buffer
libbpf: section .strtab, size 126, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section .data, size 0, link 0, flags 3, type=1
libbpf: section .bss, size 0, link 0, flags 3, type=8
libbpf: section func=sys_write, size 104, link 0, flags 6, type=1
libbpf: found program func=sys_write
libbpf: section .relfunc=sys_write, size 16, link 10, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: maps in [bpf_relocation_test]: 16 bytes
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of [bpf_relocation_test] is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of [bpf_relocation_test] is 40400
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: map 0 is "my_table"
libbpf: collecting relocating info for: 'func=sys_write'
libbpf: Program 'func=sys_write' contains non-map related relo data pointing to section 65522
bpf: failed to load buffer
Compile BPF program failed.
test child finished with 0
---- end ----
Test BPF filter subtest 2: Ok
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1453715801-7732-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-25 17:55:49 +08:00
|
|
|
int idx = 0, err = 0;
|
2015-07-01 10:13:56 +08:00
|
|
|
|
|
|
|
/* Elf is corrupted/truncated, avoid calling elf_strptr. */
|
|
|
|
if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) {
|
2019-05-30 01:36:11 +08:00
|
|
|
pr_warning("failed to get e_shstrndx from %s\n", obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
while ((scn = elf_nextscn(elf, scn)) != NULL) {
|
|
|
|
char *name;
|
|
|
|
GElf_Shdr sh;
|
|
|
|
Elf_Data *data;
|
|
|
|
|
|
|
|
idx++;
|
|
|
|
if (gelf_getshdr(scn, &sh) != &sh) {
|
2018-02-08 19:48:17 +08:00
|
|
|
pr_warning("failed to get section(%d) header from %s\n",
|
|
|
|
idx, obj->path);
|
2019-06-18 03:26:52 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
|
|
|
|
if (!name) {
|
2018-02-08 19:48:17 +08:00
|
|
|
pr_warning("failed to get section(%d) name from %s\n",
|
|
|
|
idx, obj->path);
|
2019-06-18 03:26:52 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
data = elf_getdata(scn, 0);
|
|
|
|
if (!data) {
|
2018-02-08 19:48:17 +08:00
|
|
|
pr_warning("failed to get section(%d) data from %s(%s)\n",
|
|
|
|
idx, name, obj->path);
|
2019-06-18 03:26:52 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:13:56 +08:00
|
|
|
}
|
2018-02-08 19:48:17 +08:00
|
|
|
pr_debug("section(%d) %s, size %ld, link %d, flags %lx, type=%d\n",
|
|
|
|
idx, name, (unsigned long)data->d_size,
|
2015-07-01 10:13:56 +08:00
|
|
|
(int)sh.sh_link, (unsigned long)sh.sh_flags,
|
|
|
|
(int)sh.sh_type);
|
2015-07-01 10:13:57 +08:00
|
|
|
|
2019-04-10 05:20:14 +08:00
|
|
|
if (strcmp(name, "license") == 0) {
|
2015-07-01 10:13:57 +08:00
|
|
|
err = bpf_object__init_license(obj,
|
|
|
|
data->d_buf,
|
|
|
|
data->d_size);
|
2019-06-18 03:26:52 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
2019-04-10 05:20:14 +08:00
|
|
|
} else if (strcmp(name, "version") == 0) {
|
2015-07-01 10:13:57 +08:00
|
|
|
err = bpf_object__init_kversion(obj,
|
|
|
|
data->d_buf,
|
|
|
|
data->d_size);
|
2019-06-18 03:26:52 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
2019-04-10 05:20:14 +08:00
|
|
|
} else if (strcmp(name, "maps") == 0) {
|
perf bpf: Check relocation target section
Libbpf should check the target section before doing relocation to ensure
the relocation is correct. If not, a bug in LLVM causes an error. See
[1]. Also, if an incorrect BPF script uses both global variable and
map, global variable whould be treated as map and be relocated without
error.
This patch saves the id of the map section into obj->efile and compare
target section of a relocation symbol against it during relocation.
Previous patch introduces a test case about this problem. After this
patch:
# ~/perf test BPF
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
# perf test -v BPF
...
37.3: Test BPF relocation checker :
...
libbpf: loading object '[bpf_relocation_test]' from buffer
libbpf: section .strtab, size 126, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section .data, size 0, link 0, flags 3, type=1
libbpf: section .bss, size 0, link 0, flags 3, type=8
libbpf: section func=sys_write, size 104, link 0, flags 6, type=1
libbpf: found program func=sys_write
libbpf: section .relfunc=sys_write, size 16, link 10, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: maps in [bpf_relocation_test]: 16 bytes
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of [bpf_relocation_test] is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of [bpf_relocation_test] is 40400
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: map 0 is "my_table"
libbpf: collecting relocating info for: 'func=sys_write'
libbpf: Program 'func=sys_write' contains non-map related relo data pointing to section 65522
bpf: failed to load buffer
Compile BPF program failed.
test child finished with 0
---- end ----
Test BPF filter subtest 2: Ok
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1453715801-7732-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-25 17:55:49 +08:00
|
|
|
obj->efile.maps_shndx = idx;
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
} else if (strcmp(name, MAPS_ELF_SEC) == 0) {
|
|
|
|
obj->efile.btf_maps_shndx = idx;
|
2019-04-10 05:20:14 +08:00
|
|
|
} else if (strcmp(name, BTF_ELF_SEC) == 0) {
|
|
|
|
btf_data = data;
|
2018-11-20 07:29:16 +08:00
|
|
|
} else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) {
|
2018-12-08 08:42:29 +08:00
|
|
|
btf_ext_data = data;
|
2018-04-19 06:56:05 +08:00
|
|
|
} else if (sh.sh_type == SHT_SYMTAB) {
|
bpf tools: Collect symbol table from SHT_SYMTAB section
This patch collects symbols section. This section is useful when linking
BPF maps.
What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.
BPF programs should be written in this way:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(unsigned long),
.value_size = sizeof(unsigned long),
.max_entries = 1000000,
};
SEC("my_func=sys_write")
int my_func(void *ctx)
{
...
bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
...
}
Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.
However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:
1. In relocation section (type == SHT_REL), positions of each such
'ldr_64' instruction are recorded with a reference of an entry in
symbol table (SHT_SYMTAB);
2. From records in symbol table we can find the indics of map
variables.
Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.
This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:59 +08:00
|
|
|
if (obj->efile.symbols) {
|
|
|
|
pr_warning("bpf: multiple SYMTAB in %s\n",
|
|
|
|
obj->path);
|
2019-06-18 03:26:52 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-12-08 10:25:30 +08:00
|
|
|
}
|
2019-06-18 03:26:52 +08:00
|
|
|
obj->efile.symbols = data;
|
|
|
|
obj->efile.strtabidx = sh.sh_link;
|
2019-04-10 05:20:12 +08:00
|
|
|
} else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) {
|
|
|
|
if (sh.sh_flags & SHF_EXECINSTR) {
|
|
|
|
if (strcmp(name, ".text") == 0)
|
|
|
|
obj->efile.text_shndx = idx;
|
|
|
|
err = bpf_object__add_program(obj, data->d_buf,
|
|
|
|
data->d_size, name, idx);
|
|
|
|
if (err) {
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
char *cp = libbpf_strerror_r(-err, errmsg,
|
|
|
|
sizeof(errmsg));
|
|
|
|
|
|
|
|
pr_warning("failed to alloc program %s (%s): %s",
|
|
|
|
name, obj->path, cp);
|
2019-06-18 03:26:52 +08:00
|
|
|
return err;
|
2019-04-10 05:20:12 +08:00
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
} else if (strcmp(name, ".data") == 0) {
|
|
|
|
obj->efile.data = data;
|
|
|
|
obj->efile.data_shndx = idx;
|
|
|
|
} else if (strcmp(name, ".rodata") == 0) {
|
|
|
|
obj->efile.rodata = data;
|
|
|
|
obj->efile.rodata_shndx = idx;
|
|
|
|
} else {
|
|
|
|
pr_debug("skip section(%d) %s\n", idx, name);
|
2015-07-01 10:14:00 +08:00
|
|
|
}
|
2015-07-01 10:14:01 +08:00
|
|
|
} else if (sh.sh_type == SHT_REL) {
|
2019-06-18 03:26:52 +08:00
|
|
|
int nr_reloc = obj->efile.nr_reloc;
|
2015-07-01 10:14:01 +08:00
|
|
|
void *reloc = obj->efile.reloc;
|
2018-02-08 19:48:32 +08:00
|
|
|
int sec = sh.sh_info; /* points to other section */
|
|
|
|
|
|
|
|
/* Only do relo for section with exec instructions */
|
|
|
|
if (!section_have_execinstr(obj, sec)) {
|
|
|
|
pr_debug("skip relo %s(%d) for section(%d)\n",
|
|
|
|
name, idx, sec);
|
|
|
|
continue;
|
|
|
|
}
|
2015-07-01 10:14:01 +08:00
|
|
|
|
2019-06-18 03:26:52 +08:00
|
|
|
reloc = reallocarray(reloc, nr_reloc + 1,
|
2018-07-11 05:43:05 +08:00
|
|
|
sizeof(*obj->efile.reloc));
|
2015-07-01 10:14:01 +08:00
|
|
|
if (!reloc) {
|
|
|
|
pr_warning("realloc failed\n");
|
2019-06-18 03:26:52 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2015-07-01 10:14:01 +08:00
|
|
|
|
2019-06-18 03:26:52 +08:00
|
|
|
obj->efile.reloc = reloc;
|
|
|
|
obj->efile.nr_reloc++;
|
2015-07-01 10:14:01 +08:00
|
|
|
|
2019-06-18 03:26:52 +08:00
|
|
|
obj->efile.reloc[nr_reloc].shdr = sh;
|
|
|
|
obj->efile.reloc[nr_reloc].data = data;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
} else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
|
|
|
|
obj->efile.bss = data;
|
|
|
|
obj->efile.bss_shndx = idx;
|
2018-02-08 19:48:17 +08:00
|
|
|
} else {
|
|
|
|
pr_debug("skip section(%d) %s\n", idx, name);
|
bpf tools: Collect symbol table from SHT_SYMTAB section
This patch collects symbols section. This section is useful when linking
BPF maps.
What 'bpf_map_xxx()' functions actually require are map's file
descriptors (and the internal verifier converts fds into pointers to
'struct bpf_map'), which we don't know when compiling. Therefore, we
should make compiler generate a 'ldr_64 r1, <imm>' instruction, and
fill the 'imm' field with the actual file descriptor when loading in
libbpf.
BPF programs should be written in this way:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(unsigned long),
.value_size = sizeof(unsigned long),
.max_entries = 1000000,
};
SEC("my_func=sys_write")
int my_func(void *ctx)
{
...
bpf_map_update_elem(&my_map, &key, &value, BPF_ANY);
...
}
Compiler should convert '&my_map' into a 'ldr_64, r1, <imm>'
instruction, where imm should be the address of 'my_map'. According to
the address, libbpf knows which map it actually referenced, and then
fills the imm field with the 'fd' of that map created by it.
However, since we never really 'link' the object file, the imm field is
only a record in relocation section. Therefore libbpf should do the
relocation:
1. In relocation section (type == SHT_REL), positions of each such
'ldr_64' instruction are recorded with a reference of an entry in
symbol table (SHT_SYMTAB);
2. From records in symbol table we can find the indics of map
variables.
Libbpf first record SHT_SYMTAB and positions of each instruction which
required bu such operation. Then create file descriptor. Finally, after
map creation complete, replace the imm field.
This is the first patch of BPF map related stuff. It records SHT_SYMTAB
into object's efile field for further use.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 10:13:59 +08:00
|
|
|
}
|
2015-07-01 10:13:56 +08:00
|
|
|
}
|
2015-11-27 16:47:36 +08:00
|
|
|
|
2015-12-08 10:25:30 +08:00
|
|
|
if (!obj->efile.strtabidx || obj->efile.strtabidx >= idx) {
|
|
|
|
pr_warning("Corrupted ELF file: index of strtab invalid\n");
|
2019-05-30 01:36:07 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-12-08 10:25:30 +08:00
|
|
|
}
|
2019-06-18 03:26:55 +08:00
|
|
|
err = bpf_object__init_btf(obj, btf_data, btf_ext_data);
|
2019-06-18 03:26:53 +08:00
|
|
|
if (!err)
|
2018-10-16 02:19:55 +08:00
|
|
|
err = bpf_object__init_maps(obj, flags);
|
2019-06-18 03:26:55 +08:00
|
|
|
if (!err)
|
|
|
|
err = bpf_object__sanitize_and_load_btf(obj);
|
2019-06-18 03:26:53 +08:00
|
|
|
if (!err)
|
|
|
|
err = bpf_object__init_prog_names(obj);
|
2015-07-01 10:13:56 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:02 +08:00
|
|
|
static struct bpf_program *
|
|
|
|
bpf_object__find_prog_by_idx(struct bpf_object *obj, int idx)
|
|
|
|
{
|
|
|
|
struct bpf_program *prog;
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
for (i = 0; i < obj->nr_programs; i++) {
|
|
|
|
prog = &obj->programs[i];
|
|
|
|
if (prog->idx == idx)
|
|
|
|
return prog;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-07-27 05:32:19 +08:00
|
|
|
struct bpf_program *
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_object__find_program_by_title(const struct bpf_object *obj,
|
|
|
|
const char *title)
|
2018-07-27 05:32:19 +08:00
|
|
|
{
|
|
|
|
struct bpf_program *pos;
|
|
|
|
|
|
|
|
bpf_object__for_each_program(pos, obj) {
|
|
|
|
if (pos->section_name && !strcmp(pos->section_name, title))
|
|
|
|
return pos;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
|
|
|
|
int shndx)
|
|
|
|
{
|
|
|
|
return shndx == obj->efile.data_shndx ||
|
|
|
|
shndx == obj->efile.bss_shndx ||
|
|
|
|
shndx == obj->efile.rodata_shndx;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
|
|
|
|
int shndx)
|
|
|
|
{
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
return shndx == obj->efile.maps_shndx ||
|
|
|
|
shndx == obj->efile.btf_maps_shndx;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
|
|
|
|
int shndx)
|
|
|
|
{
|
|
|
|
return shndx == obj->efile.text_shndx ||
|
|
|
|
bpf_object__shndx_is_maps(obj, shndx) ||
|
|
|
|
bpf_object__shndx_is_data(obj, shndx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static enum libbpf_map_type
|
|
|
|
bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
|
|
|
|
{
|
|
|
|
if (shndx == obj->efile.data_shndx)
|
|
|
|
return LIBBPF_MAP_DATA;
|
|
|
|
else if (shndx == obj->efile.bss_shndx)
|
|
|
|
return LIBBPF_MAP_BSS;
|
|
|
|
else if (shndx == obj->efile.rodata_shndx)
|
|
|
|
return LIBBPF_MAP_RODATA;
|
|
|
|
else
|
|
|
|
return LIBBPF_MAP_UNSPEC;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:02 +08:00
|
|
|
static int
|
2017-12-15 09:55:10 +08:00
|
|
|
bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
|
|
|
|
Elf_Data *data, struct bpf_object *obj)
|
2015-07-01 10:14:02 +08:00
|
|
|
{
|
2017-12-15 09:55:10 +08:00
|
|
|
Elf_Data *symbols = obj->efile.symbols;
|
|
|
|
struct bpf_map *maps = obj->maps;
|
|
|
|
size_t nr_maps = obj->nr_maps;
|
2015-07-01 10:14:02 +08:00
|
|
|
int i, nrels;
|
|
|
|
|
2019-05-30 01:36:11 +08:00
|
|
|
pr_debug("collecting relocating info for: '%s'\n", prog->section_name);
|
2015-07-01 10:14:02 +08:00
|
|
|
nrels = shdr->sh_size / shdr->sh_entsize;
|
|
|
|
|
|
|
|
prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels);
|
|
|
|
if (!prog->reloc_desc) {
|
|
|
|
pr_warning("failed to alloc memory in relocation\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
prog->nr_reloc = nrels;
|
|
|
|
|
|
|
|
for (i = 0; i < nrels; i++) {
|
|
|
|
struct bpf_insn *insns = prog->insns;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
enum libbpf_map_type type;
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
unsigned int insn_idx;
|
|
|
|
unsigned int shdr_idx;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
const char *name;
|
2015-07-01 10:14:02 +08:00
|
|
|
size_t map_idx;
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
GElf_Sym sym;
|
|
|
|
GElf_Rel rel;
|
2015-07-01 10:14:02 +08:00
|
|
|
|
|
|
|
if (!gelf_getrel(data, i, &rel)) {
|
|
|
|
pr_warning("relocation: failed to get %d reloc\n", i);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
|
2019-05-30 01:36:11 +08:00
|
|
|
if (!gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &sym)) {
|
2015-07-01 10:14:02 +08:00
|
|
|
pr_warning("relocation: symbol %"PRIx64" not found\n",
|
|
|
|
GELF_R_SYM(rel.r_info));
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__FORMAT;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
|
|
|
name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
|
|
|
|
sym.st_name) ? : "<?>";
|
|
|
|
|
|
|
|
pr_debug("relo for %lld value %lld name %d (\'%s\')\n",
|
2017-12-20 04:53:11 +08:00
|
|
|
(long long) (rel.r_info >> 32),
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
(long long) sym.st_value, sym.st_name, name);
|
2015-07-01 10:14:02 +08:00
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
shdr_idx = sym.st_shndx;
|
|
|
|
if (!bpf_object__relo_in_known_section(obj, shdr_idx)) {
|
|
|
|
pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
|
|
|
|
prog->section_name, shdr_idx);
|
perf bpf: Check relocation target section
Libbpf should check the target section before doing relocation to ensure
the relocation is correct. If not, a bug in LLVM causes an error. See
[1]. Also, if an incorrect BPF script uses both global variable and
map, global variable whould be treated as map and be relocated without
error.
This patch saves the id of the map section into obj->efile and compare
target section of a relocation symbol against it during relocation.
Previous patch introduces a test case about this problem. After this
patch:
# ~/perf test BPF
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
# perf test -v BPF
...
37.3: Test BPF relocation checker :
...
libbpf: loading object '[bpf_relocation_test]' from buffer
libbpf: section .strtab, size 126, link 0, flags 0, type=3
libbpf: section .text, size 0, link 0, flags 6, type=1
libbpf: section .data, size 0, link 0, flags 3, type=1
libbpf: section .bss, size 0, link 0, flags 3, type=8
libbpf: section func=sys_write, size 104, link 0, flags 6, type=1
libbpf: found program func=sys_write
libbpf: section .relfunc=sys_write, size 16, link 10, flags 0, type=9
libbpf: section maps, size 16, link 0, flags 3, type=1
libbpf: maps in [bpf_relocation_test]: 16 bytes
libbpf: section license, size 4, link 0, flags 3, type=1
libbpf: license of [bpf_relocation_test] is GPL
libbpf: section version, size 4, link 0, flags 3, type=1
libbpf: kernel version of [bpf_relocation_test] is 40400
libbpf: section .symtab, size 144, link 1, flags 0, type=2
libbpf: map 0 is "my_table"
libbpf: collecting relocating info for: 'func=sys_write'
libbpf: Program 'func=sys_write' contains non-map related relo data pointing to section 65522
bpf: failed to load buffer
Compile BPF program failed.
test child finished with 0
---- end ----
Test BPF filter subtest 2: Ok
[1] https://llvm.org/bugs/show_bug.cgi?id=26243
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1453715801-7732-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-01-25 17:55:49 +08:00
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
|
|
|
|
|
|
|
insn_idx = rel.r_offset / sizeof(struct bpf_insn);
|
|
|
|
pr_debug("relocation: insn_idx=%u\n", insn_idx);
|
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
if (insns[insn_idx].code == (BPF_JMP | BPF_CALL)) {
|
|
|
|
if (insns[insn_idx].src_reg != BPF_PSEUDO_CALL) {
|
|
|
|
pr_warning("incorrect bpf_call opcode\n");
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
|
|
|
prog->reloc_desc[i].type = RELO_CALL;
|
|
|
|
prog->reloc_desc[i].insn_idx = insn_idx;
|
|
|
|
prog->reloc_desc[i].text_off = sym.st_value;
|
2018-06-29 05:41:38 +08:00
|
|
|
obj->has_pseudo_calls = true;
|
2017-12-15 09:55:10 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:02 +08:00
|
|
|
if (insns[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
|
|
|
|
pr_warning("bpf: relocation: invalid relo for insns[%d].code 0x%x\n",
|
|
|
|
insn_idx, insns[insn_idx].code);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
if (bpf_object__shndx_is_maps(obj, shdr_idx) ||
|
|
|
|
bpf_object__shndx_is_data(obj, shdr_idx)) {
|
|
|
|
type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx);
|
2019-04-24 06:45:56 +08:00
|
|
|
if (type != LIBBPF_MAP_UNSPEC) {
|
|
|
|
if (GELF_ST_BIND(sym.st_info) == STB_GLOBAL) {
|
|
|
|
pr_warning("bpf: relocation: not yet supported relo for non-static global \'%s\' variable found in insns[%d].code 0x%x\n",
|
|
|
|
name, insn_idx, insns[insn_idx].code);
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
|
|
|
if (!obj->caps.global_data) {
|
|
|
|
pr_warning("bpf: relocation: kernel does not support global \'%s\' variable access in insns[%d]\n",
|
|
|
|
name, insn_idx);
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
}
|
|
|
|
|
2019-04-10 05:20:12 +08:00
|
|
|
for (map_idx = 0; map_idx < nr_maps; map_idx++) {
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
if (maps[map_idx].libbpf_type != type)
|
|
|
|
continue;
|
|
|
|
if (type != LIBBPF_MAP_UNSPEC ||
|
2019-06-18 03:26:54 +08:00
|
|
|
(maps[map_idx].sec_idx == sym.st_shndx &&
|
|
|
|
maps[map_idx].sec_offset == sym.st_value)) {
|
|
|
|
pr_debug("relocation: found map %zd (%s, sec_idx %d, offset %zu) for insn %u\n",
|
|
|
|
map_idx, maps[map_idx].name,
|
|
|
|
maps[map_idx].sec_idx,
|
|
|
|
maps[map_idx].sec_offset,
|
|
|
|
insn_idx);
|
2019-04-10 05:20:12 +08:00
|
|
|
break;
|
|
|
|
}
|
tools lib bpf: Fix map offsets in relocation
Commit 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") attempted to
fix map resolution by identifying the number of symbols that point to
maps, and using this number to resolve each of the maps.
However, during relocation the original definition of the map size was
still in use. For up to two maps, the calculation was correct if there
was a small difference in size between the map definition in libbpf and
the one that the client library uses. However if the difference was
large, particularly if more than two maps were used in the BPF program,
the relocation would fail.
For example, when using a map definition with size 28, with three maps,
map relocation would count:
(sym_offset / sizeof(struct bpf_map_def) => map_idx)
(0 / 16 => 0), ie map_idx = 0
(28 / 16 => 1), ie map_idx = 1
(56 / 16 => 3), ie map_idx = 3
So, libbpf reports:
libbpf: bpf relocation: map_idx 3 large than 2
Fix map relocation by checking the exact offset of maps when doing
relocation.
Signed-off-by: Joe Stringer <joe@ovn.org>
[Allow different map size in an object]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: netdev@vger.kernel.org
Fixes: 4708bbda5cb2 ("tools lib bpf: Fix maps resolution")
Link: http://lkml.kernel.org/r/20170123011128.26534-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-23 09:11:22 +08:00
|
|
|
}
|
|
|
|
|
2019-04-10 05:20:12 +08:00
|
|
|
if (map_idx >= nr_maps) {
|
2019-05-30 01:36:10 +08:00
|
|
|
pr_warning("bpf relocation: map_idx %d larger than %d\n",
|
2019-04-10 05:20:12 +08:00
|
|
|
(int)map_idx, (int)nr_maps - 1);
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
2015-07-01 10:14:02 +08:00
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
prog->reloc_desc[i].type = type != LIBBPF_MAP_UNSPEC ?
|
|
|
|
RELO_DATA : RELO_LD64;
|
2019-04-10 05:20:12 +08:00
|
|
|
prog->reloc_desc[i].insn_idx = insn_idx;
|
|
|
|
prog->reloc_desc[i].map_idx = map_idx;
|
|
|
|
}
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
static int bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map)
|
2018-04-19 06:56:05 +08:00
|
|
|
{
|
|
|
|
struct bpf_map_def *def = &map->def;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
__u32 key_type_id = 0, value_type_id = 0;
|
2019-02-05 03:00:58 +08:00
|
|
|
int ret;
|
2018-04-19 06:56:05 +08:00
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
/* if it's BTF-defined map, we don't need to search for type IDs */
|
|
|
|
if (map->sec_idx == obj->efile.btf_maps_shndx)
|
|
|
|
return 0;
|
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
if (!bpf_map__is_internal(map)) {
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
ret = btf__get_map_kv_tids(obj->btf, map->name, def->key_size,
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
def->value_size, &key_type_id,
|
|
|
|
&value_type_id);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* LLVM annotates global data differently in BTF, that is,
|
|
|
|
* only as '.data', '.bss' or '.rodata'.
|
|
|
|
*/
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
ret = btf__find_by_name(obj->btf,
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
libbpf_type_to_btf_name[map->libbpf_type]);
|
|
|
|
}
|
|
|
|
if (ret < 0)
|
2019-02-05 03:00:58 +08:00
|
|
|
return ret;
|
2018-04-19 06:56:05 +08:00
|
|
|
|
2019-02-05 03:00:58 +08:00
|
|
|
map->btf_key_type_id = key_type_id;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
map->btf_value_type_id = bpf_map__is_internal(map) ?
|
|
|
|
ret : value_type_id;
|
2018-04-19 06:56:05 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-11 05:43:06 +08:00
|
|
|
int bpf_map__reuse_fd(struct bpf_map *map, int fd)
|
|
|
|
{
|
|
|
|
struct bpf_map_info info = {};
|
|
|
|
__u32 len = sizeof(info);
|
|
|
|
int new_fd, err;
|
|
|
|
char *new_name;
|
|
|
|
|
|
|
|
err = bpf_obj_get_info_by_fd(fd, &info, &len);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
new_name = strdup(info.name);
|
|
|
|
if (!new_name)
|
|
|
|
return -errno;
|
|
|
|
|
|
|
|
new_fd = open("/", O_RDONLY | O_CLOEXEC);
|
|
|
|
if (new_fd < 0)
|
|
|
|
goto err_free_new_name;
|
|
|
|
|
|
|
|
new_fd = dup3(fd, new_fd, O_CLOEXEC);
|
|
|
|
if (new_fd < 0)
|
|
|
|
goto err_close_new_fd;
|
|
|
|
|
|
|
|
err = zclose(map->fd);
|
|
|
|
if (err)
|
|
|
|
goto err_close_new_fd;
|
|
|
|
free(map->name);
|
|
|
|
|
|
|
|
map->fd = new_fd;
|
|
|
|
map->name = new_name;
|
|
|
|
map->def.type = info.type;
|
|
|
|
map->def.key_size = info.key_size;
|
|
|
|
map->def.value_size = info.value_size;
|
|
|
|
map->def.max_entries = info.max_entries;
|
|
|
|
map->def.map_flags = info.map_flags;
|
|
|
|
map->btf_key_type_id = info.btf_key_type_id;
|
|
|
|
map->btf_value_type_id = info.btf_value_type_id;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_close_new_fd:
|
|
|
|
close(new_fd);
|
|
|
|
err_free_new_name:
|
|
|
|
free(new_name);
|
|
|
|
return -errno;
|
|
|
|
}
|
|
|
|
|
2019-02-15 07:01:42 +08:00
|
|
|
int bpf_map__resize(struct bpf_map *map, __u32 max_entries)
|
|
|
|
{
|
|
|
|
if (!map || !max_entries)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* If map already created, its attributes can't be changed. */
|
|
|
|
if (map->fd >= 0)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
map->def.max_entries = max_entries;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-21 09:11:19 +08:00
|
|
|
static int
|
|
|
|
bpf_object__probe_name(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
struct bpf_load_program_attr attr;
|
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
|
|
|
struct bpf_insn insns[] = {
|
|
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
|
|
BPF_EXIT_INSN(),
|
|
|
|
};
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* make sure basic loading works */
|
|
|
|
|
|
|
|
memset(&attr, 0, sizeof(attr));
|
|
|
|
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
|
|
|
|
attr.insns = insns;
|
|
|
|
attr.insns_cnt = ARRAY_SIZE(insns);
|
|
|
|
attr.license = "GPL";
|
|
|
|
|
|
|
|
ret = bpf_load_program_xattr(&attr, NULL, 0);
|
|
|
|
if (ret < 0) {
|
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
|
|
|
pr_warning("Error in %s():%s(%d). Couldn't load basic 'r0 = 0' BPF program.\n",
|
|
|
|
__func__, cp, errno);
|
|
|
|
return -errno;
|
|
|
|
}
|
|
|
|
close(ret);
|
|
|
|
|
|
|
|
/* now try the same program, but with the name */
|
|
|
|
|
|
|
|
attr.name = "test";
|
|
|
|
ret = bpf_load_program_xattr(&attr, NULL, 0);
|
|
|
|
if (ret >= 0) {
|
|
|
|
obj->caps.name = 1;
|
|
|
|
close(ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-04-24 06:45:56 +08:00
|
|
|
static int
|
|
|
|
bpf_object__probe_global_data(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
struct bpf_load_program_attr prg_attr;
|
|
|
|
struct bpf_create_map_attr map_attr;
|
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
|
|
|
struct bpf_insn insns[] = {
|
|
|
|
BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
|
|
|
|
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
|
|
|
|
BPF_MOV64_IMM(BPF_REG_0, 0),
|
|
|
|
BPF_EXIT_INSN(),
|
|
|
|
};
|
|
|
|
int ret, map;
|
|
|
|
|
|
|
|
memset(&map_attr, 0, sizeof(map_attr));
|
|
|
|
map_attr.map_type = BPF_MAP_TYPE_ARRAY;
|
|
|
|
map_attr.key_size = sizeof(int);
|
|
|
|
map_attr.value_size = 32;
|
|
|
|
map_attr.max_entries = 1;
|
|
|
|
|
|
|
|
map = bpf_create_map_xattr(&map_attr);
|
|
|
|
if (map < 0) {
|
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
|
|
|
pr_warning("Error in %s():%s(%d). Couldn't create simple array map.\n",
|
|
|
|
__func__, cp, errno);
|
|
|
|
return -errno;
|
|
|
|
}
|
|
|
|
|
|
|
|
insns[0].imm = map;
|
|
|
|
|
|
|
|
memset(&prg_attr, 0, sizeof(prg_attr));
|
|
|
|
prg_attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
|
|
|
|
prg_attr.insns = insns;
|
|
|
|
prg_attr.insns_cnt = ARRAY_SIZE(insns);
|
|
|
|
prg_attr.license = "GPL";
|
|
|
|
|
|
|
|
ret = bpf_load_program_xattr(&prg_attr, NULL, 0);
|
|
|
|
if (ret >= 0) {
|
|
|
|
obj->caps.global_data = 1;
|
|
|
|
close(ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
close(map);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
static int bpf_object__probe_btf_func(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
const char strs[] = "\0int\0x\0a";
|
|
|
|
/* void x(int a) {} */
|
|
|
|
__u32 types[] = {
|
|
|
|
/* int */
|
|
|
|
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
|
|
|
|
/* FUNC_PROTO */ /* [2] */
|
|
|
|
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
|
|
|
|
BTF_PARAM_ENC(7, 1),
|
|
|
|
/* FUNC x */ /* [3] */
|
|
|
|
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
|
|
|
|
};
|
2019-05-30 02:31:09 +08:00
|
|
|
int btf_fd;
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
|
2019-05-30 02:31:09 +08:00
|
|
|
btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types),
|
|
|
|
strs, sizeof(strs));
|
|
|
|
if (btf_fd >= 0) {
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
obj->caps.btf_func = 1;
|
2019-05-30 02:31:09 +08:00
|
|
|
close(btf_fd);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_object__probe_btf_datasec(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
const char strs[] = "\0x\0.data";
|
|
|
|
/* static int a; */
|
|
|
|
__u32 types[] = {
|
|
|
|
/* int */
|
|
|
|
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
|
|
|
|
/* VAR x */ /* [2] */
|
|
|
|
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
|
|
|
|
BTF_VAR_STATIC,
|
|
|
|
/* DATASEC val */ /* [3] */
|
|
|
|
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
|
|
|
|
BTF_VAR_SECINFO_ENC(2, 0, 4),
|
|
|
|
};
|
2019-05-30 02:31:09 +08:00
|
|
|
int btf_fd;
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
|
2019-05-30 02:31:09 +08:00
|
|
|
btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types),
|
|
|
|
strs, sizeof(strs));
|
|
|
|
if (btf_fd >= 0) {
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
obj->caps.btf_datasec = 1;
|
2019-05-30 02:31:09 +08:00
|
|
|
close(btf_fd);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-21 09:11:19 +08:00
|
|
|
static int
|
|
|
|
bpf_object__probe_caps(struct bpf_object *obj)
|
|
|
|
{
|
2019-04-24 06:45:56 +08:00
|
|
|
int (*probe_fn[])(struct bpf_object *obj) = {
|
|
|
|
bpf_object__probe_name,
|
|
|
|
bpf_object__probe_global_data,
|
libbpf: detect supported kernel BTF features and sanitize BTF
Depending on used versions of libbpf, Clang, and kernel, it's possible to
have valid BPF object files with valid BTF information, that still won't
load successfully due to Clang emitting newer BTF features (e.g.,
BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
are not yet supported by older kernel.
This patch adds detection of BTF features and sanitizes BPF object's BTF
by substituting various supported BTF kinds, which have compatible layout:
- BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
- BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
- BTF_KIND_VAR -> BTF_KIND_INT
- BTF_KIND_DATASEC -> BTF_KIND_STRUCT
Replacement is done in such a way as to preserve as much information as
possible (names, sizes, etc) where possible without violating kernel's
validation rules.
v2->v3:
- remove duplicate #defines from libbpf_util.h
v1->v2:
- add internal libbpf_internal.h w/ common stuff
- switch SK storage BTF to use new libbpf__probe_raw_btf()
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-11 05:13:15 +08:00
|
|
|
bpf_object__probe_btf_func,
|
|
|
|
bpf_object__probe_btf_datasec,
|
2019-04-24 06:45:56 +08:00
|
|
|
};
|
|
|
|
int i, ret;
|
|
|
|
|
|
|
|
for (i = 0; i < ARRAY_SIZE(probe_fn); i++) {
|
|
|
|
ret = probe_fn[i](obj);
|
|
|
|
if (ret < 0)
|
2019-05-15 11:38:49 +08:00
|
|
|
pr_debug("Probe #%d failed with %d.\n", i, ret);
|
2019-04-24 06:45:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2018-11-21 09:11:19 +08:00
|
|
|
}
|
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
static int
|
|
|
|
bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
|
|
|
|
{
|
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
|
|
|
int err, zero = 0;
|
|
|
|
__u8 *data;
|
|
|
|
|
|
|
|
/* Nothing to do here since kernel already zero-initializes .bss map. */
|
|
|
|
if (map->libbpf_type == LIBBPF_MAP_BSS)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
data = map->libbpf_type == LIBBPF_MAP_DATA ?
|
|
|
|
obj->sections.data : obj->sections.rodata;
|
|
|
|
|
|
|
|
err = bpf_map_update_elem(map->fd, &zero, data, 0);
|
|
|
|
/* Freeze .rodata map as read-only from syscall side. */
|
|
|
|
if (!err && map->libbpf_type == LIBBPF_MAP_RODATA) {
|
|
|
|
err = bpf_map_freeze(map->fd);
|
|
|
|
if (err) {
|
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
|
|
|
pr_warning("Error freezing map(%s) as read-only: %s\n",
|
|
|
|
map->name, cp);
|
|
|
|
err = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:04 +08:00
|
|
|
static int
|
|
|
|
bpf_object__create_maps(struct bpf_object *obj)
|
|
|
|
{
|
2018-04-19 06:56:05 +08:00
|
|
|
struct bpf_create_map_attr create_attr = {};
|
2015-07-01 10:14:04 +08:00
|
|
|
unsigned int i;
|
2018-04-19 06:56:05 +08:00
|
|
|
int err;
|
2015-07-01 10:14:04 +08:00
|
|
|
|
2015-11-27 16:47:35 +08:00
|
|
|
for (i = 0; i < obj->nr_maps; i++) {
|
2018-04-19 06:56:05 +08:00
|
|
|
struct bpf_map *map = &obj->maps[i];
|
|
|
|
struct bpf_map_def *def = &map->def;
|
2018-07-30 16:53:23 +08:00
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
2018-04-19 06:56:05 +08:00
|
|
|
int *pfd = &map->fd;
|
|
|
|
|
2018-07-11 05:43:06 +08:00
|
|
|
if (map->fd >= 0) {
|
|
|
|
pr_debug("skip map create (preset) %s: fd=%d\n",
|
|
|
|
map->name, map->fd);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-11-21 09:11:20 +08:00
|
|
|
if (obj->caps.name)
|
|
|
|
create_attr.name = map->name;
|
2018-05-17 05:02:49 +08:00
|
|
|
create_attr.map_ifindex = map->map_ifindex;
|
2018-04-19 06:56:05 +08:00
|
|
|
create_attr.map_type = def->type;
|
|
|
|
create_attr.map_flags = def->map_flags;
|
|
|
|
create_attr.key_size = def->key_size;
|
|
|
|
create_attr.value_size = def->value_size;
|
|
|
|
create_attr.max_entries = def->max_entries;
|
2019-06-13 13:04:57 +08:00
|
|
|
create_attr.btf_fd = 0;
|
2018-05-23 06:04:24 +08:00
|
|
|
create_attr.btf_key_type_id = 0;
|
|
|
|
create_attr.btf_value_type_id = 0;
|
2018-11-21 12:55:56 +08:00
|
|
|
if (bpf_map_type__is_map_in_map(def->type) &&
|
|
|
|
map->inner_map_fd >= 0)
|
|
|
|
create_attr.inner_map_fd = map->inner_map_fd;
|
2018-04-19 06:56:05 +08:00
|
|
|
|
libbpf: allow specifying map definitions using BTF
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-18 03:26:56 +08:00
|
|
|
if (obj->btf && !bpf_map_find_btf_info(obj, map)) {
|
2018-04-19 06:56:05 +08:00
|
|
|
create_attr.btf_fd = btf__fd(obj->btf);
|
2018-05-23 06:04:24 +08:00
|
|
|
create_attr.btf_key_type_id = map->btf_key_type_id;
|
|
|
|
create_attr.btf_value_type_id = map->btf_value_type_id;
|
2018-04-19 06:56:05 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
*pfd = bpf_create_map_xattr(&create_attr);
|
2019-06-13 13:04:57 +08:00
|
|
|
if (*pfd < 0 && (create_attr.btf_key_type_id ||
|
|
|
|
create_attr.btf_value_type_id)) {
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
2018-04-19 06:56:05 +08:00
|
|
|
pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
|
2018-07-30 16:53:23 +08:00
|
|
|
map->name, cp, errno);
|
2019-06-13 13:04:57 +08:00
|
|
|
create_attr.btf_fd = 0;
|
2018-05-23 06:04:24 +08:00
|
|
|
create_attr.btf_key_type_id = 0;
|
|
|
|
create_attr.btf_value_type_id = 0;
|
|
|
|
map->btf_key_type_id = 0;
|
|
|
|
map->btf_value_type_id = 0;
|
2018-04-19 06:56:05 +08:00
|
|
|
*pfd = bpf_create_map_xattr(&create_attr);
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:04 +08:00
|
|
|
if (*pfd < 0) {
|
|
|
|
size_t j;
|
|
|
|
|
2018-04-19 06:56:05 +08:00
|
|
|
err = *pfd;
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
err_out:
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
2017-08-21 03:48:14 +08:00
|
|
|
pr_warning("failed to create map (name: '%s'): %s\n",
|
2018-07-30 16:53:23 +08:00
|
|
|
map->name, cp);
|
2015-07-01 10:14:04 +08:00
|
|
|
for (j = 0; j < i; j++)
|
2015-11-27 16:47:35 +08:00
|
|
|
zclose(obj->maps[j].fd);
|
2015-07-01 10:14:04 +08:00
|
|
|
return err;
|
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
|
|
|
if (bpf_map__is_internal(map)) {
|
|
|
|
err = bpf_object__populate_internal_map(obj, map);
|
|
|
|
if (err < 0) {
|
|
|
|
zclose(*pfd);
|
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-05-30 01:36:10 +08:00
|
|
|
pr_debug("created map %s: fd=%d\n", map->name, *pfd);
|
2015-07-01 10:14:04 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-12-08 08:42:29 +08:00
|
|
|
static int
|
|
|
|
check_btf_ext_reloc_err(struct bpf_program *prog, int err,
|
|
|
|
void *btf_prog_info, const char *info_name)
|
|
|
|
{
|
|
|
|
if (err != -ENOENT) {
|
|
|
|
pr_warning("Error in loading %s for sec %s.\n",
|
|
|
|
info_name, prog->section_name);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* err == -ENOENT (i.e. prog->section_name not found in btf_ext) */
|
|
|
|
|
|
|
|
if (btf_prog_info) {
|
|
|
|
/*
|
|
|
|
* Some info has already been found but has problem
|
2019-05-30 01:36:11 +08:00
|
|
|
* in the last btf_ext reloc. Must have to error out.
|
2018-12-08 08:42:29 +08:00
|
|
|
*/
|
|
|
|
pr_warning("Error in relocating %s for sec %s.\n",
|
|
|
|
info_name, prog->section_name);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-05-30 01:36:11 +08:00
|
|
|
/* Have problem loading the very first info. Ignore the rest. */
|
2018-12-08 08:42:29 +08:00
|
|
|
pr_warning("Cannot find %s for main program sec %s. Ignore all %s.\n",
|
|
|
|
info_name, prog->section_name, info_name);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj,
|
|
|
|
const char *section_name, __u32 insn_offset)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!insn_offset || prog->func_info) {
|
|
|
|
/*
|
|
|
|
* !insn_offset => main program
|
|
|
|
*
|
|
|
|
* For sub prog, the main program's func_info has to
|
|
|
|
* be loaded first (i.e. prog->func_info != NULL)
|
|
|
|
*/
|
|
|
|
err = btf_ext__reloc_func_info(obj->btf, obj->btf_ext,
|
|
|
|
section_name, insn_offset,
|
|
|
|
&prog->func_info,
|
|
|
|
&prog->func_info_cnt);
|
|
|
|
if (err)
|
|
|
|
return check_btf_ext_reloc_err(prog, err,
|
|
|
|
prog->func_info,
|
|
|
|
"bpf_func_info");
|
|
|
|
|
|
|
|
prog->func_info_rec_size = btf_ext__func_info_rec_size(obj->btf_ext);
|
|
|
|
}
|
|
|
|
|
2018-12-08 08:42:31 +08:00
|
|
|
if (!insn_offset || prog->line_info) {
|
|
|
|
err = btf_ext__reloc_line_info(obj->btf, obj->btf_ext,
|
|
|
|
section_name, insn_offset,
|
|
|
|
&prog->line_info,
|
|
|
|
&prog->line_info_cnt);
|
|
|
|
if (err)
|
|
|
|
return check_btf_ext_reloc_err(prog, err,
|
|
|
|
prog->line_info,
|
|
|
|
"bpf_line_info");
|
|
|
|
|
|
|
|
prog->line_info_rec_size = btf_ext__line_info_rec_size(obj->btf_ext);
|
|
|
|
}
|
|
|
|
|
2018-12-08 08:42:29 +08:00
|
|
|
if (!insn_offset)
|
|
|
|
prog->btf_fd = btf__fd(obj->btf);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
static int
|
|
|
|
bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
|
|
|
|
struct reloc_desc *relo)
|
|
|
|
{
|
|
|
|
struct bpf_insn *insn, *new_insn;
|
|
|
|
struct bpf_program *text;
|
|
|
|
size_t new_cnt;
|
2018-11-20 07:29:16 +08:00
|
|
|
int err;
|
2017-12-15 09:55:10 +08:00
|
|
|
|
|
|
|
if (relo->type != RELO_CALL)
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
|
|
|
|
if (prog->idx == obj->efile.text_shndx) {
|
|
|
|
pr_warning("relo in .text insn %d into off %d\n",
|
|
|
|
relo->insn_idx, relo->text_off);
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prog->main_prog_cnt == 0) {
|
|
|
|
text = bpf_object__find_prog_by_idx(obj, obj->efile.text_shndx);
|
|
|
|
if (!text) {
|
|
|
|
pr_warning("no .text section found yet relo into text exist\n");
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
|
|
|
new_cnt = prog->insns_cnt + text->insns_cnt;
|
2018-07-11 05:43:05 +08:00
|
|
|
new_insn = reallocarray(prog->insns, new_cnt, sizeof(*insn));
|
2017-12-15 09:55:10 +08:00
|
|
|
if (!new_insn) {
|
|
|
|
pr_warning("oom in prog realloc\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2018-11-20 07:29:16 +08:00
|
|
|
|
2018-12-08 08:42:29 +08:00
|
|
|
if (obj->btf_ext) {
|
|
|
|
err = bpf_program_reloc_btf_ext(prog, obj,
|
|
|
|
text->section_name,
|
|
|
|
prog->insns_cnt);
|
|
|
|
if (err)
|
2018-11-20 07:29:16 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
memcpy(new_insn + prog->insns_cnt, text->insns,
|
|
|
|
text->insns_cnt * sizeof(*insn));
|
|
|
|
prog->insns = new_insn;
|
|
|
|
prog->main_prog_cnt = prog->insns_cnt;
|
|
|
|
prog->insns_cnt = new_cnt;
|
2018-02-20 09:00:07 +08:00
|
|
|
pr_debug("added %zd insn from %s to prog %s\n",
|
|
|
|
text->insns_cnt, text->section_name,
|
|
|
|
prog->section_name);
|
2017-12-15 09:55:10 +08:00
|
|
|
}
|
|
|
|
insn = &prog->insns[relo->insn_idx];
|
|
|
|
insn->imm += prog->main_prog_cnt - relo->insn_idx;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:05 +08:00
|
|
|
static int
|
2015-11-27 16:47:35 +08:00
|
|
|
bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
|
2015-07-01 10:14:05 +08:00
|
|
|
{
|
2017-12-15 09:55:10 +08:00
|
|
|
int i, err;
|
2015-07-01 10:14:05 +08:00
|
|
|
|
2018-11-20 07:29:16 +08:00
|
|
|
if (!prog)
|
|
|
|
return 0;
|
|
|
|
|
2018-12-08 08:42:29 +08:00
|
|
|
if (obj->btf_ext) {
|
|
|
|
err = bpf_program_reloc_btf_ext(prog, obj,
|
|
|
|
prog->section_name, 0);
|
|
|
|
if (err)
|
2018-11-20 07:29:16 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!prog->reloc_desc)
|
2015-07-01 10:14:05 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (i = 0; i < prog->nr_reloc; i++) {
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
if (prog->reloc_desc[i].type == RELO_LD64 ||
|
|
|
|
prog->reloc_desc[i].type == RELO_DATA) {
|
|
|
|
bool relo_data = prog->reloc_desc[i].type == RELO_DATA;
|
2017-12-15 09:55:10 +08:00
|
|
|
struct bpf_insn *insns = prog->insns;
|
|
|
|
int insn_idx, map_idx;
|
2015-07-01 10:14:05 +08:00
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
insn_idx = prog->reloc_desc[i].insn_idx;
|
|
|
|
map_idx = prog->reloc_desc[i].map_idx;
|
2015-07-01 10:14:05 +08:00
|
|
|
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
if (insn_idx + 1 >= (int)prog->insns_cnt) {
|
2017-12-15 09:55:10 +08:00
|
|
|
pr_warning("relocation out of range: '%s'\n",
|
|
|
|
prog->section_name);
|
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
|
|
|
if (!relo_data) {
|
|
|
|
insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
|
|
|
|
} else {
|
|
|
|
insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
|
|
|
|
insns[insn_idx + 1].imm = insns[insn_idx].imm;
|
|
|
|
}
|
2017-12-15 09:55:10 +08:00
|
|
|
insns[insn_idx].imm = obj->maps[map_idx].fd;
|
2019-04-10 05:20:12 +08:00
|
|
|
} else if (prog->reloc_desc[i].type == RELO_CALL) {
|
2017-12-15 09:55:10 +08:00
|
|
|
err = bpf_program__reloc_text(prog, obj,
|
|
|
|
&prog->reloc_desc[i]);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2015-07-01 10:14:05 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
zfree(&prog->reloc_desc);
|
|
|
|
prog->nr_reloc = 0;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
static int
|
|
|
|
bpf_object__relocate(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
struct bpf_program *prog;
|
|
|
|
size_t i;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
for (i = 0; i < obj->nr_programs; i++) {
|
|
|
|
prog = &obj->programs[i];
|
|
|
|
|
2015-11-27 16:47:35 +08:00
|
|
|
err = bpf_program__relocate(prog, obj);
|
2015-07-01 10:14:05 +08:00
|
|
|
if (err) {
|
|
|
|
pr_warning("failed to relocate '%s'\n",
|
|
|
|
prog->section_name);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:02 +08:00
|
|
|
static int bpf_object__collect_reloc(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
int i, err;
|
|
|
|
|
|
|
|
if (!obj_elf_valid(obj)) {
|
|
|
|
pr_warning("Internal error: elf object is closed\n");
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__INTERNAL;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < obj->efile.nr_reloc; i++) {
|
|
|
|
GElf_Shdr *shdr = &obj->efile.reloc[i].shdr;
|
|
|
|
Elf_Data *data = obj->efile.reloc[i].data;
|
|
|
|
int idx = shdr->sh_info;
|
|
|
|
struct bpf_program *prog;
|
|
|
|
|
|
|
|
if (shdr->sh_type != SHT_REL) {
|
|
|
|
pr_warning("internal error at %d\n", __LINE__);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__INTERNAL;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
prog = bpf_object__find_prog_by_idx(obj, idx);
|
|
|
|
if (!prog) {
|
2018-02-08 19:48:17 +08:00
|
|
|
pr_warning("relocation failed: no section(%d)\n", idx);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__RELOC;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
|
2019-05-30 01:36:11 +08:00
|
|
|
err = bpf_program__collect_reloc(prog, shdr, data, obj);
|
2015-07-01 10:14:02 +08:00
|
|
|
if (err)
|
2015-11-06 21:49:37 +08:00
|
|
|
return err;
|
2015-07-01 10:14:02 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:07 +08:00
|
|
|
static int
|
2018-11-20 07:29:16 +08:00
|
|
|
load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
|
2018-12-08 08:42:29 +08:00
|
|
|
char *license, __u32 kern_version, int *pfd)
|
2015-07-01 10:14:07 +08:00
|
|
|
{
|
2018-03-31 06:08:01 +08:00
|
|
|
struct bpf_load_program_attr load_attr;
|
2018-07-30 16:53:23 +08:00
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
2019-04-02 12:27:47 +08:00
|
|
|
int log_buf_size = BPF_LOG_BUF_SIZE;
|
2015-07-01 10:14:07 +08:00
|
|
|
char *log_buf;
|
2018-03-31 06:08:01 +08:00
|
|
|
int ret;
|
2015-07-01 10:14:07 +08:00
|
|
|
|
2019-05-30 01:36:08 +08:00
|
|
|
if (!insns || !insns_cnt)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-03-31 06:08:01 +08:00
|
|
|
memset(&load_attr, 0, sizeof(struct bpf_load_program_attr));
|
2018-11-20 07:29:16 +08:00
|
|
|
load_attr.prog_type = prog->type;
|
|
|
|
load_attr.expected_attach_type = prog->expected_attach_type;
|
2018-11-21 09:11:21 +08:00
|
|
|
if (prog->caps->name)
|
|
|
|
load_attr.name = prog->name;
|
2018-03-31 06:08:01 +08:00
|
|
|
load_attr.insns = insns;
|
|
|
|
load_attr.insns_cnt = insns_cnt;
|
|
|
|
load_attr.license = license;
|
|
|
|
load_attr.kern_version = kern_version;
|
2018-11-20 07:29:16 +08:00
|
|
|
load_attr.prog_ifindex = prog->prog_ifindex;
|
2019-06-13 13:04:57 +08:00
|
|
|
load_attr.prog_btf_fd = prog->btf_fd >= 0 ? prog->btf_fd : 0;
|
2018-11-20 07:29:16 +08:00
|
|
|
load_attr.func_info = prog->func_info;
|
|
|
|
load_attr.func_info_rec_size = prog->func_info_rec_size;
|
2018-12-08 08:42:29 +08:00
|
|
|
load_attr.func_info_cnt = prog->func_info_cnt;
|
2018-12-08 08:42:31 +08:00
|
|
|
load_attr.line_info = prog->line_info;
|
|
|
|
load_attr.line_info_rec_size = prog->line_info_rec_size;
|
|
|
|
load_attr.line_info_cnt = prog->line_info_cnt;
|
2019-04-02 12:27:47 +08:00
|
|
|
load_attr.log_level = prog->log_level;
|
2019-05-25 06:25:19 +08:00
|
|
|
load_attr.prog_flags = prog->prog_flags;
|
2015-07-01 10:14:07 +08:00
|
|
|
|
2019-04-02 12:27:47 +08:00
|
|
|
retry_load:
|
|
|
|
log_buf = malloc(log_buf_size);
|
2015-07-01 10:14:07 +08:00
|
|
|
if (!log_buf)
|
|
|
|
pr_warning("Alloc log buffer for bpf loader error, continue without log\n");
|
|
|
|
|
2019-04-02 12:27:47 +08:00
|
|
|
ret = bpf_load_program_xattr(&load_attr, log_buf, log_buf_size);
|
2015-07-01 10:14:07 +08:00
|
|
|
|
|
|
|
if (ret >= 0) {
|
2019-04-02 12:27:47 +08:00
|
|
|
if (load_attr.log_level)
|
|
|
|
pr_debug("verifier log:\n%s", log_buf);
|
2015-07-01 10:14:07 +08:00
|
|
|
*pfd = ret;
|
|
|
|
ret = 0;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2019-04-02 12:27:47 +08:00
|
|
|
if (errno == ENOSPC) {
|
|
|
|
log_buf_size <<= 1;
|
|
|
|
free(log_buf);
|
|
|
|
goto retry_load;
|
|
|
|
}
|
2015-11-06 21:49:37 +08:00
|
|
|
ret = -LIBBPF_ERRNO__LOAD;
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
2018-07-30 16:53:23 +08:00
|
|
|
pr_warning("load bpf program failed: %s\n", cp);
|
2015-07-01 10:14:07 +08:00
|
|
|
|
2015-11-06 21:49:37 +08:00
|
|
|
if (log_buf && log_buf[0] != '\0') {
|
|
|
|
ret = -LIBBPF_ERRNO__VERIFY;
|
2015-07-01 10:14:07 +08:00
|
|
|
pr_warning("-- BEGIN DUMP LOG ---\n");
|
|
|
|
pr_warning("\n%s\n", log_buf);
|
|
|
|
pr_warning("-- END LOG --\n");
|
2018-03-31 06:08:01 +08:00
|
|
|
} else if (load_attr.insns_cnt >= BPF_MAXINSNS) {
|
|
|
|
pr_warning("Program too large (%zu insns), at most %d insns\n",
|
|
|
|
load_attr.insns_cnt, BPF_MAXINSNS);
|
2016-07-13 18:44:02 +08:00
|
|
|
ret = -LIBBPF_ERRNO__PROG2BIG;
|
2015-11-06 21:49:37 +08:00
|
|
|
} else {
|
2016-07-13 18:44:02 +08:00
|
|
|
/* Wrong program type? */
|
2018-03-31 06:08:01 +08:00
|
|
|
if (load_attr.prog_type != BPF_PROG_TYPE_KPROBE) {
|
2016-07-13 18:44:02 +08:00
|
|
|
int fd;
|
|
|
|
|
2018-03-31 06:08:01 +08:00
|
|
|
load_attr.prog_type = BPF_PROG_TYPE_KPROBE;
|
|
|
|
load_attr.expected_attach_type = 0;
|
|
|
|
fd = bpf_load_program_xattr(&load_attr, NULL, 0);
|
2016-07-13 18:44:02 +08:00
|
|
|
if (fd >= 0) {
|
|
|
|
close(fd);
|
|
|
|
ret = -LIBBPF_ERRNO__PROGTYPE;
|
|
|
|
goto out;
|
|
|
|
}
|
2015-11-06 21:49:37 +08:00
|
|
|
}
|
2016-07-13 18:44:02 +08:00
|
|
|
|
|
|
|
if (log_buf)
|
|
|
|
ret = -LIBBPF_ERRNO__KVER;
|
2015-07-01 10:14:07 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
free(log_buf);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-10-03 04:35:39 +08:00
|
|
|
int
|
2015-07-01 10:14:07 +08:00
|
|
|
bpf_program__load(struct bpf_program *prog,
|
2018-10-04 06:26:43 +08:00
|
|
|
char *license, __u32 kern_version)
|
2015-07-01 10:14:07 +08:00
|
|
|
{
|
2015-11-16 20:10:09 +08:00
|
|
|
int err = 0, fd, i;
|
2015-07-01 10:14:07 +08:00
|
|
|
|
2015-11-16 20:10:09 +08:00
|
|
|
if (prog->instances.nr < 0 || !prog->instances.fds) {
|
|
|
|
if (prog->preprocessor) {
|
|
|
|
pr_warning("Internal error: can't load program '%s'\n",
|
|
|
|
prog->section_name);
|
|
|
|
return -LIBBPF_ERRNO__INTERNAL;
|
|
|
|
}
|
2015-07-01 10:14:07 +08:00
|
|
|
|
2015-11-16 20:10:09 +08:00
|
|
|
prog->instances.fds = malloc(sizeof(int));
|
|
|
|
if (!prog->instances.fds) {
|
|
|
|
pr_warning("Not enough memory for BPF fds\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
prog->instances.nr = 1;
|
|
|
|
prog->instances.fds[0] = -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!prog->preprocessor) {
|
|
|
|
if (prog->instances.nr != 1) {
|
|
|
|
pr_warning("Program '%s' is inconsistent: nr(%d) != 1\n",
|
|
|
|
prog->section_name, prog->instances.nr);
|
|
|
|
}
|
2018-11-20 07:29:16 +08:00
|
|
|
err = load_program(prog, prog->insns, prog->insns_cnt,
|
2018-12-08 08:42:29 +08:00
|
|
|
license, kern_version, &fd);
|
2015-11-16 20:10:09 +08:00
|
|
|
if (!err)
|
|
|
|
prog->instances.fds[0] = fd;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < prog->instances.nr; i++) {
|
|
|
|
struct bpf_prog_prep_result result;
|
|
|
|
bpf_program_prep_t preprocessor = prog->preprocessor;
|
|
|
|
|
2019-02-14 02:25:53 +08:00
|
|
|
memset(&result, 0, sizeof(result));
|
2015-11-16 20:10:09 +08:00
|
|
|
err = preprocessor(prog, i, prog->insns,
|
|
|
|
prog->insns_cnt, &result);
|
|
|
|
if (err) {
|
|
|
|
pr_warning("Preprocessing the %dth instance of program '%s' failed\n",
|
|
|
|
i, prog->section_name);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!result.new_insn_ptr || !result.new_insn_cnt) {
|
|
|
|
pr_debug("Skip loading the %dth instance of program '%s'\n",
|
|
|
|
i, prog->section_name);
|
|
|
|
prog->instances.fds[i] = -1;
|
|
|
|
if (result.pfd)
|
|
|
|
*result.pfd = -1;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-11-20 07:29:16 +08:00
|
|
|
err = load_program(prog, result.new_insn_ptr,
|
2015-11-16 20:10:09 +08:00
|
|
|
result.new_insn_cnt,
|
2018-12-08 08:42:29 +08:00
|
|
|
license, kern_version, &fd);
|
2015-11-16 20:10:09 +08:00
|
|
|
|
|
|
|
if (err) {
|
|
|
|
pr_warning("Loading the %dth instance of program '%s' failed\n",
|
|
|
|
i, prog->section_name);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (result.pfd)
|
|
|
|
*result.pfd = fd;
|
|
|
|
prog->instances.fds[i] = fd;
|
|
|
|
}
|
|
|
|
out:
|
2015-07-01 10:14:07 +08:00
|
|
|
if (err)
|
|
|
|
pr_warning("failed to load program '%s'\n",
|
|
|
|
prog->section_name);
|
|
|
|
zfree(&prog->insns);
|
|
|
|
prog->insns_cnt = 0;
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
static bool bpf_program__is_function_storage(const struct bpf_program *prog,
|
|
|
|
const struct bpf_object *obj)
|
2018-06-29 05:41:38 +08:00
|
|
|
{
|
|
|
|
return prog->idx == obj->efile.text_shndx && obj->has_pseudo_calls;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:07 +08:00
|
|
|
static int
|
2019-05-24 18:36:47 +08:00
|
|
|
bpf_object__load_progs(struct bpf_object *obj, int log_level)
|
2015-07-01 10:14:07 +08:00
|
|
|
{
|
|
|
|
size_t i;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
for (i = 0; i < obj->nr_programs; i++) {
|
2018-06-29 05:41:38 +08:00
|
|
|
if (bpf_program__is_function_storage(&obj->programs[i], obj))
|
2017-12-15 09:55:10 +08:00
|
|
|
continue;
|
2019-05-29 22:26:41 +08:00
|
|
|
obj->programs[i].log_level |= log_level;
|
2015-07-01 10:14:07 +08:00
|
|
|
err = bpf_program__load(&obj->programs[i],
|
|
|
|
obj->license,
|
|
|
|
obj->kern_version);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-11 01:24:42 +08:00
|
|
|
static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
|
|
|
|
{
|
|
|
|
switch (type) {
|
|
|
|
case BPF_PROG_TYPE_SOCKET_FILTER:
|
|
|
|
case BPF_PROG_TYPE_SCHED_CLS:
|
|
|
|
case BPF_PROG_TYPE_SCHED_ACT:
|
|
|
|
case BPF_PROG_TYPE_XDP:
|
|
|
|
case BPF_PROG_TYPE_CGROUP_SKB:
|
|
|
|
case BPF_PROG_TYPE_CGROUP_SOCK:
|
|
|
|
case BPF_PROG_TYPE_LWT_IN:
|
|
|
|
case BPF_PROG_TYPE_LWT_OUT:
|
|
|
|
case BPF_PROG_TYPE_LWT_XMIT:
|
ipv6: sr: Add seg6local action End.BPF
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-20 21:58:16 +08:00
|
|
|
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
|
2018-05-11 01:24:42 +08:00
|
|
|
case BPF_PROG_TYPE_SOCK_OPS:
|
|
|
|
case BPF_PROG_TYPE_SK_SKB:
|
|
|
|
case BPF_PROG_TYPE_CGROUP_DEVICE:
|
|
|
|
case BPF_PROG_TYPE_SK_MSG:
|
|
|
|
case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
|
2018-05-27 19:24:10 +08:00
|
|
|
case BPF_PROG_TYPE_LIRC_MODE2:
|
2018-08-08 16:01:30 +08:00
|
|
|
case BPF_PROG_TYPE_SK_REUSEPORT:
|
2018-09-14 22:46:20 +08:00
|
|
|
case BPF_PROG_TYPE_FLOW_DISSECTOR:
|
2018-05-11 01:24:42 +08:00
|
|
|
case BPF_PROG_TYPE_UNSPEC:
|
|
|
|
case BPF_PROG_TYPE_TRACEPOINT:
|
|
|
|
case BPF_PROG_TYPE_RAW_TRACEPOINT:
|
2019-04-27 02:49:50 +08:00
|
|
|
case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
|
2018-11-24 04:58:12 +08:00
|
|
|
case BPF_PROG_TYPE_PERF_EVENT:
|
2019-03-09 01:15:26 +08:00
|
|
|
case BPF_PROG_TYPE_CGROUP_SYSCTL:
|
2019-06-28 04:38:49 +08:00
|
|
|
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
|
2018-11-24 04:58:12 +08:00
|
|
|
return false;
|
|
|
|
case BPF_PROG_TYPE_KPROBE:
|
2018-05-11 01:24:42 +08:00
|
|
|
default:
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
|
2015-07-01 10:13:57 +08:00
|
|
|
{
|
2018-05-11 01:24:42 +08:00
|
|
|
if (needs_kver && obj->kern_version == 0) {
|
2015-07-01 10:13:57 +08:00
|
|
|
pr_warning("%s doesn't provide kernel version\n",
|
|
|
|
obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
return -LIBBPF_ERRNO__KVERSION;
|
2015-07-01 10:13:57 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
static struct bpf_object *
|
2018-05-11 01:24:42 +08:00
|
|
|
__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
|
2018-10-16 02:19:55 +08:00
|
|
|
bool needs_kver, int flags)
|
2015-07-01 10:13:53 +08:00
|
|
|
{
|
|
|
|
struct bpf_object *obj;
|
2015-11-06 21:49:37 +08:00
|
|
|
int err;
|
2015-07-01 10:13:53 +08:00
|
|
|
|
|
|
|
if (elf_version(EV_CURRENT) == EV_NONE) {
|
|
|
|
pr_warning("failed to init libelf for %s\n", path);
|
2015-11-06 21:49:37 +08:00
|
|
|
return ERR_PTR(-LIBBPF_ERRNO__LIBELF);
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:54 +08:00
|
|
|
obj = bpf_object__new(path, obj_buf, obj_buf_sz);
|
2015-11-06 21:49:37 +08:00
|
|
|
if (IS_ERR(obj))
|
|
|
|
return obj;
|
2015-07-01 10:13:53 +08:00
|
|
|
|
2015-11-06 21:49:37 +08:00
|
|
|
CHECK_ERR(bpf_object__elf_init(obj), err, out);
|
|
|
|
CHECK_ERR(bpf_object__check_endianness(obj), err, out);
|
2019-04-24 06:45:56 +08:00
|
|
|
CHECK_ERR(bpf_object__probe_caps(obj), err, out);
|
2018-10-16 02:19:55 +08:00
|
|
|
CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
|
2015-11-06 21:49:37 +08:00
|
|
|
CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
|
2018-05-11 01:24:42 +08:00
|
|
|
CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
|
2015-07-01 10:13:53 +08:00
|
|
|
|
|
|
|
bpf_object__elf_finish(obj);
|
|
|
|
return obj;
|
|
|
|
out:
|
|
|
|
bpf_object__close(obj);
|
2015-11-06 21:49:37 +08:00
|
|
|
return ERR_PTR(err);
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
2018-10-16 02:19:55 +08:00
|
|
|
struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
|
|
|
|
int flags)
|
2015-07-01 10:13:53 +08:00
|
|
|
{
|
|
|
|
/* param validation */
|
2018-07-11 05:43:02 +08:00
|
|
|
if (!attr->file)
|
2015-07-01 10:13:53 +08:00
|
|
|
return NULL;
|
|
|
|
|
2018-07-11 05:43:02 +08:00
|
|
|
pr_debug("loading %s\n", attr->file);
|
|
|
|
|
|
|
|
return __bpf_object__open(attr->file, NULL, 0,
|
2018-10-16 02:19:55 +08:00
|
|
|
bpf_prog_type__needs_kver(attr->prog_type),
|
|
|
|
flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
|
|
|
|
{
|
|
|
|
return __bpf_object__open_xattr(attr, 0);
|
2018-07-11 05:43:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_object *bpf_object__open(const char *path)
|
|
|
|
{
|
|
|
|
struct bpf_object_open_attr attr = {
|
|
|
|
.file = path,
|
|
|
|
.prog_type = BPF_PROG_TYPE_UNSPEC,
|
|
|
|
};
|
2015-07-01 10:13:53 +08:00
|
|
|
|
2018-07-11 05:43:02 +08:00
|
|
|
return bpf_object__open_xattr(&attr);
|
2015-07-01 10:13:54 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_object *bpf_object__open_buffer(void *obj_buf,
|
2015-08-27 10:30:55 +08:00
|
|
|
size_t obj_buf_sz,
|
|
|
|
const char *name)
|
2015-07-01 10:13:54 +08:00
|
|
|
{
|
2015-08-27 10:30:55 +08:00
|
|
|
char tmp_name[64];
|
|
|
|
|
2015-07-01 10:13:54 +08:00
|
|
|
/* param validation */
|
|
|
|
if (!obj_buf || obj_buf_sz <= 0)
|
|
|
|
return NULL;
|
|
|
|
|
2015-08-27 10:30:55 +08:00
|
|
|
if (!name) {
|
|
|
|
snprintf(tmp_name, sizeof(tmp_name), "%lx-%lx",
|
|
|
|
(unsigned long)obj_buf,
|
|
|
|
(unsigned long)obj_buf_sz);
|
|
|
|
name = tmp_name;
|
|
|
|
}
|
2019-05-30 01:36:11 +08:00
|
|
|
pr_debug("loading object '%s' from buffer\n", name);
|
2015-07-01 10:13:54 +08:00
|
|
|
|
2018-10-16 02:19:55 +08:00
|
|
|
return __bpf_object__open(name, obj_buf, obj_buf_sz, true, true);
|
2015-07-01 10:13:53 +08:00
|
|
|
}
|
|
|
|
|
2015-07-01 10:14:04 +08:00
|
|
|
int bpf_object__unload(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2015-11-27 16:47:35 +08:00
|
|
|
for (i = 0; i < obj->nr_maps; i++)
|
|
|
|
zclose(obj->maps[i].fd);
|
2015-07-01 10:14:04 +08:00
|
|
|
|
2015-07-01 10:14:07 +08:00
|
|
|
for (i = 0; i < obj->nr_programs; i++)
|
|
|
|
bpf_program__unload(&obj->programs[i]);
|
|
|
|
|
2015-07-01 10:14:04 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-05-24 18:36:47 +08:00
|
|
|
int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
|
2015-07-01 10:14:04 +08:00
|
|
|
{
|
2019-05-24 18:36:47 +08:00
|
|
|
struct bpf_object *obj;
|
2015-11-06 21:49:37 +08:00
|
|
|
int err;
|
|
|
|
|
2019-05-24 18:36:47 +08:00
|
|
|
if (!attr)
|
|
|
|
return -EINVAL;
|
|
|
|
obj = attr->obj;
|
2015-07-01 10:14:04 +08:00
|
|
|
if (!obj)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (obj->loaded) {
|
|
|
|
pr_warning("object should not be loaded twice\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
obj->loaded = true;
|
2015-11-06 21:49:37 +08:00
|
|
|
|
|
|
|
CHECK_ERR(bpf_object__create_maps(obj), err, out);
|
|
|
|
CHECK_ERR(bpf_object__relocate(obj), err, out);
|
2019-05-24 18:36:47 +08:00
|
|
|
CHECK_ERR(bpf_object__load_progs(obj, attr->log_level), err, out);
|
2015-07-01 10:14:04 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
out:
|
|
|
|
bpf_object__unload(obj);
|
|
|
|
pr_warning("failed to load object '%s'\n", obj->path);
|
2015-11-06 21:49:37 +08:00
|
|
|
return err;
|
2015-07-01 10:14:04 +08:00
|
|
|
}
|
|
|
|
|
2019-05-24 18:36:47 +08:00
|
|
|
int bpf_object__load(struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
struct bpf_object_load_attr attr = {
|
|
|
|
.obj = obj,
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf_object__load_xattr(&attr);
|
|
|
|
}
|
|
|
|
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
static int check_path(const char *path)
|
|
|
|
{
|
2018-07-30 16:53:23 +08:00
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
struct statfs st_fs;
|
|
|
|
char *dname, *dir;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (path == NULL)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
dname = strdup(path);
|
|
|
|
if (dname == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
dir = dirname(dname);
|
|
|
|
if (statfs(dir, &st_fs)) {
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
2018-07-30 16:53:23 +08:00
|
|
|
pr_warning("failed to statfs %s: %s\n", dir, cp);
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
err = -errno;
|
|
|
|
}
|
|
|
|
free(dname);
|
|
|
|
|
|
|
|
if (!err && st_fs.f_type != BPF_FS_MAGIC) {
|
|
|
|
pr_warning("specified path %s is not on BPF FS\n", path);
|
|
|
|
err = -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
|
|
|
|
int instance)
|
|
|
|
{
|
2018-07-30 16:53:23 +08:00
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
int err;
|
|
|
|
|
|
|
|
err = check_path(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (prog == NULL) {
|
|
|
|
pr_warning("invalid program pointer\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (instance < 0 || instance >= prog->instances.nr) {
|
|
|
|
pr_warning("invalid prog instance %d of prog %s (max %d)\n",
|
|
|
|
instance, prog->section_name, prog->instances.nr);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bpf_obj_pin(prog->instances.fds[instance], path)) {
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
2018-07-30 16:53:23 +08:00
|
|
|
pr_warning("failed to pin program: %s\n", cp);
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
return -errno;
|
|
|
|
}
|
|
|
|
pr_debug("pinned program '%s'\n", path);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
int bpf_program__unpin_instance(struct bpf_program *prog, const char *path,
|
|
|
|
int instance)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = check_path(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (prog == NULL) {
|
|
|
|
pr_warning("invalid program pointer\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (instance < 0 || instance >= prog->instances.nr) {
|
|
|
|
pr_warning("invalid prog instance %d of prog %s (max %d)\n",
|
|
|
|
instance, prog->section_name, prog->instances.nr);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = unlink(path);
|
|
|
|
if (err != 0)
|
|
|
|
return -errno;
|
|
|
|
pr_debug("unpinned program '%s'\n", path);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
static int make_dir(const char *path)
|
|
|
|
{
|
2018-07-30 16:53:23 +08:00
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (mkdir(path, 0700) && errno != EEXIST)
|
|
|
|
err = -errno;
|
|
|
|
|
2018-07-30 16:53:23 +08:00
|
|
|
if (err) {
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(-err, errmsg, sizeof(errmsg));
|
2018-07-30 16:53:23 +08:00
|
|
|
pr_warning("failed to mkdir %s: %s\n", path, cp);
|
|
|
|
}
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_program__pin(struct bpf_program *prog, const char *path)
|
|
|
|
{
|
|
|
|
int i, err;
|
|
|
|
|
|
|
|
err = check_path(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (prog == NULL) {
|
|
|
|
pr_warning("invalid program pointer\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prog->instances.nr <= 0) {
|
|
|
|
pr_warning("no instances of prog %s to pin\n",
|
|
|
|
prog->section_name);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:42 +08:00
|
|
|
if (prog->instances.nr == 1) {
|
|
|
|
/* don't create subdirs when pinning single instance */
|
|
|
|
return bpf_program__pin_instance(prog, path, 0);
|
|
|
|
}
|
|
|
|
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
err = make_dir(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
for (i = 0; i < prog->instances.nr; i++) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
|
|
|
|
if (len < 0) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto err_unpin;
|
|
|
|
} else if (len >= PATH_MAX) {
|
|
|
|
err = -ENAMETOOLONG;
|
|
|
|
goto err_unpin;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = bpf_program__pin_instance(prog, buf, i);
|
|
|
|
if (err)
|
|
|
|
goto err_unpin;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_unpin:
|
|
|
|
for (i = i - 1; i >= 0; i--) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
|
|
|
|
if (len < 0)
|
|
|
|
continue;
|
|
|
|
else if (len >= PATH_MAX)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
bpf_program__unpin_instance(prog, buf, i);
|
|
|
|
}
|
|
|
|
|
|
|
|
rmdir(path);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_program__unpin(struct bpf_program *prog, const char *path)
|
|
|
|
{
|
|
|
|
int i, err;
|
|
|
|
|
|
|
|
err = check_path(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (prog == NULL) {
|
|
|
|
pr_warning("invalid program pointer\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prog->instances.nr <= 0) {
|
|
|
|
pr_warning("no instances of prog %s to pin\n",
|
|
|
|
prog->section_name);
|
|
|
|
return -EINVAL;
|
2018-11-10 00:21:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (prog->instances.nr == 1) {
|
|
|
|
/* don't create subdirs when pinning single instance */
|
|
|
|
return bpf_program__unpin_instance(prog, path, 0);
|
2018-11-10 00:21:41 +08:00
|
|
|
}
|
|
|
|
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
for (i = 0; i < prog->instances.nr; i++) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%d", path, i);
|
|
|
|
if (len < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
else if (len >= PATH_MAX)
|
|
|
|
return -ENAMETOOLONG;
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
err = bpf_program__unpin_instance(prog, buf, i);
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
err = rmdir(path);
|
|
|
|
if (err)
|
|
|
|
return -errno;
|
|
|
|
|
tools lib bpf: Add BPF program pinning APIs
Add new APIs to pin a BPF program (or specific instances) to the
filesystem. The user can specify the path full path within a BPF
filesystem to pin the program.
bpf_program__pin_instance(prog, path, n) will pin the nth instance of
'prog' to the specified path.
bpf_program__pin(prog, path) will create the directory 'path' (if it
does not exist) and pin each instance within that directory. For
instance, path/0, path/1, path/2.
Committer notes:
- Add missing headers for mkdir()
- Check strdup() for failure
- Check snprintf >= size, not >, as == also means truncated, see 'man
snprintf', return value.
- Conditionally define BPF_FS_MAGIC, as it isn't in magic.h in older
systems and we're not yet having a tools/include/uapi/linux/magic.h
copy.
- Do not include linux/magic.h, not present in older distros.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170126212001.14103-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27 05:19:56 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-01-27 05:19:57 +08:00
|
|
|
int bpf_map__pin(struct bpf_map *map, const char *path)
|
|
|
|
{
|
2018-07-30 16:53:23 +08:00
|
|
|
char *cp, errmsg[STRERR_BUFSIZE];
|
2017-01-27 05:19:57 +08:00
|
|
|
int err;
|
|
|
|
|
|
|
|
err = check_path(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (map == NULL) {
|
|
|
|
pr_warning("invalid map pointer\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (bpf_obj_pin(map->fd, path)) {
|
2018-10-04 06:26:41 +08:00
|
|
|
cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
|
2018-07-30 16:53:23 +08:00
|
|
|
pr_warning("failed to pin map: %s\n", cp);
|
2017-01-27 05:19:57 +08:00
|
|
|
return -errno;
|
|
|
|
}
|
|
|
|
|
|
|
|
pr_debug("pinned map '%s'\n", path);
|
2018-11-10 00:21:41 +08:00
|
|
|
|
2017-01-27 05:19:57 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
int bpf_map__unpin(struct bpf_map *map, const char *path)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = check_path(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (map == NULL) {
|
|
|
|
pr_warning("invalid map pointer\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = unlink(path);
|
|
|
|
if (err != 0)
|
|
|
|
return -errno;
|
|
|
|
pr_debug("unpinned map '%s'\n", path);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
|
2017-01-27 05:19:58 +08:00
|
|
|
{
|
|
|
|
struct bpf_map *map;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
if (!obj->loaded) {
|
|
|
|
pr_warning("object not yet loaded; load it first\n");
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = make_dir(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2019-02-28 11:04:12 +08:00
|
|
|
bpf_object__for_each_map(map, obj) {
|
2018-11-10 00:21:41 +08:00
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%s", path,
|
|
|
|
bpf_map__name(map));
|
|
|
|
if (len < 0) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto err_unpin_maps;
|
|
|
|
} else if (len >= PATH_MAX) {
|
|
|
|
err = -ENAMETOOLONG;
|
|
|
|
goto err_unpin_maps;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = bpf_map__pin(map, buf);
|
|
|
|
if (err)
|
|
|
|
goto err_unpin_maps;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_unpin_maps:
|
|
|
|
while ((map = bpf_map__prev(map, obj))) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%s", path,
|
|
|
|
bpf_map__name(map));
|
|
|
|
if (len < 0)
|
|
|
|
continue;
|
|
|
|
else if (len >= PATH_MAX)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
bpf_map__unpin(map, buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_object__unpin_maps(struct bpf_object *obj, const char *path)
|
|
|
|
{
|
|
|
|
struct bpf_map *map;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
2019-02-28 11:04:12 +08:00
|
|
|
bpf_object__for_each_map(map, obj) {
|
2017-01-27 05:19:58 +08:00
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%s", path,
|
|
|
|
bpf_map__name(map));
|
|
|
|
if (len < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
else if (len >= PATH_MAX)
|
|
|
|
return -ENAMETOOLONG;
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
err = bpf_map__unpin(map, buf);
|
2017-01-27 05:19:58 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
|
|
|
|
{
|
|
|
|
struct bpf_program *prog;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
if (!obj->loaded) {
|
|
|
|
pr_warning("object not yet loaded; load it first\n");
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = make_dir(path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
bpf_object__for_each_program(prog, obj) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%s", path,
|
2018-11-10 00:21:43 +08:00
|
|
|
prog->pin_name);
|
2018-11-10 00:21:41 +08:00
|
|
|
if (len < 0) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto err_unpin_programs;
|
|
|
|
} else if (len >= PATH_MAX) {
|
|
|
|
err = -ENAMETOOLONG;
|
|
|
|
goto err_unpin_programs;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = bpf_program__pin(prog, buf);
|
|
|
|
if (err)
|
|
|
|
goto err_unpin_programs;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_unpin_programs:
|
|
|
|
while ((prog = bpf_program__prev(prog, obj))) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%s", path,
|
2018-11-10 00:21:43 +08:00
|
|
|
prog->pin_name);
|
2018-11-10 00:21:41 +08:00
|
|
|
if (len < 0)
|
|
|
|
continue;
|
|
|
|
else if (len >= PATH_MAX)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
bpf_program__unpin(prog, buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_object__unpin_programs(struct bpf_object *obj, const char *path)
|
|
|
|
{
|
|
|
|
struct bpf_program *prog;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
2017-01-27 05:19:58 +08:00
|
|
|
bpf_object__for_each_program(prog, obj) {
|
|
|
|
char buf[PATH_MAX];
|
|
|
|
int len;
|
|
|
|
|
|
|
|
len = snprintf(buf, PATH_MAX, "%s/%s", path,
|
2018-11-10 00:21:43 +08:00
|
|
|
prog->pin_name);
|
2017-01-27 05:19:58 +08:00
|
|
|
if (len < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
else if (len >= PATH_MAX)
|
|
|
|
return -ENAMETOOLONG;
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
err = bpf_program__unpin(prog, buf);
|
2017-01-27 05:19:58 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
int bpf_object__pin(struct bpf_object *obj, const char *path)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = bpf_object__pin_maps(obj, path);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
err = bpf_object__pin_programs(obj, path);
|
|
|
|
if (err) {
|
|
|
|
bpf_object__unpin_maps(obj, path);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
void bpf_object__close(struct bpf_object *obj)
|
|
|
|
{
|
2015-07-01 10:14:00 +08:00
|
|
|
size_t i;
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
if (!obj)
|
|
|
|
return;
|
|
|
|
|
2016-11-26 15:03:26 +08:00
|
|
|
if (obj->clear_priv)
|
|
|
|
obj->clear_priv(obj, obj->priv);
|
|
|
|
|
2015-07-01 10:13:53 +08:00
|
|
|
bpf_object__elf_finish(obj);
|
2015-07-01 10:14:04 +08:00
|
|
|
bpf_object__unload(obj);
|
2018-04-19 06:56:05 +08:00
|
|
|
btf__free(obj->btf);
|
2018-11-20 07:29:16 +08:00
|
|
|
btf_ext__free(obj->btf_ext);
|
2015-07-01 10:13:53 +08:00
|
|
|
|
2015-11-27 16:47:35 +08:00
|
|
|
for (i = 0; i < obj->nr_maps; i++) {
|
2015-11-27 16:47:36 +08:00
|
|
|
zfree(&obj->maps[i].name);
|
2015-11-27 16:47:35 +08:00
|
|
|
if (obj->maps[i].clear_priv)
|
|
|
|
obj->maps[i].clear_priv(&obj->maps[i],
|
|
|
|
obj->maps[i].priv);
|
|
|
|
obj->maps[i].priv = NULL;
|
|
|
|
obj->maps[i].clear_priv = NULL;
|
|
|
|
}
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
|
|
|
|
zfree(&obj->sections.rodata);
|
|
|
|
zfree(&obj->sections.data);
|
2015-11-27 16:47:35 +08:00
|
|
|
zfree(&obj->maps);
|
|
|
|
obj->nr_maps = 0;
|
2015-07-01 10:14:00 +08:00
|
|
|
|
|
|
|
if (obj->programs && obj->nr_programs) {
|
|
|
|
for (i = 0; i < obj->nr_programs; i++)
|
|
|
|
bpf_program__exit(&obj->programs[i]);
|
|
|
|
}
|
|
|
|
zfree(&obj->programs);
|
|
|
|
|
2015-07-01 10:14:10 +08:00
|
|
|
list_del(&obj->list);
|
2015-07-01 10:13:53 +08:00
|
|
|
free(obj);
|
|
|
|
}
|
2015-07-01 10:14:08 +08:00
|
|
|
|
2015-07-01 10:14:10 +08:00
|
|
|
struct bpf_object *
|
|
|
|
bpf_object__next(struct bpf_object *prev)
|
|
|
|
{
|
|
|
|
struct bpf_object *next;
|
|
|
|
|
|
|
|
if (!prev)
|
|
|
|
next = list_first_entry(&bpf_objects_list,
|
|
|
|
struct bpf_object,
|
|
|
|
list);
|
|
|
|
else
|
|
|
|
next = list_next_entry(prev, list);
|
|
|
|
|
|
|
|
/* Empty list is noticed here so don't need checking on entry. */
|
|
|
|
if (&next->list == &bpf_objects_list)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return next;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
const char *bpf_object__name(const struct bpf_object *obj)
|
2015-08-27 10:30:55 +08:00
|
|
|
{
|
2016-06-03 23:22:51 +08:00
|
|
|
return obj ? obj->path : ERR_PTR(-EINVAL);
|
2015-08-27 10:30:55 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
unsigned int bpf_object__kversion(const struct bpf_object *obj)
|
2015-11-06 21:49:38 +08:00
|
|
|
{
|
2016-06-03 23:22:51 +08:00
|
|
|
return obj ? obj->kern_version : 0;
|
2015-11-06 21:49:38 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
struct btf *bpf_object__btf(const struct bpf_object *obj)
|
2019-02-15 07:01:43 +08:00
|
|
|
{
|
|
|
|
return obj ? obj->btf : NULL;
|
|
|
|
}
|
|
|
|
|
2018-04-19 06:56:05 +08:00
|
|
|
int bpf_object__btf_fd(const struct bpf_object *obj)
|
|
|
|
{
|
|
|
|
return obj->btf ? btf__fd(obj->btf) : -1;
|
|
|
|
}
|
|
|
|
|
2016-11-26 15:03:26 +08:00
|
|
|
int bpf_object__set_priv(struct bpf_object *obj, void *priv,
|
|
|
|
bpf_object_clear_priv_t clear_priv)
|
|
|
|
{
|
|
|
|
if (obj->priv && obj->clear_priv)
|
|
|
|
obj->clear_priv(obj, obj->priv);
|
|
|
|
|
|
|
|
obj->priv = priv;
|
|
|
|
obj->clear_priv = clear_priv;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
void *bpf_object__priv(const struct bpf_object *obj)
|
2016-11-26 15:03:26 +08:00
|
|
|
{
|
|
|
|
return obj ? obj->priv : ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
2018-06-29 05:41:39 +08:00
|
|
|
static struct bpf_program *
|
2019-06-18 06:48:58 +08:00
|
|
|
__bpf_program__iter(const struct bpf_program *p, const struct bpf_object *obj,
|
|
|
|
bool forward)
|
2015-07-01 10:14:08 +08:00
|
|
|
{
|
2018-11-13 07:44:53 +08:00
|
|
|
size_t nr_programs = obj->nr_programs;
|
2018-11-10 00:21:41 +08:00
|
|
|
ssize_t idx;
|
2015-07-01 10:14:08 +08:00
|
|
|
|
2018-11-13 07:44:53 +08:00
|
|
|
if (!nr_programs)
|
2015-07-01 10:14:08 +08:00
|
|
|
return NULL;
|
|
|
|
|
2018-11-13 07:44:53 +08:00
|
|
|
if (!p)
|
|
|
|
/* Iter from the beginning */
|
|
|
|
return forward ? &obj->programs[0] :
|
|
|
|
&obj->programs[nr_programs - 1];
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
if (p->obj != obj) {
|
2015-07-01 10:14:08 +08:00
|
|
|
pr_warning("error: program handler doesn't match object\n");
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-11-13 07:44:53 +08:00
|
|
|
idx = (p - obj->programs) + (forward ? 1 : -1);
|
2018-11-10 00:21:41 +08:00
|
|
|
if (idx >= obj->nr_programs || idx < 0)
|
2015-07-01 10:14:08 +08:00
|
|
|
return NULL;
|
|
|
|
return &obj->programs[idx];
|
|
|
|
}
|
|
|
|
|
2018-06-29 05:41:39 +08:00
|
|
|
struct bpf_program *
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_program__next(struct bpf_program *prev, const struct bpf_object *obj)
|
2018-06-29 05:41:39 +08:00
|
|
|
{
|
|
|
|
struct bpf_program *prog = prev;
|
|
|
|
|
|
|
|
do {
|
2018-11-13 07:44:53 +08:00
|
|
|
prog = __bpf_program__iter(prog, obj, true);
|
2018-11-10 00:21:41 +08:00
|
|
|
} while (prog && bpf_program__is_function_storage(prog, obj));
|
|
|
|
|
|
|
|
return prog;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_program *
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_program__prev(struct bpf_program *next, const struct bpf_object *obj)
|
2018-11-10 00:21:41 +08:00
|
|
|
{
|
|
|
|
struct bpf_program *prog = next;
|
|
|
|
|
|
|
|
do {
|
2018-11-13 07:44:53 +08:00
|
|
|
prog = __bpf_program__iter(prog, obj, false);
|
2018-06-29 05:41:39 +08:00
|
|
|
} while (prog && bpf_program__is_function_storage(prog, obj));
|
|
|
|
|
|
|
|
return prog;
|
|
|
|
}
|
|
|
|
|
2016-06-03 23:38:21 +08:00
|
|
|
int bpf_program__set_priv(struct bpf_program *prog, void *priv,
|
|
|
|
bpf_program_clear_priv_t clear_priv)
|
2015-07-01 10:14:08 +08:00
|
|
|
{
|
|
|
|
if (prog->priv && prog->clear_priv)
|
|
|
|
prog->clear_priv(prog, prog->priv);
|
|
|
|
|
|
|
|
prog->priv = priv;
|
|
|
|
prog->clear_priv = clear_priv;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
void *bpf_program__priv(const struct bpf_program *prog)
|
2015-07-01 10:14:08 +08:00
|
|
|
{
|
2016-06-03 23:36:39 +08:00
|
|
|
return prog ? prog->priv : ERR_PTR(-EINVAL);
|
2015-07-01 10:14:08 +08:00
|
|
|
}
|
|
|
|
|
2018-06-29 05:41:37 +08:00
|
|
|
void bpf_program__set_ifindex(struct bpf_program *prog, __u32 ifindex)
|
|
|
|
{
|
|
|
|
prog->prog_ifindex = ifindex;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
const char *bpf_program__title(const struct bpf_program *prog, bool needs_copy)
|
2015-07-01 10:14:08 +08:00
|
|
|
{
|
|
|
|
const char *title;
|
|
|
|
|
|
|
|
title = prog->section_name;
|
2015-11-03 19:21:05 +08:00
|
|
|
if (needs_copy) {
|
2015-07-01 10:14:08 +08:00
|
|
|
title = strdup(title);
|
|
|
|
if (!title) {
|
|
|
|
pr_warning("failed to strdup program title\n");
|
2015-11-06 21:49:37 +08:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2015-07-01 10:14:08 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return title;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
int bpf_program__fd(const struct bpf_program *prog)
|
2015-07-01 10:14:08 +08:00
|
|
|
{
|
2015-11-16 20:10:09 +08:00
|
|
|
return bpf_program__nth_fd(prog, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_program__set_prep(struct bpf_program *prog, int nr_instances,
|
|
|
|
bpf_program_prep_t prep)
|
|
|
|
{
|
|
|
|
int *instances_fds;
|
|
|
|
|
|
|
|
if (nr_instances <= 0 || !prep)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (prog->instances.nr > 0 || prog->instances.fds) {
|
|
|
|
pr_warning("Can't set pre-processor after loading\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
instances_fds = malloc(sizeof(int) * nr_instances);
|
|
|
|
if (!instances_fds) {
|
|
|
|
pr_warning("alloc memory failed for fds\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* fill all fd with -1 */
|
|
|
|
memset(instances_fds, -1, sizeof(int) * nr_instances);
|
|
|
|
|
|
|
|
prog->instances.nr = nr_instances;
|
|
|
|
prog->instances.fds = instances_fds;
|
|
|
|
prog->preprocessor = prep;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
int bpf_program__nth_fd(const struct bpf_program *prog, int n)
|
2015-11-16 20:10:09 +08:00
|
|
|
{
|
|
|
|
int fd;
|
|
|
|
|
2018-07-27 05:32:18 +08:00
|
|
|
if (!prog)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2015-11-16 20:10:09 +08:00
|
|
|
if (n >= prog->instances.nr || n < 0) {
|
|
|
|
pr_warning("Can't get the %dth fd from program %s: only %d instances\n",
|
|
|
|
n, prog->section_name, prog->instances.nr);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
fd = prog->instances.fds[n];
|
|
|
|
if (fd < 0) {
|
|
|
|
pr_warning("%dth instance of program '%s' is invalid\n",
|
|
|
|
n, prog->section_name);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
return fd;
|
2015-07-01 10:14:08 +08:00
|
|
|
}
|
2015-11-27 16:47:35 +08:00
|
|
|
|
2017-03-31 12:45:40 +08:00
|
|
|
void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type)
|
2016-07-13 18:44:01 +08:00
|
|
|
{
|
|
|
|
prog->type = type;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
static bool bpf_program__is_type(const struct bpf_program *prog,
|
2016-07-13 18:44:01 +08:00
|
|
|
enum bpf_prog_type type)
|
|
|
|
{
|
|
|
|
return prog ? (prog->type == type) : false;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
#define BPF_PROG_TYPE_FNS(NAME, TYPE) \
|
|
|
|
int bpf_program__set_##NAME(struct bpf_program *prog) \
|
|
|
|
{ \
|
|
|
|
if (!prog) \
|
|
|
|
return -EINVAL; \
|
|
|
|
bpf_program__set_type(prog, TYPE); \
|
|
|
|
return 0; \
|
|
|
|
} \
|
|
|
|
\
|
|
|
|
bool bpf_program__is_##NAME(const struct bpf_program *prog) \
|
|
|
|
{ \
|
|
|
|
return bpf_program__is_type(prog, TYPE); \
|
|
|
|
} \
|
2017-01-23 09:11:23 +08:00
|
|
|
|
2017-01-23 09:11:24 +08:00
|
|
|
BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER);
|
2017-01-23 09:11:23 +08:00
|
|
|
BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
|
2017-01-23 09:11:24 +08:00
|
|
|
BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS);
|
|
|
|
BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT);
|
2017-01-23 09:11:23 +08:00
|
|
|
BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
|
2018-04-18 01:28:46 +08:00
|
|
|
BPF_PROG_TYPE_FNS(raw_tracepoint, BPF_PROG_TYPE_RAW_TRACEPOINT);
|
2017-01-23 09:11:24 +08:00
|
|
|
BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
|
|
|
|
BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
|
2016-07-13 18:44:01 +08:00
|
|
|
|
2018-04-24 05:30:38 +08:00
|
|
|
void bpf_program__set_expected_attach_type(struct bpf_program *prog,
|
|
|
|
enum bpf_attach_type type)
|
2018-03-31 06:08:01 +08:00
|
|
|
{
|
|
|
|
prog->expected_attach_type = type;
|
|
|
|
}
|
|
|
|
|
2018-11-01 03:57:18 +08:00
|
|
|
#define BPF_PROG_SEC_IMPL(string, ptype, eatype, is_attachable, atype) \
|
|
|
|
{ string, sizeof(string) - 1, ptype, eatype, is_attachable, atype }
|
2018-03-31 06:08:01 +08:00
|
|
|
|
2018-09-27 06:24:53 +08:00
|
|
|
/* Programs that can NOT be attached. */
|
2018-11-01 03:57:18 +08:00
|
|
|
#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, 0, 0)
|
2018-03-31 06:08:01 +08:00
|
|
|
|
2018-09-27 06:24:53 +08:00
|
|
|
/* Programs that can be attached. */
|
|
|
|
#define BPF_APROG_SEC(string, ptype, atype) \
|
2018-11-01 03:57:18 +08:00
|
|
|
BPF_PROG_SEC_IMPL(string, ptype, 0, 1, atype)
|
2018-04-18 01:28:45 +08:00
|
|
|
|
2018-09-27 06:24:53 +08:00
|
|
|
/* Programs that must specify expected attach type at load time. */
|
|
|
|
#define BPF_EAPROG_SEC(string, ptype, eatype) \
|
2018-11-01 03:57:18 +08:00
|
|
|
BPF_PROG_SEC_IMPL(string, ptype, eatype, 1, eatype)
|
2018-09-27 06:24:53 +08:00
|
|
|
|
|
|
|
/* Programs that can be attached but attach type can't be identified by section
|
|
|
|
* name. Kept for backward compatibility.
|
|
|
|
*/
|
|
|
|
#define BPF_APROG_COMPAT(string, ptype) BPF_PROG_SEC(string, ptype)
|
selftests/bpf: Selftest for sys_bind hooks
Add selftest to work with bpf_sock_addr context from
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` programs.
Try to bind(2) on IP:port and apply:
* loads to make sure context can be read correctly, including narrow
loads (byte, half) for IP and full-size loads (word) for all fields;
* stores to those fields allowed by verifier.
All combination from IPv4/IPv6 and TCP/UDP are tested.
Both scenarios are tested:
* valid programs can be loaded and attached;
* invalid programs can be neither loaded nor attached.
Test passes when expected data can be read from context in the
BPF-program, and after the call to bind(2) socket is bound to IP:port
pair that was written by BPF-program to the context.
Example:
# ./test_sock_addr
Attached bind4 program.
Test case #1 (IPv4/TCP):
Requested: bind(192.168.1.254, 4040) ..
Actual: bind(127.0.0.1, 4444)
Test case #2 (IPv4/UDP):
Requested: bind(192.168.1.254, 4040) ..
Actual: bind(127.0.0.1, 4444)
Attached bind6 program.
Test case #3 (IPv6/TCP):
Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
Actual: bind(::1, 6666)
Test case #4 (IPv6/UDP):
Requested: bind(face:b00c:1234:5678::abcd, 6060) ..
Actual: bind(::1, 6666)
### SUCCESS
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 06:08:03 +08:00
|
|
|
|
2017-12-13 23:18:51 +08:00
|
|
|
static const struct {
|
|
|
|
const char *sec;
|
|
|
|
size_t len;
|
|
|
|
enum bpf_prog_type prog_type;
|
2018-03-31 06:08:01 +08:00
|
|
|
enum bpf_attach_type expected_attach_type;
|
2018-11-01 03:57:18 +08:00
|
|
|
int is_attachable;
|
2018-09-27 06:24:53 +08:00
|
|
|
enum bpf_attach_type attach_type;
|
2017-12-13 23:18:51 +08:00
|
|
|
} section_names[] = {
|
2018-09-27 06:24:53 +08:00
|
|
|
BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER),
|
|
|
|
BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE),
|
|
|
|
BPF_PROG_SEC("kretprobe/", BPF_PROG_TYPE_KPROBE),
|
|
|
|
BPF_PROG_SEC("classifier", BPF_PROG_TYPE_SCHED_CLS),
|
|
|
|
BPF_PROG_SEC("action", BPF_PROG_TYPE_SCHED_ACT),
|
|
|
|
BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
|
|
|
|
BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT),
|
|
|
|
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
|
|
|
|
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
|
|
|
|
BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN),
|
|
|
|
BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT),
|
|
|
|
BPF_PROG_SEC("lwt_xmit", BPF_PROG_TYPE_LWT_XMIT),
|
|
|
|
BPF_PROG_SEC("lwt_seg6local", BPF_PROG_TYPE_LWT_SEG6LOCAL),
|
2018-09-27 06:24:54 +08:00
|
|
|
BPF_APROG_SEC("cgroup_skb/ingress", BPF_PROG_TYPE_CGROUP_SKB,
|
|
|
|
BPF_CGROUP_INET_INGRESS),
|
|
|
|
BPF_APROG_SEC("cgroup_skb/egress", BPF_PROG_TYPE_CGROUP_SKB,
|
|
|
|
BPF_CGROUP_INET_EGRESS),
|
2018-09-27 06:24:53 +08:00
|
|
|
BPF_APROG_COMPAT("cgroup/skb", BPF_PROG_TYPE_CGROUP_SKB),
|
|
|
|
BPF_APROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK,
|
|
|
|
BPF_CGROUP_INET_SOCK_CREATE),
|
|
|
|
BPF_EAPROG_SEC("cgroup/post_bind4", BPF_PROG_TYPE_CGROUP_SOCK,
|
|
|
|
BPF_CGROUP_INET4_POST_BIND),
|
|
|
|
BPF_EAPROG_SEC("cgroup/post_bind6", BPF_PROG_TYPE_CGROUP_SOCK,
|
|
|
|
BPF_CGROUP_INET6_POST_BIND),
|
|
|
|
BPF_APROG_SEC("cgroup/dev", BPF_PROG_TYPE_CGROUP_DEVICE,
|
|
|
|
BPF_CGROUP_DEVICE),
|
|
|
|
BPF_APROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS,
|
|
|
|
BPF_CGROUP_SOCK_OPS),
|
2018-09-27 06:24:55 +08:00
|
|
|
BPF_APROG_SEC("sk_skb/stream_parser", BPF_PROG_TYPE_SK_SKB,
|
|
|
|
BPF_SK_SKB_STREAM_PARSER),
|
|
|
|
BPF_APROG_SEC("sk_skb/stream_verdict", BPF_PROG_TYPE_SK_SKB,
|
|
|
|
BPF_SK_SKB_STREAM_VERDICT),
|
2018-09-27 06:24:53 +08:00
|
|
|
BPF_APROG_COMPAT("sk_skb", BPF_PROG_TYPE_SK_SKB),
|
|
|
|
BPF_APROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG,
|
|
|
|
BPF_SK_MSG_VERDICT),
|
|
|
|
BPF_APROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2,
|
|
|
|
BPF_LIRC_MODE2),
|
|
|
|
BPF_APROG_SEC("flow_dissector", BPF_PROG_TYPE_FLOW_DISSECTOR,
|
|
|
|
BPF_FLOW_DISSECTOR),
|
|
|
|
BPF_EAPROG_SEC("cgroup/bind4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_INET4_BIND),
|
|
|
|
BPF_EAPROG_SEC("cgroup/bind6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_INET6_BIND),
|
|
|
|
BPF_EAPROG_SEC("cgroup/connect4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_INET4_CONNECT),
|
|
|
|
BPF_EAPROG_SEC("cgroup/connect6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_INET6_CONNECT),
|
|
|
|
BPF_EAPROG_SEC("cgroup/sendmsg4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_UDP4_SENDMSG),
|
|
|
|
BPF_EAPROG_SEC("cgroup/sendmsg6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_UDP6_SENDMSG),
|
2019-06-07 07:48:59 +08:00
|
|
|
BPF_EAPROG_SEC("cgroup/recvmsg4", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_UDP4_RECVMSG),
|
|
|
|
BPF_EAPROG_SEC("cgroup/recvmsg6", BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
|
|
|
|
BPF_CGROUP_UDP6_RECVMSG),
|
2019-03-09 01:15:26 +08:00
|
|
|
BPF_EAPROG_SEC("cgroup/sysctl", BPF_PROG_TYPE_CGROUP_SYSCTL,
|
|
|
|
BPF_CGROUP_SYSCTL),
|
2019-06-28 04:38:49 +08:00
|
|
|
BPF_EAPROG_SEC("cgroup/getsockopt", BPF_PROG_TYPE_CGROUP_SOCKOPT,
|
|
|
|
BPF_CGROUP_GETSOCKOPT),
|
|
|
|
BPF_EAPROG_SEC("cgroup/setsockopt", BPF_PROG_TYPE_CGROUP_SOCKOPT,
|
|
|
|
BPF_CGROUP_SETSOCKOPT),
|
2017-12-13 23:18:51 +08:00
|
|
|
};
|
2018-03-31 06:08:01 +08:00
|
|
|
|
2018-09-27 06:24:53 +08:00
|
|
|
#undef BPF_PROG_SEC_IMPL
|
2017-12-13 23:18:51 +08:00
|
|
|
#undef BPF_PROG_SEC
|
2018-09-27 06:24:53 +08:00
|
|
|
#undef BPF_APROG_SEC
|
|
|
|
#undef BPF_EAPROG_SEC
|
|
|
|
#undef BPF_APROG_COMPAT
|
2017-12-13 23:18:51 +08:00
|
|
|
|
2019-01-21 21:06:38 +08:00
|
|
|
#define MAX_TYPE_NAME_SIZE 32
|
|
|
|
|
|
|
|
static char *libbpf_get_type_names(bool attach_type)
|
|
|
|
{
|
|
|
|
int i, len = ARRAY_SIZE(section_names) * MAX_TYPE_NAME_SIZE;
|
|
|
|
char *buf;
|
|
|
|
|
|
|
|
buf = malloc(len);
|
|
|
|
if (!buf)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
buf[0] = '\0';
|
|
|
|
/* Forge string buf with all available names */
|
|
|
|
for (i = 0; i < ARRAY_SIZE(section_names); i++) {
|
|
|
|
if (attach_type && !section_names[i].is_attachable)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (strlen(buf) + strlen(section_names[i].sec) + 2 > len) {
|
|
|
|
free(buf);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
strcat(buf, " ");
|
|
|
|
strcat(buf, section_names[i].sec);
|
|
|
|
}
|
|
|
|
|
|
|
|
return buf;
|
|
|
|
}
|
|
|
|
|
2018-07-11 05:42:59 +08:00
|
|
|
int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
|
|
|
|
enum bpf_attach_type *expected_attach_type)
|
2017-12-13 23:18:51 +08:00
|
|
|
{
|
2019-01-21 21:06:38 +08:00
|
|
|
char *type_names;
|
2017-12-13 23:18:51 +08:00
|
|
|
int i;
|
|
|
|
|
2018-07-11 05:42:59 +08:00
|
|
|
if (!name)
|
|
|
|
return -EINVAL;
|
2017-12-13 23:18:51 +08:00
|
|
|
|
2018-07-11 05:42:59 +08:00
|
|
|
for (i = 0; i < ARRAY_SIZE(section_names); i++) {
|
|
|
|
if (strncmp(name, section_names[i].sec, section_names[i].len))
|
|
|
|
continue;
|
|
|
|
*prog_type = section_names[i].prog_type;
|
|
|
|
*expected_attach_type = section_names[i].expected_attach_type;
|
|
|
|
return 0;
|
|
|
|
}
|
2019-01-21 21:06:38 +08:00
|
|
|
pr_warning("failed to guess program type based on ELF section name '%s'\n", name);
|
|
|
|
type_names = libbpf_get_type_names(false);
|
|
|
|
if (type_names != NULL) {
|
|
|
|
pr_info("supported section(type) names are:%s\n", type_names);
|
|
|
|
free(type_names);
|
|
|
|
}
|
|
|
|
|
2018-07-11 05:42:59 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-12-13 23:18:51 +08:00
|
|
|
|
2018-09-27 06:24:53 +08:00
|
|
|
int libbpf_attach_type_by_name(const char *name,
|
|
|
|
enum bpf_attach_type *attach_type)
|
|
|
|
{
|
2019-01-21 21:06:38 +08:00
|
|
|
char *type_names;
|
2018-09-27 06:24:53 +08:00
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!name)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
for (i = 0; i < ARRAY_SIZE(section_names); i++) {
|
|
|
|
if (strncmp(name, section_names[i].sec, section_names[i].len))
|
|
|
|
continue;
|
2018-11-01 03:57:18 +08:00
|
|
|
if (!section_names[i].is_attachable)
|
2018-09-27 06:24:53 +08:00
|
|
|
return -EINVAL;
|
|
|
|
*attach_type = section_names[i].attach_type;
|
|
|
|
return 0;
|
|
|
|
}
|
2019-01-21 21:06:38 +08:00
|
|
|
pr_warning("failed to guess attach type based on ELF section name '%s'\n", name);
|
|
|
|
type_names = libbpf_get_type_names(true);
|
|
|
|
if (type_names != NULL) {
|
|
|
|
pr_info("attachable section(type) names are:%s\n", type_names);
|
|
|
|
free(type_names);
|
|
|
|
}
|
|
|
|
|
2018-09-27 06:24:53 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2018-07-11 05:42:59 +08:00
|
|
|
static int
|
|
|
|
bpf_program__identify_section(struct bpf_program *prog,
|
|
|
|
enum bpf_prog_type *prog_type,
|
|
|
|
enum bpf_attach_type *expected_attach_type)
|
|
|
|
{
|
|
|
|
return libbpf_prog_type_by_name(prog->section_name, prog_type,
|
|
|
|
expected_attach_type);
|
2017-12-13 23:18:51 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
int bpf_map__fd(const struct bpf_map *map)
|
2015-11-27 16:47:35 +08:00
|
|
|
{
|
2016-06-03 23:15:52 +08:00
|
|
|
return map ? map->fd : -EINVAL;
|
2015-11-27 16:47:35 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
const struct bpf_map_def *bpf_map__def(const struct bpf_map *map)
|
2015-11-27 16:47:35 +08:00
|
|
|
{
|
2016-06-03 01:21:06 +08:00
|
|
|
return map ? &map->def : ERR_PTR(-EINVAL);
|
2015-11-27 16:47:35 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
const char *bpf_map__name(const struct bpf_map *map)
|
2015-11-27 16:47:36 +08:00
|
|
|
{
|
2016-06-02 22:02:05 +08:00
|
|
|
return map ? map->name : NULL;
|
2015-11-27 16:47:36 +08:00
|
|
|
}
|
|
|
|
|
2018-07-24 23:40:21 +08:00
|
|
|
__u32 bpf_map__btf_key_type_id(const struct bpf_map *map)
|
2018-04-19 06:56:05 +08:00
|
|
|
{
|
2018-05-23 06:04:24 +08:00
|
|
|
return map ? map->btf_key_type_id : 0;
|
2018-04-19 06:56:05 +08:00
|
|
|
}
|
|
|
|
|
2018-07-24 23:40:21 +08:00
|
|
|
__u32 bpf_map__btf_value_type_id(const struct bpf_map *map)
|
2018-04-19 06:56:05 +08:00
|
|
|
{
|
2018-05-23 06:04:24 +08:00
|
|
|
return map ? map->btf_value_type_id : 0;
|
2018-04-19 06:56:05 +08:00
|
|
|
}
|
|
|
|
|
2016-06-03 23:38:21 +08:00
|
|
|
int bpf_map__set_priv(struct bpf_map *map, void *priv,
|
|
|
|
bpf_map_clear_priv_t clear_priv)
|
2015-11-27 16:47:35 +08:00
|
|
|
{
|
|
|
|
if (!map)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (map->priv) {
|
|
|
|
if (map->clear_priv)
|
|
|
|
map->clear_priv(map, map->priv);
|
|
|
|
}
|
|
|
|
|
|
|
|
map->priv = priv;
|
|
|
|
map->clear_priv = clear_priv;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
void *bpf_map__priv(const struct bpf_map *map)
|
2015-11-27 16:47:35 +08:00
|
|
|
{
|
2016-06-02 21:51:59 +08:00
|
|
|
return map ? map->priv : ERR_PTR(-EINVAL);
|
2015-11-27 16:47:35 +08:00
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
bool bpf_map__is_offload_neutral(const struct bpf_map *map)
|
2018-07-11 05:43:01 +08:00
|
|
|
{
|
|
|
|
return map->def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
|
|
|
|
}
|
|
|
|
|
2019-06-18 06:48:58 +08:00
|
|
|
bool bpf_map__is_internal(const struct bpf_map *map)
|
bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-10 05:20:13 +08:00
|
|
|
{
|
|
|
|
return map->libbpf_type != LIBBPF_MAP_UNSPEC;
|
|
|
|
}
|
|
|
|
|
2018-06-29 05:41:37 +08:00
|
|
|
void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
|
|
|
|
{
|
|
|
|
map->map_ifindex = ifindex;
|
|
|
|
}
|
|
|
|
|
2018-11-21 12:55:56 +08:00
|
|
|
int bpf_map__set_inner_map_fd(struct bpf_map *map, int fd)
|
|
|
|
{
|
|
|
|
if (!bpf_map_type__is_map_in_map(map->def.type)) {
|
|
|
|
pr_warning("error: unsupported map type\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (map->inner_map_fd != -1) {
|
|
|
|
pr_warning("error: inner_map_fd already specified\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
map->inner_map_fd = fd;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
static struct bpf_map *
|
2019-06-18 06:48:58 +08:00
|
|
|
__bpf_map__iter(const struct bpf_map *m, const struct bpf_object *obj, int i)
|
2015-11-27 16:47:35 +08:00
|
|
|
{
|
2018-11-10 00:21:41 +08:00
|
|
|
ssize_t idx;
|
2015-11-27 16:47:35 +08:00
|
|
|
struct bpf_map *s, *e;
|
|
|
|
|
|
|
|
if (!obj || !obj->maps)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
s = obj->maps;
|
|
|
|
e = obj->maps + obj->nr_maps;
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
if ((m < s) || (m >= e)) {
|
2015-11-27 16:47:35 +08:00
|
|
|
pr_warning("error in %s: map handler doesn't belong to object\n",
|
|
|
|
__func__);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
idx = (m - obj->maps) + i;
|
|
|
|
if (idx >= obj->nr_maps || idx < 0)
|
2015-11-27 16:47:35 +08:00
|
|
|
return NULL;
|
|
|
|
return &obj->maps[idx];
|
|
|
|
}
|
2015-11-27 16:47:36 +08:00
|
|
|
|
2018-11-10 00:21:41 +08:00
|
|
|
struct bpf_map *
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_map__next(const struct bpf_map *prev, const struct bpf_object *obj)
|
2018-11-10 00:21:41 +08:00
|
|
|
{
|
|
|
|
if (prev == NULL)
|
|
|
|
return obj->maps;
|
|
|
|
|
|
|
|
return __bpf_map__iter(prev, obj, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_map *
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_map__prev(const struct bpf_map *next, const struct bpf_object *obj)
|
2018-11-10 00:21:41 +08:00
|
|
|
{
|
|
|
|
if (next == NULL) {
|
|
|
|
if (!obj->nr_maps)
|
|
|
|
return NULL;
|
|
|
|
return obj->maps + obj->nr_maps - 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return __bpf_map__iter(next, obj, -1);
|
|
|
|
}
|
|
|
|
|
2015-11-27 16:47:36 +08:00
|
|
|
struct bpf_map *
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_object__find_map_by_name(const struct bpf_object *obj, const char *name)
|
2015-11-27 16:47:36 +08:00
|
|
|
{
|
|
|
|
struct bpf_map *pos;
|
|
|
|
|
2019-02-28 11:04:12 +08:00
|
|
|
bpf_object__for_each_map(pos, obj) {
|
2015-12-08 10:25:29 +08:00
|
|
|
if (pos->name && !strcmp(pos->name, name))
|
2015-11-27 16:47:36 +08:00
|
|
|
return pos;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
2016-11-26 15:03:27 +08:00
|
|
|
|
2019-02-02 05:42:23 +08:00
|
|
|
int
|
2019-06-18 06:48:58 +08:00
|
|
|
bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name)
|
2019-02-02 05:42:23 +08:00
|
|
|
{
|
|
|
|
return bpf_map__fd(bpf_object__find_map_by_name(obj, name));
|
|
|
|
}
|
|
|
|
|
2016-11-26 15:03:27 +08:00
|
|
|
struct bpf_map *
|
|
|
|
bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset)
|
|
|
|
{
|
2019-06-18 03:26:54 +08:00
|
|
|
return ERR_PTR(-ENOTSUP);
|
2016-11-26 15:03:27 +08:00
|
|
|
}
|
2017-01-23 09:11:25 +08:00
|
|
|
|
|
|
|
long libbpf_get_error(const void *ptr)
|
|
|
|
{
|
2019-05-25 17:02:57 +08:00
|
|
|
return PTR_ERR_OR_ZERO(ptr);
|
2017-01-23 09:11:25 +08:00
|
|
|
}
|
2017-08-16 13:34:22 +08:00
|
|
|
|
|
|
|
int bpf_prog_load(const char *file, enum bpf_prog_type type,
|
|
|
|
struct bpf_object **pobj, int *prog_fd)
|
2018-03-31 06:08:01 +08:00
|
|
|
{
|
|
|
|
struct bpf_prog_load_attr attr;
|
|
|
|
|
|
|
|
memset(&attr, 0, sizeof(struct bpf_prog_load_attr));
|
|
|
|
attr.file = file;
|
|
|
|
attr.prog_type = type;
|
|
|
|
attr.expected_attach_type = 0;
|
|
|
|
|
|
|
|
return bpf_prog_load_xattr(&attr, pobj, prog_fd);
|
|
|
|
}
|
|
|
|
|
|
|
|
int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
|
|
|
|
struct bpf_object **pobj, int *prog_fd)
|
2017-08-16 13:34:22 +08:00
|
|
|
{
|
2019-07-02 18:25:31 +08:00
|
|
|
struct bpf_object_open_attr open_attr = {};
|
2017-12-15 09:55:10 +08:00
|
|
|
struct bpf_program *prog, *first_prog = NULL;
|
2018-03-31 06:08:01 +08:00
|
|
|
enum bpf_attach_type expected_attach_type;
|
|
|
|
enum bpf_prog_type prog_type;
|
2017-08-16 13:34:22 +08:00
|
|
|
struct bpf_object *obj;
|
2018-05-17 05:02:49 +08:00
|
|
|
struct bpf_map *map;
|
2017-08-16 13:34:22 +08:00
|
|
|
int err;
|
|
|
|
|
2018-03-31 06:08:01 +08:00
|
|
|
if (!attr)
|
|
|
|
return -EINVAL;
|
2018-05-11 01:24:42 +08:00
|
|
|
if (!attr->file)
|
|
|
|
return -EINVAL;
|
2018-03-31 06:08:01 +08:00
|
|
|
|
2019-07-02 18:25:31 +08:00
|
|
|
open_attr.file = attr->file;
|
|
|
|
open_attr.prog_type = attr->prog_type;
|
|
|
|
|
2018-07-11 05:43:02 +08:00
|
|
|
obj = bpf_object__open_xattr(&open_attr);
|
2018-05-11 01:09:34 +08:00
|
|
|
if (IS_ERR_OR_NULL(obj))
|
2017-08-16 13:34:22 +08:00
|
|
|
return -ENOENT;
|
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
bpf_object__for_each_program(prog, obj) {
|
|
|
|
/*
|
|
|
|
* If type is not specified, try to guess it based on
|
|
|
|
* section name.
|
|
|
|
*/
|
2018-03-31 06:08:01 +08:00
|
|
|
prog_type = attr->prog_type;
|
2018-05-17 05:02:49 +08:00
|
|
|
prog->prog_ifindex = attr->ifindex;
|
2018-03-31 06:08:01 +08:00
|
|
|
expected_attach_type = attr->expected_attach_type;
|
|
|
|
if (prog_type == BPF_PROG_TYPE_UNSPEC) {
|
2018-07-11 05:42:59 +08:00
|
|
|
err = bpf_program__identify_section(prog, &prog_type,
|
|
|
|
&expected_attach_type);
|
|
|
|
if (err < 0) {
|
2017-12-15 09:55:10 +08:00
|
|
|
bpf_object__close(obj);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2017-12-13 23:18:51 +08:00
|
|
|
}
|
2017-12-15 09:55:10 +08:00
|
|
|
|
2018-03-31 06:08:01 +08:00
|
|
|
bpf_program__set_type(prog, prog_type);
|
|
|
|
bpf_program__set_expected_attach_type(prog,
|
|
|
|
expected_attach_type);
|
|
|
|
|
2019-04-02 12:27:47 +08:00
|
|
|
prog->log_level = attr->log_level;
|
2019-05-25 06:25:19 +08:00
|
|
|
prog->prog_flags = attr->prog_flags;
|
2018-09-03 07:30:07 +08:00
|
|
|
if (!first_prog)
|
2017-12-15 09:55:10 +08:00
|
|
|
first_prog = prog;
|
|
|
|
}
|
|
|
|
|
2019-02-28 11:04:12 +08:00
|
|
|
bpf_object__for_each_map(map, obj) {
|
2018-07-11 05:43:01 +08:00
|
|
|
if (!bpf_map__is_offload_neutral(map))
|
|
|
|
map->map_ifindex = attr->ifindex;
|
2018-05-17 05:02:49 +08:00
|
|
|
}
|
|
|
|
|
2017-12-15 09:55:10 +08:00
|
|
|
if (!first_prog) {
|
|
|
|
pr_warning("object file doesn't contain bpf program\n");
|
|
|
|
bpf_object__close(obj);
|
|
|
|
return -ENOENT;
|
2017-12-13 23:18:51 +08:00
|
|
|
}
|
|
|
|
|
2017-08-16 13:34:22 +08:00
|
|
|
err = bpf_object__load(obj);
|
|
|
|
if (err) {
|
|
|
|
bpf_object__close(obj);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
*pobj = obj;
|
2017-12-15 09:55:10 +08:00
|
|
|
*prog_fd = bpf_program__fd(first_prog);
|
2017-08-16 13:34:22 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2018-05-11 01:24:40 +08:00
|
|
|
|
2019-07-02 07:58:56 +08:00
|
|
|
struct bpf_link {
|
|
|
|
int (*destroy)(struct bpf_link *link);
|
|
|
|
};
|
|
|
|
|
|
|
|
int bpf_link__destroy(struct bpf_link *link)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!link)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err = link->destroy(link);
|
|
|
|
free(link);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-07-02 07:58:57 +08:00
|
|
|
struct bpf_link_fd {
|
|
|
|
struct bpf_link link; /* has to be at the top of struct */
|
|
|
|
int fd; /* hook FD */
|
|
|
|
};
|
|
|
|
|
|
|
|
static int bpf_link__destroy_perf_event(struct bpf_link *link)
|
|
|
|
{
|
|
|
|
struct bpf_link_fd *l = (void *)link;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = ioctl(l->fd, PERF_EVENT_IOC_DISABLE, 0);
|
|
|
|
if (err)
|
|
|
|
err = -errno;
|
|
|
|
|
|
|
|
close(l->fd);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog,
|
|
|
|
int pfd)
|
|
|
|
{
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
struct bpf_link_fd *link;
|
|
|
|
int prog_fd, err;
|
|
|
|
|
|
|
|
if (pfd < 0) {
|
|
|
|
pr_warning("program '%s': invalid perf event FD %d\n",
|
|
|
|
bpf_program__title(prog, false), pfd);
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
prog_fd = bpf_program__fd(prog);
|
|
|
|
if (prog_fd < 0) {
|
|
|
|
pr_warning("program '%s': can't attach BPF program w/o FD (did you load it?)\n",
|
|
|
|
bpf_program__title(prog, false));
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
link = malloc(sizeof(*link));
|
|
|
|
if (!link)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
link->link.destroy = &bpf_link__destroy_perf_event;
|
|
|
|
link->fd = pfd;
|
|
|
|
|
|
|
|
if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) {
|
|
|
|
err = -errno;
|
|
|
|
free(link);
|
|
|
|
pr_warning("program '%s': failed to attach to pfd %d: %s\n",
|
|
|
|
bpf_program__title(prog, false), pfd,
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
|
|
|
|
err = -errno;
|
|
|
|
free(link);
|
|
|
|
pr_warning("program '%s': failed to enable pfd %d: %s\n",
|
|
|
|
bpf_program__title(prog, false), pfd,
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
return (struct bpf_link *)link;
|
|
|
|
}
|
|
|
|
|
2019-07-02 07:58:58 +08:00
|
|
|
/*
|
|
|
|
* this function is expected to parse integer in the range of [0, 2^31-1] from
|
|
|
|
* given file using scanf format string fmt. If actual parsed value is
|
|
|
|
* negative, the result might be indistinguishable from error
|
|
|
|
*/
|
|
|
|
static int parse_uint_from_file(const char *file, const char *fmt)
|
|
|
|
{
|
|
|
|
char buf[STRERR_BUFSIZE];
|
|
|
|
int err, ret;
|
|
|
|
FILE *f;
|
|
|
|
|
|
|
|
f = fopen(file, "r");
|
|
|
|
if (!f) {
|
|
|
|
err = -errno;
|
|
|
|
pr_debug("failed to open '%s': %s\n", file,
|
|
|
|
libbpf_strerror_r(err, buf, sizeof(buf)));
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
err = fscanf(f, fmt, &ret);
|
|
|
|
if (err != 1) {
|
|
|
|
err = err == EOF ? -EIO : -errno;
|
|
|
|
pr_debug("failed to parse '%s': %s\n", file,
|
|
|
|
libbpf_strerror_r(err, buf, sizeof(buf)));
|
|
|
|
fclose(f);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
fclose(f);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int determine_kprobe_perf_type(void)
|
|
|
|
{
|
|
|
|
const char *file = "/sys/bus/event_source/devices/kprobe/type";
|
|
|
|
|
|
|
|
return parse_uint_from_file(file, "%d\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static int determine_uprobe_perf_type(void)
|
|
|
|
{
|
|
|
|
const char *file = "/sys/bus/event_source/devices/uprobe/type";
|
|
|
|
|
|
|
|
return parse_uint_from_file(file, "%d\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static int determine_kprobe_retprobe_bit(void)
|
|
|
|
{
|
|
|
|
const char *file = "/sys/bus/event_source/devices/kprobe/format/retprobe";
|
|
|
|
|
|
|
|
return parse_uint_from_file(file, "config:%d\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static int determine_uprobe_retprobe_bit(void)
|
|
|
|
{
|
|
|
|
const char *file = "/sys/bus/event_source/devices/uprobe/format/retprobe";
|
|
|
|
|
|
|
|
return parse_uint_from_file(file, "config:%d\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
|
|
|
|
uint64_t offset, int pid)
|
|
|
|
{
|
|
|
|
struct perf_event_attr attr = {};
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
int type, pfd, err;
|
|
|
|
|
|
|
|
type = uprobe ? determine_uprobe_perf_type()
|
|
|
|
: determine_kprobe_perf_type();
|
|
|
|
if (type < 0) {
|
|
|
|
pr_warning("failed to determine %s perf type: %s\n",
|
|
|
|
uprobe ? "uprobe" : "kprobe",
|
|
|
|
libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
|
|
|
|
return type;
|
|
|
|
}
|
|
|
|
if (retprobe) {
|
|
|
|
int bit = uprobe ? determine_uprobe_retprobe_bit()
|
|
|
|
: determine_kprobe_retprobe_bit();
|
|
|
|
|
|
|
|
if (bit < 0) {
|
|
|
|
pr_warning("failed to determine %s retprobe bit: %s\n",
|
|
|
|
uprobe ? "uprobe" : "kprobe",
|
|
|
|
libbpf_strerror_r(bit, errmsg,
|
|
|
|
sizeof(errmsg)));
|
|
|
|
return bit;
|
|
|
|
}
|
|
|
|
attr.config |= 1 << bit;
|
|
|
|
}
|
|
|
|
attr.size = sizeof(attr);
|
|
|
|
attr.type = type;
|
|
|
|
attr.config1 = (uint64_t)(void *)name; /* kprobe_func or uprobe_path */
|
|
|
|
attr.config2 = offset; /* kprobe_addr or probe_offset */
|
|
|
|
|
|
|
|
/* pid filter is meaningful only for uprobes */
|
|
|
|
pfd = syscall(__NR_perf_event_open, &attr,
|
|
|
|
pid < 0 ? -1 : pid /* pid */,
|
|
|
|
pid == -1 ? 0 : -1 /* cpu */,
|
|
|
|
-1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
|
|
|
|
if (pfd < 0) {
|
|
|
|
err = -errno;
|
|
|
|
pr_warning("%s perf_event_open() failed: %s\n",
|
|
|
|
uprobe ? "uprobe" : "kprobe",
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
return pfd;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_link *bpf_program__attach_kprobe(struct bpf_program *prog,
|
|
|
|
bool retprobe,
|
|
|
|
const char *func_name)
|
|
|
|
{
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
struct bpf_link *link;
|
|
|
|
int pfd, err;
|
|
|
|
|
|
|
|
pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name,
|
|
|
|
0 /* offset */, -1 /* pid */);
|
|
|
|
if (pfd < 0) {
|
|
|
|
pr_warning("program '%s': failed to create %s '%s' perf event: %s\n",
|
|
|
|
bpf_program__title(prog, false),
|
|
|
|
retprobe ? "kretprobe" : "kprobe", func_name,
|
|
|
|
libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
|
|
|
|
return ERR_PTR(pfd);
|
|
|
|
}
|
|
|
|
link = bpf_program__attach_perf_event(prog, pfd);
|
|
|
|
if (IS_ERR(link)) {
|
|
|
|
close(pfd);
|
|
|
|
err = PTR_ERR(link);
|
|
|
|
pr_warning("program '%s': failed to attach to %s '%s': %s\n",
|
|
|
|
bpf_program__title(prog, false),
|
|
|
|
retprobe ? "kretprobe" : "kprobe", func_name,
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return link;
|
|
|
|
}
|
|
|
|
return link;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog,
|
|
|
|
bool retprobe, pid_t pid,
|
|
|
|
const char *binary_path,
|
|
|
|
size_t func_offset)
|
|
|
|
{
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
struct bpf_link *link;
|
|
|
|
int pfd, err;
|
|
|
|
|
|
|
|
pfd = perf_event_open_probe(true /* uprobe */, retprobe,
|
|
|
|
binary_path, func_offset, pid);
|
|
|
|
if (pfd < 0) {
|
|
|
|
pr_warning("program '%s': failed to create %s '%s:0x%zx' perf event: %s\n",
|
|
|
|
bpf_program__title(prog, false),
|
|
|
|
retprobe ? "uretprobe" : "uprobe",
|
|
|
|
binary_path, func_offset,
|
|
|
|
libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
|
|
|
|
return ERR_PTR(pfd);
|
|
|
|
}
|
|
|
|
link = bpf_program__attach_perf_event(prog, pfd);
|
|
|
|
if (IS_ERR(link)) {
|
|
|
|
close(pfd);
|
|
|
|
err = PTR_ERR(link);
|
|
|
|
pr_warning("program '%s': failed to attach to %s '%s:0x%zx': %s\n",
|
|
|
|
bpf_program__title(prog, false),
|
|
|
|
retprobe ? "uretprobe" : "uprobe",
|
|
|
|
binary_path, func_offset,
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return link;
|
|
|
|
}
|
|
|
|
return link;
|
|
|
|
}
|
|
|
|
|
2019-07-02 07:58:59 +08:00
|
|
|
static int determine_tracepoint_id(const char *tp_category,
|
|
|
|
const char *tp_name)
|
|
|
|
{
|
|
|
|
char file[PATH_MAX];
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = snprintf(file, sizeof(file),
|
|
|
|
"/sys/kernel/debug/tracing/events/%s/%s/id",
|
|
|
|
tp_category, tp_name);
|
|
|
|
if (ret < 0)
|
|
|
|
return -errno;
|
|
|
|
if (ret >= sizeof(file)) {
|
|
|
|
pr_debug("tracepoint %s/%s path is too long\n",
|
|
|
|
tp_category, tp_name);
|
|
|
|
return -E2BIG;
|
|
|
|
}
|
|
|
|
return parse_uint_from_file(file, "%d\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static int perf_event_open_tracepoint(const char *tp_category,
|
|
|
|
const char *tp_name)
|
|
|
|
{
|
|
|
|
struct perf_event_attr attr = {};
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
int tp_id, pfd, err;
|
|
|
|
|
|
|
|
tp_id = determine_tracepoint_id(tp_category, tp_name);
|
|
|
|
if (tp_id < 0) {
|
|
|
|
pr_warning("failed to determine tracepoint '%s/%s' perf event ID: %s\n",
|
|
|
|
tp_category, tp_name,
|
|
|
|
libbpf_strerror_r(tp_id, errmsg, sizeof(errmsg)));
|
|
|
|
return tp_id;
|
|
|
|
}
|
|
|
|
|
|
|
|
attr.type = PERF_TYPE_TRACEPOINT;
|
|
|
|
attr.size = sizeof(attr);
|
|
|
|
attr.config = tp_id;
|
|
|
|
|
|
|
|
pfd = syscall(__NR_perf_event_open, &attr, -1 /* pid */, 0 /* cpu */,
|
|
|
|
-1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
|
|
|
|
if (pfd < 0) {
|
|
|
|
err = -errno;
|
|
|
|
pr_warning("tracepoint '%s/%s' perf_event_open() failed: %s\n",
|
|
|
|
tp_category, tp_name,
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
return pfd;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog,
|
|
|
|
const char *tp_category,
|
|
|
|
const char *tp_name)
|
|
|
|
{
|
|
|
|
char errmsg[STRERR_BUFSIZE];
|
|
|
|
struct bpf_link *link;
|
|
|
|
int pfd, err;
|
|
|
|
|
|
|
|
pfd = perf_event_open_tracepoint(tp_category, tp_name);
|
|
|
|
if (pfd < 0) {
|
|
|
|
pr_warning("program '%s': failed to create tracepoint '%s/%s' perf event: %s\n",
|
|
|
|
bpf_program__title(prog, false),
|
|
|
|
tp_category, tp_name,
|
|
|
|
libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
|
|
|
|
return ERR_PTR(pfd);
|
|
|
|
}
|
|
|
|
link = bpf_program__attach_perf_event(prog, pfd);
|
|
|
|
if (IS_ERR(link)) {
|
|
|
|
close(pfd);
|
|
|
|
err = PTR_ERR(link);
|
|
|
|
pr_warning("program '%s': failed to attach to tracepoint '%s/%s': %s\n",
|
|
|
|
bpf_program__title(prog, false),
|
|
|
|
tp_category, tp_name,
|
|
|
|
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
|
|
|
|
return link;
|
|
|
|
}
|
|
|
|
return link;
|
|
|
|
}
|
|
|
|
|
2018-05-11 01:24:40 +08:00
|
|
|
enum bpf_perf_event_ret
|
2018-10-21 08:09:28 +08:00
|
|
|
bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
|
|
|
|
void **copy_mem, size_t *copy_size,
|
|
|
|
bpf_perf_event_print_t fn, void *private_data)
|
2018-05-11 01:24:40 +08:00
|
|
|
{
|
2018-10-21 08:09:28 +08:00
|
|
|
struct perf_event_mmap_page *header = mmap_mem;
|
2018-10-19 21:51:03 +08:00
|
|
|
__u64 data_head = ring_buffer_read_head(header);
|
2018-05-11 01:24:40 +08:00
|
|
|
__u64 data_tail = header->data_tail;
|
2018-10-21 08:09:28 +08:00
|
|
|
void *base = ((__u8 *)header) + page_size;
|
|
|
|
int ret = LIBBPF_PERF_EVENT_CONT;
|
|
|
|
struct perf_event_header *ehdr;
|
|
|
|
size_t ehdr_size;
|
|
|
|
|
|
|
|
while (data_head != data_tail) {
|
|
|
|
ehdr = base + (data_tail & (mmap_size - 1));
|
|
|
|
ehdr_size = ehdr->size;
|
|
|
|
|
|
|
|
if (((void *)ehdr) + ehdr_size > base + mmap_size) {
|
|
|
|
void *copy_start = ehdr;
|
|
|
|
size_t len_first = base + mmap_size - copy_start;
|
|
|
|
size_t len_secnd = ehdr_size - len_first;
|
|
|
|
|
|
|
|
if (*copy_size < ehdr_size) {
|
|
|
|
free(*copy_mem);
|
|
|
|
*copy_mem = malloc(ehdr_size);
|
|
|
|
if (!*copy_mem) {
|
|
|
|
*copy_size = 0;
|
2018-05-11 01:24:40 +08:00
|
|
|
ret = LIBBPF_PERF_EVENT_ERROR;
|
|
|
|
break;
|
|
|
|
}
|
2018-10-21 08:09:28 +08:00
|
|
|
*copy_size = ehdr_size;
|
2018-05-11 01:24:40 +08:00
|
|
|
}
|
|
|
|
|
2018-10-21 08:09:28 +08:00
|
|
|
memcpy(*copy_mem, copy_start, len_first);
|
|
|
|
memcpy(*copy_mem + len_first, base, len_secnd);
|
|
|
|
ehdr = *copy_mem;
|
2018-05-11 01:24:40 +08:00
|
|
|
}
|
|
|
|
|
2018-10-21 08:09:28 +08:00
|
|
|
ret = fn(ehdr, private_data);
|
|
|
|
data_tail += ehdr_size;
|
2018-05-11 01:24:40 +08:00
|
|
|
if (ret != LIBBPF_PERF_EVENT_CONT)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2018-10-19 21:51:03 +08:00
|
|
|
ring_buffer_write_tail(header, data_tail);
|
2018-05-11 01:24:40 +08:00
|
|
|
return ret;
|
|
|
|
}
|
2019-03-12 13:30:38 +08:00
|
|
|
|
|
|
|
struct bpf_prog_info_array_desc {
|
|
|
|
int array_offset; /* e.g. offset of jited_prog_insns */
|
|
|
|
int count_offset; /* e.g. offset of jited_prog_len */
|
|
|
|
int size_offset; /* > 0: offset of rec size,
|
|
|
|
* < 0: fix size of -size_offset
|
|
|
|
*/
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct bpf_prog_info_array_desc bpf_prog_info_array_desc[] = {
|
|
|
|
[BPF_PROG_INFO_JITED_INSNS] = {
|
|
|
|
offsetof(struct bpf_prog_info, jited_prog_insns),
|
|
|
|
offsetof(struct bpf_prog_info, jited_prog_len),
|
|
|
|
-1,
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_XLATED_INSNS] = {
|
|
|
|
offsetof(struct bpf_prog_info, xlated_prog_insns),
|
|
|
|
offsetof(struct bpf_prog_info, xlated_prog_len),
|
|
|
|
-1,
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_MAP_IDS] = {
|
|
|
|
offsetof(struct bpf_prog_info, map_ids),
|
|
|
|
offsetof(struct bpf_prog_info, nr_map_ids),
|
|
|
|
-(int)sizeof(__u32),
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_JITED_KSYMS] = {
|
|
|
|
offsetof(struct bpf_prog_info, jited_ksyms),
|
|
|
|
offsetof(struct bpf_prog_info, nr_jited_ksyms),
|
|
|
|
-(int)sizeof(__u64),
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_JITED_FUNC_LENS] = {
|
|
|
|
offsetof(struct bpf_prog_info, jited_func_lens),
|
|
|
|
offsetof(struct bpf_prog_info, nr_jited_func_lens),
|
|
|
|
-(int)sizeof(__u32),
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_FUNC_INFO] = {
|
|
|
|
offsetof(struct bpf_prog_info, func_info),
|
|
|
|
offsetof(struct bpf_prog_info, nr_func_info),
|
|
|
|
offsetof(struct bpf_prog_info, func_info_rec_size),
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_LINE_INFO] = {
|
|
|
|
offsetof(struct bpf_prog_info, line_info),
|
|
|
|
offsetof(struct bpf_prog_info, nr_line_info),
|
|
|
|
offsetof(struct bpf_prog_info, line_info_rec_size),
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_JITED_LINE_INFO] = {
|
|
|
|
offsetof(struct bpf_prog_info, jited_line_info),
|
|
|
|
offsetof(struct bpf_prog_info, nr_jited_line_info),
|
|
|
|
offsetof(struct bpf_prog_info, jited_line_info_rec_size),
|
|
|
|
},
|
|
|
|
[BPF_PROG_INFO_PROG_TAGS] = {
|
|
|
|
offsetof(struct bpf_prog_info, prog_tags),
|
|
|
|
offsetof(struct bpf_prog_info, nr_prog_tags),
|
|
|
|
-(int)sizeof(__u8) * BPF_TAG_SIZE,
|
|
|
|
},
|
|
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
static __u32 bpf_prog_info_read_offset_u32(struct bpf_prog_info *info, int offset)
|
|
|
|
{
|
|
|
|
__u32 *array = (__u32 *)info;
|
|
|
|
|
|
|
|
if (offset >= 0)
|
|
|
|
return array[offset / sizeof(__u32)];
|
|
|
|
return -(int)offset;
|
|
|
|
}
|
|
|
|
|
|
|
|
static __u64 bpf_prog_info_read_offset_u64(struct bpf_prog_info *info, int offset)
|
|
|
|
{
|
|
|
|
__u64 *array = (__u64 *)info;
|
|
|
|
|
|
|
|
if (offset >= 0)
|
|
|
|
return array[offset / sizeof(__u64)];
|
|
|
|
return -(int)offset;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bpf_prog_info_set_offset_u32(struct bpf_prog_info *info, int offset,
|
|
|
|
__u32 val)
|
|
|
|
{
|
|
|
|
__u32 *array = (__u32 *)info;
|
|
|
|
|
|
|
|
if (offset >= 0)
|
|
|
|
array[offset / sizeof(__u32)] = val;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bpf_prog_info_set_offset_u64(struct bpf_prog_info *info, int offset,
|
|
|
|
__u64 val)
|
|
|
|
{
|
|
|
|
__u64 *array = (__u64 *)info;
|
|
|
|
|
|
|
|
if (offset >= 0)
|
|
|
|
array[offset / sizeof(__u64)] = val;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_prog_info_linear *
|
|
|
|
bpf_program__get_prog_info_linear(int fd, __u64 arrays)
|
|
|
|
{
|
|
|
|
struct bpf_prog_info_linear *info_linear;
|
|
|
|
struct bpf_prog_info info = {};
|
|
|
|
__u32 info_len = sizeof(info);
|
|
|
|
__u32 data_len = 0;
|
|
|
|
int i, err;
|
|
|
|
void *ptr;
|
|
|
|
|
|
|
|
if (arrays >> BPF_PROG_INFO_LAST_ARRAY)
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
|
|
|
|
/* step 1: get array dimensions */
|
|
|
|
err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
|
|
|
|
if (err) {
|
|
|
|
pr_debug("can't get prog info: %s", strerror(errno));
|
|
|
|
return ERR_PTR(-EFAULT);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 2: calculate total size of all arrays */
|
|
|
|
for (i = BPF_PROG_INFO_FIRST_ARRAY; i < BPF_PROG_INFO_LAST_ARRAY; ++i) {
|
|
|
|
bool include_array = (arrays & (1UL << i)) > 0;
|
|
|
|
struct bpf_prog_info_array_desc *desc;
|
|
|
|
__u32 count, size;
|
|
|
|
|
|
|
|
desc = bpf_prog_info_array_desc + i;
|
|
|
|
|
|
|
|
/* kernel is too old to support this field */
|
|
|
|
if (info_len < desc->array_offset + sizeof(__u32) ||
|
|
|
|
info_len < desc->count_offset + sizeof(__u32) ||
|
|
|
|
(desc->size_offset > 0 && info_len < desc->size_offset))
|
|
|
|
include_array = false;
|
|
|
|
|
|
|
|
if (!include_array) {
|
|
|
|
arrays &= ~(1UL << i); /* clear the bit */
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
count = bpf_prog_info_read_offset_u32(&info, desc->count_offset);
|
|
|
|
size = bpf_prog_info_read_offset_u32(&info, desc->size_offset);
|
|
|
|
|
|
|
|
data_len += count * size;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 3: allocate continuous memory */
|
|
|
|
data_len = roundup(data_len, sizeof(__u64));
|
|
|
|
info_linear = malloc(sizeof(struct bpf_prog_info_linear) + data_len);
|
|
|
|
if (!info_linear)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
|
|
|
/* step 4: fill data to info_linear->info */
|
|
|
|
info_linear->arrays = arrays;
|
|
|
|
memset(&info_linear->info, 0, sizeof(info));
|
|
|
|
ptr = info_linear->data;
|
|
|
|
|
|
|
|
for (i = BPF_PROG_INFO_FIRST_ARRAY; i < BPF_PROG_INFO_LAST_ARRAY; ++i) {
|
|
|
|
struct bpf_prog_info_array_desc *desc;
|
|
|
|
__u32 count, size;
|
|
|
|
|
|
|
|
if ((arrays & (1UL << i)) == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
desc = bpf_prog_info_array_desc + i;
|
|
|
|
count = bpf_prog_info_read_offset_u32(&info, desc->count_offset);
|
|
|
|
size = bpf_prog_info_read_offset_u32(&info, desc->size_offset);
|
|
|
|
bpf_prog_info_set_offset_u32(&info_linear->info,
|
|
|
|
desc->count_offset, count);
|
|
|
|
bpf_prog_info_set_offset_u32(&info_linear->info,
|
|
|
|
desc->size_offset, size);
|
|
|
|
bpf_prog_info_set_offset_u64(&info_linear->info,
|
|
|
|
desc->array_offset,
|
|
|
|
ptr_to_u64(ptr));
|
|
|
|
ptr += count * size;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 5: call syscall again to get required arrays */
|
|
|
|
err = bpf_obj_get_info_by_fd(fd, &info_linear->info, &info_len);
|
|
|
|
if (err) {
|
|
|
|
pr_debug("can't get prog info: %s", strerror(errno));
|
|
|
|
free(info_linear);
|
|
|
|
return ERR_PTR(-EFAULT);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 6: verify the data */
|
|
|
|
for (i = BPF_PROG_INFO_FIRST_ARRAY; i < BPF_PROG_INFO_LAST_ARRAY; ++i) {
|
|
|
|
struct bpf_prog_info_array_desc *desc;
|
|
|
|
__u32 v1, v2;
|
|
|
|
|
|
|
|
if ((arrays & (1UL << i)) == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
desc = bpf_prog_info_array_desc + i;
|
|
|
|
v1 = bpf_prog_info_read_offset_u32(&info, desc->count_offset);
|
|
|
|
v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
|
|
|
|
desc->count_offset);
|
|
|
|
if (v1 != v2)
|
|
|
|
pr_warning("%s: mismatch in element count\n", __func__);
|
|
|
|
|
|
|
|
v1 = bpf_prog_info_read_offset_u32(&info, desc->size_offset);
|
|
|
|
v2 = bpf_prog_info_read_offset_u32(&info_linear->info,
|
|
|
|
desc->size_offset);
|
|
|
|
if (v1 != v2)
|
|
|
|
pr_warning("%s: mismatch in rec size\n", __func__);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* step 7: update info_len and data_len */
|
|
|
|
info_linear->info_len = sizeof(struct bpf_prog_info);
|
|
|
|
info_linear->data_len = data_len;
|
|
|
|
|
|
|
|
return info_linear;
|
|
|
|
}
|
|
|
|
|
|
|
|
void bpf_program__bpil_addr_to_offs(struct bpf_prog_info_linear *info_linear)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = BPF_PROG_INFO_FIRST_ARRAY; i < BPF_PROG_INFO_LAST_ARRAY; ++i) {
|
|
|
|
struct bpf_prog_info_array_desc *desc;
|
|
|
|
__u64 addr, offs;
|
|
|
|
|
|
|
|
if ((info_linear->arrays & (1UL << i)) == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
desc = bpf_prog_info_array_desc + i;
|
|
|
|
addr = bpf_prog_info_read_offset_u64(&info_linear->info,
|
|
|
|
desc->array_offset);
|
|
|
|
offs = addr - ptr_to_u64(info_linear->data);
|
|
|
|
bpf_prog_info_set_offset_u64(&info_linear->info,
|
|
|
|
desc->array_offset, offs);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void bpf_program__bpil_offs_to_addr(struct bpf_prog_info_linear *info_linear)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = BPF_PROG_INFO_FIRST_ARRAY; i < BPF_PROG_INFO_LAST_ARRAY; ++i) {
|
|
|
|
struct bpf_prog_info_array_desc *desc;
|
|
|
|
__u64 addr, offs;
|
|
|
|
|
|
|
|
if ((info_linear->arrays & (1UL << i)) == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
desc = bpf_prog_info_array_desc + i;
|
|
|
|
offs = bpf_prog_info_read_offset_u64(&info_linear->info,
|
|
|
|
desc->array_offset);
|
|
|
|
addr = offs + ptr_to_u64(info_linear->data);
|
|
|
|
bpf_prog_info_set_offset_u64(&info_linear->info,
|
|
|
|
desc->array_offset, addr);
|
|
|
|
}
|
|
|
|
}
|
2019-06-11 08:56:50 +08:00
|
|
|
|
|
|
|
int libbpf_num_possible_cpus(void)
|
|
|
|
{
|
|
|
|
static const char *fcpu = "/sys/devices/system/cpu/possible";
|
|
|
|
int len = 0, n = 0, il = 0, ir = 0;
|
|
|
|
unsigned int start = 0, end = 0;
|
|
|
|
static int cpus;
|
|
|
|
char buf[128];
|
|
|
|
int error = 0;
|
|
|
|
int fd = -1;
|
|
|
|
|
|
|
|
if (cpus > 0)
|
|
|
|
return cpus;
|
|
|
|
|
|
|
|
fd = open(fcpu, O_RDONLY);
|
|
|
|
if (fd < 0) {
|
|
|
|
error = errno;
|
|
|
|
pr_warning("Failed to open file %s: %s\n",
|
|
|
|
fcpu, strerror(error));
|
|
|
|
return -error;
|
|
|
|
}
|
|
|
|
len = read(fd, buf, sizeof(buf));
|
|
|
|
close(fd);
|
|
|
|
if (len <= 0) {
|
|
|
|
error = len ? errno : EINVAL;
|
|
|
|
pr_warning("Failed to read # of possible cpus from %s: %s\n",
|
|
|
|
fcpu, strerror(error));
|
|
|
|
return -error;
|
|
|
|
}
|
|
|
|
if (len == sizeof(buf)) {
|
|
|
|
pr_warning("File %s size overflow\n", fcpu);
|
|
|
|
return -EOVERFLOW;
|
|
|
|
}
|
|
|
|
buf[len] = '\0';
|
|
|
|
|
|
|
|
for (ir = 0, cpus = 0; ir <= len; ir++) {
|
|
|
|
/* Each sub string separated by ',' has format \d+-\d+ or \d+ */
|
|
|
|
if (buf[ir] == ',' || buf[ir] == '\0') {
|
|
|
|
buf[ir] = '\0';
|
|
|
|
n = sscanf(&buf[il], "%u-%u", &start, &end);
|
|
|
|
if (n <= 0) {
|
|
|
|
pr_warning("Failed to get # CPUs from %s\n",
|
|
|
|
&buf[il]);
|
|
|
|
return -EINVAL;
|
|
|
|
} else if (n == 1) {
|
|
|
|
end = start;
|
|
|
|
}
|
|
|
|
cpus += end - start + 1;
|
|
|
|
il = ir + 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (cpus <= 0) {
|
|
|
|
pr_warning("Invalid #CPUs %d from %s\n", cpus, fcpu);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
return cpus;
|
|
|
|
}
|