For background of BPF CO-RE project, please refer to
http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.
In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.
Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
addr = preserve_array_access_index(base, index, dimension)
addr = preserve_union_access_index(base, di_index)
addr = preserve_struct_access_index(base, gep_index, di_index)
here,
base: the base pointer for the array/union/struct access.
index: the last access index for array, the same for IR/DebugInfo layout.
dimension: the array dimension.
gep_index: the access index based on IR layout.
di_index: the access index based on user/debuginfo types.
For example, for the following example,
$ cat test.c
struct sk_buff {
int i;
int b1:1;
int b2:2;
union {
struct {
int o1;
int o2;
} o;
struct {
char flags;
char dev_id;
} dev;
int netid;
} u[10];
};
static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
= (void *) 4;
#define _(x) (__builtin_preserve_access_index(x))
int bpf_prog(struct sk_buff *ctx) {
char dev_id;
bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
return dev_id;
}
$ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
test.c >& log
The generated IR looks like below:
...
define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
%2 = alloca %struct.sk_buff*, align 8
%3 = alloca i8, align 1
store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
%4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
%5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
%6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
%struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
%7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
[10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
%8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
%union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
%9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
%10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
%struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
%11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
%12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
%13 = sext i8 %12 to i32, !dbg !54
call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
ret i32 %13, !dbg !57
}
!19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
!26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
!34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.
For &ctx->u[5].dev.dev_id,
. The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
. The "%7 = ..." represents array subscript "5".
. The "%8 = ..." represents union member "dev" with index 1 for DI layout.
. The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.
The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.
The test case ThinLTO/X86/lazyload_metadata.ll is adjusted to reflect the
new addition of the metadata.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D61810
llvm-svn: 365423
An abstract call site is a wrapper that allows to treat direct,
indirect, and callback calls the same. If an abstract call site
represents a direct or indirect call site it behaves like a stripped
down version of a normal call site object. The abstract call site can
also represent a callback call, thus the fact that the initially
called function (=broker) may invoke a third one (=callback callee).
In this case, the abstract call side hides the middle man, hence the
broker function. The result is a representation of the callback call,
inside the broker, but in the context of the original instruction that
invoked the broker.
Again, there are up to three functions involved when we talk about
callback call sites. The caller (1), which invokes the broker
function. The broker function (2), that may or may not invoke the
callback callee. And finally the callback callee (3), which is the
target of the callback call.
The abstract call site will handle the mapping from parameters to
arguments depending on the semantic of the broker function. However,
it is important to note that the mapping is often partial. Thus, some
arguments of the call/invoke instruction are mapped to parameters of
the callee while others are not. At the same time, arguments of the
callback callee might be unknown, thus "null" if queried.
This patch introduces also !callback metadata which describe how a
callback broker maps from parameters to arguments. This metadata is
directly created by clang for known broker functions, provided through
source code attributes by the user, or later deduced by analyses.
For motivation and additional information please see the corresponding
talk (slides/video)
https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk20
as well as the LCPC paper
http://compilers.cs.uni-saarland.de/people/doerfert/par_opt_lcpc18.pdf
Differential Revision: https://reviews.llvm.org/D54498
llvm-svn: 351627
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.
The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.
This patch is a basic support for this.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39028
llvm-svn: 317278
This patch adds a new kind of metadata that indicates the possible callees of
indirect calls.
Differential Revision: https://reviews.llvm.org/D37354
llvm-svn: 315944
This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section
pointing to the metadata argument's section. The effect of that is a reverse dependency
between sections for the linker GC.
!associated does not change the behavior of global-dce. The global
may also need to be added to llvm.compiler.used.
Since SHF_LINK_ORDER is per-section, !associated effectively enables
fdata-sections for the affected globals, the same as comdats do.
Differential Revision: https://reviews.llvm.org/D29104
llvm-svn: 298157
CFI is using intrinsics that takes MDString as arguments, and this
was broken during lazy-loading of metadata.
Differential Revision: https://reviews.llvm.org/D28916
llvm-svn: 292641
Summary:
Without this, we're stressing the RAUW of unique nodes,
which is a costly operation. This is intended to limit
the number of RAUW, and is very effective on the total
link-time of opt with ThinLTO, before:
real 4m4.587s user 15m3.401s sys 0m23.616s
after:
real 3m25.261s user 12m22.132s sys 0m24.152s
Reviewers: tejohnson, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28751
llvm-svn: 292420
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.
Two main current limitations are:
1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.
These two limitations can be lifted in a subsequent improvement if
needed.
With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).
Reviewers: pcc, tejohnson, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28113
llvm-svn: 291027