This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.
Replaces https://reviews.llvm.org/D74158
Differential Revision: https://reviews.llvm.org/D89613
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.
Differential Revision: https://reviews.llvm.org/D74158
Or else on optnone functions we get the following during instruction selection:
fatal error: error in backend: Cannot select: intrinsic %llvm.preserve.struct.access.index
Currently the -O0 pipeline doesn't properly run passes registered via
TargetMachine::registerPassBuilderCallbacks(), so don't add that RUN
line yet. That will be fixed after this.
Reviewed By: yonghong-song
Differential Revision: https://reviews.llvm.org/D89083
Currently, bpf backend Instruction section DAG2DAG phase has
an optimization to replace loading constant struct memeber
or array element with direct values. The reason is that these
locally defined struct or array variables may have their
initial values stored in a readonly section and early bpf
ecosystem is not able to handle such cases.
Bpf ecosystem now can not only handle readonly sections,
but also global variables. global variable can also have
initialized data and global variable may or may not be constant,
i.e., global variable data can be put in .data section or .rodata
section. This exposed a bug in DAG2DAG Load optimization
as it did not check whether the global variable is constant
or not.
This patch fixed the bug by checking whether global variable,
representing the initial data, is constant or not and will not
do optimization if it is not a constant.
Another bug is also fixed in this patch to check whether
the load is simple (not volatile/atomic) or not. If it is
not simple, we will not do optimization. To summary for
globals:
- struct t var = { ... } ; // no load optimization
- const struct t var = { ... }; // load optimization is possible
- volatile const struct t var = { ... }; // no load optimization
Differential Revision: https://reviews.llvm.org/D89021
Add an IR phase right before main module optimization.
This is to modify IR to restrict certain downward optimizations
in order to generate verifier friendly code.
> prevent certain instcombine optimizations, handling both
in-block/cross-block instcombines.
> avoid speculative code motion if the variable used in
condition is also used in the later blocks.
Internally, a bpf IR builtin
result = __builtin_bpf_passthrough(seq_num, result)
is used to enforce ordering. This builtin is only used
during target independent IR optimizations and it will
be removed at the beginning of target dependent IR
optimizations.
For example, removing the following workaround,
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
@@ -47,7 +47,7 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
/* a workaround to prevent compiler from generating
* codes verifier cannot handle yet.
*/
- volatile int ret;
+ int ret;
this patch is able to generate code which passed the verifier.
To disable optimization, users need to use "opt" command like below:
clang -target bpf -O2 -S -emit-llvm -Xclang -disable-llvm-passes test.c
// disable icmp serialization
opt -O2 -bpf-disable-serialize-icmp test.ll | llvm-dis > t.ll
// disable avoid-speculation
opt -O2 -bpf-disable-avoid-speculation test.ll | llvm-dis > t.ll
llc t.ll
Differential Revision: https://reviews.llvm.org/D85570
This patch fixed two issues related with relocation globals.
In LLVM, if a global, e.g. with name "g", is created and
conflict with another global with the same name, LLVM will
rename the global, e.g., with a new name "g.2". Since
relocation global name has special meaning, we do not want
llvm to change it, so internally we have logic to check
whether duplication happens or not. If happens, just reuse
the previous global.
The first bug is related to non-btf-id relocation
(BPFAbstractMemberAccess.cpp). Commit 54d9f743c8
("BPF: move AbstractMemberAccess and PreserveDIType passes
to EP_EarlyAsPossible") changed ModulePass to FunctionPass,
i.e., handling each function at a time. But still just
one BPFAbstractMemberAccess object is created so module
level de-duplication still possible. Commit 40251fee00
("[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer
pipeline") made a change to create a BPFAbstractMemberAccess
object per function so module level de-duplication is not
possible any more without going through all module globals.
This patch simply changed the map which holds reloc globals
as class static, so it will be available to all
BPFAbstractMemberAccess objects for different functions.
The second bug is related to btf-id relocation
(BPFPreserveDIType.cpp). Before Commit 54d9f743c8, the pass
is a ModulePass, so we have a local variable, incremented for
each instance, and works fine. But after Commit 54d9f743c8,
the pass becomes a FunctionPass. Local variable won't work
properly since different functions will start with the same
initial value. Fix the issue by change the local count variable
as static, so it will be truely unique across the whole module
compilation.
Differential Revision: https://reviews.llvm.org/D88942
This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to
NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks
(the NPM equivalent of adjustPassManager()).
Reviewed By: yonghong-song, asbirlea
Differential Revision: https://reviews.llvm.org/D88855
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.
Currently, compiler may transform the above code
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
bpf_probe_read(buf, buf_size, p2);
}
to
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
bpf_probe_read(buf, buf_size, p2);
}
and eventually assembly code looks like
reloc_exist = 1;
reloc_member_offset = 10; //calculate member offset from base
p2 = base + reloc_member_offset;
if (reloc_exist) {
bpf_probe_read(bpf, buf_size, p2);
}
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.
This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
%6 = load @llvm.sk_buff:0:50$0:0:0:2:0
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.
But this transformation has another consequence, code sinking
may happen like below:
PHI = <possibly different @preserve_*_access_globals>
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6
For such cases, we will not able to generate relocations since
multiple relocations are merged into one.
This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.
A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
- report fatal error if any reloc global in PHI nodes
- remove all bpf passthrough builtin functions
Changes for existing CORE tests:
- for clang tests, add "-Xclang -disable-llvm-passes" flags to
avoid builtin->reloc_global transformation so the test is still
able to check correctness for clang generated IR.
- for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
before "llc" command since "opt" is needed to call newly-placed
builtin->reloc_global transformation. Add target triple in the IR
file since "opt" requires it.
- Since target triple is added in IR file, if a test may produce
different results for different endianness, two tests will be
created, one for bpfeb and another for bpfel, e.g., some tests
for relocation of lshift/rshift of bitfields.
- field-reloc-bitfield-1.ll has different relocations compared to
old codes. This is because for the structure in the test,
new code returns struct layout alignment 4 while old code
is 8. Align 8 is more precise and permits double load. With align 4,
the new mechanism uses 4-byte load, so generating different
relocations.
- test intrinsic-transforms.ll is removed. This is used to test
cse on intrinsics so we do not lose metadata. Now metadata is attached
to global and not instruction, it won't get lost with cse.
Differential Revision: https://reviews.llvm.org/D87153
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
Buildbot reported a build failure when building shared
library libLLVMBPFCodeGen.so with unknown reference to
"createCFGSimplificationPass".
Commit 87cba43402 ("BPF: add a SimplifyCFG IR pass during
generic Scalar/IPO optimization") added an IR pass SimplifyCFG
by BPF target. The commit called function
createCFGSimplificationPass() defined in "Scalar" library.
Add this library in Target/BPF/LLVMBuild.txt so
shared library build can succeed.
The following bpf linux kernel selftest failed with latest
llvm:
$ ./test_progs -n 7/10
...
The sequence of 8193 jumps is too complex.
verification time 126272 usec
stack depth 320
processed 114799 insns (limit 1000000)
...
libbpf: failed to load object 'pyperf600_nounroll.o'
test_bpf_verif_scale:FAIL:110
#7/10 pyperf600_nounroll.o:FAIL
#7 bpf_verif_scale:FAIL
After some investigation, I found the following llvm patch
https://reviews.llvm.org/D84108
is responsible. The patch disabled hoisting common instructions
in SimplifyCFG by default. Later on, the code changes and a
SimplifyCFG phase with hoisting on cannot do the work any more.
A test is provided to demonstrate the problem.
The IR before simplifyCFG looks like:
for.cond:
%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
%cmp = icmp ult i32 %i.0, 6
br i1 %cmp, label %for.body, label %for.cond.cleanup
for.cond.cleanup:
%2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
%cmp2 = icmp eq i8* %2, null
%conv = zext i1 %cmp2 to i32
call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3
call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3
ret i32 %conv
for.body:
%3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
%tobool.not = icmp eq i8* %3, null
br i1 %tobool.not, label %for.inc, label %land.lhs.true
The first two insns of `for.cond.cleanup` and `for.body`, load and
icmp, can be hoisted to `for.cond` block. With Patch D84108, the
optimization is delayed. But unfortunately, later on loop rotation
added addition phi nodes to `for.body` and hoisting cannot
be done any more.
Note such a hoisting is beneficial to bpf programs as
bpf verifier does path sensitive analysis and verification.
The hoisting preverts reloading from stack which will assume
conservative value and increase exploited insns. In this case,
it caused verifier failure.
To fix this problem, I added an IR pass from bpf target
to performance additional simplifycfg with hoisting common inst
enabled.
Differential Revision: https://reviews.llvm.org/D85434
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
if (obj is a lvalue)
llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type
else
llvm.bpf.btf.type.id(obj, 0, flag) !type
The purpose of the 2nd argument is to differentiate
__builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
__builtin_btf_type_id(obj.ptr, flag)
Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
- obj is the same .e.g., nullptr
- flag is the same
- metadata type is different, e.g., typedef of struct "s"
and strust "s".
In the above, we don't want CSE since their metadata is different.
This patch change IR builtin to
llvm.bpf.btf.type.id(seq_num, flag) !type
and seq_num is always increasing. This will prevent potential
llvm CSE.
Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.
Differential Revision: https://reviews.llvm.org/D85174
Four new CO-RE relocations are introduced:
- TYPE_EXISTENCE: whether a typedef/record/enum type exists
- TYPE_SIZE: the size of a typedef/record/enum type
- ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists
- ENUM_VALUE: the enum value of an enum type
These additional relocations will make CO-RE bpf programs
more adaptive for potential kernel internal data structure
changes.
Differential Revision: https://reviews.llvm.org/D83878
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
Currently, BTF datasec type for .rodata is generated only if there are
user-defined readonly global variables which have debuginfo generated.
Certain readonly global variables may be generated from initialized
local variables. For example,
void foo(const void *);
int test() {
const struct {
unsigned a[4];
char b;
} val = { .a = {2, 3, 4, 5}, .b = 6 };
foo(&val);
return 0;
}
The clang will create a private linkage const global to store
the initialized value:
@__const.test.val = private unnamed_addr constant %struct.anon
{ [4 x i32] [i32 2, i32 3, i32 4, i32 5], i8 6 }, align 4
This global variable eventually is put in .rodata ELF section.
If there is .rodata ELF section, libbpf expects a BTF .rodata
datasec as well even though it may be empty meaning there are no
global readonly variables with proper debuginfo. Martin reported
a bug where without this empty BTF .rodata datasec, the bpftool
gen will exit with an error.
This patch fixed the issue by generating .rodata BTF datasec
if there exists local var intial data which will result in
.rodata ELF section.
Differential Revision: https://reviews.llvm.org/D84002
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.
Differential Revision: https://reviews.llvm.org/D83709
Currently, llvm when see a global variable in .maps section,
it ensures its type must be a struct type. Then pointee
will be further evaluated for the structure members.
In normal cases, the pointee type will be skipped.
Although this is what current all bpf programs are doing,
but it is a little bit restrictive. For example, it is legitimate
for users to have:
typedef struct { int key_size; int value_size; } __map_t;
__map_t map __attribute__((section(".maps")));
This patch lifts this restriction and typedef of
a struct type is also allowed for .maps section variables.
To avoid create unnecessary fixup entries when traversal
started with typedef/struct type, the new implementation
first traverse all map struct members and then traverse
the typedef/struct type. This way, in internal BTFDebug
implementation, no fixup entries are generated.
Two new unit tests are added for typedef and const
struct in .maps section. Also tested with kernel bpf selftests.
Differential Revision: https://reviews.llvm.org/D83638
Currently, BTF generation stops at pointer struct members
if the pointee type is a struct. This is to avoid bloating
generated BTF size. The following is the process to
correctly record types for these pointee struct types.
- During type traversal stage, when a struct member, which
is a pointer to another struct, is encountered,
the pointee struct type, keyed with its name, is
remembered in a Fixup map.
- Later, when all type traversal is done, the Fixup map
is scanned, based on struct name matching, to either
resolve as pointing to a real already generated type
or as a forward declaration.
Andrii discovered a bug if the struct member pointee struct
is anonymous. In this case, a struct with empty name is
recorded in Fixup map, and later it happens another anonymous
struct with empty name is defined in BTF. So wrong type
resolution happens.
To fix the problem, if the struct member pointee struct
is anonymous, pointee struct type will be generated in
stead of being put in Fixup map.
Differential Revision: https://reviews.llvm.org/D82976
Andrii discovered a problem where a simple case similar to below
will generate wrong relocation kind:
enum { FIELD_EXISTENCE = 2, };
struct s1 { int a1; };
int test() {
struct s1 *v = 0;
return __builtin_preserve_field_info(v[0], FIELD_EXISTENCE);
}
The expected relocation kind should be FIELD_EXISTENCE, but
recorded reloc kind in the final object file is FIELD_BYTE_OFFSET,
which is incorrect.
This exposed a bug in generating access strings from intrinsics.
The current access string generation has two steps:
step 1: find the base struct/union type,
step 2: traverse members in the base type.
The current implementation relies on at lease one member access
in step 2 to get the correct relocation kind, which is true
in typical cases. But if there is no member accesses, the current
implementation falls to the default info kind FIELD_BYTE_OFFSET.
This is incorrect, we should still record the reloc kind
based on the user input. This patch fixed this issue by properly
recording the reloc kind in such cases.
Differential Revision: https://reviews.llvm.org/D82932
In BTF, pointee type pruning is used to reduce cluttering
too many unused types into prog BTF. For example,
struct task_struct {
...
struct mm_struct *mm;
...
}
If bpf program does not access members of "struct mm_struct",
there is no need to bring types for "struct mm_struct" to BTF.
This patch fixed a bug where an incorrect pruning happened.
The test case like below:
struct t;
typedef struct t _t;
struct s1 { _t *c; };
int test1(struct s1 *arg) { ... }
struct t { int a; int b; };
struct s2 { _t c; }
int test2(struct s2 *arg) { ... }
After processing test1(), among others, BPF backend generates BTF types for
"struct s1", "_t" and a placeholder for "struct t".
Note that "struct t" is not really generated. If later a direct access
to "struct t" member happened, "struct t" BTF type will be generated
properly.
During processing test2(), when processing member type "_t c",
BPF backend sees type "_t" already generated, so returned.
This caused the problem that "struct t" BTF type is never generated and
eventually causing incorrect type definition for "struct s2".
To fix the issue, during DebugInfo type traversal, even if a
typedef/const/volatile/restrict derived type has been recorded in BTF,
if it is not a type pruning candidate, type traversal of its base type continues.
Differential Revision: https://reviews.llvm.org/D82041
In BPF Instruction Selection DAGToDAG transformation phase,
BPF backend had an optimization to turn load from readonly data
section to direct load of the values. This phase is implemented
before libbpf has readonly section support and before alu32
is supported.
This phase however may generate incorrect type when alu32 is
enabled. The following is an example,
-bash-4.4$ cat ~/tmp2/t.c
struct t {
unsigned char a;
unsigned char b;
unsigned char c;
};
extern void foo(void *);
int test() {
struct t v = {
.b = 2,
};
foo(&v);
return 0;
}
The compiler will turn local variable "v" into a readonly section.
During instruction selection phase, the compiler generates two
loads from readonly section, one 2 byte load or 1 byte load, e.g., for 2 loads,
t8: i32,ch = load<(dereferenceable load 2 from `i8* getelementptr inbounds
(%struct.t, %struct.t* @__const.test.v, i64 0, i32 0)`, align 1),
anyext from i16> t3, GlobalAddress:i64<%struct.t* @__const.test.v> 0, undef:i64
t9: ch = store<(store 2 into %ir.v1.sub1), trunc to i16> t3, t8,
FrameIndex:i64<0>, undef:i64
BPF backend changed t8 to i64 = Constant<2> and eventually the generated machine IR:
t10: i64 = MOV_ri TargetConstant:i64<2>
t40: i32 = SLL_ri_32 t10, TargetConstant:i32<8>
t41: i32 = OR_ri_32 t40, TargetConstant:i64<0>
t9: ch = STH32<Mem:(store 2 into %ir.v1.sub1)> t41, TargetFrameIndex:i64<0>,
TargetConstant:i64<0>, t3
Note that t10 in the above is not correct. The type should be i32 and instruction
should be MOV_ri_32. The reason for incorrect insn selection is BPF insn selection
generated an i64 constant instead of an i32 constant as specified in the original
load instruction. Such incorrect insn sequence eventually caused the following
fatal error when a COPY insn tries to copy a 64bit register to a 32bit subregister.
Impossible reg-to-reg copy
UNREACHABLE executed at ../lib/Target/BPF/BPFInstrInfo.cpp:42!
This patch fixed the issue by using the load result type instead of always i64
when doing readonly load optimization.
Differential Revision: https://reviews.llvm.org/D81630
Commit 13f6c81c5d ("[BPF] simplify zero extension
with MOV_32_64") tried to use MOV_32_64 instructions
instead of lshift/rshift instructions for zero extension.
This has the benefit to remove the number of instructions
and may help verifier too.
But the same commit also removed the old MOV_32_64
pruning as it deems unsafe as MOV_32_64 does have the
side effect, zeroing out the top 32bit in the register.
This caused the following failure in kernel selftest
test_cls_redirect.o. In linux kernel, we have
struct __sk_buff {
__u32 data;
__u32 data_end;
};
The compiler will generate 32bit load for __sk_buff->data
and __sk_buff->data_end. But kernel verifier will actually
loads an address (64bit address on 64bit kernel) to the
result register. In this particular example, the explicit zext
was not optimized away and destroyed top 32bit
address and the verifier rejected the program :
w2 = *(u32 *)(r1 + 76)
...
r2 = w2 /* MOV_32_64: this will clear top 32bit */
Currently, if the load and the zext are next to each other, the
instruction pattern match can actually capture this to
avoid MOV_32_64, e.g., in BPFInstrInfo.td, we have
def : Pat<(i64 (zextloadi32 ADDRri:$src)),
(SUBREG_TO_REG (i64 0), (LDW32 ADDRri:$src), sub_32)>;
However, if they are not next to each other, LDW32 and
MOV_32_64 are generated, which may cause the above mentioned
problem.
BPF Backend already tried to optimize away pattern
mov_32_64 + lshift + rshift
Commit 13f6c81c5d may generate mov_32_64 not followed by shifts.
This patch added optimization for only mov_32_64 too.
Differential Revision: https://reviews.llvm.org/D81048
The current pattern matching for zext results in the following code snippet
being produced,
w1 = w0
r1 <<= 32
r1 >>= 32
Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.
R0=inv(id=0,smax_value=800)
R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9=inv800 R10=fp0 fp-8=mmmm????
58: (1c) w9 -= w8
59: (bc) w1 = w8
60: (67) r1 <<= 32
61: (77) r1 >>= 32
62: (bf) r2 = r7
63: (0f) r2 += r1
64: (bf) r1 = r6
65: (bc) w3 = w9
66: (b7) r4 = 0
67: (85) call bpf_get_stack#67
R0=inv(id=0,smax_value=800)
R1_w=ctx(id=0,off=0,imm=0)
R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R10=fp0 fp-8=mmmm????
After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.
Further it reduces 3 insns to 1.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Differential Revision: https://reviews.llvm.org/D73985
Commit 8e8f1bd75a ("[BPF] Return fail if disassembled insn registers
out of range") tried to fix a segfault when an illegal instruction
is decoded. A test case is added to emulate such an illegal instruction.
The llvm buildbot reported an asan issue with this test case.
ERROR: AddressSanitizer: global-buffer-overflow on address ...
decodeMemoryOpValue(llvm::MCInst&, unsigned int, ...)
llvm::MCDisassembler::DecodeStatus llvm::decodeToMCInst<unsigned long>(...)
llvm::MCDisassembler::DecodeStatus llvm::decodeInstruction<unsigned long>(...)
in (anonymous namespace)::BPFDisassembler::getInstruction(...)
...
Basically, the fix in Commit 8e8f1bd75a is too later to prevent
the asan. The fix in this patch moved the register number check earlier
during decodeInstruction(). It will return fail for decodeInstruction()
if the register number is out of range.
Note that DecodeGPRRegisterClass() and DecodeGPR32RegisterClass()
already have register number checking, so here we only check
decodeMemoryOpValue().
Daniel reported a llvm-objdump segfault like below:
$ llvm-objdump -D bpf_xdp.o
...
0000000000000000 <.strtab>:
0: 00 63 69 6c 69 75 6d 5f <unknown>
1: 6c 62 36 5f 61 66 66 69 w2 <<= w6
...
(llvm-objdump: lib/Target/BPF/BPFGenAsmWriter.inc:1087: static const char*
llvm::BPFInstPrinter::getRegisterName(unsigned int): Assertion
`RegNo && RegNo < 25 && "Invalid register number!"' failed.
Stack dump:
0. Program arguments: llvm-objdump -D bpf_xdp.o
...
abort
...
llvm::BPFInstPrinter::getRegisterName(unsigned int)
llvm::BPFInstPrinter::printMemOperand(llvm::MCInst const*,
int, llvm::raw_ostream&, char const*)
llvm::BPFInstPrinter::printInstruction(llvm::MCInst const*,
unsigned long, llvm::raw_ostream&)
llvm::BPFInstPrinter::printInst(llvm::MCInst const*,
unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&,
llvm::raw_ostream&)
...
Basically, since -D enables disassembly for all sections, .strtab is also disassembled,
but some strings are decoded as legal instructions but with illegal register numbers.
When llvm-objdump tries to print register name for these illegal register numbers,
assertion and segfault happens.
The patch fixed the issue by returning fail for a disassembled insn if
that insn contains a reg operand with illegal reg number.
The insn will be printed as "<unknown>" instead of causing an assertion.
For a simple program like below:
-bash-4.4$ cat t.c
int test() {
asm volatile("r0 = r0" ::);
return 0;
}
compiled with
clang -target bpf -O2 -c t.c
the following llvm-objdump command will segfault.
llvm-objdump -d t.o
0: bf 00 00 00 00 00 00 00 nop
llvm-objdump: ../include/llvm/ADT/SmallVector.h:180
...
Assertion `idx < size()' failed
...
abort
...
llvm::BPFInstPrinter::printOperand
llvm::BPFInstPrinter::printInstruction
...
The reason is both NOP and MOV_rr (r0 = r0) having the same encoding.
The disassembly getInstruction() decodes to be a NOP instruciton but
during printInstruction() the same encoding is interpreted as
a MOV_rr instruction. Such a mismatcch caused the segfault.
The fix is to make NOP instruction as CodeGen only so disassembler
will skip NOP insn for disassembling.
Note that instruction "r0 = r0" should not appear in non inline
asm codes since BPF Machine Instruction Peephole optimization will
remove it.
Differential Revision: https://reviews.llvm.org/D80156
The builtin function
u32 btf_type_id = __builtin_btf_type_id(param, 0)
can help preserve type info for the following use case:
extern void foo(..., void *data, int size);
int test(...) {
struct t { int a; int b; int c; } d;
d.a = ...; d.b = ...; d.c = ...;
foo(..., &d, sizeof(d));
}
The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.
This patch handles the builtin in BPF backend. It includes
an IR pass to translate the IR intrinsic to a load of
a global variable which carries the metadata, and an MI
pass to remove the intermediate load of the global variable.
Finally, in AsmPrinter pass, proper instruction are generated.
In the above example, the second argument for __builtin_btf_type_id()
is 0, which means a relocation for local adjustment,
i.e., w.r.t. bpf program BTF change, will be generated.
The value 1 for the second argument means
a relocation for remote adjustment, e.g., against vmlinux.
Differential Revision: https://reviews.llvm.org/D74572
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
1) The first argument's type is `const MCInst &`, the third
argument's type is `MCInst &`, but they may be aliased to the same
variable
2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
loop.
In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.
Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain
Reviewed By: Razer6, MaskRay, bcain
Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78364
For the test case in this patch like below
struct t { int a; } __attribute__((preserve_access_index));
int foo(void *);
int test(struct t *arg) {
long param[1];
param[0] = (long)&arg->a;
return foo(param);
}
The IR right before BPF SimplifyPatchable phase:
%1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
%2:gpr = LDD killed %1:gpr, 0
%3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
STD killed %3:gpr, %stack.0.param, 0
After SimplifyPatchable phase, the incorrect IR is generated:
%1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
%3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0"
Note that CORE_MEM pseudo op is introduced to encode
memory operations related to CORE. In the above, we intend
to check whether we have a store like
*(%3:gpr + 0) = ...
and if this is the case, we could replace it with
*(%0:gpr + @"llvm.t:0:0$0:0"_ = ...
Unfortunately, in the above, IR for the store is
*(%stack.0.param + 0) = %3:gpr
and transformation should not happen.
Note that we won't have problem if the actual CORE
dereference (arg->a) happens.
This patch fixed the problem by skip CORE optimization if
the use of ADD_rr result is not the base address of the store
operation.
Differential Revision: https://reviews.llvm.org/D78466
We were only including MachineInstr.h for DebugLoc.h. This exposes an implicit include dependency in BTFDebug.h where I've had to add the MachineInstr.h include.
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.
This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.