forked from OSchip/llvm-project
87cba43402
The following bpf linux kernel selftest failed with latest llvm: $ ./test_progs -n 7/10 ... The sequence of 8193 jumps is too complex. verification time 126272 usec stack depth 320 processed 114799 insns (limit 1000000) ... libbpf: failed to load object 'pyperf600_nounroll.o' test_bpf_verif_scale:FAIL:110 #7/10 pyperf600_nounroll.o:FAIL #7 bpf_verif_scale:FAIL After some investigation, I found the following llvm patch https://reviews.llvm.org/D84108 is responsible. The patch disabled hoisting common instructions in SimplifyCFG by default. Later on, the code changes and a SimplifyCFG phase with hoisting on cannot do the work any more. A test is provided to demonstrate the problem. The IR before simplifyCFG looks like: for.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp ult i32 %i.0, 6 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: %2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2 %cmp2 = icmp eq i8* %2, null %conv = zext i1 %cmp2 to i32 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3 call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3 ret i32 %conv for.body: %3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2 %tobool.not = icmp eq i8* %3, null br i1 %tobool.not, label %for.inc, label %land.lhs.true The first two insns of `for.cond.cleanup` and `for.body`, load and icmp, can be hoisted to `for.cond` block. With Patch D84108, the optimization is delayed. But unfortunately, later on loop rotation added addition phi nodes to `for.body` and hoisting cannot be done any more. Note such a hoisting is beneficial to bpf programs as bpf verifier does path sensitive analysis and verification. The hoisting preverts reloading from stack which will assume conservative value and increase exploited insns. In this case, it caused verifier failure. To fix this problem, I added an IR pass from bpf target to performance additional simplifycfg with hoisting common inst enabled. Differential Revision: https://reviews.llvm.org/D85434 |
||
---|---|---|
.. | ||
BTF | ||
CORE | ||
32-bit-subreg-alu.ll | ||
32-bit-subreg-cond-select.ll | ||
32-bit-subreg-load-store.ll | ||
32-bit-subreg-peephole-phi-1.ll | ||
32-bit-subreg-peephole-phi-2.ll | ||
32-bit-subreg-peephole-phi-3.ll | ||
32-bit-subreg-peephole.ll | ||
32-bit-subreg-zext.ll | ||
alu8.ll | ||
atomics.ll | ||
basictest.ll | ||
byval.ll | ||
callx.ll | ||
cc_args.ll | ||
cc_args_be.ll | ||
cc_ret.ll | ||
cmp.ll | ||
dwarfdump.ll | ||
elf-symbol-information.ll | ||
ex1.ll | ||
fi_ri.ll | ||
i128.ll | ||
inline_asm.ll | ||
inlineasm-output-template.ll | ||
intrinsics.ll | ||
is_trunc_free.ll | ||
is_zext_free.ll | ||
lit.local.cfg | ||
load.ll | ||
loops.ll | ||
many_args1.ll | ||
many_args2.ll | ||
mem_offset.ll | ||
mem_offset_be.ll | ||
memcpy-expand-in-order.ll | ||
objdump_atomics.ll | ||
objdump_cond_op.ll | ||
objdump_cond_op_2.ll | ||
objdump_dis_all.ll | ||
objdump_imm_hex.ll | ||
objdump_intrinsics.ll | ||
objdump_nop.ll | ||
objdump_static_var.ll | ||
objdump_trivial.ll | ||
objdump_two_funcs.ll | ||
optnone-1.ll | ||
reloc-btf-2.ll | ||
reloc-btf.ll | ||
reloc.ll | ||
remove_truncate_1.ll | ||
remove_truncate_2.ll | ||
remove_truncate_3.ll | ||
remove_truncate_4.ll | ||
remove_truncate_5.ll | ||
remove_truncate_6.ll | ||
remove_truncate_7.ll | ||
rodata_1.ll | ||
rodata_2.ll | ||
rodata_3.ll | ||
rodata_4.ll | ||
rodata_5.ll | ||
sanity.ll | ||
sdiv_error.ll | ||
select_ri.ll | ||
setcc.ll | ||
shifts.ll | ||
simplifycfg.ll | ||
sockex2.ll | ||
struct_ret1.ll | ||
struct_ret2.ll | ||
undef.ll | ||
vararg1.ll | ||
warn-call.ll | ||
warn-stack.ll | ||
xadd.ll | ||
xadd_legal.ll |