llvm-project/llvm/test/CodeGen/BPF/CORE/no-narrow-load.ll

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

159 lines
8.2 KiB
LLVM
Raw Normal View History

BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible Move abstractMemberAccess and PreserveDIType passes as early as possible, right after clang code generation. Currently, compiler may transform the above code p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); bpf_probe_read(buf, buf_size, p2); } to p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { bpf_probe_read(buf, buf_size, p2); } and eventually assembly code looks like reloc_exist = 1; reloc_member_offset = 10; //calculate member offset from base p2 = base + reloc_member_offset; if (reloc_exist) { bpf_probe_read(bpf, buf_size, p2); } if during libbpf relocation resolution, reloc_exist is actually resolved to 0 (not exist), reloc_member_offset relocation cannot be resolved and will be patched with illegal instruction. This will cause verifier failure. This patch attempts to address this issue by do chaining analysis and replace chains with special globals right after clang code gen. This will remove the cse possibility described in the above. The IR typically looks like %6 = load @llvm.sk_buff:0:50$0:0:0:2:0 %7 = bitcast %struct.sk_buff* %2 to i8* %8 = getelementptr i8, i8* %7, %6 for a particular address computation relocation. But this transformation has another consequence, code sinking may happen like below: PHI = <possibly different @preserve_*_access_globals> %7 = bitcast %struct.sk_buff* %2 to i8* %8 = getelementptr i8, i8* %7, %6 For such cases, we will not able to generate relocations since multiple relocations are merged into one. This patch introduced a passthrough builtin to prevent such optimization. Looks like inline assembly has more impact for optimizaiton, e.g., inlining. Using passthrough has less impact on optimizations. A new IR pass is introduced at the beginning of target-dependent IR optimization, which does: - report fatal error if any reloc global in PHI nodes - remove all bpf passthrough builtin functions Changes for existing CORE tests: - for clang tests, add "-Xclang -disable-llvm-passes" flags to avoid builtin->reloc_global transformation so the test is still able to check correctness for clang generated IR. - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command before "llc" command since "opt" is needed to call newly-placed builtin->reloc_global transformation. Add target triple in the IR file since "opt" requires it. - Since target triple is added in IR file, if a test may produce different results for different endianness, two tests will be created, one for bpfeb and another for bpfel, e.g., some tests for relocation of lshift/rshift of bitfields. - field-reloc-bitfield-1.ll has different relocations compared to old codes. This is because for the structure in the test, new code returns struct layout alignment 4 while old code is 8. Align 8 is more precise and permits double load. With align 4, the new mechanism uses 4-byte load, so generating different relocations. - test intrinsic-transforms.ll is removed. This is used to test cse on intrinsics so we do not lose metadata. Now metadata is attached to global and not instruction, it won't get lost with cse. Differential Revision: https://reviews.llvm.org/D87153
2020-09-03 13:56:41 +08:00
; RUN: opt -O2 %s | llvm-dis > %t1
; RUN: llc -filetype=asm -o - %t1 | FileCheck -check-prefixes=CHECK %s
; Source code:
; struct data_t {
; int d1;
; int d2;
; };
; struct info_t {
; int pid;
; int flags;
; } __attribute__((preserve_access_index));
;
; extern void output(void *);
; void test(struct info_t * args) {
; int is_mask2 = args->flags & 0x10000;
; struct data_t data = {};
;
; data.d1 = is_mask2 ? 2 : args->pid;
; data.d2 = (is_mask2 || (args->flags & 0x8000)) ? 1 : 2;
; output(&data);
; }
; Compilation flag:
BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible Move abstractMemberAccess and PreserveDIType passes as early as possible, right after clang code generation. Currently, compiler may transform the above code p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); bpf_probe_read(buf, buf_size, p2); } to p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0); p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2); a = llvm.bpf.builtin.preserve_field_info(p2, EXIST); if (a) { bpf_probe_read(buf, buf_size, p2); } and eventually assembly code looks like reloc_exist = 1; reloc_member_offset = 10; //calculate member offset from base p2 = base + reloc_member_offset; if (reloc_exist) { bpf_probe_read(bpf, buf_size, p2); } if during libbpf relocation resolution, reloc_exist is actually resolved to 0 (not exist), reloc_member_offset relocation cannot be resolved and will be patched with illegal instruction. This will cause verifier failure. This patch attempts to address this issue by do chaining analysis and replace chains with special globals right after clang code gen. This will remove the cse possibility described in the above. The IR typically looks like %6 = load @llvm.sk_buff:0:50$0:0:0:2:0 %7 = bitcast %struct.sk_buff* %2 to i8* %8 = getelementptr i8, i8* %7, %6 for a particular address computation relocation. But this transformation has another consequence, code sinking may happen like below: PHI = <possibly different @preserve_*_access_globals> %7 = bitcast %struct.sk_buff* %2 to i8* %8 = getelementptr i8, i8* %7, %6 For such cases, we will not able to generate relocations since multiple relocations are merged into one. This patch introduced a passthrough builtin to prevent such optimization. Looks like inline assembly has more impact for optimizaiton, e.g., inlining. Using passthrough has less impact on optimizations. A new IR pass is introduced at the beginning of target-dependent IR optimization, which does: - report fatal error if any reloc global in PHI nodes - remove all bpf passthrough builtin functions Changes for existing CORE tests: - for clang tests, add "-Xclang -disable-llvm-passes" flags to avoid builtin->reloc_global transformation so the test is still able to check correctness for clang generated IR. - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command before "llc" command since "opt" is needed to call newly-placed builtin->reloc_global transformation. Add target triple in the IR file since "opt" requires it. - Since target triple is added in IR file, if a test may produce different results for different endianness, two tests will be created, one for bpfeb and another for bpfel, e.g., some tests for relocation of lshift/rshift of bitfields. - field-reloc-bitfield-1.ll has different relocations compared to old codes. This is because for the structure in the test, new code returns struct layout alignment 4 while old code is 8. Align 8 is more precise and permits double load. With align 4, the new mechanism uses 4-byte load, so generating different relocations. - test intrinsic-transforms.ll is removed. This is used to test cse on intrinsics so we do not lose metadata. Now metadata is attached to global and not instruction, it won't get lost with cse. Differential Revision: https://reviews.llvm.org/D87153
2020-09-03 13:56:41 +08:00
; clang -target bpf -O2 -g -S -emit-llvm -Xclang -disable-llvm-passes test.c
target triple = "bpf"
%struct.info_t = type { i32, i32 }
%struct.data_t = type { i32, i32 }
; Function Attrs: nounwind
define dso_local void @test(%struct.info_t* readonly %args) local_unnamed_addr #0 !dbg !12 {
entry:
%data = alloca i64, align 8
%tmpcast = bitcast i64* %data to %struct.data_t*
call void @llvm.dbg.value(metadata %struct.info_t* %args, metadata !22, metadata !DIExpression()), !dbg !29
%0 = tail call i32* @llvm.preserve.struct.access.index.p0i32.p0s_struct.info_ts(%struct.info_t* %args, i32 1, i32 1), !dbg !30, !llvm.preserve.access.index !16
%1 = load i32, i32* %0, align 4, !dbg !30, !tbaa !31
%and = and i32 %1, 65536, !dbg !36
call void @llvm.dbg.value(metadata i32 %and, metadata !23, metadata !DIExpression()), !dbg !29
%2 = bitcast i64* %data to i8*, !dbg !37
call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %2) #5, !dbg !37
call void @llvm.dbg.declare(metadata %struct.data_t* %tmpcast, metadata !24, metadata !DIExpression()), !dbg !38
store i64 0, i64* %data, align 8, !dbg !38
%tobool = icmp eq i32 %and, 0, !dbg !39
br i1 %tobool, label %cond.false, label %lor.end.critedge, !dbg !39
cond.false: ; preds = %entry
%3 = tail call i32* @llvm.preserve.struct.access.index.p0i32.p0s_struct.info_ts(%struct.info_t* %args, i32 0, i32 0), !dbg !40, !llvm.preserve.access.index !16
%4 = load i32, i32* %3, align 4, !dbg !40, !tbaa !41
%d1 = bitcast i64* %data to i32*, !dbg !42
store i32 %4, i32* %d1, align 8, !dbg !43, !tbaa !44
%5 = load i32, i32* %0, align 4, !dbg !46, !tbaa !31
%and2 = and i32 %5, 32768, !dbg !47
%tobool3 = icmp eq i32 %and2, 0, !dbg !48
%phitmp = select i1 %tobool3, i32 2, i32 1, !dbg !48
br label %lor.end, !dbg !48
lor.end.critedge: ; preds = %entry
%d1.c = bitcast i64* %data to i32*, !dbg !42
store i32 2, i32* %d1.c, align 8, !dbg !43, !tbaa !44
br label %lor.end, !dbg !48
lor.end: ; preds = %lor.end.critedge, %cond.false
%6 = phi i32 [ %phitmp, %cond.false ], [ 1, %lor.end.critedge ]
%d2 = getelementptr inbounds %struct.data_t, %struct.data_t* %tmpcast, i64 0, i32 1, !dbg !49
store i32 %6, i32* %d2, align 4, !dbg !50, !tbaa !51
call void @output(i8* nonnull %2) #5, !dbg !52
call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %2) #5, !dbg !53
ret void, !dbg !53
}
; CHECK: r[[LOAD1:[0-9]+]] = *(u32 *)(r{{[0-9]+}} + 4)
; CHECK: r[[LOAD1]] &= 65536
; CHECK: r[[LOAD2:[0-9]+]] = *(u32 *)(r{{[0-9]+}} + 4)
; CHECK: r[[LOAD2]] &= 32768
; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2
; Function Attrs: nounwind readnone
declare i32* @llvm.preserve.struct.access.index.p0i32.p0s_struct.info_ts(%struct.info_t*, i32 immarg, i32 immarg) #3
declare !dbg !4 dso_local void @output(i8*) local_unnamed_addr #4
; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2
; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #1
attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readnone speculatable willreturn }
attributes #2 = { argmemonly nounwind willreturn }
attributes #3 = { nounwind readnone }
attributes #4 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #5 = { nounwind }
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!8, !9, !10}
!llvm.ident = !{!11}
!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.0 (https://github.com/llvm/llvm-project.git 5884aae58f56786475bbc0f13ad8bd35f7f1ce69)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "test.c", directory: "/tmp/home/yhs/work/tests/core")
!2 = !{}
!3 = !{!4}
!4 = !DISubprogram(name: "output", scope: !1, file: !1, line: 10, type: !5, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !2)
!5 = !DISubroutineType(types: !6)
!6 = !{null, !7}
!7 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
!8 = !{i32 7, !"Dwarf Version", i32 4}
!9 = !{i32 2, !"Debug Info Version", i32 3}
!10 = !{i32 1, !"wchar_size", i32 4}
!11 = !{!"clang version 11.0.0 (https://github.com/llvm/llvm-project.git 5884aae58f56786475bbc0f13ad8bd35f7f1ce69)"}
!12 = distinct !DISubprogram(name: "test", scope: !1, file: !1, line: 11, type: !13, scopeLine: 11, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !21)
!13 = !DISubroutineType(types: !14)
!14 = !{null, !15}
!15 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !16, size: 64)
!16 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "info_t", file: !1, line: 5, size: 64, elements: !17)
!17 = !{!18, !20}
!18 = !DIDerivedType(tag: DW_TAG_member, name: "pid", scope: !16, file: !1, line: 6, baseType: !19, size: 32)
!19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!20 = !DIDerivedType(tag: DW_TAG_member, name: "flags", scope: !16, file: !1, line: 7, baseType: !19, size: 32, offset: 32)
!21 = !{!22, !23, !24}
!22 = !DILocalVariable(name: "args", arg: 1, scope: !12, file: !1, line: 11, type: !15)
!23 = !DILocalVariable(name: "is_mask2", scope: !12, file: !1, line: 12, type: !19)
!24 = !DILocalVariable(name: "data", scope: !12, file: !1, line: 13, type: !25)
!25 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "data_t", file: !1, line: 1, size: 64, elements: !26)
!26 = !{!27, !28}
!27 = !DIDerivedType(tag: DW_TAG_member, name: "d1", scope: !25, file: !1, line: 2, baseType: !19, size: 32)
!28 = !DIDerivedType(tag: DW_TAG_member, name: "d2", scope: !25, file: !1, line: 3, baseType: !19, size: 32, offset: 32)
!29 = !DILocation(line: 0, scope: !12)
!30 = !DILocation(line: 12, column: 24, scope: !12)
!31 = !{!32, !33, i64 4}
!32 = !{!"info_t", !33, i64 0, !33, i64 4}
!33 = !{!"int", !34, i64 0}
!34 = !{!"omnipotent char", !35, i64 0}
!35 = !{!"Simple C/C++ TBAA"}
!36 = !DILocation(line: 12, column: 30, scope: !12)
!37 = !DILocation(line: 13, column: 3, scope: !12)
!38 = !DILocation(line: 13, column: 17, scope: !12)
!39 = !DILocation(line: 15, column: 13, scope: !12)
!40 = !DILocation(line: 15, column: 34, scope: !12)
!41 = !{!32, !33, i64 0}
!42 = !DILocation(line: 15, column: 8, scope: !12)
!43 = !DILocation(line: 15, column: 11, scope: !12)
!44 = !{!45, !33, i64 0}
!45 = !{!"data_t", !33, i64 0, !33, i64 4}
!46 = !DILocation(line: 16, column: 33, scope: !12)
!47 = !DILocation(line: 16, column: 39, scope: !12)
!48 = !DILocation(line: 16, column: 23, scope: !12)
!49 = !DILocation(line: 16, column: 8, scope: !12)
!50 = !DILocation(line: 16, column: 11, scope: !12)
!51 = !{!45, !33, i64 4}
!52 = !DILocation(line: 17, column: 3, scope: !12)
!53 = !DILocation(line: 18, column: 1, scope: !12)