llvm-project/llvm/test/Transforms/HotColdSplit/multiple-exits.ll

73 lines
1.9 KiB
LLVM
Raw Normal View History

; RUN: opt -S -hotcoldsplit -hotcoldsplit-threshold=0 < %s | FileCheck %s
[HotColdSplitting] Identify larger cold regions using domtree queries The current splitting algorithm works in three stages: 1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks *outside* of the set of hot blocks and start outlining. While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified: - An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo. - It isn't necessary to limit outlining to single-exit regions. This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach. Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out. Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested. Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions. Differential Revision: https://reviews.llvm.org/D53627 llvm-svn: 345209
2018-10-25 06:15:41 +08:00
; Source:
;
; extern void sideeffect(int);
; extern void __attribute__((cold)) sink();
; void foo(int cond) {
; if (cond) { //< Start outlining here.
; sink();
; if (cond > 10)
; goto exit1;
; else
; goto exit2;
; }
; exit1:
; sideeffect(1);
; return;
; exit2:
; sideeffect(2);
; return;
; }
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0"
; CHECK-LABEL: define {{.*}}@foo(
; CHECK: br i1 {{.*}}, label %exit1, label %codeRepl
; CHECK-LABEL: codeRepl:
; CHECK: [[targetBlock:%.*]] = call i1 @foo.cold.1(
; CHECK-NEXT: br i1 [[targetBlock]], label %exit1, label %[[return:.*]]
; CHECK-LABEL: exit1:
; CHECK: call {{.*}}@sideeffect(i32 1)
; CHECK: [[return]]:
; CHECK-NEXT: ret void
define void @foo(i32 %cond) {
entry:
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %exit1, label %if.then
if.then: ; preds = %entry
tail call void (...) @sink()
%cmp = icmp sgt i32 %cond, 10
br i1 %cmp, label %exit1, label %exit2
exit1: ; preds = %entry, %if.then
call void @sideeffect(i32 1)
br label %return
exit2: ; preds = %if.then
call void @sideeffect(i32 2)
br label %return
return: ; preds = %exit2, %exit1
ret void
}
; CHECK-LABEL: define {{.*}}@foo.cold.1(
; CHECK: br
; CHECK: [[exit1Stub:.*]]:
; CHECK-NEXT: ret i1 true
; CHECK: [[returnStub:.*]]:
; CHECK-NEXT: ret i1 false
; CHECK: call {{.*}}@sink
; CHECK-NEXT: [[cmp:%.*]] = icmp
; CHECK-NEXT: br i1 [[cmp]], label %[[exit1Stub]], label %exit2
; CHECK-LABEL: exit2:
; CHECK-NEXT: call {{.*}}@sideeffect(i32 2)
; CHECK-NEXT: br label %[[returnStub]]
declare void @sink(...) cold
declare void @sideeffect(i32)