[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; Test to check both the callgraph and refgraph in summary
|
2016-04-13 05:35:18 +08:00
|
|
|
; RUN: opt -module-summary %s -o %t.o
|
[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; RUN: llvm-bcanalyzer -dump %t.o | FileCheck %s
|
|
|
|
|
|
|
|
; See if the calls and other references are recorded properly using the
|
|
|
|
; expected value id and other information as appropriate (callsite cout
|
|
|
|
; for calls). Use different linkage types for the various test cases to
|
|
|
|
; distinguish the test cases here (op1 contains the linkage type).
|
|
|
|
; Note that op3 contains the # non-call references.
|
|
|
|
; This also ensures that we didn't include a call or reference to intrinsic
|
|
|
|
; llvm.ctpop.i8.
|
|
|
|
; CHECK: <GLOBALVAL_SUMMARY_BLOCK
|
|
|
|
; Function main contains call to func, as well as address reference to func:
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-DAG: <PERMODULE {{.*}} op0=[[MAINID:[0-9]+]] op1=0 {{.*}} op3=1 op4=[[FUNCID:[0-9]+]] op5=[[FUNCID]]/>
|
[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; Function W contains a call to func3 as well as a reference to globalvar:
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-DAG: <PERMODULE {{.*}} op0=[[WID:[0-9]+]] op1=5 {{.*}} op3=1 op4=[[GLOBALVARID:[0-9]+]] op5=[[FUNC3ID:[0-9]+]]/>
|
[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; Function X contains call to foo, as well as address reference to foo
|
|
|
|
; which is in the same instruction as the call:
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-DAG: <PERMODULE {{.*}} op0=[[XID:[0-9]+]] op1=1 {{.*}} op3=1 op4=[[FOOID:[0-9]+]] op5=[[FOOID]]/>
|
[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; Function Y contains call to func2, and ensures we don't incorrectly add
|
|
|
|
; a reference to it when reached while earlier analyzing the phi using its
|
|
|
|
; return value:
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-DAG: <PERMODULE {{.*}} op0=[[YID:[0-9]+]] op1=8 {{.*}} op3=0 op4=[[FUNC2ID:[0-9]+]]/>
|
[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; Function Z contains call to func2, and ensures we don't incorrectly add
|
|
|
|
; a reference to it when reached while analyzing subsequent use of its return
|
|
|
|
; value:
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-DAG: <PERMODULE {{.*}} op0=[[ZID:[0-9]+]] op1=3 {{.*}} op3=0 op4=[[FUNC2ID:[0-9]+]]/>
|
[ThinLTO] Support for reference graph in per-module and combined summary.
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
2016-03-12 02:52:24 +08:00
|
|
|
; Variable bar initialization contains address reference to func:
|
|
|
|
; CHECK-DAG: <PERMODULE_GLOBALVAR_INIT_REFS {{.*}} op0=[[BARID:[0-9]+]] op1=0 op2=[[FUNCID]]/>
|
|
|
|
; CHECK: </GLOBALVAL_SUMMARY_BLOCK>
|
|
|
|
|
|
|
|
; CHECK-NEXT: <VALUE_SYMTAB
|
|
|
|
; CHECK-DAG: <ENTRY {{.*}} op0=[[BARID]] {{.*}} record string = 'bar'
|
|
|
|
; CHECK-DAG: <ENTRY {{.*}} op0=[[FUNCID]] {{.*}} record string = 'func'
|
|
|
|
; CHECK-DAG: <ENTRY {{.*}} op0=[[FOOID]] {{.*}} record string = 'foo'
|
|
|
|
; CHECK-DAG: <FNENTRY {{.*}} op0=[[MAINID]] {{.*}} record string = 'main'
|
|
|
|
; CHECK-DAG: <FNENTRY {{.*}} op0=[[WID]] {{.*}} record string = 'W'
|
|
|
|
; CHECK-DAG: <FNENTRY {{.*}} op0=[[XID]] {{.*}} record string = 'X'
|
|
|
|
; CHECK-DAG: <FNENTRY {{.*}} op0=[[YID]] {{.*}} record string = 'Y'
|
|
|
|
; CHECK-DAG: <FNENTRY {{.*}} op0=[[ZID]] {{.*}} record string = 'Z'
|
|
|
|
; CHECK-DAG: <ENTRY {{.*}} op0=[[FUNC2ID]] {{.*}} record string = 'func2'
|
|
|
|
; CHECK-DAG: <ENTRY {{.*}} op0=[[FUNC3ID]] {{.*}} record string = 'func3'
|
|
|
|
; CHECK-DAG: <ENTRY {{.*}} op0=[[GLOBALVARID]] {{.*}} record string = 'globalvar'
|
|
|
|
; CHECK: </VALUE_SYMTAB>
|
|
|
|
|
|
|
|
; ModuleID = 'thinlto-function-summary-refgraph.ll'
|
|
|
|
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
|
|
|
|
target triple = "x86_64-unknown-linux-gnu"
|
|
|
|
|
|
|
|
@bar = global void (...)* bitcast (void ()* @func to void (...)*), align 8
|
|
|
|
|
|
|
|
@globalvar = global i32 0, align 4
|
|
|
|
|
|
|
|
declare void @func() #0
|
|
|
|
declare i32 @func2(...) #1
|
|
|
|
declare void @foo(i8* %F) #0
|
|
|
|
declare i32 @func3(i32* dereferenceable(4)) #2
|
|
|
|
|
|
|
|
; Function Attrs: nounwind uwtable
|
|
|
|
define weak_odr void @W() #0 {
|
|
|
|
entry:
|
|
|
|
%call = tail call i32 @func3(i32* nonnull dereferenceable(4) @globalvar)
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Function Attrs: nounwind uwtable
|
|
|
|
define available_externally void @X() #0 {
|
|
|
|
entry:
|
|
|
|
call void @foo(i8* bitcast (void (i8*)* @foo to i8*))
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
; Function Attrs: nounwind uwtable
|
|
|
|
define private i32 @Y(i32 %i) #0 {
|
|
|
|
entry:
|
|
|
|
%cmp3 = icmp slt i32 %i, 10
|
|
|
|
br i1 %cmp3, label %while.body.preheader, label %while.end
|
|
|
|
|
|
|
|
while.body.preheader: ; preds = %entry
|
|
|
|
br label %while.body
|
|
|
|
|
|
|
|
while.body: ; preds = %while.body.preheader, %while.body
|
|
|
|
%j.05 = phi i32 [ %add, %while.body ], [ 0, %while.body.preheader ]
|
|
|
|
%i.addr.04 = phi i32 [ %inc, %while.body ], [ %i, %while.body.preheader ]
|
|
|
|
%inc = add nsw i32 %i.addr.04, 1
|
|
|
|
%call = tail call i32 (...) @func2() #2
|
|
|
|
%add = add nsw i32 %call, %j.05
|
|
|
|
%exitcond = icmp eq i32 %inc, 10
|
|
|
|
br i1 %exitcond, label %while.end.loopexit, label %while.body
|
|
|
|
|
|
|
|
while.end.loopexit: ; preds = %while.body
|
|
|
|
%add.lcssa = phi i32 [ %add, %while.body ]
|
|
|
|
br label %while.end
|
|
|
|
|
|
|
|
while.end: ; preds = %while.end.loopexit, %entry
|
|
|
|
%j.0.lcssa = phi i32 [ 0, %entry ], [ %add.lcssa, %while.end.loopexit ]
|
|
|
|
ret i32 %j.0.lcssa
|
|
|
|
}
|
|
|
|
|
|
|
|
; Function Attrs: nounwind uwtable
|
|
|
|
define linkonce_odr i32 @Z() #0 {
|
|
|
|
entry:
|
|
|
|
%call = tail call i32 (...) @func2() #2
|
|
|
|
ret i32 %call
|
|
|
|
}
|
|
|
|
|
|
|
|
declare i8 @llvm.ctpop.i8(i8)
|
|
|
|
|
|
|
|
; Function Attrs: nounwind uwtable
|
|
|
|
define i32 @main() #0 {
|
|
|
|
entry:
|
|
|
|
%retval = alloca i32, align 4
|
|
|
|
%foo = alloca void (...)*, align 8
|
|
|
|
store i32 0, i32* %retval, align 4
|
|
|
|
store void (...)* bitcast (void ()* @func to void (...)*), void (...)** %foo, align 8
|
|
|
|
%0 = load void (...)*, void (...)** %foo, align 8
|
|
|
|
call void (...) %0()
|
|
|
|
call void @func()
|
|
|
|
call i8 @llvm.ctpop.i8( i8 10 )
|
|
|
|
ret i32 0
|
|
|
|
}
|