[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; Test to check the callgraph in summary when there is PGO
|
|
|
|
; RUN: opt -module-summary %s -o %t.o
|
|
|
|
; RUN: llvm-bcanalyzer -dump %t.o | FileCheck %s
|
2018-05-26 10:34:13 +08:00
|
|
|
; RUN: llvm-dis %t.o
|
|
|
|
; RUN: cat %t.o.ll | FileCheck %s --check-prefix=DIS
|
|
|
|
|
|
|
|
; Make sure the assembler doesn't error when parsing the summary
|
|
|
|
; RUN: llvm-as %t.o.ll
|
2018-06-26 21:56:49 +08:00
|
|
|
|
|
|
|
; Check assembled summary.
|
|
|
|
; RUN: llvm-dis %t.o.bc -o - | FileCheck %s --check-prefix=DIS
|
2018-05-26 10:34:13 +08:00
|
|
|
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; RUN: opt -module-summary %p/Inputs/thinlto-function-summary-callgraph-profile-summary.ll -o %t2.o
|
|
|
|
; RUN: llvm-lto -thinlto -o %t3 %t.o %t2.o
|
|
|
|
; RUN: llvm-bcanalyzer -dump %t3.thinlto.bc | FileCheck %s --check-prefix=COMBINED
|
2018-05-26 10:34:13 +08:00
|
|
|
; RUN: llvm-dis %t3.thinlto.bc
|
|
|
|
; RUN: cat %t3.thinlto.ll | FileCheck %s --check-prefix=COMBINED-DIS
|
2018-06-26 21:56:49 +08:00
|
|
|
; Round trip it through llvm-as
|
|
|
|
; RUN: cat %t3.thinlto.ll | llvm-as -o - | llvm-dis -o - | FileCheck %s --check-prefix=COMBINED-DIS
|
2018-05-26 10:34:13 +08:00
|
|
|
|
|
|
|
; Make sure the assembler doesn't error when parsing the combined summary
|
|
|
|
; RUN: llvm-as %t3.thinlto.ll -o %t3.thinlto.o
|
2018-06-26 21:56:49 +08:00
|
|
|
|
|
|
|
; Check assembled combined summary.
|
|
|
|
; RUN: llvm-dis %t3.thinlto.o -o - | FileCheck %s --check-prefix=COMBINED-DIS
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
|
|
|
|
|
2017-04-18 01:51:36 +08:00
|
|
|
; CHECK: <SOURCE_FILENAME
|
|
|
|
; "hot_function"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=0 op1=12
|
|
|
|
; "hot1"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=12 op1=4
|
|
|
|
; "hot2"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=16 op1=4
|
|
|
|
; "hot3"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=20 op1=4
|
|
|
|
; "hot4"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=24 op1=4
|
|
|
|
; "cold"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=28 op1=4
|
|
|
|
; "none1"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=32 op1=5
|
|
|
|
; "none2"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=37 op1=5
|
|
|
|
; "none3"
|
|
|
|
; CHECK-NEXT: <FUNCTION op0=42 op1=5
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-LABEL: <GLOBALVAL_SUMMARY_BLOCK
|
|
|
|
; CHECK-NEXT: <VERSION
|
[LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
2019-01-12 02:31:57 +08:00
|
|
|
; CHECK-NEXT: <FLAGS
|
2017-04-18 01:51:36 +08:00
|
|
|
; CHECK-NEXT: <VALUE_GUID op0=25 op1=123/>
|
|
|
|
; op4=hot1 op6=cold op8=hot2 op10=hot4 op12=none1 op14=hot3 op16=none2 op18=none3 op20=123
|
2019-07-05 23:25:05 +08:00
|
|
|
; CHECK-NEXT: <PERMODULE_PROFILE {{.*}} op7=1 op8=3 op9=5 op10=1 op11=2 op12=3 op13=4 op14=1 op15=6 op16=2 op17=3 op18=3 op19=7 op20=2 op21=8 op22=2 op23=25 op24=4/>
|
2020-05-22 04:28:24 +08:00
|
|
|
; CHECK-NEXT: <BLOCK_COUNT op0=6/>
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; CHECK-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
|
2017-04-18 01:51:36 +08:00
|
|
|
|
|
|
|
; CHECK: <STRTAB_BLOCK
|
2017-06-28 07:50:11 +08:00
|
|
|
; CHECK-NEXT: blob data = 'hot_functionhot1hot2hot3hot4coldnone1none2none3{{.*}}'
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
|
|
|
|
; COMBINED: <GLOBALVAL_SUMMARY_BLOCK
|
|
|
|
; COMBINED-NEXT: <VERSION
|
2018-02-07 12:05:59 +08:00
|
|
|
; COMBINED-NEXT: <FLAGS
|
2017-04-18 01:51:36 +08:00
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
|
|
|
; COMBINED-NEXT: <VALUE_GUID
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
2019-07-05 23:25:05 +08:00
|
|
|
; COMBINED-NEXT: <COMBINED_PROFILE {{.*}} op9=[[HOT1:.*]] op10=3 op11=[[COLD:.*]] op12=1 op13=[[HOT2:.*]] op14=3 op15=[[NONE1:.*]] op16=2 op17=[[HOT3:.*]] op18=3 op19=[[NONE2:.*]] op20=2 op21=[[NONE3:.*]] op22=2/>
|
2020-05-22 04:28:24 +08:00
|
|
|
; COMBINED-NEXT: <COMBINED abbrevid=
|
|
|
|
; COMBINED-NEXT: <BLOCK_COUNT op0=13/>
|
|
|
|
; COMBINED-NEXT: </GLOBALVAL_SUMMARY_BLOCK>
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
|
|
|
|
|
|
|
|
; ModuleID = 'thinlto-function-summary-callgraph.ll'
|
|
|
|
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
|
|
|
|
target triple = "x86_64-unknown-linux-gnu"
|
|
|
|
|
|
|
|
; This function have high profile count, so entry block is hot.
|
|
|
|
define void @hot_function(i1 %a, i1 %a2) !prof !20 {
|
|
|
|
entry:
|
|
|
|
call void @hot1()
|
|
|
|
br i1 %a, label %Cold, label %Hot, !prof !41
|
|
|
|
Cold: ; 1/1000 goes here
|
|
|
|
call void @cold()
|
|
|
|
call void @hot2()
|
2017-03-22 01:22:35 +08:00
|
|
|
call void @hot4(), !prof !15
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
call void @none1()
|
|
|
|
br label %exit
|
|
|
|
Hot: ; 999/1000 goes here
|
|
|
|
call void @hot2()
|
|
|
|
call void @hot3()
|
|
|
|
br i1 %a2, label %None1, label %None2, !prof !42
|
|
|
|
None1: ; half goes here
|
|
|
|
call void @none1()
|
|
|
|
call void @none2()
|
|
|
|
br label %exit
|
|
|
|
None2: ; half goes here
|
|
|
|
call void @none3()
|
|
|
|
br label %exit
|
|
|
|
exit:
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
declare void @hot1() #1
|
|
|
|
declare void @hot2() #1
|
|
|
|
declare void @hot3() #1
|
2017-03-22 01:22:35 +08:00
|
|
|
declare void @hot4() #1
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
declare void @cold() #1
|
|
|
|
declare void @none1() #1
|
|
|
|
declare void @none2() #1
|
|
|
|
declare void @none3() #1
|
|
|
|
|
|
|
|
|
|
|
|
!41 = !{!"branch_weights", i32 1, i32 1000}
|
|
|
|
!42 = !{!"branch_weights", i32 1, i32 1}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
!llvm.module.flags = !{!1}
|
2017-03-01 02:09:44 +08:00
|
|
|
!20 = !{!"function_entry_count", i64 110, i64 123}
|
[thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
2016-09-27 04:37:32 +08:00
|
|
|
|
|
|
|
!1 = !{i32 1, !"ProfileSummary", !2}
|
|
|
|
!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
|
|
|
|
!3 = !{!"ProfileFormat", !"InstrProf"}
|
|
|
|
!4 = !{!"TotalCount", i64 10000}
|
|
|
|
!5 = !{!"MaxCount", i64 10}
|
|
|
|
!6 = !{!"MaxInternalCount", i64 1}
|
|
|
|
!7 = !{!"MaxFunctionCount", i64 1000}
|
|
|
|
!8 = !{!"NumCounts", i64 3}
|
|
|
|
!9 = !{!"NumFunctions", i64 3}
|
|
|
|
!10 = !{!"DetailedSummary", !11}
|
|
|
|
!11 = !{!12, !13, !14}
|
|
|
|
!12 = !{i32 10000, i64 100, i32 1}
|
|
|
|
!13 = !{i32 999000, i64 100, i32 1}
|
|
|
|
!14 = !{i32 999999, i64 1, i32 2}
|
2017-03-22 01:22:35 +08:00
|
|
|
!15 = !{!"branch_weights", i32 100}
|
2018-05-26 10:34:13 +08:00
|
|
|
|
2018-06-26 21:56:49 +08:00
|
|
|
; DIS: ^0 = module: (path: "{{.*}}thinlto-function-summary-callgraph-profile-summary.ll.tmp.o{{.*}}", hash: (0, 0, 0, 0, 0))
|
2018-05-26 10:34:13 +08:00
|
|
|
; DIS: ^1 = gv: (guid: 123)
|
|
|
|
; DIS: ^2 = gv: (name: "none2") ; guid = 3741006263754194003
|
|
|
|
; DIS: ^3 = gv: (name: "hot3") ; guid = 5026609803865204483
|
|
|
|
; DIS: ^4 = gv: (name: "hot2") ; guid = 8117347573235780485
|
|
|
|
; DIS: ^5 = gv: (name: "hot1") ; guid = 9453975128311291976
|
|
|
|
; DIS: ^6 = gv: (name: "cold") ; guid = 11668175513417606517
|
|
|
|
; DIS: ^7 = gv: (name: "hot4") ; guid = 13161834114071272798
|
|
|
|
; DIS: ^8 = gv: (name: "none3") ; guid = 16213681105727317812
|
[ThinLTO] Auto-hide prevailing linkonce_odr only when all copies eligible
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).
Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.
Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59709
llvm-svn: 360466
2019-05-11 04:08:24 +08:00
|
|
|
; DIS: ^9 = gv: (name: "hot_function", summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 16, calls: ((callee: ^5, hotness: hot), (callee: ^6, hotness: cold), (callee: ^4, hotness: hot), (callee: ^7, hotness: cold), (callee: ^10, hotness: none), (callee: ^3, hotness: hot), (callee: ^2, hotness: none), (callee: ^8, hotness: none), (callee: ^1, hotness: critical))))) ; guid = 17381606045411660303
|
2018-05-26 10:34:13 +08:00
|
|
|
; DIS: ^10 = gv: (name: "none1") ; guid = 17712061229457633252
|
|
|
|
|
2018-05-26 11:50:29 +08:00
|
|
|
; COMBINED-DIS: ^0 = module: (path: "{{.*}}thinlto-function-summary-callgraph-profile-summary.ll.tmp.o", hash: (0, 0, 0, 0, 0))
|
|
|
|
; COMBINED-DIS: ^1 = module: (path: "{{.*}}thinlto-function-summary-callgraph-profile-summary.ll.tmp2.o", hash: (0, 0, 0, 0, 0))
|
[ThinLTO] Auto-hide prevailing linkonce_odr only when all copies eligible
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).
Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.
Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59709
llvm-svn: 360466
2019-05-11 04:08:24 +08:00
|
|
|
; COMBINED-DIS: ^2 = gv: (guid: 3741006263754194003, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|
|
|
|
; COMBINED-DIS: ^3 = gv: (guid: 5026609803865204483, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|
|
|
|
; COMBINED-DIS: ^4 = gv: (guid: 8117347573235780485, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|
|
|
|
; COMBINED-DIS: ^5 = gv: (guid: 9453975128311291976, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|
|
|
|
; COMBINED-DIS: ^6 = gv: (guid: 11668175513417606517, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|
|
|
|
; COMBINED-DIS: ^7 = gv: (guid: 16213681105727317812, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|
|
|
|
; COMBINED-DIS: ^8 = gv: (guid: 17381606045411660303, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 16, calls: ((callee: ^5, hotness: hot), (callee: ^6, hotness: cold), (callee: ^4, hotness: hot), (callee: ^9, hotness: none), (callee: ^3, hotness: hot), (callee: ^2, hotness: none), (callee: ^7, hotness: none)))))
|
|
|
|
; COMBINED-DIS: ^9 = gv: (guid: 17712061229457633252, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
|