Commit Graph

135 Commits

Author SHA1 Message Date
Reid Kleckner d72163947a [PGO] Update ICP pass for recent byval type changes
Fixes verifier errors encountered in PR42413.

Reviewers: xur, t.p.northover, inglorion, gbiv, george.burgess.iv

Differential Revision: https://reviews.llvm.org/D63842

llvm-svn: 364861
2019-07-01 22:43:39 +00:00
Fangrui Song ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Rong Xu e44fa83c37 [PGO] Handle cases of non-instrument BBs
As shown in PR41279, some basic blocks (such as catchswitch) cannot be
instrumented. This patch filters out these BBs in PGO instrumentation.
It also sets the profile count to the fail-to-instrument edge, so that we
can propagate the counts in the CFG.

Differential Revision: https://reviews.llvm.org/D62700

llvm-svn: 362995
2019-06-10 22:36:27 +00:00
Rong Xu e88173abc0 [PGO] Handle cases of failing to split critical edges
Fix PR41279 where critical edges to EHPad are not split.
The fix is to not instrument those critical edges. We used to be able to know
the size of counters right after MST is computed. With this, we have to
pre-collect the instrument BBs to know the size, and then instrument them.

Differential Revision: https://reviews.llvm.org/D62439

llvm-svn: 361882
2019-05-28 21:45:56 +00:00
Hiroshi Yamauchi dfeb797455 [PGO][CHR] Speed up following long use-def chains.
Summary: Avoid visiting an instruction more than once by using a map.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62262

llvm-svn: 361416
2019-05-22 18:37:34 +00:00
Hiroshi Yamauchi 1620104034 [PGO][CHR] A bug fix.
Summary: Fix a transformation bug where two scopes share a common instrution to hoist.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61405

llvm-svn: 359736
2019-05-01 22:49:52 +00:00
Dmitry Mikulin 312b5f86b7 The error message for mismatched value sites is very cryptic.
Make it more readable for an average user.

Differential Revision: https://reviews.llvm.org/D60896

llvm-svn: 359043
2019-04-23 22:26:55 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Rong Xu 959ef16859 [PGO] Better handling of profile hash mismatch
We currently assume profile hash conflicts will be caught by an upfront
check and we assert for the cases that escape the check. The assumption
is not always true as there are chances of conflict. This patch prints
a warning and skips annotating the function for the escaped cases,.

Differential Revision: https://reviews.llvm.org/D60154

llvm-svn: 358225
2019-04-11 20:54:17 +00:00
Rong Xu 3ee1524afc [PGO] Fix hexagon buildbot errors in r355541
Add "REQUIRES: x86-registered-target" to thinlto test cases.

llvm-svn: 355556
2019-03-06 22:16:47 +00:00
Rong Xu 05c0afe842 [PGO] Context sensitive PGO (part 4)
Part 4 of CSPGO changes:
(1) add support in cmake for cspgo build.
(2) fix an issue in big endian.
(3) test cases.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355541
2019-03-06 19:31:37 +00:00
Wei Mi c876e3d42b [PGO] Make pgo related options in opt more consistent.
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.

To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.

Differential Revision: https://reviews.llvm.org/D56749

llvm-svn: 351392
2019-01-16 23:19:02 +00:00
Rong Xu fb4bcc452c [PGO] Exit early if all count values are zero
If all the edge counts for a function are zero, skip count population and
annotation, as nothing will happen. This can save some compile time.

Differential Revision: https://reviews.llvm.org/D54212

llvm-svn: 346370
2018-11-07 23:51:20 +00:00
Richard Smith 6c67662816 Add a flag to remap manglings when reading profile data information.
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.

The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.

Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.

Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.

This is the LLVM side of Clang r344199.

Reviewers: davidxl, tejohnson, dlj, erik.pilkington

Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51249

llvm-svn: 344200
2018-10-10 23:13:47 +00:00
Hiroshi Yamauchi 9775a620b0 [PGO] Control Height Reduction
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.

if (hot_cond1) { // Likely true.
  do_stg_hot1();
}
if (hot_cond2) { // Likely true.
  do_stg_hot2();
}

->

if (hot_cond1 && hot_cond2) { // Hot path.
  do_stg_hot1();
  do_stg_hot2();
} else { // Cold path.
  if (hot_cond1) {
    do_stg_hot1();
  }
  if (hot_cond2) {
    do_stg_hot2();
  }
}

This speeds up some internal benchmarks up to ~30%.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D50591

llvm-svn: 341386
2018-09-04 17:19:13 +00:00
Xinliang David Li bcf726a32d [PGO] add target md5sum in warning message for icall
Differential revision: http://reviews.llvm.org/D51193

llvm-svn: 340657
2018-08-24 21:38:24 +00:00
Reid Kleckner c6cf918f44 [InstrProf] Use comdats on COFF for available_externally functions
Summary:
r262157 added ELF-specific logic to put a comdat on the __profc_*
globals created for available_externally functions. We should be able to
generalize that logic to all object file formats that support comdats,
i.e. everything other than MachO. This fixes duplicate symbol errors,
since on COFF, linkonce_odr doesn't make the symbol weak.

Fixes PR38251.

Reviewers: davidxl, xur

Subscribers: hiraditya, dmajor, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D49882

llvm-svn: 338082
2018-07-26 22:59:17 +00:00
Chijun Sima 9e1e0c7b2a [PGOMemOPSize] Preserve the DominatorTree
Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.

Reviewers: kuhar, davide, dmgreen

Reviewed By: dmgreen

Subscribers: mzolotukhin, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D48914

llvm-svn: 336522
2018-07-09 08:07:21 +00:00
Taewook Oh 923c216da5 [ICP] Do not attempt type matching for variable length arguments.
Summary:
When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument.

It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch.

Reviewers: mssimpso, davidxl, davide

Reviewed By: mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46026

llvm-svn: 330844
2018-04-25 17:19:21 +00:00
Rong Xu 662f38b16f [PGO] Fix branch probability remarks assert
Fixed counter/weight overflow that leads to an assertion. Also fixed the help
string for pgo-emit-branch-prob option.

Differential Revision: https://reviews.llvm.org/D44809

llvm-svn: 328653
2018-03-27 18:55:56 +00:00
Eugene Leviant 19e238746b [ThinLTO] Recommit of import global variables
This wasreverted in r326638 due to link problems and fixed
afterwards

llvm-svn: 327254
2018-03-12 10:30:50 +00:00
Chandler Carruth a4619d9944 [ThinLTO] Revert r325320: Import global variables
This caused some links to fail with ThinLTO due to missing symbols as
well as causing some binaries to have failures at runtime. We're working
with the author to get a test case, but want to get the tree green
again.

Further, it appears to introduce a data race. While the test usage of
threads was disabled in r325361 & r325362, that isn't an acceptable fix.
I've reverted both of these as well. This code needs to be thread safe.
Test cases for this are already on the original commit thread.

llvm-svn: 326638
2018-03-02 23:40:08 +00:00
Eugene Leviant 8c83b9b8c5 [ThinLTO] Fix data race in test #2
Switched to the right option (-thinlto-threads)

llvm-svn: 325362
2018-02-16 17:25:03 +00:00
Eugene Leviant c9724d9149 [ThinLTO] Fix data race in test
llvm-svn: 325361
2018-02-16 16:56:33 +00:00
Teresa Johnson 791c98e4c8 [ThinLTO] Remove dead and dropped symbol declarations when possible
Summary:
Removing the dropped symbols will prevent indirect call promotion in the
ThinLTO Backend from adding a new reference to a symbol, which can
result in linker unsats. This can happen when we compile with a sample
profile collected from one binary by used for another, which may have
profiled targets that aren't used in the new binary.

Note that until dropDeadSymbols handles variables and aliases (in
progress), we may not be able to remove the declaration and can still
have an issue.

Reviewers: grimar, davidxl

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D42816

llvm-svn: 324299
2018-02-06 00:43:39 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Matthew Simpson cb35c5d5c2 [ICP] Expose unconditional call promotion interface
This patch modifies the indirect call promotion utilities by exposing and using
an unconditional call promotion interface. The unconditional promotion
interface (i.e., call promotion without creating an if-then-else) can be used
if it's known that an indirect call has only one possible callee. The existing
conditional promotion interface uses this unconditional interface to promote an
indirect call after it has been versioned and placed within the "then" block.

A consequence of unconditional promotion is that the fix-up operations for phi
nodes in the normal destination of invoke instructions are changed. This is
necessary because the existing implementation assumed that an invoke had been
versioned, creating a "merge" block where a return value bitcast could be
placed. In the new implementation, the edge between a promoted invoke's parent
block and its normal destination is split if needed to add a bitcast for the
return value. If the invoke is also versioned, the phi node merging the return
value of the promoted and original invoke instructions is placed in the "merge"
block.

Differential Revision: https://reviews.llvm.org/D40751

llvm-svn: 321210
2017-12-20 19:26:37 +00:00
Xinliang David Li 19fb5b467b [PGO] add MST min edge selection heuristic to ensure non-zero entry count
Differential Revision: http://reviews.llvm.org/D41059

llvm-svn: 320998
2017-12-18 17:56:19 +00:00
Vitaly Buka a5376f393e [LTO] Make processing of combined module more consistent
Summary:
1. Use stream 0 only for combined module. Previously if combined module was not
processes ThinLTO used the stream for own output. However small changes in input,
could trigger combined module  and shuffle outputs making life of llvm::LTO harder.

2. Always process combined module and write output to stream 0. Processing empty
combined module is cheap and allows llvm::LTO users to avoid implementing processing
which is already done in llvm::LTO.

Subscribers: mehdi_amini, inglorion, eraman, hiraditya

Differential Revision: https://reviews.llvm.org/D41267

llvm-svn: 320905
2017-12-16 02:10:00 +00:00
Hiroshi Yamauchi f3bda1daa2 Split IndirectBr critical edges before PGO gen/use passes.
Summary:
The PGO gen/use passes currently fail with an assert failure if there's a
critical edge whose source is an IndirectBr instruction and that edge
needs to be instrumented.

To avoid this in certain cases, split IndirectBr critical edges in the PGO
gen/use passes. This works for blocks with single indirectbr predecessors,
but not for those with multiple indirectbr predecessors (splitting an
IndirectBr critical edge isn't always possible.)

Reviewers: davidxl, xur

Reviewed By: davidxl

Subscribers: efriedma, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D40699

llvm-svn: 320511
2017-12-12 19:07:43 +00:00
Xinliang David Li d91057bf52 Revert r320104: infinite loop profiling bug fix
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.

Abandon this patch.  There is a more general solution
which also handles runtime infinite loop (but not statically).

llvm-svn: 320180
2017-12-08 19:38:07 +00:00
Xinliang David Li 4b0027f671 [PGO] detect infinite loop and form MST properly
Differential Revision: http://reviews.llvm.org/D40873

llvm-svn: 320104
2017-12-07 22:23:28 +00:00
Xinliang David Li 45c819063a Revert r319794: [PGO] detect infinite loop and form MST properly: memory leak problem
llvm-svn: 319841
2017-12-05 21:54:01 +00:00
Xinliang David Li cc35bc9efc [PGO] detect infinite loop and form MST properly
Differential Revision: http://reviews.llvm.org/D40702

llvm-svn: 319794
2017-12-05 17:19:41 +00:00
Xinliang David Li c23d2c6883 [PGO] Skip counter promotion for infinite loops
Differential Revision: http://reviews.llvm.org/D40662

llvm-svn: 319462
2017-11-30 19:16:25 +00:00
Hiroshi Yamauchi c94d4d70d8 Add heuristics for irreducible loop metadata under PGO
Summary:
Add the following heuristics for irreducible loop metadata:

- When an irreducible loop header is missing the loop header weight metadata,
  give it the minimum weight seen among other headers.
- Annotate indirectbr targets with the loop header weight metadata (as they are
  likely to become irreducible loop headers after indirectbr tail duplication.)

These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39980

llvm-svn: 318693
2017-11-20 21:03:38 +00:00
Hiroshi Yamauchi 69c233ac6c Simplify irreducible loop metadata test code.
Summary:
Shorten the irreducible loop metadata test code by removing insignificant
instructions.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40043

llvm-svn: 318182
2017-11-14 19:48:59 +00:00
Sean Fertile 4595a915f6 [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.

Originally commited as r317374, but reverted in r317395 to update some missed
tests.

Differential Revision: https://reviews.llvm.org/D35702

llvm-svn: 317408
2017-11-04 17:04:39 +00:00
Sean Fertile 39770ca0a1 Revert "[LTO][ThinLTO] Use the linker resolutions to mark global values ..."
Changes more tests then expected on one of the build bots.
reverting to investigate.

This reverts https://llvm.org/svn/llvm-project/llvm/trunk@317374

llvm-svn: 317395
2017-11-04 01:54:20 +00:00
Sean Fertile 36528c2a9b [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.

Differential Revision: https://reviews.llvm.org/D35702

llvm-svn: 317374
2017-11-03 21:45:55 +00:00
Hiroshi Yamauchi dce9def3dd Irreducible loop metadata for more accurate block frequency under PGO.
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.

The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.

This patch is a basic support for this.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D39028

llvm-svn: 317278
2017-11-02 22:26:51 +00:00
Teresa Johnson f625118ec7 [ThinLTO] Fix dead stripping analysis for SamplePGO
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.

Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.

Reviewers: danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D38086

llvm-svn: 313766
2017-09-20 17:09:47 +00:00
Teresa Johnson b1bb468aa9 Fix bot failures by requiring x86 target in new test
The test added in r313151 requires a target triple since it is
running through code generation. Fix bot failures by requiring
an x86 target.

llvm-svn: 313153
2017-09-13 15:35:35 +00:00
Teresa Johnson 1958083d35 [ThinLTO] For SamplePGO, need to handle ICP targets consistently in thin link
Summary:
SamplePGO indirect call profiles record the target as the original GUID
for statics. The importer had special handling to map to the normal GUID
in that case. The dead global analysis needs the same treatment or
inconsistencies arise, resulting in linker unsats due to some dead
symbols being exported and kept, leaving in references to other dead
symbols that are removed.

This can happen when a SamplePGO profile collected by one binary is used
for a different binary, so the indirect call profiles may not accurately
reflect live targets.

Reviewers: danielcdh

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D37783

llvm-svn: 313151
2017-09-13 15:16:38 +00:00
Dehao Chen efd007f6f4 Add null check for promoted direct call
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.

Reviewers: tejohnson, xur, davidxl, djasper

Reviewed By: tejohnson

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D37252

llvm-svn: 312005
2017-08-29 15:28:12 +00:00
Taewook Oh 572f45a3c8 Create PHI node for the return value only when the return value has uses.
Summary:
Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses.

This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block.

I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false.

Reviewers: xur, davidxl, danielcdh

Reviewed By: xur

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37176

llvm-svn: 311906
2017-08-28 18:57:00 +00:00
Rong Xu 15848e5977 [PGO] Set edge weights for indirectbr instruction with profile counts
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.

Differential Revision: https://reviews.llvm.org/D37074

llvm-svn: 311604
2017-08-23 21:36:02 +00:00
Sam Elliott b0c9753691 Keep Optimization Remark Yaml in NewPM
Summary:
The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds.

So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33951

Reviewers: anemet, chandlerc

Reviewed By: anemet, chandlerc

Subscribers: javed.absar, chandlerc, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D36906

llvm-svn: 311271
2017-08-20 01:30:45 +00:00
Ana Pazos 6210f27dfc [PGO] Fixed assertion due to mismatched memcpy size type.
Summary:
Memcpy intrinsics have size argument of any integer type, like i32 or i64.
Fixed size type along with its value when cloning the intrinsic.

Reviewers: davidxl, xur

Reviewed By: davidxl

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D36844

llvm-svn: 311188
2017-08-18 19:17:08 +00:00