llvm-project

Commit Graph

Author	SHA1	Message	Date
Taewook Oh	923c216da5	[ICP] Do not attempt type matching for variable length arguments. Summary: When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument. It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch. Reviewers: mssimpso, davidxl, davide Reviewed By: mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46026 llvm-svn: 330844	2018-04-25 17:19:21 +00:00
Rong Xu	662f38b16f	[PGO] Fix branch probability remarks assert Fixed counter/weight overflow that leads to an assertion. Also fixed the help string for pgo-emit-branch-prob option. Differential Revision: https://reviews.llvm.org/D44809 llvm-svn: 328653	2018-03-27 18:55:56 +00:00
Eugene Leviant	19e238746b	[ThinLTO] Recommit of import global variables This wasreverted in r326638 due to link problems and fixed afterwards llvm-svn: 327254	2018-03-12 10:30:50 +00:00
Chandler Carruth	a4619d9944	[ThinLTO] Revert r325320: Import global variables This caused some links to fail with ThinLTO due to missing symbols as well as causing some binaries to have failures at runtime. We're working with the author to get a test case, but want to get the tree green again. Further, it appears to introduce a data race. While the test usage of threads was disabled in r325361 & r325362, that isn't an acceptable fix. I've reverted both of these as well. This code needs to be thread safe. Test cases for this are already on the original commit thread. llvm-svn: 326638	2018-03-02 23:40:08 +00:00
Eugene Leviant	8c83b9b8c5	[ThinLTO] Fix data race in test #2 Switched to the right option (-thinlto-threads) llvm-svn: 325362	2018-02-16 17:25:03 +00:00
Eugene Leviant	c9724d9149	[ThinLTO] Fix data race in test llvm-svn: 325361	2018-02-16 16:56:33 +00:00
Teresa Johnson	791c98e4c8	[ThinLTO] Remove dead and dropped symbol declarations when possible Summary: Removing the dropped symbols will prevent indirect call promotion in the ThinLTO Backend from adding a new reference to a symbol, which can result in linker unsats. This can happen when we compile with a sample profile collected from one binary by used for another, which may have profiled targets that aren't used in the new binary. Note that until dropDeadSymbols handles variables and aliases (in progress), we may not be able to remove the declaration and can still have an issue. Reviewers: grimar, davidxl Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D42816 llvm-svn: 324299	2018-02-06 00:43:39 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Matthew Simpson	cb35c5d5c2	[ICP] Expose unconditional call promotion interface This patch modifies the indirect call promotion utilities by exposing and using an unconditional call promotion interface. The unconditional promotion interface (i.e., call promotion without creating an if-then-else) can be used if it's known that an indirect call has only one possible callee. The existing conditional promotion interface uses this unconditional interface to promote an indirect call after it has been versioned and placed within the "then" block. A consequence of unconditional promotion is that the fix-up operations for phi nodes in the normal destination of invoke instructions are changed. This is necessary because the existing implementation assumed that an invoke had been versioned, creating a "merge" block where a return value bitcast could be placed. In the new implementation, the edge between a promoted invoke's parent block and its normal destination is split if needed to add a bitcast for the return value. If the invoke is also versioned, the phi node merging the return value of the promoted and original invoke instructions is placed in the "merge" block. Differential Revision: https://reviews.llvm.org/D40751 llvm-svn: 321210	2017-12-20 19:26:37 +00:00
Xinliang David Li	19fb5b467b	[PGO] add MST min edge selection heuristic to ensure non-zero entry count Differential Revision: http://reviews.llvm.org/D41059 llvm-svn: 320998	2017-12-18 17:56:19 +00:00
Vitaly Buka	a5376f393e	[LTO] Make processing of combined module more consistent Summary: 1. Use stream 0 only for combined module. Previously if combined module was not processes ThinLTO used the stream for own output. However small changes in input, could trigger combined module and shuffle outputs making life of llvm::LTO harder. 2. Always process combined module and write output to stream 0. Processing empty combined module is cheap and allows llvm::LTO users to avoid implementing processing which is already done in llvm::LTO. Subscribers: mehdi_amini, inglorion, eraman, hiraditya Differential Revision: https://reviews.llvm.org/D41267 llvm-svn: 320905	2017-12-16 02:10:00 +00:00
Hiroshi Yamauchi	f3bda1daa2	Split IndirectBr critical edges before PGO gen/use passes. Summary: The PGO gen/use passes currently fail with an assert failure if there's a critical edge whose source is an IndirectBr instruction and that edge needs to be instrumented. To avoid this in certain cases, split IndirectBr critical edges in the PGO gen/use passes. This works for blocks with single indirectbr predecessors, but not for those with multiple indirectbr predecessors (splitting an IndirectBr critical edge isn't always possible.) Reviewers: davidxl, xur Reviewed By: davidxl Subscribers: efriedma, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D40699 llvm-svn: 320511	2017-12-12 19:07:43 +00:00
Xinliang David Li	d91057bf52	Revert r320104: infinite loop profiling bug fix Causes unexpected memory issue with New PM this time. The new PM invalidates BPI but not BFI, leaving the reference to BPI from BFI invalid. Abandon this patch. There is a more general solution which also handles runtime infinite loop (but not statically). llvm-svn: 320180	2017-12-08 19:38:07 +00:00
Xinliang David Li	4b0027f671	[PGO] detect infinite loop and form MST properly Differential Revision: http://reviews.llvm.org/D40873 llvm-svn: 320104	2017-12-07 22:23:28 +00:00
Xinliang David Li	45c819063a	Revert r319794: [PGO] detect infinite loop and form MST properly: memory leak problem llvm-svn: 319841	2017-12-05 21:54:01 +00:00
Xinliang David Li	cc35bc9efc	[PGO] detect infinite loop and form MST properly Differential Revision: http://reviews.llvm.org/D40702 llvm-svn: 319794	2017-12-05 17:19:41 +00:00
Xinliang David Li	c23d2c6883	[PGO] Skip counter promotion for infinite loops Differential Revision: http://reviews.llvm.org/D40662 llvm-svn: 319462	2017-11-30 19:16:25 +00:00
Hiroshi Yamauchi	c94d4d70d8	Add heuristics for irreducible loop metadata under PGO Summary: Add the following heuristics for irreducible loop metadata: - When an irreducible loop header is missing the loop header weight metadata, give it the minimum weight seen among other headers. - Annotate indirectbr targets with the loop header weight metadata (as they are likely to become irreducible loop headers after indirectbr tail duplication.) These greatly improve the accuracy of the block frequency info of the Python interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to better register allocation under PGO. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39980 llvm-svn: 318693	2017-11-20 21:03:38 +00:00
Hiroshi Yamauchi	69c233ac6c	Simplify irreducible loop metadata test code. Summary: Shorten the irreducible loop metadata test code by removing insignificant instructions. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40043 llvm-svn: 318182	2017-11-14 19:48:59 +00:00
Sean Fertile	4595a915f6	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Originally commited as r317374, but reverted in r317395 to update some missed tests. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317408	2017-11-04 17:04:39 +00:00
Sean Fertile	39770ca0a1	Revert "[LTO][ThinLTO] Use the linker resolutions to mark global values ..." Changes more tests then expected on one of the build bots. reverting to investigate. This reverts https://llvm.org/svn/llvm-project/llvm/trunk@317374 llvm-svn: 317395	2017-11-04 01:54:20 +00:00
Sean Fertile	36528c2a9b	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317374	2017-11-03 21:45:55 +00:00
Hiroshi Yamauchi	dce9def3dd	Irreducible loop metadata for more accurate block frequency under PGO. Summary: Currently the block frequency analysis is an approximation for irreducible loops. The new irreducible loop metadata is used to annotate the irreducible loop headers with their header weights based on the PGO profile (currently this is approximated to be evenly weighted) and to help improve the accuracy of the block frequency analysis for irreducible loops. This patch is a basic support for this. Reviewers: davidxl Reviewed By: davidxl Subscribers: mehdi_amini, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D39028 llvm-svn: 317278	2017-11-02 22:26:51 +00:00
Teresa Johnson	f625118ec7	[ThinLTO] Fix dead stripping analysis for SamplePGO Summary: The fix for dead stripping analysis in the case of SamplePGO indirect calls to local functions (r313151) introduced the possibility of an infinite loop. Make sure we check for the value being already live after we update it for SamplePGO indirect call handling. Reviewers: danielcdh Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D38086 llvm-svn: 313766	2017-09-20 17:09:47 +00:00
Teresa Johnson	b1bb468aa9	Fix bot failures by requiring x86 target in new test The test added in r313151 requires a target triple since it is running through code generation. Fix bot failures by requiring an x86 target. llvm-svn: 313153	2017-09-13 15:35:35 +00:00
Teresa Johnson	1958083d35	[ThinLTO] For SamplePGO, need to handle ICP targets consistently in thin link Summary: SamplePGO indirect call profiles record the target as the original GUID for statics. The importer had special handling to map to the normal GUID in that case. The dead global analysis needs the same treatment or inconsistencies arise, resulting in linker unsats due to some dead symbols being exported and kept, leaving in references to other dead symbols that are removed. This can happen when a SamplePGO profile collected by one binary is used for a different binary, so the indirect call profiles may not accurately reflect live targets. Reviewers: danielcdh Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D37783 llvm-svn: 313151	2017-09-13 15:16:38 +00:00
Dehao Chen	efd007f6f4	Add null check for promoted direct call Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed. Reviewers: tejohnson, xur, davidxl, djasper Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D37252 llvm-svn: 312005	2017-08-29 15:28:12 +00:00
Taewook Oh	572f45a3c8	Create PHI node for the return value only when the return value has uses. Summary: Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses. This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block. I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false. Reviewers: xur, davidxl, danielcdh Reviewed By: xur Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37176 llvm-svn: 311906	2017-08-28 18:57:00 +00:00
Rong Xu	15848e5977	[PGO] Set edge weights for indirectbr instruction with profile counts Current PGO only annotates the edge weight for branch and switch instructions with profile counts. We should also annotate the indirectbr instruction as all the information is there. This patch enables the annotating for indirectbr instructions. Also uses this annotation in branch probability analysis. Differential Revision: https://reviews.llvm.org/D37074 llvm-svn: 311604	2017-08-23 21:36:02 +00:00
Sam Elliott	b0c9753691	Keep Optimization Remark Yaml in NewPM Summary: The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds. So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33951 Reviewers: anemet, chandlerc Reviewed By: anemet, chandlerc Subscribers: javed.absar, chandlerc, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D36906 llvm-svn: 311271	2017-08-20 01:30:45 +00:00
Ana Pazos	6210f27dfc	[PGO] Fixed assertion due to mismatched memcpy size type. Summary: Memcpy intrinsics have size argument of any integer type, like i32 or i64. Fixed size type along with its value when cloning the intrinsic. Reviewers: davidxl, xur Reviewed By: davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D36844 llvm-svn: 311188	2017-08-18 19:17:08 +00:00
Dehao Chen	34cfcb29aa	Make ICP uses PSI to check for hotness. Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted. Reviewers: davidxl, tejohnson, eraman Reviewed By: davidxl Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36341 llvm-svn: 310416	2017-08-08 20:57:33 +00:00
Sam Elliott	67b0e589d0	Migrate PGOMemOptSizeOpt to use new OptimizationRemarkEmitter Pass Summary: Fixes PR33790. This patch still needs a yaml-style test, which I shall write tomorrow Reviewers: anemet Reviewed By: anemet Subscribers: anemet, llvm-commits Differential Revision: https://reviews.llvm.org/D35981 llvm-svn: 309497	2017-07-30 00:35:33 +00:00
Dehao Chen	f4240b5b91	Separate the ICP total threshold and remaining threshold. Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively. Reviewers: davidxl, tejohnson Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D35962 llvm-svn: 309345	2017-07-28 01:02:54 +00:00
Adam Nemet	0d8b5d6f69	[ICP] Migrate to OptimizationRemarkEmitter This is a module pass so for the old PM, we can't use ORE, the function analysis pass. Instead ORE is created on the fly. A few notes: - isPromotionLegal is folded in the caller since we want to emit the Function in the remark but we can only do that if the symbol table look-up succeeded. - There was good test coverage for remarks in this pass. - promoteIndirectCall uses ORE conditionally since it's also used from SampleProfile which does not use ORE yet. Fixes PR33792. Differential Revision: https://reviews.llvm.org/D35929 llvm-svn: 309294	2017-07-27 16:54:15 +00:00
Davide Italiano	0c8d26c312	[PGO] Move the PGOInstrumentation pass to new OptRemark API. This fixes PR33791. llvm-svn: 308668	2017-07-20 20:43:05 +00:00
Xinliang David Li	f564c6959e	[PGO] Enhance pgo counter promotion This is an incremental change to the promotion feature. There are two problems with the current behavior: 1) loops with multiple exiting blocks are totally disabled 2) a counter update can only be promoted one level up in the loop nest -- which does help much for short trip count inner loops inside a high trip-count outer loops. Due to this limitation, we still saw very large profile count fluctuations from run to run for the affected loops which are usually very hot. This patch adds the support for promotion counters iteratively across the loop nest. It also turns on the promotion for loops with multiple exiting blocks (with a limit). For single-threaded applications, the performance impact is flat on average. For instance, dealII improves, but povray regresses. llvm-svn: 307863	2017-07-12 23:27:44 +00:00
Xinliang David Li	b67530e9b9	[PGO] Implementate profile counter regiser promotion Differential Revision: http://reviews.llvm.org/D34085 llvm-svn: 306231	2017-06-25 00:26:43 +00:00
Ana Pazos	f731bde064	[PATCH] [PGO] Fixed cast operation in emIntrinsicVisitor::instrumentOneMemIntrinsic. Reviewers: xur, efriedma, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34293 llvm-svn: 305737	2017-06-19 20:04:33 +00:00
Teresa Johnson	8015f88525	[PGO] Update VP metadata after memory intrinsic optimization Summary: Leave an updated VP metadata on the fallback memcpy intrinsic after specialization. This can be used for later possible expansion based on the average of the remaining values. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34164 llvm-svn: 305321	2017-06-13 20:44:08 +00:00
Xinliang David Li	0a0acbcf78	[PartialInlining] Emit branch info and profile data as remarks This allows us to collect profile statistics to tune static branch prediction. Differential Revision: http://reviews.llvm.org/D33746 llvm-svn: 304452	2017-06-01 18:58:50 +00:00
Teresa Johnson	51177295c4	Memory intrinsic value profile optimization: Avoid divide by 0 Summary: Skip memops if the total value profiled count is 0, we can't correctly scale up the counts and there is no point anyway. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32624 llvm-svn: 301645	2017-04-28 14:30:54 +00:00
Teresa Johnson	b2c390e9f5	Update profile during memory instrinsic optimization Summary: Ensure that the new merge BB (which contains the rest of the original BB after the mem op being optimized) gets a profile frequency, in case there are additional mem ops later in the BB. Otherwise they get skipped as the merge BB looks cold. Reviewers: davidxl, xur Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32447 llvm-svn: 301244	2017-04-24 20:30:42 +00:00
Rong Xu	48596b6f7a	[PGO] Memory intrinsic calls optimization based on profiled size This patch optimizes two memory intrinsic operations: memset and memcpy based on the profiled size of the operation. The high level transformation is like: mem_op(..., size) ==> switch (size) { case s1: mem_op(..., s1); goto merge_bb; case s2: mem_op(..., s2); goto merge_bb; ... default: mem_op(..., size); goto merge_bb; } merge_bb: Differential Revision: http://reviews.llvm.org/D28966 llvm-svn: 299446	2017-04-04 16:42:20 +00:00
Rong Xu	661ffe104e	[PGO] Add omitted test cases. llvm-svn: 298115	2017-03-17 20:05:13 +00:00
Rong Xu	e60343d6b0	[PGO] Value profile for size of memory intrinsic calls This patch annotates the valuesites profile to memory intrinsics. Differential Revision: http://reviews.llvm.org/D31002 llvm-svn: 298110	2017-03-17 18:07:26 +00:00
Rong Xu	60faea19f8	Resubmit r297897: [PGO] Value profile for size of memory intrinsic calls R297897 inadvertently enabled annotation for memop profiling. This new patch fixed it. llvm-svn: 297996	2017-03-16 21:15:48 +00:00
Eric Liu	971de62291	Revert "[PGO] Value profile for size of memory intrinsic calls" This commit reverts r297897 and r297909. llvm-svn: 297951	2017-03-16 13:16:35 +00:00
Rong Xu	4ed52798ce	[PGO] Value profile for size of memory intrinsic calls This patch adds the value profile support to profile the size parameter of memory intrinsic calls: memcpy, memcmp, and memmov. Differential Revision: http://reviews.llvm.org/D28965 llvm-svn: 297897	2017-03-15 21:47:27 +00:00
Dehao Chen	4a435e0896	SamplePGO ThinLTO ICP fix for local functions. Summary: In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen: 1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName. 2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining: 1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import. 2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName. 3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30754 llvm-svn: 297757	2017-03-14 17:33:01 +00:00

1 2 3

116 Commits