llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Kuperstein	76e06c8858	[LICM] When promoting scalars, allow inserting stores to thread-local allocas. This is similar to the allocfn case - if an alloca is not captured, then it's necessarily thread-local. Differential Revision: https://reviews.llvm.org/D28170 llvm-svn: 290738	2016-12-30 01:03:17 +00:00
Dehao Chen	cc76344ef5	Use continuous boosting factor for complete unroll. Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737	2016-12-30 00:50:28 +00:00
Michael Kuperstein	4a86a1921a	[LICM] Remove unneeded tracking of whether changes were made. NFC. "Changed" doesn't actually change within the loop, so there's no reason to keep track of it - we always return false during analysis and true after the transformation is made. llvm-svn: 290735	2016-12-30 00:43:22 +00:00
Michael Kuperstein	62b98c3977	[LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC. llvm-svn: 290734	2016-12-30 00:39:00 +00:00
David Majnemer	a1cfd7c5f8	[InstCombine] More thoroughly canonicalize the position of zexts We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y)) where possible. However, we didn't perform the same canonicalization for zexts or for muls. llvm-svn: 290733	2016-12-30 00:28:58 +00:00
Dylan McKay	453d042969	[AVR] Optimize 16-bit ORs with '0' Summary: Fixes PR 31344 Authored by Anmol P. Paralkar Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28121 llvm-svn: 290732	2016-12-30 00:21:56 +00:00
Reid Kleckner	0e7c84c682	Simplify FunctionLoweringInfo.cpp with range for loops I'm preparing to add some pattern matching code here, so simplify the code before I do. NFC llvm-svn: 290731	2016-12-30 00:21:38 +00:00
Reid Kleckner	e8ee89f8b0	Include <algorithm> for std::max etc llvm-svn: 290730	2016-12-30 00:15:40 +00:00
Michael Kuperstein	ff36baefe7	[LICM] Compute exit blocks for promotion eagerly. NFC. This moves the exit block and insertion point computation to be eager, instead of after seeing the first scalar we can promote. The cost is relatively small (the computation happens anyway, see discussion on D28147), and the code is easier to follow, and can bail out earlier if there's a catchswitch present. llvm-svn: 290729	2016-12-29 23:11:19 +00:00
Michael Kuperstein	5566092963	[LICM] Don't try to promote in loops where we have no chance to promote. NFC. We would check whether we have a prehader or dedicated exit blocks, and go into the promotion loop. Then, for each alias set we'd check if we have a preheader and dedicated exit blocks, and bail if not. Instead, bail immediately if we don't have both. llvm-svn: 290728	2016-12-29 22:51:22 +00:00
Michael Kuperstein	b6da9cf3b7	[LICM] Only recompute LCSSA when we actually promoted something. We want to recompute LCSSA only when we actually promoted a value. This means we only need to look at changes made by promotion when deciding whether to recompute it or not, not at regular sinking/hoisting. (This was what the code was documented as doing, just not what it did) Hopefully NFC. llvm-svn: 290726	2016-12-29 22:37:13 +00:00
Daniel Berlin	e0bd37e78f	NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. Change to one based numbering so we can assert we don't cause the same bug again. llvm-svn: 290724	2016-12-29 22:15:12 +00:00
Justin Lebar	175ab74dc5	[ADT] Delete RefCountedBaseVPTR. Summary: This class is unnecessary. Its comment indicated that it was a compile error to allocate an instance of a class that inherits from RefCountedBaseVPTR on the stack. This may have been true at one point, but it's not today. Moreover you really do not want to allocate any refcounted object on the stack, vptrs or not, so if we did have a way to prevent these objects from being stack-allocated, we'd want to apply it to regular RefCountedBase too, obviating the need for a separate RefCountedBaseVPTR class. It seems that the main way RefCountedBaseVPTR provides safety is by making its subclass's destructor virtual. This may have been helpful at one point, but these days clang will emit an error if you define a class with virtual functions that inherits from RefCountedBase but doesn't have a virtual destructor. Reviewers: compnerd, dblaikie Subscribers: cfe-commits, klimek, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D28162 llvm-svn: 290717	2016-12-29 19:59:26 +00:00
Reid Kleckner	cd46c1df80	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64" This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714	2016-12-29 17:07:10 +00:00
Sanjoy Das	00d76a5754	[TBAAVerifier] Be stricter around verifying scalar nodes This fixes the issue exposed in PR31393, where we weren't trying sufficiently hard to diagnose bad TBAA metadata. This does reduce the variety in the error messages we print out, but I think the tradeoff of verifying more, simply and quickly overrules the need for more helpful error messags here. llvm-svn: 290713	2016-12-29 15:47:05 +00:00
Sanjoy Das	600d2a5a6b	[TBAAVerifier] Make things const-consistent; NFC llvm-svn: 290712	2016-12-29 15:47:01 +00:00
Sanjoy Das	55f12d9de9	[TBAAVerifier] Memoize validity of scalar tbaa nodes; NFCI llvm-svn: 290711	2016-12-29 15:46:57 +00:00
Artem Tamazov	25478d821b	[AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710	2016-12-29 15:41:52 +00:00
Igor Laevsky	4f31e52f94	Introduce element-wise atomic memcpy intrinsic This change adds a new intrinsic which is intended to provide memcpy functionality with additional atomicity guarantees. Please refer to the review thread or language reference for further details. Differential Revision: https://reviews.llvm.org/D27133 llvm-svn: 290708	2016-12-29 14:31:07 +00:00
Craig Topper	17b5568bc7	[InstCombine] Use getVectorNumElements instead of explicitly casting to VectorType and calling getNumElements. NFC llvm-svn: 290707	2016-12-29 07:03:18 +00:00
Craig Topper	62f06e241b	[InstCombine] Fix typo in comment. NFC llvm-svn: 290706	2016-12-29 05:38:31 +00:00
Craig Topper	2e18bcfc60	[InstCombine] Use a 32-bits instead of 64-bits for storing the number of elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC llvm-svn: 290705	2016-12-29 04:24:32 +00:00
Craig Topper	1a8a3377cc	[InstCombine][X86] If the lowest element of a scalar intrinsic isn't used make sure we add it to the worklist so we can DCE it sooner. We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded. llvm-svn: 290704	2016-12-29 03:30:17 +00:00
Kostya Serebryany	d723804fa2	[libFuzzer] make __sanitizer_cov_trace_switch more predictable llvm-svn: 290703	2016-12-29 02:50:35 +00:00
Daniel Berlin	6658cc9ead	NewGVN: Sort Dominator Tree in RPO order, and use that for generating order. Summary: The optimal iteration order for this problem is RPO order. We want to process as many preds of a backedge as we can before we process the backedge. At the same time, as we add predicate handling, we want to be able to touch instructions that are dominated by a given block by ranges (because a change in value numbering a predicate possibly affects all users we dominate that are using that predicate). If we don't do it this way, we can't do value inference over backedges (the paper covers this in depth). The newgvn branch currently overshoots the last part, and guarantees that it will touch at least the right set of instructions, but it does touch more. This is because the bitvector instruction ranges are currently generated in RPO order (so we take the max and the min of the ranges of dominated blocks, which means there are some in the middle we didn't have to touch that we did). We can do better by sorting the dominator tree, and then just using dominator tree order. As a preliminary, the dominator tree has some RPO guarantees, but not enough. It guarantees that for a given node, your idom must come before you in the RPO ordering. It guarantees no relative RPO ordering for siblings. We add siblings in whatever order they appear in the module. So that is what we fix. We sort the children array of the domtree into RPO order, and then use the dominator tree for ordering, instead of RPO, since the dominator tree is now a valid RPO ordering. Note: This would help any other pass that iterates a forward problem in dominator tree order. Most of them are single pass. It will still maximize whatever result they compute. We could also build the dominator tree in this order, but our incremental updates would still put it out of sort order, and recomputing the sort order is almost as hard as general incremental updates of the domtree. Also note that the sorting does not affect any tests, etc. Nothing depends on domtree order, including the verifier, the equals functions for domtree nodes, etc. How much could this matter, you ask? Here are the current numbers. This is generated by running NewGVN over all files in LLVM. Note that once we propagate equalities, the differences go up by an order of magnitude or two (IE instead of 29, the max ends up in the thousands, since the worst case we add a factor of N, where N is the number of branch predicates). So while it doesn't look that stark for the default ordering, it gets much much worse. There are also programs in the wild where the difference is already pretty stark (2 iterations vs hundreds). RPO ordering: 759040 Number of iterations is 1 112908 Number of iterations is 2 Default dominator tree ordering: 755081 Number of iterations is 1 116234 Number of iterations is 2 603 Number of iterations is 3 27 Number of iterations is 4 2 Number of iterations is 5 1 Number of iterations is 7 Dominator tree sorted: 759040 Number of iterations is 1 112908 Number of iterations is 2 <yay!> Really bad ordering (sort domtree siblings in postorder. not quite the worst possible, but yeah): 754008 Number of iterations is 1 21 Number of iterations is 10 8 Number of iterations is 11 6 Number of iterations is 12 5 Number of iterations is 13 2 Number of iterations is 14 2 Number of iterations is 15 3 Number of iterations is 16 1 Number of iterations is 17 2 Number of iterations is 18 96642 Number of iterations is 2 1 Number of iterations is 20 2 Number of iterations is 21 1 Number of iterations is 22 1 Number of iterations is 29 17266 Number of iterations is 3 2598 Number of iterations is 4 798 Number of iterations is 5 273 Number of iterations is 6 186 Number of iterations is 7 80 Number of iterations is 8 42 Number of iterations is 9 Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28129 llvm-svn: 290699	2016-12-29 01:12:36 +00:00
Reid Kleckner	e9c8d7f87b	Add a static_assert about the sizeof(GlobalValue) I added one for Value back in r262045, and I'm starting to think we should have these for any class with bitfields whose memory efficiency really matters. llvm-svn: 290698	2016-12-29 00:55:51 +00:00
Daniel Berlin	7ad1ea0984	Update equalsStoreHelper for the fact that only one branch can be true llvm-svn: 290697	2016-12-29 00:49:32 +00:00
Reid Kleckner	c9e0a153cf	[COFF] Use 32-bit jump table entries in .rdata for Win64 Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694	2016-12-29 00:12:39 +00:00
Mehdi Amini	5022bb7238	Change Metadata Index emission in the bitcode to use 2x32 bits for the placeholder The Bitstream reader and writer are limited to handle a "size_t" at most, which means that we can't backpatch and read back a 64bits value on 32 bits platform. llvm-svn: 290693	2016-12-28 23:45:54 +00:00
Piotr Padlewski	6c37d298d9	Revert "[NewGVN] replace emplace_back with push_back" llvm-svn: 290692	2016-12-28 23:24:02 +00:00
Justin Lebar	291abd3ebb	Speed up Function::isIntrinsic() by adding a bit to GlobalValue. NFC Summary: Previously isIntrinsic() called getName(). This involves a hashtable lookup, so is nontrivially expensive. And isIntrinsic() is called frequently, particularly by dyn_cast<IntrinsicInstr>. This patch steals a bit of IntID and uses that to store whether or not getName() starts with "llvm." Reviewers: bogner, arsenm, joker-eph Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D22949 llvm-svn: 290691	2016-12-28 22:59:45 +00:00
Mehdi Amini	e98f925834	Add an index for Module Metadata record in the bitcode This index record the position for each metadata record in the bitcode, so that the reader will be able to lazy-load on demand each individual record. We also make sure that every abbrev is emitted upfront so that the block can be skipped while reading. I don't plan to commit this before having the reader counterpart, but I figured this can be reviewed mostly independently. Recommit r290684 (was reverted in r290686 because a test was broken) after adding a threshold to avoid emitting the index when unnecessary (little amount of metadata). This optimization "hides" a limitation of the ability to backpatch in the bitstream: we can only backpatch safely when the position has been flushed. So if we emit an index for one metadata, it is possible that (part of) the offset placeholder hasn't been flushed and the backpatch will fail. Differential Revision: https://reviews.llvm.org/D28083 llvm-svn: 290690	2016-12-28 22:30:28 +00:00
Saleem Abdulrasool	2b59eca1f7	Revert "Add an index for Module Metadata record in the bitcode" This reverts commit a0ca6ae2d38339e4ede0dfa588086fc23d87e836. Revert at Mehdi's request as it is breaking bots. llvm-svn: 290686	2016-12-28 20:37:22 +00:00
Piotr Padlewski	629a7f2cc0	[NewGVN] replace emplace_back with push_back emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the same type that the one stored in container. It is ugly and it might be even slower (see Scott Meyers presentation about emplacement). llvm-svn: 290685	2016-12-28 20:36:08 +00:00
Mehdi Amini	32ca148198	Add an index for Module Metadata record in the bitcode Summary: This index record the position for each metadata record in the bitcode, so that the reader will be able to lazy-load on demand each individual record. We also make sure that every abbrev is emitted upfront so that the block can be skipped while reading. I don't plan to commit this before having the reader counterpart, but I figured this can be reviewed mostly independently. Reviewers: pcc, tejohnson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28083 llvm-svn: 290684	2016-12-28 19:44:19 +00:00
Piotr Padlewski	26dada79ff	[NewGVN] Simplyfy loop NFC llvm-svn: 290683	2016-12-28 19:42:49 +00:00
Mehdi Amini	cc7fbf718d	[ThinLTO] Honor -O{0,1,2,4} passed through the libLTO interface for ThinLTO This was hardcoded to be O3 till now, without any way to change it without changing the code. llvm-svn: 290682	2016-12-28 19:37:16 +00:00
Piotr Padlewski	e4047b89ad	[NewGVN] replace typedefs with usings llvm-svn: 290680	2016-12-28 19:29:26 +00:00
Piotr Padlewski	fc5727b2a2	[NewGVN] NFC fixes llvm-svn: 290679	2016-12-28 19:17:17 +00:00
Reid Kleckner	92647369fc	[WinEH] Don't assume endFunction is called while in .text Jump table emission can switch to .rdata before WinException::endFunction gets called. Just remember the appropriate text section we started in and reset back to it when we end the function. We were already switching sections back from .xdata anyway. Fixes the first problem in PR31488, so that now COFF switch tables can live in .rdata if we want them to. llvm-svn: 290678	2016-12-28 19:05:12 +00:00
Davide Italiano	0e71480523	[NewGVN] Global sweep replacing NULL with nullptr. NFCI. llvm-svn: 290670	2016-12-28 14:00:11 +00:00
Davide Italiano	0fb3c7cde5	[NewGVN] Remove redundant code. NFCI. llvm-svn: 290669	2016-12-28 13:54:16 +00:00
Davide Italiano	b111409015	[NewGVN] equals() for loads/stores is the same. Unify. Differential Revision: https://reviews.llvm.org/D28116 llvm-svn: 290667	2016-12-28 13:37:17 +00:00
Chandler Carruth	05ca5acc9e	[PM] Introduce a devirtualization iteration layer for the new PM. This is an orthogonal and separated layer instead of being embedded inside the pass manager. While it adds a small amount of complexity, it is fairly minimal and the composability and control seems worth the cost. The logic for this ends up being nicely isolated and targeted. It should be easy to experiment with different iteration strategies wrapped around the CGSCC bottom-up walk using this kind of facility. The mechanism used to track devirtualization is the simplest one I came up with. I think it handles most of the cases the existing iteration machinery handles, but I haven't done a very in depth analysis. It does however match the basic intended semantics, and we can tweak or tune its exact behavior incrementally as necessary. One thing that we may want to revisit is freshly building the value handle set on each iteration. While I don't think this will be a significant cost (it is strictly fewer value handles but more churn of value handes than the old call graph), it is conceivable that we'll want a somewhat more clever tracking mechanism. My hope is to layer that on as a follow up patch with data supporting any implementation complexity it adds. This code also provides for a basic count heuristic: if the number of indirect calls decreases and the number of direct calls increases for a given function in the SCC, we assume devirtualization is responsible. This matches the heuristics currently used in the legacy pass manager. Differential Revision: https://reviews.llvm.org/D23114 llvm-svn: 290665	2016-12-28 11:07:33 +00:00
Chandler Carruth	443e57e01d	[PM] Teach the CGSCC's CG update utility to more carefully invalidate analyses when we're about to break apart an SCC. We can't wait until after breaking apart the SCC to invalidate things: 1) Which SCC do we then invalidate? All of them? 2) Even if we invalidate all of them, a newly created SCC may not have a proxy that will convey the invalidation to functions! Previously we only invalidated one of the SCCs and too late. This led to stale analyses remaining in the cache. And because the caching strategy actually works, they would get used and chaos would ensue. Doing invalidation early is somewhat pessimizing though if we know that the SCC structure won't change. So it turns out that the design to make the mutation API force the caller to know the kind of mutation in advance was indeed 100% correct and we didn't do enough of it. So this change also splits two cases of switching a call edge to a ref edge into two separate APIs so that callers can clearly test for this and take the easy path without invalidating when appropriate. This is particularly important in this case as we expect most inlines to be between functions in separate SCCs and so the common case is that we don't have to so aggressively invalidate analyses. The LCG API change in turn needed some basic cleanups and better testing in its unittest. No interesting functionality changed there other than more coverage of the returned sequence of SCCs. While this seems like an obvious improvement over the current state, I'd like to revisit the core concept of invalidating within the CG-update layer at all. I'm wondering if we would be better served forcing the callers to handle the invalidation beforehand in the cases that they can handle it. An interesting example is when we want to teach the inliner to update and preserve analyses. But we can cross that bridge when we get there. With this patch, the new pass manager an build all of the LLVM test suite at -O3 and everything passes. =D I haven't bootstrapped yet and I'm sure there are still plenty of bugs, but this gives a nice baseline so I'm going to increasingly focus on fleshing out the missing functionality, especially the bits that are just turned off right now in order to let us establish this baseline. llvm-svn: 290664	2016-12-28 10:34:50 +00:00
Gadi Haber	19c4fc5e62	This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663	2016-12-28 10:12:48 +00:00
Chandler Carruth	9900d18bab	[PM] Teach the inliner's call graph update to handle inserting new edges when they are call edges at the leaf but may (transitively) be reached via ref edges. It turns out there is a simple rule: insert everything as a ref edge which is a safe conservative default. Then we let the existing update logic handle promoting some of those to call edges. Note that it would be fairly cheap to make these call edges right away if that is desirable by testing whether there is some existing call path from the source to the target. It just seemed like slightly more complexity in this code path that isn't strictly necessary. If anyone feels strongly about handling this differently I'm happy to change it. llvm-svn: 290649	2016-12-28 03:13:12 +00:00
Craig Topper	28ec3460e4	[InstCombine] Remove a piece of a comment that said that InstCombiner contains pass infrastructure. That hasn't been true since r226618. NFC llvm-svn: 290648	2016-12-28 03:12:42 +00:00
Chandler Carruth	c6334579e9	[LCG] Teach the ref edge removal to handle a ref edge that is trivial due to a call cycle. This actually crashed the ref removal before. I've added a unittest that covers this kind of interesting graph structure and mutation. llvm-svn: 290645	2016-12-28 02:24:58 +00:00
Chandler Carruth	e635289ee2	[PM] Disable the loop vectorizer from the new PM's pipeline as it currenty relies on the old PM's dependency system forming LCSSA. The new PM will require a different design for this, and for now this is causing most of the issues I'm currently seeing in testing. I'd like to get to a testable baseline and then work on re-enabling things one at a time. llvm-svn: 290644	2016-12-28 02:24:55 +00:00

1 2 3 4 5 ...

98026 Commits