llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjoy Das	2aacc0ecca	[SCEV] Introduce ScalarEvolution::getOne and getZero. Summary: It is fairly common to call SE->getConstant(Ty, 0) or SE->getConstant(Ty, 1); this change makes such uses a little bit briefer. I've refactored the call sites I could find easily to use getZero / getOne. Reviewers: hfinkel, majnemer, reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12947 llvm-svn: 248362	2015-09-23 01:59:04 +00:00
James Molloy	50a4c27f97	[LoopUtils,LV] Propagate fast-math flags on generated FCmp instructions We're currently losing any fast-math flags when synthesizing fcmps for min/max reductions. In LV, make sure we copy over the scalar inst's flags. In LoopUtils, we know we only ever match patterns with hasUnsafeAlgebra, so apply that to any synthesized ops. llvm-svn: 248201	2015-09-21 19:41:19 +00:00
Chandler Carruth	7b560d40bd	[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are within a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167	2015-09-09 17:55:00 +00:00
James Molloy	520838977b	Rename ExitCount to BackedgeTakenCount, because that's what it is. We called a variable ExitCount, stored the backedge count in it, then redefined it to be the exit count again. llvm-svn: 247140	2015-09-09 12:51:10 +00:00
James Molloy	89eccee4db	Delay predication of stores until near the end of vector code generation Predicating stores requires creating extra blocks. It's much cleaner if we do this in one pass instead of mutating the CFG while writing vector instructions. Besides which we can make use of helper functions to update domtree for us, reducing the work we need to do. llvm-svn: 247139	2015-09-09 12:51:06 +00:00
James Molloy	1e583704f5	[LV] Don't bail to MiddleBlock if a runtime check fails, bail to ScalarPH instead We were bailing to two places if our runtime checks failed. If the initial overflow check failed, we'd go to ScalarPH. If any other check failed, we'd go to MiddleBlock. This caused us to have to have an extra PHI per induction and reduction as the vector loop's exit block was not dominated by its latch. There's no need to have this behavior - if we just always go to ScalarPH we can get rid of a bunch of complexity. llvm-svn: 246637	2015-09-02 10:15:39 +00:00
James Molloy	f2523e38d8	[LV] Move some code around slightly to make the intent of the function more clear. NFC. llvm-svn: 246636	2015-09-02 10:15:32 +00:00
James Molloy	aca2f400ba	[LV] Cleanup: Sink an IRBuilder closer to its uses. NFC. llvm-svn: 246635	2015-09-02 10:15:27 +00:00
James Molloy	cba9230507	[LV] Refactor all runtime check emissions into helper functions. This reduces the complexity of createEmptyBlock() and will open the door to further refactoring. The test change is simply because we're now constant folding a trivial test. llvm-svn: 246634	2015-09-02 10:15:22 +00:00
James Molloy	ff623dce39	[LV] Pull creation of trip counts into a helper function. ... and do a tad of tidyup while we're at it. Because StartIdx must now be zero, there's no difference between Count and EndIdx. llvm-svn: 246633	2015-09-02 10:15:16 +00:00
James Molloy	239ff5d193	[LV] Factor the creation of the loop induction variable out of createEmptyLoop() It makes things easier to understand if this is in a helper method. This is part of my ongoing spaghetti-removal operation on createEmptyLoop. llvm-svn: 246632	2015-09-02 10:15:09 +00:00
James Molloy	a860a2216a	[LV] Never widen an induction variable. There's no need to widen canonical induction variables. It's just as efficient to create a new, wide, induction variable. Consider, if we widen an indvar, then we'll have to truncate it before its uses anyway (1 trunc). If we create a new indvar instead, we'll have to truncate that instead (1 trunc) [besides which IndVars should go and clean up our mess after us anyway on principle]. This lets us remove a ton of special-casing code. llvm-svn: 246631	2015-09-02 10:15:05 +00:00
James Molloy	c07701b017	[LV] Switch to using canonical induction variables. Vectorized loops only ever have one induction variable. All induction PHIs from the scalar loop are rewritten to be in terms of this single indvar. We were trying very hard to pick an indvar that already existed, even if that indvar wasn't canonical (didn't start at zero). But trying so hard is really fruitless - creating a new, canonical, indvar only results in one extra add in the worst case and that add is trivially easy to push through the PHI out of the loop by instcombine. If we try and be less clever here and instead let instcombine clean up our mess (as we do in many other places in LV), we can remove unneeded complexity. llvm-svn: 246630	2015-09-02 10:14:54 +00:00
Tyler Nowicki	5eaa5a9d26	Improve vectorization diagnostic messages and extend vectorize(enable) pragma. This patch changes the analysis diagnostics produced when loops with floating-point recurrences or memory operations are identified. The new messages say "cannot prove it is safe to reorder * operations; allow reordering by specifying #pragma clang loop vectorize(enable)". Depending on the type of diagnostic the message will include additional options such as ffast-math or __restrict__. This patch also allows the vectorize(enable) pragma to override the low pointer memory check threshold. When the hint is given a higher threshold is used. See the clang patch for the options produced for each diagnostic. llvm-svn: 246187	2015-08-27 18:56:49 +00:00
Chad Rosier	c94f8e2906	[LoopVectorize] Add Support for Small Size Reductions. Unlike scalar operations, we can perform vector operations on element types that are smaller than the native integer types. We type-promote scalar operations if they are smaller than a native type (e.g., i8 arithmetic is promoted to i32 arithmetic on Arm targets). This patch detects and removes type-promotions within the reduction detection framework, enabling the vectorization of small size reductions. In the legality phase, we look through the ANDs and extensions that InstCombine creates during promotion, keeping track of the smaller type. In the profitability phase, we use the smaller type and ignore the ANDs and extensions in the cost model. Finally, in the code generation phase, we truncate the result of the reduction to allow InstCombine to rewrite the entire expression in the smaller type. This fixes PR21369. http://reviews.llvm.org/D12202 Patch by Matt Simpson <mssimpso@codeaurora.org>! llvm-svn: 246149	2015-08-27 14:12:17 +00:00
James Molloy	1bbf15c57c	[LoopVectorize] Extract InductionInfo into a helper class... ... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup. NFC llvm-svn: 246145	2015-08-27 09:53:00 +00:00
Tyler Nowicki	e0f400feaa	Improved printing of analysis diagnostics in the loop vectorizer. This patch ensures that every analysis diagnostic produced by the vectorizer will be printed if the loop has a vectorization hint on it. The condition has also been improved to prevent printing when a disabling hint is specified. llvm-svn: 246132	2015-08-27 01:02:04 +00:00
Wei Mi	edae87d819	The patch replace the overflow check in loop vectorization with the minimum loop iterations check. The loop minimum iterations check below ensures the loop has enough trip count so the generated vector loop will likely be executed, and it covers the overflow check. Differential Revision: http://reviews.llvm.org/D12107. llvm-svn: 245952	2015-08-25 16:43:47 +00:00
Tyler Nowicki	552a62fabc	Standardized 'failed' to 'Failed' in LoopVectorizationRequirements. llvm-svn: 245759	2015-08-21 23:03:24 +00:00
Michael Zolotukhin	2a3d99fedf	[LoopVectorize] Propagate 'nontemporal' attribute into vectorized instructions. llvm-svn: 245632	2015-08-20 22:27:38 +00:00
Chandler Carruth	2f1fd1658f	[PM] Port ScalarEvolution to the new pass manager. This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never actually invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there is actually a key that we don't update with value handles -- there is a map keyed off of Loops. Because LoopInfo does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that don't trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in just such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193	2015-08-17 02:08:17 +00:00
Sanjay Patel	fec7965b36	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244617	2015-08-11 15:56:31 +00:00
Sanjay Patel	2a3eb41deb	fix code that was accidentally commented out in previous commit llvm-svn: 244610	2015-08-11 15:08:29 +00:00
Sanjay Patel	320217668e	fix typos in comments; NFC llvm-svn: 244609	2015-08-11 15:04:51 +00:00
Sanjay Patel	25b2601bca	fix typo in comment; NFC llvm-svn: 244607	2015-08-11 14:45:08 +00:00
Tyler Nowicki	c94d6ad241	Print vectorization analysis when loop hint is specified. This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints. llvm-svn: 244555	2015-08-11 01:09:15 +00:00
Tyler Nowicki	233773837e	Moved LoopVectorizeHints and related functions before LoopVectorizationLegality and LoopVectorizationCostModel. llvm-svn: 244552	2015-08-11 00:52:54 +00:00
Tyler Nowicki	2d5802f38d	Simplify processLoop() by moving loop hint verification into Hints::allowVectorization(). llvm-svn: 244550	2015-08-11 00:35:44 +00:00
Adam Nemet	5b0a479541	[LAA] Change name from addRuntimeCheck to addRuntimeChecks, NFC This was requested by Hal in D11205. llvm-svn: 244540	2015-08-11 00:09:37 +00:00
Tyler Nowicki	652b0dabe6	Extend late diagnostics to include late test for runtime pointer checks. This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization. llvm-svn: 244523	2015-08-10 23:01:55 +00:00
Tyler Nowicki	c1a86f5866	Late evaluation of the fast-math vectorization requirement. This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint. llvm-svn: 244489	2015-08-10 19:51:46 +00:00
Tyler Nowicki	4d62f2e039	Modify diagnostic messages to clearly indicate the why interleaving wasn't done. Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done. llvm-svn: 244485	2015-08-10 19:14:16 +00:00
Silviu Baranga	61bdc51339	[TTI] Add a hook for specifying per-target defaults for Interleaved Accesses Summary: This adds a hook to TTI which enables us to selectively turn on by default interleaved access vectorization for targets on which we have have performed the required benchmarking. Reviewers: rengolin Subscribers: rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D11901 llvm-svn: 244449	2015-08-10 14:50:54 +00:00
Sanjay Patel	924879ad2c	wrap OptSize and MinSize attributes for easier and consistent access (NFCI) Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994	2015-08-04 15:49:57 +00:00
Wei Mi	deee61e434	Create a wrapper pass for BlockFrequencyInfo. This is useful when we want to do block frequency analysis conditionally (e.g. only in PGO mode) but don't want to add one more pass dependence. Patch by congh. Approved by dexonsmith. Differential Revision: http://reviews.llvm.org/D11196 llvm-svn: 242248	2015-07-14 23:40:50 +00:00
Adam Nemet	7cdebac0c8	[LAA] Lift RuntimePointerCheck out of LoopAccessInfo, NFC I am planning to add more nested classes inside RuntimePointerCheck so all these triple-nesting would be hard to follow. Also rename it to RuntimePointerChecking (i.e. append 'ing'). llvm-svn: 242218	2015-07-14 22:32:44 +00:00
Benjamin Kramer	e448b5be05	Avoid using Loop::getSubLoopsVector. Passes should never modify it, just use the const version. While there reduce copying in LoopInterchange. No functional change intended. llvm-svn: 242041	2015-07-13 17:21:14 +00:00
Hal Finkel	9cf58c4095	Move getStrideFromPointer and friends from LoopVectorize to VectorUtils The following functions are moved from the LoopVectorizer to VectorUtils: - getGEPInductionOperand - stripGetElementPtr - getUniqueCastUse - getStrideFromPointer These used to be static functions in LoopVectorize, but will also be used by the upcoming loop versioning LICM transformation. Patch by Ashutosh Nema! llvm-svn: 241980	2015-07-11 10:52:42 +00:00
Tyler Nowicki	3960d85262	Renamed some uses of unroll to interleave in the vectorizer. llvm-svn: 241971	2015-07-11 00:31:11 +00:00
Jingyue Wu	a277561922	[TTI] BasicTTIImpl assumes no vector registers Summary: Following the discussion on r241884, it's more reasonable to assume that a target has no vector registers by default instead of letting every such target overrides getNumberOfRegisters. Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to return 0 when Vector is true, and partially reverts r241884 which modifies NVPTXTTIImpl::getNumberOfRegisters. It also fixes a performance bug in LoopVectorizer. Even if a target has no vector registers, vectorization may still help ILP. So, we need both checks to be false before disabling loop vectorization all together. Reviewers: hfinkel Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11108 llvm-svn: 241942	2015-07-10 21:14:54 +00:00
Michael Zolotukhin	97295ea7dd	[LoopVectorizer] Rename BypassBlock to VectorPH, and CheckBlock to NewVectorPH. NFCI. llvm-svn: 241742	2015-07-08 21:48:03 +00:00
Michael Zolotukhin	8c874bb2f1	[LoopVectorizer] Restructurize code for emitting RT checks. NFCI. Place all code corresponding to a run-time check in one place. Previously we generated some code, then proceeded to a next check, then finished the code for the first check (like splitting blocks and generating branches). Now the code for generating a check is self-contained. llvm-svn: 241741	2015-07-08 21:47:59 +00:00
Michael Zolotukhin	66f5591f9b	[LoopVectorizer] Remove redundant variables PastOverflowCheck and OverflowCheckAnchor. NFCI. llvm-svn: 241740	2015-07-08 21:47:56 +00:00
Michael Zolotukhin	00345cadd5	[LoopVectorizer] Move some code around to ease further refactoring. NFCI. llvm-svn: 241739	2015-07-08 21:47:53 +00:00
Michael Zolotukhin	7db3063f87	[LoopVectorizer] Remove redundant variable LastBypassBlock. NFC. llvm-svn: 241738	2015-07-08 21:47:47 +00:00
Alexey Samsonov	958dab71b3	[LoopVectorize] Use ReplaceInstWithInst() helper where appropriate. This is mostly an NFC, which increases code readability (instead of saving old terminator, generating new one in front of old, and deleting old, we just call a function). However, it would additionaly copy the debug location from old instruction to replacement, which would help PR23837. llvm-svn: 241197	2015-07-01 22:18:30 +00:00
David Majnemer	9f3979fd78	[LoopVectorize] Pointer indicies may be wider than the pointer If we are dealing with a pointer induction variable, isInductionPHI gives back a step value of Stride / size of pointer. However, we might be indexing with a legal type wider than the pointer width. Handle this by inserting casts where appropriate instead of crashing. This fixes PR23954. llvm-svn: 240877	2015-06-27 08:38:17 +00:00
David Blaikie	b447ac6435	Move VectorUtils from Transforms to Analysis to correct layering violation llvm-svn: 240804	2015-06-26 18:02:52 +00:00
Michael Zolotukhin	79ff564ef3	[LoopVectorizer] Fix bailing-out condition for OptForSize case. With option OptForSize enabled, the Loop Vectorizer is not supposed to create tail loop. The condition checking that was invalid and was not matching to the comment above. Patch by Marianne Mailhot-Sarrasin. llvm-svn: 240556	2015-06-24 17:26:24 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00

1 2 3 4 5 ...

509 Commits