llvm-project

Commit Graph

Author	SHA1	Message	Date
Yi Jiang	edf2d9179e	set the cost of tiny trees to INT_MAX in SLP vectorizer to disable vectorization on them llvm-svn: 191314	2013-09-24 17:26:43 +00:00
Arnold Schwaighofer	22639407d7	Revert "LoopVectorizer: Only allow vectorization of intrinsics." Revert 191122 - with extra checks we are allowed to vectorize math library function calls. Standard library indentifiers are reserved names so functions with external linkage must not overrided them. However, functions with internal linkage can. Therefore, we can vectorize calls to math library functions with a check for external linkage and matching signature. This matches what we do during SelectionDAG building. llvm-svn: 191206	2013-09-23 14:54:39 +00:00
Arnold Schwaighofer	d743feef81	SLPVectorizer: Fix multiline comment warning llvm-svn: 191135	2013-09-21 05:37:30 +00:00
Arnold Schwaighofer	500242d4fe	Reapply "SLPVectorizer: Handle more horizontal reductions (disabled)"" Reapply r191108 with a fix for a memory corruption error I introduced. Of course, we can't reference the scalars that we replace by vectorizing and then call their eraseFromParent method. I only 'needed' the scalars to get the DebugLoc. Just store the DebugLoc before actually vectorizing instead. As a nice side effect, this also simplifies the interface between BoUpSLP and the HorizontalReduction class to returning a value pointer (the vectorized tree root). radar://14607682 llvm-svn: 191123	2013-09-21 01:06:00 +00:00
Nadav Rotem	3371172a67	LoopVectorizer: Only allow vectorization of intrinsics. We can't know for sure that the functions 'abs' or 'round' are the functions from libm. rdar://15012650 llvm-svn: 191122	2013-09-21 00:27:05 +00:00
Arnold Schwaighofer	f1dfbfdde1	Revert "SLPVectorizer: Handle more horizontal reductions (disabled)" This reverts commit r191108. The horizontal.ll test case fails under libgmalloc. Thanks Shuxin for pointing this out to me. llvm-svn: 191121	2013-09-21 00:06:20 +00:00
Arnold Schwaighofer	4724963112	SLPVectorizer: Handle more horizontal reductions (disabled) Match reductions starting at binary operation feeding into a phi. The code handles trees like r += v1 + v2 + v3 ... and r += v1 r += v2 ... and r *= v1 + v2 + ... We currently only handle associative operations (add, fadd fast). The code can now also handle reductions feeding into stores. a[i] = v1 + v2 + v3 + ... The code is currently disabled behind the flag "-slp-vectorize-hor". The cost model for most architectures is not there yet. I found one opportunity of a horizontal reduction feeding a phi in TSVC (LoopRerolling-flt) and there are several opportunities where reductions feed into stores. radar://14607682 llvm-svn: 191108	2013-09-20 21:18:20 +00:00
Robert Lytton	f637e2cb23	Prevent LoopVectorizer and SLPVectorizer running if the target has no vector registers. XCore target: Add XCoreTargetTransformInfo This is where getNumberOfRegisters() resides, which in turn returns the number of vector registers (=0). llvm-svn: 190936	2013-09-18 12:43:35 +00:00
Craig Topper	be3e01e61f	Revert accidental commit I had to make to get the test case in PR17268 to still work correctly. llvm-svn: 190917	2013-09-18 04:10:17 +00:00
Craig Topper	98064b9f4d	Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268. llvm-svn: 190916	2013-09-18 03:55:53 +00:00
Arnold Schwaighofer	4a3dcaa193	SLPVectorizer: Don't vectorize phi nodes that use invoke values We can't insert an insertelement after an invoke. We would have to split a critical edge. So when we see a phi node that uses an invoke we just give up. radar://14990770 llvm-svn: 190871	2013-09-17 17:03:29 +00:00
Arnold Schwaighofer	53e622cef4	Don't vectorize if there are outside loop users of the induction variable. We would have to compute the pre increment value, either by computing it on every loop iteration or by splitting the edge out of the loop and inserting a computation for it there. For now, just give up vectorizing such loops. Fixes PR17179. llvm-svn: 190790	2013-09-16 16:17:24 +00:00
Eli Friedman	05906faa4d	Don't assert on invalid loop vectorization hint. llvm-svn: 190450	2013-09-10 23:45:25 +00:00
Benjamin Kramer	934f6f39f4	LoopVectorize: PHI nodes are always at the beginning of a block, no need to scan the whole block. llvm-svn: 190422	2013-09-10 18:46:15 +00:00
Yi Jiang	aeb5b46a85	In this patch we are trying to do two things: 1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg. 2) We do not vectorize the width which is not power of two. The performance result shows it will help some spec benchmarks. mesa improved 6.97% and ammp improved 1.54%. llvm-svn: 189830	2013-09-03 17:26:04 +00:00
Hal Finkel	6d09904cc9	Disable unrolling in the loop vectorizer when disabled in the pass manager When unrolling is disabled in the pass manager, the loop vectorizer should also not unroll loops. This will allow the -fno-unroll-loops option in Clang to behave as expected (even for vectorizable loops). The loop vectorizer's -force-vector-unroll option will (continue to) override the pass-manager setting (including -force-vector-unroll=0 to force use of the internal auto-selection logic). In order to test this, I added a flag to opt (-disable-loop-unrolling) to force disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also, this fixes a small bug in opt where the loop vectorizer was enabled only after the pass manager populated the queue of passes (the global_alias.ll test needed a slight update to the RUN line as a result of this fix). llvm-svn: 189499	2013-08-28 18:33:10 +00:00
Nadav Rotem	6b41f7cc4c	Refactor 'vectorizeLoop' no functionality change. This patch merges LoopVectorize of InnerLoopVectorizer and InnerLoopUnroller by adding checks for VF=1. This helps in erasing the Unroller code that is almost identical to the InnerLoopVectorizer code. llvm-svn: 189391	2013-08-27 18:52:47 +00:00
Matt Arsenault	ed9f76d37b	Fix inserting instructions before last in bundle. The builder inserts from before the insert point, not after, so this would insert before the last instruction in the bundle instead of after it. I'm not sure if this can actually be a problem with any of the current insertions. llvm-svn: 189285	2013-08-26 23:08:37 +00:00
Nadav Rotem	bdc9ff4498	LoopVectorize: Implement partial loop unrolling when vectorization is not profitable. This patch enables unrolling of loops when vectorization is legal but not profitable. We add a new class InnerLoopUnroller, that extends InnerLoopVectorizer and replaces some of the vector-specific logic with scalars. This patch does not introduce any runtime regressions and improves the following workloads: SingleSource/Benchmarks/Shootout/matrix -22.64% SingleSource/Benchmarks/Shootout-C++/matrix -13.06% External/SPEC/CINT2006/464_h264ref/464_h264ref -3.99% SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -1.95% llvm-svn: 189281	2013-08-26 22:33:26 +00:00
Yi Jiang	7107d41574	test commit. Remove blank line llvm-svn: 189265	2013-08-26 18:57:55 +00:00
Matt Arsenault	bcd8c577d7	Fix unused variable in release build llvm-svn: 189264	2013-08-26 18:38:29 +00:00
Matt Arsenault	8f21c838c0	Constify functions llvm-svn: 189234	2013-08-26 17:56:38 +00:00
Matt Arsenault	39274be65f	Vectorize starting from insertelements building a vector llvm-svn: 189233	2013-08-26 17:56:35 +00:00
Matt Arsenault	8405888af1	Check if in set on insertion instead of separately llvm-svn: 189179	2013-08-24 19:55:38 +00:00
Chandler Carruth	1c34afcb61	Teach the SLP vectorizer the correct way to check for consecutive access using GEPs. Previously, it used a number of different heuristics for analyzing the GEPs. Several of these were conservatively correct, but failed to fall back to SCEV even when SCEV might have given a reasonable answer. One was simply incorrect in how it was formulated. There was good code already to recursively evaluate the constant offsets in GEPs, look through pointer casts, etc. I gathered this into a form code like the SLP code can use in a previous commit, which allows all of this code to become quite simple. There is some performance (compile time) concern here at first glance as we're directly attempting to walk both pointers constant GEP chains. However, a couple of thoughts: 1) The very common cases where there is a dynamic pointer, and a second pointer at a constant offset (usually a stride) from it, this code will actually not do any unnecessary work. 2) InstCombine and other passes work very hard to collapse constant GEPs, so it will be rare that we iterate here for a long time. That said, if there remain performance problems here, there are some obvious things that can improve the situation immensely. Doing a vectorizer-pass-wide memoizer for each individual layer of pointer values, their base values, and the constant offset is likely to be able to completely remove redundant work and strictly limit the scaling of the work to scrape these GEPs. Since this optimization was not done on the prior version (which would still benefit from it), I've not done it here. But if folks have benchmarks that slow down it should be straight forward for them to add. I've added a test case, but I'm not really confident of the amount of testing done for different access patterns, strides, and pointer manipulation. llvm-svn: 189007	2013-08-22 12:45:17 +00:00
Matt Arsenault	f599d97449	Teach LoopVectorize about address space sizes llvm-svn: 188980	2013-08-22 02:42:55 +00:00
Matt Arsenault	745832dcc9	Use attribute helper function llvm-svn: 188916	2013-08-21 18:54:50 +00:00
Matt Arsenault	3c71dabd88	Fix typo llvm-svn: 188915	2013-08-21 18:54:47 +00:00
Arnold Schwaighofer	e1f3ab69d1	SLPVectorizer: Fix invalid iterator errors Update iterator when the SLP vectorizer changes the instructions in the basic block by restarting the traversal of the basic block. Patch by Yi Jiang! Fixes PR 16899. llvm-svn: 188832	2013-08-20 21:21:45 +00:00
Hal Finkel	0c5c01aa4a	Add a llvm.copysign intrinsic This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728	2013-08-19 23:35:46 +00:00
Joerg Sonnenberger	8e3050db51	PR 16899: Do not modify the basic block using the iterator, but keep the next value. This avoids crashes due to invalidation. Patch by Joey Gouly. llvm-svn: 188605	2013-08-17 11:04:47 +00:00
Matt Arsenault	5cae894a13	Fix spelling llvm-svn: 188506	2013-08-15 23:11:03 +00:00
Hal Finkel	1a61f621da	BBVectorize: Add initial stores to the write set when tracking uses When computing the use set of a store, we need to add the store to the write set prior to iterating over later instructions. Otherwise, if there is a later aliasing load of that store, that load will not be tagged as a use, and bad things will happen. trackUsesOfI still adds later dependent stores of an instruction to that instruction's write set, but it never sees the original instruction, and so when tracking uses of a store, the store must be added to the write set by the caller. Fixes PR16834. llvm-svn: 188329	2013-08-13 23:34:32 +00:00
Nadav Rotem	e23147bbd4	Fix PR16797 - Support PHINodes with multiple inputs from the same basic block. Do not generate new vector values for the same entries because we know that the incoming values from the same block must be identical. llvm-svn: 188185	2013-08-12 17:46:44 +00:00
Hal Finkel	171817ee8a	Add ISD::FROUND for libm round() All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926	2013-08-07 22:49:12 +00:00
Arnold Schwaighofer	a7cd6bf3bb	LoopVectorize: Allow vectorization of loops with lifetime markers Patch by Marc Jessome! llvm-svn: 187825	2013-08-06 22:37:52 +00:00
Nadav Rotem	5defea90e6	SLPVectorizer: Fix PR16777. PHInodes may use multiple extracted values that come from different blocks. Thanks Alexey Samsonov. llvm-svn: 187663	2013-08-02 18:40:24 +00:00
Nadav Rotem	25f15358d2	80-col llvm-svn: 187535	2013-07-31 22:17:45 +00:00
Nadav Rotem	d9c74cc6d3	SLPVectorier: update the debug location for the new instructions. llvm-svn: 187363	2013-07-29 18:18:46 +00:00
Nadav Rotem	750e42cba3	Don't vectorize when the attribute NoImplicitFloat is used. llvm-svn: 187340	2013-07-29 05:13:00 +00:00
Nadav Rotem	3e50c68956	Update the comment llvm-svn: 187316	2013-07-27 23:28:47 +00:00
Nadav Rotem	cfd40da9b1	SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. llvm-svn: 187267	2013-07-26 23:07:55 +00:00
Nadav Rotem	9ce0f779bc	SLP Vectorizer: Disable the vectorization of non power of two chains, such as <3 x float>, because we dont have a good cost model for these types. llvm-svn: 187265	2013-07-26 22:53:11 +00:00
Nadav Rotem	cf0dcdc71c	When we vectorize across multiple basic blocks we may vectorize PHINodes that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can. llvm-svn: 186883	2013-07-22 22:18:07 +00:00
Nadav Rotem	8c45d4b27f	Fix an obvious typo in the loop vectorizer where the cost model uses the wrong variable. The variable BlockCost is ignored. We don't have tests for the effect of if-conversion loops because it requires a big test (that includes if-converted loops) and it is difficult to find and balance a loop to do the right thing. llvm-svn: 186845	2013-07-22 17:10:48 +00:00
Nadav Rotem	d7ff88a8d9	Delete unused helper functions. llvm-svn: 186808	2013-07-22 05:19:22 +00:00
Nadav Rotem	f6bb6a464c	Revert a part of r186420. Don't forbid multiple store chains that merge. llvm-svn: 186786	2013-07-21 06:12:57 +00:00
Nadav Rotem	e210839f5b	fix an 80-col line. llvm-svn: 186733	2013-07-19 23:14:01 +00:00
Nadav Rotem	c069c25518	Use LLVMs ADTs that improve the compile time of this pass. llvm-svn: 186732	2013-07-19 23:12:19 +00:00
Nadav Rotem	5c9a193a65	SLPVectorizer: Improve the compile time of isConsecutive by reordering the conditions that check GEPs and eliminate two of the calls to accumulateConstantOffset. llvm-svn: 186731	2013-07-19 23:11:15 +00:00
Nadav Rotem	bb3398f000	Handle constants without going through SCEV. llvm-svn: 186593	2013-07-18 18:34:21 +00:00
Nadav Rotem	de2815a5f7	SLPVectorizer: Speedup isConsecutive by manually checking GEPs with multiple indices. This brings the compile time of the SLP-Vectorizer to about 2.5% of OPT for my testcase. llvm-svn: 186592	2013-07-18 18:20:45 +00:00
Nadav Rotem	7d7036b8c6	SLPVectorizer: Speedup isConsecutive (that checks if two addresses are consecutive in memory) by checking for additional patterns that don't need to go through SCEV. llvm-svn: 186563	2013-07-18 04:33:20 +00:00
Nadav Rotem	43639e8492	Fix a comment. llvm-svn: 186541	2013-07-17 22:41:16 +00:00
Nadav Rotem	3072baeb9c	Add a micro optimization to catch cases where the PtrA equals PtrB. llvm-svn: 186531	2013-07-17 19:52:25 +00:00
Nadav Rotem	2202317fce	SLPVectorizer: Accelerate the isConsecutive check by replacing the subtraction of the two values with a simple SCEV expression that adds the offset to one of the pointers that we compare. llvm-svn: 186479	2013-07-17 00:48:31 +00:00
Nadav Rotem	d2e8c4cdea	flip the scev minus direction to simplify the code. llvm-svn: 186466	2013-07-16 22:57:06 +00:00
Nadav Rotem	8f924f3891	SLPVectorizer: Improve the compile time of isConsecutive by adding a simple constant-gep check before using SCEV. This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV. llvm-svn: 186465	2013-07-16 22:51:07 +00:00
Nadav Rotem	26bf9a0c75	SLPVectorizer: Reduce the compile time of the consecutive store lookup. Process groups of stores in chunks of 16. llvm-svn: 186420	2013-07-16 15:25:17 +00:00
Nadav Rotem	1c1d6c1666	PR16628: Fix a bug in the code that merges compares. Compares return i1 but they compare different types. llvm-svn: 186359	2013-07-15 22:52:48 +00:00
Nadav Rotem	d9f3f4548e	SLPVectorizer: change the order in which we search for vectorization candidates. Do stores first and PHIs second. llvm-svn: 186277	2013-07-14 06:15:46 +00:00
Craig Topper	b94011fd28	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. llvm-svn: 186274	2013-07-14 04:42:23 +00:00
Arnold Schwaighofer	a92eeebde8	LoopVectorizer: Disallow reductions whose header phi is used outside the loop If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256	2013-07-13 19:09:29 +00:00
Andrew Trick	0ae8c94f8f	LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander. In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241	2013-07-13 06:20:06 +00:00
Arnold Schwaighofer	9da9a43af8	TargetTransformInfo: address calculation parameter for gather/scather Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits. radar://14351991 llvm-svn: 186187	2013-07-12 19:16:02 +00:00
Nadav Rotem	89c41bf06a	SLPVectorizer: Sink and enable CSE for ExtractElements. llvm-svn: 186145	2013-07-12 06:09:24 +00:00
Nadav Rotem	fa3c2db211	SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes. Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler. llvm-svn: 186139	2013-07-12 00:04:18 +00:00
Nadav Rotem	db06b139fd	Remove an argument that we dont use anymore. llvm-svn: 186116	2013-07-11 20:56:13 +00:00
Arnold Schwaighofer	e97c71b8fd	LoopVectorize: Vectorize all accesses in address space zero with unit stride We can vectorize them because in the case where we wrap in the address space the unvectorized code would have had to access a pointer value of zero which is undefined behavior in address space zero according to the LLVM IR semantics. (Thank you Duncan, for pointing this out to me). Fixes PR16592. llvm-svn: 186088	2013-07-11 15:21:55 +00:00
Nadav Rotem	08efb262a9	Fix a warning. llvm-svn: 186064	2013-07-11 05:39:02 +00:00
Nadav Rotem	b8dd66f655	SLPVectorizer: refactor the code that places extracts. Place the code that decides where to put extracts in the build-tree phase. This allows us to take the cost of the extracts into account. llvm-svn: 186058	2013-07-11 04:54:05 +00:00
Nadav Rotem	d7b574e5b3	Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform. llvm-svn: 185970	2013-07-09 21:38:08 +00:00
Nadav Rotem	861bef7dd0	Set the default insert point to the first instruction, and not to end() llvm-svn: 185953	2013-07-09 17:55:36 +00:00
Nadav Rotem	c9c57518ab	This patch changes the saved IRBuilder insert point from BasicBlock::iterator to AssertingVH. Commit 185883 fixes a bug in the IRBuilder that should fix the ASan bot. AssertingVH can help in exposing some RAUW problems. Thanks Ben and Alexey! llvm-svn: 185886	2013-07-08 23:31:13 +00:00
Nadav Rotem	2ee35771a8	Clear the builder insert point between tree-vectorization phases. llvm-svn: 185777	2013-07-07 14:57:18 +00:00
Nadav Rotem	2041b742d4	SLPVectorizer: Implement DCE as part of vectorization. This is a complete re-write if the bottom-up vectorization class. Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization. There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design. In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree. llvm-svn: 185774	2013-07-07 06:57:07 +00:00
Craig Topper	af0dea1347	Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. llvm-svn: 185606	2013-07-04 01:31:24 +00:00
Arnold Schwaighofer	ef51cf202b	LoopVectorize: Math functions only read rounding mode Math functions are mark as readonly because they read the floating point rounding mode. Because we don't vectorize loops that would contain function calls that set the rounding mode it is safe to ignore this memory read. llvm-svn: 185299	2013-07-01 00:54:44 +00:00
Benjamin Kramer	4ab72f9b9a	LoopVectorizer: Pack MemAccessInfo pairs. llvm-svn: 185263	2013-06-29 17:52:08 +00:00
Benjamin Kramer	53545693d7	Move helper classes into anonymous namespaces. llvm-svn: 185262	2013-06-29 17:02:06 +00:00
Nadav Rotem	0a25727f31	We preserve the CFG and some of the analysis passes. llvm-svn: 185251	2013-06-29 05:38:15 +00:00
Nadav Rotem	e00343446c	Update docs. llvm-svn: 185250	2013-06-29 05:37:19 +00:00
Nadav Rotem	060be733a5	SLP Vectorizer: Add support for trees with external users. To support this we have to insert 'extractelement' instructions to pick the right lane. We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated. llvm-svn: 185230	2013-06-28 22:07:09 +00:00
Nadav Rotem	9ce3fedcdd	LoopVectorizer: Refactor the code that checks if it is safe to predicate blocks. In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks. We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault. llvm-svn: 185214	2013-06-28 20:46:27 +00:00
Arnold Schwaighofer	ce2c766f61	LoopVectorize: Pull dyn_cast into setDebugLocFromInst llvm-svn: 185168	2013-06-28 17:14:48 +00:00
Arnold Schwaighofer	3b27b992ca	LoopVectorize: Use static function instead of DebugLocSetter class I used the class to safely reset the state of the builder's debug location. I think I have caught all places where we need to set the debug location to a new one. Therefore, we can replace the class by a function that just sets the debug location. llvm-svn: 185165	2013-06-28 16:26:54 +00:00
Arnold Schwaighofer	12ecb331af	LoopVectorize: Preserve debug location info radar://14169017 llvm-svn: 185122	2013-06-28 00:38:54 +00:00
Arnold Schwaighofer	38de7cd464	LoopVectorize: Cache edge masks created during if-conversion Otherwise, we end up with an exponential IR blowup. Fixes PR16472. llvm-svn: 185097	2013-06-27 20:31:06 +00:00
Arnold Schwaighofer	a2dd195fb3	LoopVectorize: Use vectorized loop invariant gep index anchored in loop Use vectorized instruction instead of original instruction anchored in the original loop. Fixes PR16452 and t2075.c of PR16455. llvm-svn: 185081	2013-06-27 15:11:55 +00:00
Arnold Schwaighofer	ccd6c9929b	LoopVectorize: Don't store a reversed value in the vectorized value map When we store values for reversed induction stores we must not store the reversed value in the vectorized value map. Another instruction might use this value. This fixes 3 test cases of PR16455. llvm-svn: 185051	2013-06-27 00:45:41 +00:00
Nadav Rotem	8edefb3665	No need to use a Set when a vector would do. llvm-svn: 185047	2013-06-27 00:14:13 +00:00
Nadav Rotem	93f880fb77	SLP: When searching for vectorization opportunities scan the blocks in post-order because we grow chains upwards. llvm-svn: 185041	2013-06-26 23:44:45 +00:00
Nadav Rotem	7f0d6d7975	SLP: Dont erase instructions during vectorization because it prevents the outerloops from iterating over the instructions. llvm-svn: 185040	2013-06-26 23:43:23 +00:00
Nadav Rotem	4c5b2d1de6	Erase all of the instructions that we RAUWed llvm-svn: 184969	2013-06-26 17:16:09 +00:00
Nadav Rotem	f4ca3994b8	Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited. llvm-svn: 184966	2013-06-26 16:54:53 +00:00
Nadav Rotem	0794acc1da	SLPVectorizer: support slp-vectorization of PHINodes between basic blocks llvm-svn: 184888	2013-06-25 23:04:09 +00:00
Nadav Rotem	3de032a3b6	Fix a typo in the code that collected the costs recursively. llvm-svn: 184827	2013-06-25 05:30:56 +00:00
Nadav Rotem	9c7c997a7e	Rename the variable to fix a warning. Thanks Andy Gibbs. llvm-svn: 184749	2013-06-24 15:59:47 +00:00
Arnold Schwaighofer	b252c11ccc	Reapply 184685 after the SetVector iteration order fix. This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg testers. "LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598" llvm-svn: 184724	2013-06-24 12:09:15 +00:00
Arnold Schwaighofer	91472fa4fc	LoopVectorize: Use SetVector for the access set We are creating the runtime checks using this set so we need a deterministic iteration order. llvm-svn: 184723	2013-06-24 12:09:12 +00:00
Arnold Schwaighofer	58ca945f38	Revert "LoopVectorize: Use the dependence test utility class" This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac. We are seeing a stage2 and stage3 miscompare on some dragonegg bots. llvm-svn: 184690	2013-06-24 06:10:41 +00:00
Arnold Schwaighofer	b914a7e2ef	LoopVectorize: Use the dependence test utility class We now no longer need alias analysis - the cases that alias analysis would handle are now handled as accesses with a large dependence distance. We can now vectorize loops with simple constant dependence distances. for (i = 8; i < 256; ++i) { a[i] = a[i+4] * a[i+8]; } for (i = 8; i < 256; ++i) { a[i] = a[i-4] * a[i-8]; } We would be able to vectorize about 200 more loops (in many cases the cost model instructs us no to) in the test suite now. Results on x86-64 are a wash. I have seen one degradation in ammp. Interestingly, the function in which we now vectorize a loop is never executed so we probably see some instruction cache effects. There is a 2% improvement in h264ref. There is one or the other TSCV loop kernel that speeds up. radar://13681598 llvm-svn: 184685	2013-06-24 03:55:48 +00:00
Arnold Schwaighofer	d517976758	LoopVectorize: Add utility class for checking dependency among accesses This class checks dependences by subtracting two Scalar Evolution access functions allowing us to catch very simple linear dependences. The checker assumes source order in determining whether vectorization is safe. We currently don't reorder accesses. Positive true dependencies need to be a multiple of VF otherwise we impede store-load forwarding. llvm-svn: 184684	2013-06-24 03:55:45 +00:00
Arnold Schwaighofer	d57419696d	LoopVectorize: Add utility class for building sets of dependent accesses Sets of dependent accesses are built by unioning sets based on underlying objects. This class will be used by the upcoming dependence checker. llvm-svn: 184683	2013-06-24 03:55:44 +00:00
Nadav Rotem	210e86d7c4	SLP Vectorizer: Add support for vectorizing parts of the tree. Untill now we detected the vectorizable tree and evaluated the cost of the entire tree. With this patch we can decide to trim-out branches of the tree that are not profitable to vectorizer. Also, increase the max depth from 6 to 12. In the worse possible case where all of the code is made of diamond-shaped graph this can bring the cost to 2**10, but diamonds are not very common. llvm-svn: 184681	2013-06-24 02:52:43 +00:00
Nadav Rotem	0323925d51	SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences. Make sure that we don't replace and RAUW two sequences if one does not dominate the other. llvm-svn: 184674	2013-06-23 21:57:27 +00:00
Nadav Rotem	78428401e9	SLP Vectorizer: Erase instructions outside the vectorizeTree method. The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization. llvm-svn: 184671	2013-06-23 19:38:56 +00:00
Nadav Rotem	eb65e67eea	SLP Vectorizer: Implement a simple CSE optimization for the gather sequences. llvm-svn: 184660	2013-06-23 06:15:46 +00:00
Nadav Rotem	80de0a28f1	SLP Vectorizer: Implement multi-block slp-vectorization. Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks. It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function. I removed the support for extracting values from trees. We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2). llvm-svn: 184647	2013-06-22 21:34:10 +00:00
Nadav Rotem	e1713e5fcf	SLP Vectorizer: do not search for store-chains that are wider than the vector-register size. llvm-svn: 184527	2013-06-21 04:18:13 +00:00
Nadav Rotem	b488beefeb	Clang-format the SLP vectorizer. No functionality change. llvm-svn: 184446	2013-06-20 17:54:36 +00:00
Nadav Rotem	14a89c5428	SLPVectorization: Add a basic support for cross-basic block slp vectorization. We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent hints for vectorization of other basic blocks. llvm-svn: 184444	2013-06-20 17:41:45 +00:00
Nadav Rotem	c41028a013	Change the debug type to match the debug type that is used by vecutils.cpp. This change makes it easier to filter debug messages. llvm-svn: 184440	2013-06-20 16:38:05 +00:00
Nadav Rotem	1e9668ea81	SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst). llvm-svn: 184325	2013-06-19 17:33:16 +00:00
Nadav Rotem	86e848c849	SLPVectorizer: start constructing chains at stores that are not power of two. The type <3 x i8> is a common in graphics and we want to be able to vectorize it. This changes accelerates bullet by 12% and 471_omnetpp by 5%. llvm-svn: 184317	2013-06-19 15:57:29 +00:00
Nadav Rotem	e98da7f548	SLPVectorizer: vectorize compares and selects. llvm-svn: 184282	2013-06-19 05:49:52 +00:00
Nadav Rotem	4f3224f3ed	Document the return value and fix a typo. llvm-svn: 184281	2013-06-19 05:47:33 +00:00
Nadav Rotem	1f96427da0	Scan the successor blocks and use the PHI nodes as a hint for possible chain roots. llvm-svn: 184201	2013-06-18 15:58:05 +00:00
Nadav Rotem	3349feac4e	Add a return value to make this function more useful. llvm-svn: 184200	2013-06-18 15:57:12 +00:00
Pekka Jaaskelainen	eb90fd1c3b	Fix for a regression caused by the LoopVectorizer when vectorizing loops with memory accesses to non-zero address spaces. It simply dropped the AS info. Fixes PR16306. llvm-svn: 184103	2013-06-17 18:49:06 +00:00
Arnold Schwaighofer	7b1b4db35e	LoopVectorize: Change API call to get the backedge taken count Use ScalarEvolution's getBackedgeTakenCount API instead of getExitCount since that is really what we want to know. Using the more specific getExitCount was safe because we made sure that there is only one exiting block. No functionality change. llvm-svn: 183047	2013-05-31 21:48:56 +00:00
Arnold Schwaighofer	70a9be5297	LoopVectorize: PHIs with only outside users should prevent vectorization We check that instructions in the loop don't have outside users (except if they are reduction values). Unfortunately, we skipped this check for if-convertable PHIs. Fixes PR16184. llvm-svn: 183035	2013-05-31 19:53:50 +00:00
NAKAMURA Takumi	d11b42aaad	LoopVectorize.cpp: Fix abuse of StringRef on Twine. Twine captures the pointer of StringRef. llvm-svn: 182820	2013-05-29 03:13:47 +00:00
NAKAMURA Takumi	d57ea87080	Whitespace. llvm-svn: 182819	2013-05-29 03:13:41 +00:00
Paul Redmond	5fdf836ba4	Add support for llvm.vectorizer metadata - llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic by making the root of additional loop metadata. - Loop::isAnnotatedParallel now looks for llvm.loop and associated llvm.mem.parallel_loop_access - document llvm.loop and update llvm.mem.parallel_loop_access - add support for llvm.vectorizer.width and llvm.vectorizer.unroll - document llvm.vectorizer.* metadata - add utility class LoopVectorizerHints for getting/setting loop metadata - use llvm.vectorizer.width=1 to indicate already vectorized instead of already_vectorized - update existing tests that used llvm.loop.parallel and llvm.vectorizer.already_vectorized Reviewed by: Nadav Rotem llvm-svn: 182802	2013-05-28 20:00:34 +00:00
Benjamin Kramer	6ac1e62377	LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in it, don't assert on those cases. Fixes PR16139. llvm-svn: 182656	2013-05-24 18:05:35 +00:00
Nadav Rotem	9e00eb38a2	SLPVectorizer: Change the order in which new instructions are added to the function. We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs). * Imroved the numbering API. * Changed the placement of new instructions to the last root. * Fixed a bug with external tree users with non-zero lane. * Fixed a bug in the placement of in-tree users. llvm-svn: 182508	2013-05-22 19:47:32 +00:00
Arnold Schwaighofer	12b0d1cda0	LoopVectorize: Make Value pointers that could be RAUW'ed a VH The Value pointers we store in the induction variable list can be RAUW'ed by a call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing in some other places where we store pointers that could potentially be RAUW'ed. Fixes PR16073. llvm-svn: 182485	2013-05-22 16:54:56 +00:00
Arnold Schwaighofer	693a1ca628	LoopVectorize: Handle single edge PHIs We might encouter single edge PHIs - handle them with an identity select. Fixes PR15990. llvm-svn: 182199	2013-05-18 18:38:34 +00:00
Benjamin Kramer	d84a63398e	LoopVectorize: Simplify code. No functionality change. llvm-svn: 182100	2013-05-17 14:48:17 +00:00
Arnold Schwaighofer	88e7fddc8c	LoopVectorize: Move call of canHoistAllLoads to canVectorizeWithIfConvert We only want to check this once, not for every conditional block in the loop. No functionality change (except that we don't perform a check redudantly anymore). llvm-svn: 181942	2013-05-15 22:38:14 +00:00
Arnold Schwaighofer	09cee97270	LoopVectorize: Fix comments No functionality change. llvm-svn: 181862	2013-05-15 02:02:45 +00:00
Arnold Schwaighofer	2d920477a4	LoopVectorize: Hoist conditional loads if possible InstCombine can be uncooperative to vectorization and sink loads into conditional blocks. This prevents vectorization. Undo this optimization if there are unconditional memory accesses to the same addresses in the loop. radar://13815763 llvm-svn: 181860	2013-05-15 01:44:30 +00:00
Arnold Schwaighofer	2e7a922a15	LoopVectorize: Handle loops with multiple forward inductions We used to give up if we saw two integer inductions. After this patch, we base further induction variables on the chosen one like we do in the reverse induction and pointer induction case. Fixes PR15720. radar://13851975 llvm-svn: 181746	2013-05-14 00:21:18 +00:00
Duncan Sands	0480b9b54e	Suppress GCC compiler warnings in release builds about variables that are only read in asserts. llvm-svn: 181689	2013-05-13 07:50:47 +00:00
Nadav Rotem	33dcf0a70f	SLPVectorizer: Swap LHS and RHS. No functionality change. llvm-svn: 181684	2013-05-13 05:13:13 +00:00
Nadav Rotem	ce42cc6d4d	SLPVectorizer: Fix a bug in the code that generates extracts for values with multiple users. The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract. llvm-svn: 181674	2013-05-12 22:58:45 +00:00
Nadav Rotem	cbf6d24d50	SLPVectorizer: Clear the map that maps between scalars to vectors after each round of vectorization. Testcase in the next commit. llvm-svn: 181673	2013-05-12 22:55:57 +00:00
Arnold Schwaighofer	f2305e4467	LoopVectorize: Use the widest induction variable type Use the widest induction type encountered for the cannonical induction variable. We used to turn the following loop into an empty loop because we used i8 as induction variable type and truncated 1024 to 0 as trip count. int a[1024]; void fail() { int reverse_induction = 1023; unsigned char forward_induction = 0; while ((reverse_induction) >= 0) { forward_induction++; a[reverse_induction] = forward_induction; --reverse_induction; } } radar://13862901 llvm-svn: 181667	2013-05-11 23:04:28 +00:00
Arnold Schwaighofer	a544fefa32	LoopVectorize: Use variable instead of repeated function call No functionality change intended. llvm-svn: 181666	2013-05-11 23:04:26 +00:00
Arnold Schwaighofer	1ba84df437	LoopVectorize: Use IRBuilder interface in more places No functionality change intended. llvm-svn: 181665	2013-05-11 23:04:24 +00:00
Nadav Rotem	cdfb48d2fe	SLPVectorizer: Add support for trees with external users. For example: bar() { int a = A[i]; int b = A[i+1]; B[i] = a; B[i+1] = b; foo(a); <--- a is used outside the vectorized expression. } llvm-svn: 181648	2013-05-10 22:59:33 +00:00
Nadav Rotem	0686e5cb05	Add a debug print llvm-svn: 181647	2013-05-10 22:56:18 +00:00
Arnold Schwaighofer	2e8c69cf97	LoopVectorizer: Don't assert on the absence of induction variables A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495	2013-05-09 00:32:18 +00:00
Arnold Schwaighofer	3610139ac5	LoopVectorizer: Improve reduction variable identification The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369	2013-05-07 21:55:37 +00:00
Arnold Schwaighofer	e78b76fbed	LoopVectorize: getConsecutiveVector must respect signed arithmetic We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286	2013-05-07 04:37:05 +00:00
Nadav Rotem	632b25b743	Update the comment to mention that we use TTI. llvm-svn: 181178	2013-05-06 03:06:36 +00:00
Benjamin Kramer	3e3f2a4b8d	LoopVectorize: Print values instead of pointers in debug output. llvm-svn: 181157	2013-05-05 14:54:52 +00:00
Arnold Schwaighofer	d96e427eac	LoopVectorize: Add support for floating point min/max reductions Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144	2013-05-05 01:54:48 +00:00
Arnold Schwaighofer	f5183729db	LoopVectorizer: Cleanup of miminimum/maximum pattern match code No need for setting the operands. The pointers are going to be bound by the matcher. radar://13723044 llvm-svn: 181142	2013-05-05 01:54:44 +00:00

1 2 3 4 5 ...

543 Commits