llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	57ac766ee9	[LoopAccesses] Change LAA:getInfo to return a constant reference As expected, this required a few more const-correctness fixes. Based on Hal's feedback on D7684. llvm-svn: 229899	2015-02-19 19:15:21 +00:00
Adam Nemet	2bd6e984ef	[LoopAccesses] Split out LoopAccessReport from VectorizerReport The only difference between these two is that VectorizerReport adds a vectorizer-specific prefix to its messages. When LAA is used in the vectorizer context the prefix is added when we promote the LoopAccessReport into a VectorizerReport via one of the constructors. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229897	2015-02-19 19:15:15 +00:00
Adam Nemet	3e87634fd8	[LoopAccesses] Add missing const to APIs in VectorizationReport When I split out LoopAccessReport from this, I need to create some temps so constness becomes necessary. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229896	2015-02-19 19:15:13 +00:00
Adam Nemet	339f42b396	[LoopAccesses] Change debug messages from LV to LAA Also add pass name as an argument to VectorizationReport::emitAnalysis. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229894	2015-02-19 19:15:07 +00:00
Adam Nemet	3bfd93d789	[LoopAccesses] Create the analysis pass This is a function pass that runs the analysis on demand. The analysis can be initiated by querying the loop access info via LAA::getInfo. It either returns the cached info or runs the analysis. Symbolic stride information continues to reside outside of this analysis pass. We may move it inside later but it's not a priority for me right now. The idea is that Loop Distribution won't support run-time stride checking at least initially. This means that when querying the analysis, symbolic stride information can be provided optionally. Whether stride information is used can invalidate the cache entry and rerun the analysis. Note that if the loop does not have any symbolic stride, the entry should be preserved across Loop Distribution and LV. Since currently the only user of the pass is LV, I just check that the symbolic stride information didn't change when using a cached result. On the LV side, LoopVectorizationLegality requests the info object corresponding to the loop from the analysis pass. A large chunk of the diff is due to LAI becoming a pointer from a reference. A test will be added as part of the -analyze patch. Also tested that with AVX, we generate identical assembly output for the testsuite (including the external testsuite) before and after. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229893	2015-02-19 19:15:04 +00:00
Adam Nemet	436018c3ff	[LoopAccesses] Cache the result of canVectorizeMemory LAA will be an on-demand analysis pass, so we need to cache the result of the analysis. canVectorizeMemory is renamed to analyzeLoop which computes the result. canVectorizeMemory becomes the query function for the cached result. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229892	2015-02-19 19:15:00 +00:00
Adam Nemet	c922853b93	[LoopAccesses] Stash the report from the analysis rather than emitting it The transformation passes will query this and then emit them as part of their own report. The currently only user LV is modified to do just that. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229891	2015-02-19 19:14:56 +00:00
Adam Nemet	f219c64723	[LoopAccesses] Make VectorizerParams global + fix for cyclic dep As LAA is becoming a pass, we can no longer pass the params to its constructor. This changes the command line flags to have external storage. These can now be accessed both from LV and LAA. VectorizerParams is moved out of LoopAccessInfo in order to shorten the code to access it. This commits also has the fix (D7731) to the break dependence cycle between the analysis and vector libraries. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229890	2015-02-19 19:14:52 +00:00
Adam Nemet	04d4163e95	Revert "Reformat." This reverts commit r229651. I'd like to ultimately revert r229650 but this reformat stands in the way. I'll reformat the affected files once the the loop-access pass is fully committed. llvm-svn: 229889	2015-02-19 19:14:34 +00:00
NAKAMURA Takumi	a250484c4c	Reformat. llvm-svn: 229651	2015-02-18 08:36:14 +00:00
NAKAMURA Takumi	fa520c5f49	Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector. r229622: "[LoopAccesses] Make VectorizerParams global" r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it" r229624: "[LoopAccesses] Cache the result of canVectorizeMemory" r229626: "[LoopAccesses] Create the analysis pass" r229628: "[LoopAccesses] Change debug messages from LV to LAA" r229630: "[LoopAccesses] Add canAnalyzeLoop" r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport" r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport" r229633: "[LoopAccesses] Add -analyze support" r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference" r229638: "Analysis: fix buildbots" llvm-svn: 229650	2015-02-18 08:34:47 +00:00
Adam Nemet	85fd9f8d09	[LoopAccesses] Change LAA:getInfo to return a constant reference As expected, this required a few more const-correctness fixes. Based on Hal's feedback on D7684. llvm-svn: 229634	2015-02-18 03:44:33 +00:00
Adam Nemet	d7350dbb85	[LoopAccesses] Split out LoopAccessReport from VectorizerReport The only difference between these two is that VectorizerReport adds a vectorizer-specific prefix to its messages. When LAA is used in the vectorizer context the prefix is added when we promote the LoopAccessReport into a VectorizerReport via one of the constructors. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229632	2015-02-18 03:44:25 +00:00
Adam Nemet	8b12afbeee	[LoopAccesses] Add missing const to APIs in VectorizationReport When I split out LoopAccessReport from this, I need to create some temps so constness becomes necessary. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229631	2015-02-18 03:44:20 +00:00
Adam Nemet	d0db4c1395	[LoopAccesses] Change debug messages from LV to LAA Also add pass name as an argument to VectorizationReport::emitAnalysis. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229628	2015-02-18 03:43:37 +00:00
Adam Nemet	d6b7e29815	[LoopAccesses] Create the analysis pass This is a function pass that runs the analysis on demand. The analysis can be initiated by querying the loop access info via LAA::getInfo. It either returns the cached info or runs the analysis. Symbolic stride information continues to reside outside of this analysis pass. We may move it inside later but it's not a priority for me right now. The idea is that Loop Distribution won't support run-time stride checking at least initially. This means that when querying the analysis, symbolic stride information can be provided optionally. Whether stride information is used can invalidate the cache entry and rerun the analysis. Note that if the loop does not have any symbolic stride, the entry should be preserved across Loop Distribution and LV. Since currently the only user of the pass is LV, I just check that the symbolic stride information didn't change when using a cached result. On the LV side, LoopVectorizationLegality requests the info object corresponding to the loop from the analysis pass. A large chunk of the diff is due to LAI becoming a pointer from a reference. A test will be added as part of the -analyze patch. Also tested that with AVX, we generate identical assembly output for the testsuite (including the external testsuite) before and after. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229626	2015-02-18 03:43:24 +00:00
Adam Nemet	01abb2c355	[LoopAccesses] Make blockNeedsPredication static blockNeedsPredication is in LoopAccess in order to share it with the vectorizer. It's a utility needed by LoopAccess not strictly provided by it but it's a good place to share it. This makes the function static so that it no longer required to create an LoopAccessInfo instance in order to access it from LV. This was actually causing problems because it would have required creating LAI much earlier that LV::canVectorizeMemory(). This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229625	2015-02-18 03:43:19 +00:00
Adam Nemet	3cf32ad6db	[LoopAccesses] Cache the result of canVectorizeMemory LAA will be an on-demand analysis pass, so we need to cache the result of the analysis. canVectorizeMemory is renamed to analyzeLoop which computes the result. canVectorizeMemory becomes the query function for the cached result. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229624	2015-02-18 03:42:57 +00:00
Adam Nemet	5474be2c80	[LoopAccesses] Stash the report from the analysis rather than emitting it The transformation passes will query this and then emit them as part of their own report. The currently only user LV is modified to do just that. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229623	2015-02-18 03:42:50 +00:00
Adam Nemet	4f3ede5a01	[LoopAccesses] Make VectorizerParams global As LAA is becoming a pass, we can no longer pass the params to its constructor. This changes the command line flags to have external storage. These can now be accessed both from LV and LAA. VectorizerParams is moved out of LoopAccessInfo in order to shorten the code to access it. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229622	2015-02-18 03:42:43 +00:00
Adam Nemet	30f16e1696	[LoopAccesses] Rename LoopAccessAnalysis to LoopAccessInfo LoopAccessAnalysis will be used as the name of the pass. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229621	2015-02-18 03:42:35 +00:00
Elena Demikhovsky	6f5a859633	Enabled cost calculation for masked memory operations. We already have implementation for cost calculation for masked memory operations. I just call it from the loop vectorizer. llvm-svn: 229290	2015-02-15 08:08:48 +00:00
Duncan P. N. Exon Smith	2c79ad974c	Transforms: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202	2015-02-14 01:11:29 +00:00
Chandler Carruth	30d69c2e36	[PM] Remove the old 'PassManager.h' header file at the top level of LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094	2015-02-13 10:01:29 +00:00
Chandler Carruth	63aaa98d94	[slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out. Apparently some code finally started to tickle this after my canonicalization changes to instcombine. The bug stems from trying to form a vector type out of scalars that aren't compatible at all. In this example, from x86_mmx values. The code in the vectorizer that checks for reasonable types whas checking for aggregates or vectors, but there are lots of other types that should just never reach the vectorizer. Debugging this was made more confusing by the lie in an assert in VectorType::get() -- it isn't that the types are primitive. The types must be integer, pointer, or floating point types. No other types are allowed. I've improved the assert and added a helper to the vectorizer to handle the element type validity checks. It now re-uses the VectorType static function and then further excludes weird target-specific types that we probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of these are really reachable anyways (neither 80-bit nor 128-bit things will get vectorized) but it seems better to just eagerly exclude such nonesense. I've added a test case, but while it definitely covers two of the paths through this code there may be more paths that would benefit from test coverage. I'm not familiar enough with the SLP vectorizer to synthesize test cases for all of these, but was able to update the code itself by inspection. llvm-svn: 228899	2015-02-12 02:30:56 +00:00
Zachary Turner	3bd47cee78	Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects. This allows IDEs to recognize the entire set of header files for each of the core LLVM projects. Differential Revision: http://reviews.llvm.org/D7526 Reviewed By: Chris Bieneman llvm-svn: 228798	2015-02-11 03:28:02 +00:00
Bjorn Steinbrink	5ec7522771	Correctly combine alias.scope metadata by a union instead of intersecting Summary: The alias.scope metadata represents sets of things an instruction might alias with. When generically combining the metadata from two instructions the result must be the union of the original sets, because the new instruction might alias with anything any of the original instructions aliased with. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7490 llvm-svn: 228525	2015-02-08 17:07:14 +00:00
Adam Nemet	7206d7a5d2	[LV] Move addRuntimeCheck to LoopAccessAnalysis This will allow it to be shared with the new Loop Distribution pass. getFirstInst is currently duplicated across LoopVectorize.cpp and LoopAccessAnalysis.cpp. This is a short-term work-around until we figure out a better solution. NFC. (The code moved is adjusted a bit for the name of the Loop member and that PtrRtCheck is now a reference rather than a pointer.) llvm-svn: 228418	2015-02-06 18:31:04 +00:00
Adam Nemet	5add5d9d85	[LV] Split off memcheck block really at the first check I've noticed this while trying to move addRuntimeCheck to LoopAccessAnalysis. I think that the intention was to early exit from the overflow checking before the code for the memchecks. This is the entire reason why we compute FirstCheckInst but then we don't use that as the splitting instruction but the final check. Looks like an oversight. llvm-svn: 228056	2015-02-03 22:45:39 +00:00
Adam Nemet	b60295a525	[LoopVectorize] Fix rebase glitch in r227751 LoopVectorizationLegality::{getNumLoads,getNumStores} should forward to LoopAccessAnalysis now. Thanks to Takumi for noticing this! llvm-svn: 227992	2015-02-03 17:59:53 +00:00
NAKAMURA Takumi	c7f8bfc5e5	Resurrect initializers for NumLoads and NumStores in LoopVectorizationLegality to suppress undefined behavior. FIXME: Shall they be managed in LAA? llvm-svn: 227940	2015-02-03 03:55:06 +00:00
Erik Eckstein	7330e358f6	Fix: SLPVectorizer crashes with assertion when vectorizing a cmp instruction. The commit r225977 uncovered this bug. The problem was that the vectorizer tried to read the second operand of an already deleted instruction. The bug didn't show up before r225977 because the freed memory still contained a non-null pointer. With r225977 deletion of instructions is delayed and the read operand pointer is always null. llvm-svn: 227800	2015-02-02 12:45:34 +00:00
Benjamin Kramer	ad960019b0	LoopVectorize: Remove initializer list that blocks MSVC. llvm-svn: 227766	2015-02-01 21:13:26 +00:00
Adam Nemet	0456327cfb	[LoopVectorize] Move LoopAccessAnalysis to its own module Other than moving code and adding the boilerplate for the new files, the code being moved is unchanged. There are a few global functions that are shared with the rest of the LoopVectorizer. I moved these to the new module as well (emitLoopAnalysis, stripIntegerCast, replaceSymbolicStrideSCEV) along with the Report class used by emitLoopAnalysis. There is probably room for further improvement in this area. I kept DEBUG_TYPE "loop-vectorize" because it's used as the PassName with emitOptimizationRemarkAnalysis. This will obviously have to change. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227756	2015-02-01 16:56:15 +00:00
Adam Nemet	3f737d935d	[LoopVectorize] Move RuntimePointerCheck under LoopAccessAnalysis This class needs to remain public because it's used by LoopVectorizationLegality::addRuntimeCheck. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227755	2015-02-01 16:56:11 +00:00
Adam Nemet	c237f68fc9	[LoopVectorize] Pass parameters explicitly to MemoryDepChecker Rather than using globals use a structure to pass parameters from the vectorizer. This prepares the class to be moved outside the LoopVectorizer. It's not great how all this is passed through in LoopAccessAnalysis but this is all expected to change once the class start servicing the Loop Distribution pass as well where some of these parameters make no sense. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227754	2015-02-01 16:56:09 +00:00
Adam Nemet	c28ffdcf35	[LoopVectorize] Split out LoopAccessAnalysis from LoopVectorizationLegality Move the canVectorizeMemory functionality from LoopVectorizationLegality to a new class LoopAccessAnalysis and forward users. Currently the collection of the symbolic stride information is kept with LoopVectorizationLegality and it becomes an input to LoopAccessAnalysis. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227751	2015-02-01 16:56:04 +00:00
Adam Nemet	5985971fa5	[LoopVectorize] Add accessors for Num{Stores,Loads,PredStores} in AccessAnalysis These members are moving to LoopAccessAnalysis. The accessors help to hide this. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227750	2015-02-01 16:56:02 +00:00
Adam Nemet	a2abc59dac	[LoopVectorize] Rename the Report class to VectorizationReport This class will become public in the new LoopAccessAnalysis header so the name needs to be more global. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227749	2015-02-01 16:56:00 +00:00
Adam Nemet	ac71cecdc6	[LoopVectorize] Factor out duplicated code into Report::emitAnalysis The logic in emitAnalysis is duplicated across multiple functions. This splits it into a function. Another use will be added by the patchset. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227748	2015-02-01 16:55:58 +00:00
Adam Nemet	6854b9203a	[LoopVectorize] Split out RuntimePointerCheck from LoopVectorizationLegality RuntimePointerCheck will be used through LoopAccessAnalysis in LoopVectorizationLegality. Later in the patchset it will become a local class of LoopAccessAnalysis. NFC. This is part of the patchset that splits out the memory dependence logic from LoopVectorizationLegality into a new class LoopAccessAnalysis. LoopAccessAnalysis will be used by the new Loop Distribution pass. llvm-svn: 227747	2015-02-01 16:55:56 +00:00
Chandler Carruth	fdb9c573f7	[multiversion] Thread a function argument through all the callers of the getTTI method used to get an actual TTI object. No functionality changed. This just threads the argument and ensures code like the inliner can correctly look up the callee's TTI rather than using a fixed one. The next change will use this to implement per-function subtarget usage by TTI. The changes after that should eliminate the need for FTTI as that will have become the default. llvm-svn: 227730	2015-02-01 12:01:35 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Reid Kleckner	f0d389c139	Silence "not all paths return a value" warning in MSVC llvm-svn: 227614	2015-01-30 21:30:57 +00:00
Chandler Carruth	6b1aa9be6a	Fix a warning introduced by r227557 due to a default label in a fully covering switch. llvm-svn: 227575	2015-01-30 13:30:43 +00:00
Hao Liu	8de4f8b1b5	[LoopVectorize] Induction variables: support arbitrary constant step. Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support arbitrary constant steps. Initial patch by Alexey Volkov. Some bug fixes are added in the following version. Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193 llvm-svn: 227557	2015-01-30 05:02:21 +00:00
Erik Eckstein	98df6da740	SLPVectorizer: fix wrong scheduling of atomic load/stores. This fixes PR22306. llvm-svn: 227077	2015-01-26 09:07:04 +00:00
Aaron Ballman	9ada6cd0f6	Silencing a -Wsign-compare warning (all uses of this constant are within unsigned expressions anyway); NFC. llvm-svn: 226826	2015-01-22 13:57:41 +00:00
Erik Eckstein	96cfb9c655	SLPVectorizer: add a second limit for the number of alias checks. Even with the current limit on the number of alias checks, the containing loop has quadratic complexity. This begins to hurt for blocks containing > 1K load/store instructions. This commit introduces a limit for the loop count. It reduces the runtime for such very large blocks. llvm-svn: 226792	2015-01-22 08:20:51 +00:00
Elena Demikhovsky	079b2d8c0c	Fixed a bug in masked load/store in reversed loop. Added a test. The bug was submitted to bugzilla: http://llvm.org/bugs/show_bug.cgi?id=22225 llvm-svn: 226791	2015-01-22 08:20:06 +00:00
Karthik Bhat	0b0f4660fa	Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain. This patch fixes 2 issues in reorderInputsAccordingToOpcode 1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases. 2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode. Thanks Michael for inputs and review. Review: http://reviews.llvm.org/D6677 llvm-svn: 226547	2015-01-20 06:11:00 +00:00
Erik Eckstein	76cb53a839	SLPVectorizer: limit the number of alias checks to reduce the runtime. In case of blocks with many memory-accessing instructions, alias checking can take lot of time (because calculating the memory dependencies has quadratic complexity). I chose a limit which resulted in no changes when running the benchmarks. llvm-svn: 226439	2015-01-19 09:33:38 +00:00
Chandler Carruth	691addc25f	[PM] Now that LoopInfo isn't in the Pass type hierarchy, it is much cleaner to derive from the generic base. Thise removes a ton of boiler plate code and somewhat strange and pointless indirections. It also remove a bunch of the previously needed friend declarations. To fully remove these, I also lifted the verify logic into the generic LoopInfoBase, which seems good anyways -- it is generic and useful logic even for the machine side. llvm-svn: 226385	2015-01-18 01:25:51 +00:00
Chandler Carruth	4f8f307c77	[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373	2015-01-17 14:16:18 +00:00
Alexander Kornienko	8c0809c7f8	Replace size method call of containers to empty method where appropriate This patch was generated by a clang tidy checker that is being open sourced. The documentation of that checker is the following: /// The emptiness of a container should be checked using the empty method /// instead of the size method. It is not guaranteed that size is a /// constant-time function, and it is generally more efficient and also shows /// clearer intent to use empty. Furthermore some containers may implement the /// empty method but not implement the size method. Using empty whenever /// possible makes it easier to switch to another container in the future. Patch by Gábor Horváth! llvm-svn: 226161	2015-01-15 11:41:30 +00:00
Chandler Carruth	b98f63dbdb	[PM] Separate the TargetLibraryInfo object from the immutable pass. The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157	2015-01-15 10:41:28 +00:00
NAKAMURA Takumi	24ebfcb619	Update libdeps since TLI was moved from Target to Analysis in r226078. llvm-svn: 226126	2015-01-15 05:21:00 +00:00
Erik Eckstein	13c4ab89ba	reapply: SLPVectorizer: Cache results from memory alias checking. This speeds up the dependency calculations for blocks with many load/store/call instructions. Beside the improved runtime, there is no functional change. Compared to the original commit, this re-applied commit contains a bug fix which ensures that there are no incorrect collisions in the alias cache. llvm-svn: 225977	2015-01-14 11:24:47 +00:00
Hao Liu	e28d154cd5	Fix a wrong comment in LoopVectorize. I.E. more than two -> exactly two Fix a typo function name in LoopVectorize. I.E. collectStrideAcccess() -> collectStrideAccess() llvm-svn: 225935	2015-01-14 03:02:16 +00:00
Julien Lerouge	0473cb5ab7	Fix non-determinism issue in SLP The issue was introduced in r214638: + for (auto &BSIter : BlocksSchedules) { + scheduleBlock(BSIter.second.get()); + } Because BlocksSchedules is a DenseMap with BasicBlock* keys, blocks are scheduled in non-deterministic order, resulting in unpredictable IR. Patch by Daniel Reynaud! llvm-svn: 225821	2015-01-13 19:45:52 +00:00
Erik Eckstein	a168ef753f	Revert "SLPVectorizer: Cache results from memory alias checking." The alias cache has a problem of incorrect collisions in case a new instruction is allocated at the same address as a previously deleted instruction. llvm-svn: 225790	2015-01-13 14:36:46 +00:00
Erik Eckstein	4a445c047f	SLPVectorizer: Cache results from memory alias checking. This speeds up the dependency calculations for blocks with many load/store/call instructions. Beside the improved runtime, there is no functional change. llvm-svn: 225786	2015-01-13 11:37:51 +00:00
Michael Zolotukhin	d9ade185b9	Update comment. llvm-svn: 225553	2015-01-09 22:15:06 +00:00
Michael Zolotukhin	1c38bc12de	Remove duplicating code. NFC. The removed condition is checked in the previous loop. llvm-svn: 225542	2015-01-09 20:36:19 +00:00
Suyog Sarda	85d0473650	Assumption that "VectorizedValue" will always be an Instruction is not correct. It can be a constant or a vector argument. ex : define i32 @hadd(<4 x i32> %a) #0 { entry: %vecext = extractelement <4 x i32> %a, i32 0 %vecext1 = extractelement <4 x i32> %a, i32 1 %add = add i32 %vecext, %vecext1 %vecext2 = extractelement <4 x i32> %a, i32 2 %add3 = add i32 %add, %vecext2 %vecext4 = extractelement <4 x i32> %a, i32 3 %add5 = add i32 %add3, %vecext4 ret i32 %add5 } llvm-svn: 225517	2015-01-09 10:23:48 +00:00
Jiangning Liu	40c1b35292	Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159	2015-01-05 10:08:58 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
Elena Demikhovsky	84d1997b95	Some code improvements in Masked Load/Store. No functional changes. llvm-svn: 224986	2014-12-30 14:28:14 +00:00
Elena Demikhovsky	fb81b93e17	Masked Load/Store - Changed the order of parameters in intrinsics. No functional changes. The documentation is coming. llvm-svn: 224829	2014-12-25 07:49:20 +00:00
Tilmann Scheller	be98f3c4ec	[BBVectorize] Remove two more redundant assignments. Found by the Clang static analyzer. llvm-svn: 224590	2014-12-19 17:21:38 +00:00
Tilmann Scheller	945ce0ae00	[BBVectorize] Remove redundant assignment. Found by the Clang static analyzer. llvm-svn: 224589	2014-12-19 17:13:12 +00:00
Tilmann Scheller	b811030b47	[LoopVectorize] Remove redundant assignment. Found by the Clang static analyzer. llvm-svn: 224587	2014-12-19 17:02:31 +00:00
Suyog Sarda	43fae93da8	Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it." This was re-ordering floating point data types resulting in mismatch in output. llvm-svn: 224424	2014-12-17 10:34:27 +00:00
Elena Demikhovsky	f5b72afff4	Masked Load and Store Intrinsics in loop vectorizer. The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334	2014-12-16 11:50:42 +00:00
Elena Demikhovsky	3fcafa2cdb	Loop Vectorizer minor changes in the code - some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218	2014-12-14 09:43:50 +00:00
Suyog Sarda	384095e65c	This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it. Test case : float hadd(float* a) { return (a[0] + a[1]) + (a[2] + a[3]); } AArch64 assembly before patch : ldp s0, s1, [x0] ldp s2, s3, [x0, #8] fadd s0, s0, s1 fadd s1, s2, s3 fadd s0, s0, s1 ret AArch64 assembly after patch : ldp d0, d1, [x0] fadd v0.2s, v0.2s, v1.2s faddp s0, v0.2s ret Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html llvm-svn: 224119	2014-12-12 12:53:44 +00:00
Michael Zolotukhin	4def395646	Remove redundant variable. Tested by adding assert(LoopVectorPreHeader == VecPreheader) on LLVM test suite and SPECs. llvm-svn: 223847	2014-12-09 22:45:07 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Duncan P. N. Exon Smith	a48bd07e5e	LoopVectorize: Remove unnecessary RAUW Remove an unnecessary `MDNode::replaceAllUsesWith()`. In the preceding line, `TheLoop->setLoopID()` visits all backedges and sets the new loop ID. This sufficiently updates the loop metadata. Metadata RAUW is going away as part of PR21532. llvm-svn: 223210	2014-12-03 05:41:20 +00:00
Michael Zolotukhin	ea8327b80f	PR21302. Vectorize only bottom-tested loops. rdar://problem/18886083 llvm-svn: 223171	2014-12-02 22:59:06 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Suyog Sarda	aba97f4aba	Vectorize a reduction chain feeding into a 'return' statement. e.x return (a[0]+b[0]) + (a[1]+b[1]) Differential Revision: http://reviews.llvm.org/D6227 llvm-svn: 222364	2014-11-19 16:07:38 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Duncan P. N. Exon Smith	77349e7aaf	IR: Make MDString::getName() private Hide the fact that `MDString`'s string is stored in `Value::Name` -- that's going to change soon. Update the only in-tree client that was using it instead of `Value::getString()`. Part of PR21532. llvm-svn: 221951	2014-11-13 23:59:16 +00:00
Duncan P. N. Exon Smith	de36e8040f	Revert "IR: MDNode => Value" Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711	2014-11-11 21:30:22 +00:00
David Majnemer	bf93e7c7d3	LoopVectorize: Don't assume pointees are sized A pointer's pointee might not be sized: the pointee could be a function. Report this as IK_NoInduction when calculating isInductionVariable. This fixes PR21508. llvm-svn: 221501	2014-11-07 00:31:14 +00:00
Duncan P. N. Exon Smith	3d5a02f677	IR: MDNode => Value: Instruction::getAllMetadataOtherThanDebugLoc() Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of `MDNode` to one of `Value`. Part of PR21433. llvm-svn: 221167	2014-11-03 18:13:57 +00:00
Duncan P. N. Exon Smith	3872d0084c	IR: MDNode => Value: Instruction::getMetadata() Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024	2014-11-01 00:10:31 +00:00
Michael Zolotukhin	9b9624de0c	Correctly update dom-tree after loop vectorizer. llvm-svn: 221009	2014-10-31 22:28:03 +00:00
NAKAMURA Takumi	d0e13af22c	Reformat partially, where I touched for whitespace changes. llvm-svn: 220773	2014-10-28 11:54:52 +00:00
NAKAMURA Takumi	335a7bcf1e	Untabify and whitespace cleanups. llvm-svn: 220771	2014-10-28 11:53:30 +00:00
Benjamin Kramer	26ce8ff637	LoopVectorize: Simplify code. No functionality change. llvm-svn: 220405	2014-10-22 19:13:54 +00:00
Matt Arsenault	d6511b49ac	Add minnum / maxnum intrinsics These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341	2014-10-21 23:00:20 +00:00
Hal Finkel	3b7fc86677	[SLPVectorize] Basic ephemeral-value awareness The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816	2014-10-15 17:35:01 +00:00
Eric Christopher	611f0488ff	No need to cache this unused variable. Patch by Ehsan Akhgari. llvm-svn: 219749	2014-10-14 23:58:51 +00:00
Hal Finkel	1a600faba0	[LoopVectorize] Ignore @llvm.assume for cost estimates and legality A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741	2014-10-14 22:59:49 +00:00
Chandler Carruth	bff0ae772c	[SCEV] Fix one more caller blindly passing the latch to SCEV's getSmallConstantTripCount even when it isn't the exiting block. I missed this in my first audit, very sorry. This was found in LNT and elsewhere. I don't have a test case, but it was completely obvious from inspection that this was the problem. I'll see if I can reduce a test case, but I'm not really hopeful, and the value seems quite low. llvm-svn: 219562	2014-10-11 05:28:30 +00:00
Chandler Carruth	6666c27e99	[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch is the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a generic fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually wants to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the right fix here! llvm-svn: 219550	2014-10-11 00:12:11 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Sanjay Patel	9433a28845	Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802). The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar instructions into vectors. But this isn't a simple copy - we need to take the intersection (the logical 'and') of the sets of flags on the scalars. The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after: http://reviews.llvm.org/D4015 http://llvm.org/viewvc/llvm-project?view=revision&revision=211339 The vast majority of changed files are existing tests that were not propagating IR flags, but I've also added a new test file for focused testing of IR flag possibilities. Differential Revision: http://reviews.llvm.org/D5172 llvm-svn: 217051	2014-09-03 17:40:30 +00:00
Sanjay Patel	a982d992f0	Change name of copyFlags() to copyIRFlags(). Add convenience method for logical 'and' of all flags. NFC. Adding 'IR' to the names in an attempt to be less ambiguous about the flags we're dealing with here. The 'and' method is needed by the SLPVectorizer (PR20802) and possibly other passes. llvm-svn: 217004	2014-09-03 01:06:50 +00:00
Yi Jiang	77a609b556	Generate extract for in-tree uses if the use is scalar operand in vectorized instruction. radar://18144665 llvm-svn: 216946	2014-09-02 21:00:39 +00:00
Sanjay Patel	5ad239e15a	Add a convenience method to copy wrapping, exact, and fast-math flags (NFC). The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions. This patch adds a convenience method to make that operation easier because we need to do this in the loop vectorizer, SLP vectorizer, and possibly other places. Although this is a 'no functional change' patch, I've added a testcase to verify that the exact flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked in existing testcases. Differential Revision: http://reviews.llvm.org/D5138 llvm-svn: 216886	2014-09-01 18:44:57 +00:00
Renato Golin	86a6c3f269	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. Re-applying again due to MSVC 2013 being minimum requirement, and this patch having C++11 that MSVC 2012 didn't support. Fixes PR20655. llvm-svn: 216870	2014-09-01 10:00:17 +00:00
Erik Eckstein	8354cfaf95	Fix: SLPVectorizer tried to move an instruction which was replaced by a vector instruction. For a detailed description of the problem see the comment in the test file. The problematic moveBefore() calls are not required anymore because the new scheduling algorithm ensures a correct ordering anyway. llvm-svn: 216656	2014-08-28 07:04:02 +00:00
Michael Zolotukhin	5dc466b863	[SLP] Re-enable vectorization of GEP expressions (re-apply r210342 with a fix). llvm-svn: 216549	2014-08-27 15:01:18 +00:00
Craig Topper	e1d1294853	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created. llvm-svn: 216525	2014-08-27 05:25:25 +00:00
Joerg Sonnenberger	cb5674b9c2	Revert r210342 and r210343, add test case for the crasher. PR 20642. llvm-svn: 216475	2014-08-26 19:06:41 +00:00
Sanjay Patel	4e31cdabd1	fix typos in comments llvm-svn: 216424	2014-08-26 00:59:15 +00:00
Karthik Bhat	7f33ff7dea	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Erik Eckstein	b49d7abb7b	fix: SLPVectorizer crashes for unreachable blocks containing not schedulable instructions. In unreachable blocks it's legal to have instructions like "%x = op %x". Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for unreachable blocks and ignore them. Fixes bug 20646. llvm-svn: 216256	2014-08-22 01:18:39 +00:00
Craig Topper	71b7b68b74	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
James Molloy	82c995d450	[LoopVectorizer] Limit unroll factor in the presence of nested reductions. If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation. llvm-svn: 216140	2014-08-20 23:53:52 +00:00
Renato Golin	06d601fb3e	Revert "Small refactor on VectorizerHint for deduplication" This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness. llvm-svn: 215999	2014-08-19 18:08:50 +00:00
Renato Golin	dd6394d833	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. llvm-svn: 215994	2014-08-19 17:30:43 +00:00
Craig Topper	6230691c91	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	5229cfd163	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Rafael Espindola	ea46c32f81	Introduce a helper to combine instruction metadata. Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use it. Patch by Björn Steinbrink! llvm-svn: 215723	2014-08-15 15:46:38 +00:00
James Molloy	65b08f5e46	[LoopVectorizer] Enable support for floating-point subtraction reductions llvm-svn: 215200	2014-08-08 12:41:08 +00:00
Arnold Schwaighofer	4fb3c47456	SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment We were using the pointer type which is incorrect. llvm-svn: 215162	2014-08-07 22:47:27 +00:00
James Molloy	2b8933c354	Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859	2014-08-05 12:30:34 +00:00
Erik Eckstein	26a1bf7d84	fix bug 20513 - Crash in SLP Vectorizer llvm-svn: 214638	2014-08-02 19:39:42 +00:00
Tyler Nowicki	064896bbc5	Add diagnostics to the vectorizer cost model. When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599	2014-08-02 00:14:03 +00:00
Erik Eckstein	690dd037d9	SLPVectorizer: fix build problem in Release configuration llvm-svn: 214496	2014-08-01 09:47:38 +00:00
Erik Eckstein	c80e1dc081	SLPVectorizer: improved scheduling algorithm. llvm-svn: 214494	2014-08-01 09:20:42 +00:00
Erik Eckstein	f16a808292	SLP Vectorizer: added statistics counter llvm-svn: 214487	2014-08-01 08:14:28 +00:00
Erik Eckstein	4944b2ff94	SLP Vectorizer: improve canonicalize tree operands of commutitive binary operands. This reverts r214338 (except the test file) and replaces it with a more general algorithm. llvm-svn: 214485	2014-08-01 08:05:55 +00:00
Tyler Nowicki	b5a65395cc	Improve the remark generated for -Rpass-missed. The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss. Reviewed by Arnold Schwaighofer` llvm-svn: 214445	2014-07-31 21:22:22 +00:00
Tyler Nowicki	9fe497fcac	Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable. Reviewed by Arnold Schwaighofer llvm-svn: 214440	2014-07-31 21:02:40 +00:00
Chad Rosier	78f41b3ca7	SLP Vectorizer: Canonicalize tree operands of commutitive binary operands. llvm-svn: 214338	2014-07-30 21:07:56 +00:00
Hal Finkel	9414665a3b	Add scoped-noalias metadata This commit adds scoped noalias metadata. The primary motivations for this feature are: 1. To preserve noalias function attribute information when inlining 2. To provide the ability to model block-scope C99 restrict pointers Neither of these two abilities are added here, only the necessary infrastructure. In fact, there should be no change to existing functionality, only the addition of new features. The logic that converts noalias function parameters into this metadata during inlining will come in a follow-up commit. What is added here is the ability to generally specify noalias memory-access sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA nodes: !scope0 = metadata !{ metadata !"scope of foo()" } !scope1 = metadata !{ metadata !"scope 1", metadata !scope0 } !scope2 = metadata !{ metadata !"scope 2", metadata !scope0 } !scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 } !scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 } Loads and stores can be tagged with an alias-analysis scope, and also, with a noalias tag for a specific scope: ... = load %ptr1, !alias.scope !{ !scope1 } ... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 } When evaluating an aliasing query, if one of the instructions is associated with an alias.scope id that is identical to the noalias scope associated with the other instruction, or is a descendant (in the scope hierarchy) of the noalias scope associated with the other instruction, then the two memory accesses are assumed not to alias. Note that is the first element of the scope metadata is a string, then it can be combined accross functions and translation units. The string can be replaced by a self-reference to create globally unqiue scope identifiers. [Note: This overview is slightly stylized, since the metadata nodes really need to just be numbers (!0 instead of !scope0), and the scope lists are also global unnamed metadata.] Existing noalias metadata in a callee is "cloned" for use by the inlined code. This is necessary because the aliasing scopes are unique to each call site (because of possible control dependencies on the aliasing properties). For example, consider a function: foo(noalias a, noalias b) { a = b; } that gets inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } -- now just because we know that a1 does not alias with b1 at the first call site, and a2 does not alias with b2 at the second call site, we cannot let inlining these functons have the metadata imply that a1 does not alias with b2. llvm-svn: 213864	2014-07-24 14:25:39 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Mark Heffernan	9d20e42765	Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave. llvm-svn: 213588	2014-07-21 23:11:03 +00:00
Duncan P. N. Exon Smith	6c99015fe2	Revert "[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges." This reverts commit r213474 (and r213475), which causes a miscompile on a stage2 LTO build. I'll reply on the list in a moment. llvm-svn: 213562	2014-07-21 17:06:51 +00:00
Hal Finkel	07c9bb3d87	[LoopVectorize] Remove an unused private AA pointer Thanks to the lld-x86_64-darwin13 builder for catching this first. llvm-svn: 213488	2014-07-20 23:28:25 +00:00
Hal Finkel	7ae00a1282	[LoopVectorize] Use AA to partition potential dependency checks Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int a, float b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486	2014-07-20 23:07:52 +00:00
Manuel Jacob	d11beffef4	[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474	2014-07-20 09:10:11 +00:00
Hal Finkel	aac5fc9cf8	[LoopVectorize] Use CreateAligned(Load\|Store) IRBuilder has CreateAligned(Load\|Store) functions; use them and we don't need to make a second call to setAlignment. No functionality change intended. llvm-svn: 213453	2014-07-19 13:39:45 +00:00
Hal Finkel	4f7d55aac8	[LoopVectorize] Propagate known metadata to vectorized instructions There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452	2014-07-19 13:33:16 +00:00
Tyler Nowicki	641d8a06bd	Emit warnings if vectorization is forced and fails. This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110	2014-07-16 00:36:00 +00:00
Alp Toker	e69170a110	Revert "Introduce a string_ostream string builder facilty" Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814	2014-06-26 22:52:05 +00:00
Alp Toker	614717388c	Introduce a string_ostream string builder facilty string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749	2014-06-26 00:00:48 +00:00
Tyler Nowicki	4b07b00786	Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call. Reviewed by: Arnold Schwaighofer llvm-svn: 211721	2014-06-25 17:50:15 +00:00
Eli Bendersky	5d5e18da3e	Rename loop unrolling and loop vectorizer metadata to have a common prefix. [LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710	2014-06-25 15:41:00 +00:00
Arnold Schwaighofer	c11107cb1e	LoopVectorizer: Fix a dominance issue The induction variables start value needs to be defined before we branch (overflow check) to the scalar preheader where we used it. llvm-svn: 211460	2014-06-22 03:38:59 +00:00
Karthik Bhat	e03a25da70	Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer. This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and vectorizes them as vector shuffles if they are profitable. These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86. Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 llvm-svn: 211339	2014-06-20 04:32:48 +00:00
Michael Zolotukhin	1c51612d42	[SLP] Enable vectorization of GEP expressions. The use cases look like the following: x->a = y->a + 10 x->b = y->b + 12 llvm-svn: 210342	2014-06-06 15:34:24 +00:00
Karthik Bhat	bf56d44cab	Fix PR19657 (scalar loads not combined into vector load) If we have common uses on separate paths in the tree; process the one with greater common depth first. This makes sure that we do not assume we need to extract a load when it is actually going to be part of a vectorized tree. Review: http://reviews.llvm.org/D3800 llvm-svn: 210310	2014-06-06 06:20:08 +00:00
Karthik Bhat	5ab7795649	Allow vectorization of intrinsics such as powi,cttz and ctlz in Loop and SLP Vectorizer. This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar. Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857 llvm-svn: 209873	2014-05-30 04:31:24 +00:00
Arnold Schwaighofer	e2067680a6	LoopVectorizer: Add a check that the backedge taken count + 1 does not overflow The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count. If this expression overflows the generated code was invalid. In case of overflow the code now jumps to the scalar loop. Fixes PR17288. llvm-svn: 209854	2014-05-29 22:10:01 +00:00
Diego Novillo	7f8af8bf91	Add support for missed and analysis optimization remarks. Summary: This adds two new diagnostics: -pass-remarks-missed and -pass-remarks-analysis. They take the same values as -pass-remarks but are intended to be triggered in different contexts. -pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed, which passes call when they tried to apply a transformation but couldn't. -pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis, which passes call when they want to inform the user about analysis results. The patch also: 1- Adds support in the inliner for the two new remarks and a test case. 2- Moves emitOptimizationRemark* functions to the llvm namespace. 3- Adds an LLVMContext argument instead of making them member functions of LLVMContext. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3682 llvm-svn: 209442	2014-05-22 14:19:46 +00:00
Eric Christopher	650c8f2a06	Clean up language and grammar. Based on a patch by jfcaron3@gmail.com! PR19806 llvm-svn: 209216	2014-05-20 17:11:11 +00:00
Zinovy Nis	abdf44e7f3	[LV][REFACTOR] One more tiny fix for printing debug locations in loop vectorizer. Now consistent with the remarks emitter. Differential Revision: http://reviews.llvm.org/D3821 llvm-svn: 209197	2014-05-20 08:26:20 +00:00
Benjamin Kramer	1fef64214c	SLPVectorizer: Instead of just performing CSE on dead blocks ignore them completely. Turns out that there is a very cheap way of testing whether a block is dead, just look it up in the DomTree. We have to do this anyways so just ignore unreachable blocks before sorting by domination. This restores a proper ordering for std::stable_sort when dead code is present. Covered by existing tests & buildbots running in STL debug mode (MSVC). llvm-svn: 208492	2014-05-11 10:28:58 +00:00
Benjamin Kramer	8722aa5754	SLPVectorizer: When sorting by domination for CSE don't assert on unreachable code. There is no total ordering if the CFG is disconnected. We don't care if we catch all CSE opportunities in dead code either so just exclude ignore them in the assert. PR19646 llvm-svn: 208461	2014-05-09 23:28:49 +00:00
Zinovy Nis	da925c0d7c	[BUG][REFACTOR] 1) Fix for printing debug locations for absolute paths. 2) Location printing is moved into public method DebugLoc::print() to avoid re-inventing the wheel. Differential Revision: http://reviews.llvm.org/D3513 llvm-svn: 208177	2014-05-07 09:51:22 +00:00
Yi Jiang	a4821fc9fb	Always set alignment of vectorized LD/ST in SLP-Vectorizer. <rdar://problem/16812145> llvm-svn: 207983	2014-05-05 17:59:14 +00:00
Arnold Schwaighofer	cd566c423a	SLPVectorizer: Bring back the insertelement patch (r205965) with fixes When can't assume a vectorized tree is rooted in an instruction. The IRBuilder could have constant folded it. When we rebuild the build_vector (the series of InsertElement instructions) use the last original InsertElement instruction. The vectorized tree root is guaranteed to be before it. Also, we can't assume that the n-th InsertElement inserts the n-th element into a vector. This reverts r207746 which reverted the revert of the revert of r205018 or so. Fixes the test case in PR19621. llvm-svn: 207939	2014-05-04 17:10:15 +00:00
Benjamin Kramer	64425fe875	SLPVectorizer: Lazily allocate the map for block numbering. There is no point in creating it if we're not going to vectorize anything. Creating the map is expensive as it creates large values. No functionality change. llvm-svn: 207916	2014-05-03 15:50:37 +00:00
Karthik Bhat	ddd0cb5ecf	Vectorize intrinsic math function calls in SLPVectorizer. This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer. Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559 llvm-svn: 207901	2014-05-03 09:59:54 +00:00
Eric Christopher	6c26beb770	Clean up constructor logic and member access for LoopVectorizeHints. There are public functions that mutate various members as well as another private member already, so make all the members private to avoid the discontinuity and add accessors for the values. Should be no functional change. llvm-svn: 207868	2014-05-02 20:40:04 +00:00
Chandler Carruth	18c2fbb143	Revert r205965, which essentially reverts r205018 for the second time. =[ Turns out that this was the root cause of PR19621. We found a crasher only recently (likely due to improvements elsewhere in the SLP vectorizer) but the reduced test case failed all the way back to here. I've confirmed that reverting this patch both fixes the reduced test case in PR19621 and the actual source file that led to it, so it seems to really be rooted here. I've replied to the commit thread with discussion of my (feeble) attempts to debug this. Didn't make it very far, so reverting now that we have a good test case so that things can get back to healthy while the debugging carries on. llvm-svn: 207746	2014-05-01 11:24:11 +00:00
Benjamin Kramer	bf2368d94b	Add a <tuple> include to more files that aren't getting it transitively on MSVC. llvm-svn: 207617	2014-04-30 07:21:01 +00:00
Diego Novillo	cd64780d18	Fix vectorization remarks. This patch changes the vectorization remarks to also inform when vectorization is possible but not beneficial. Added tests to exercise some loop remarks. llvm-svn: 207574	2014-04-29 20:06:10 +00:00
Yi Jiang	1a3f18b161	Continue slp vectorization even the BB already has vectorized store radar://16641956 llvm-svn: 207572	2014-04-29 19:37:20 +00:00
Diego Novillo	34fc8a7c4c	Add optimization remarks to the loop unroller and vectorizer. Summary: This calls emitOptimizationRemark from the loop unroller and vectorizer at the point where they make a positive transformation. For the vectorizer, it reports vectorization and interleave factors. For the loop unroller, it reports all the different supported types of unrolling. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3456 llvm-svn: 207528	2014-04-29 14:27:31 +00:00
Zinovy Nis	487268574a	[BUG] Fix -Wunused-variable warning in Release mode. Thnx to Kostya Serebryany for pointing. llvm-svn: 207516	2014-04-29 09:45:08 +00:00
Kostya Serebryany	dc8e551d84	fix -Wunused-variable warning in Release mode llvm-svn: 207514	2014-04-29 09:33:02 +00:00
Zinovy Nis	d373fec199	[OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer llvm-svn: 207512	2014-04-29 08:55:11 +00:00
Craig Topper	e73658ddbb	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Craig Topper	f40110f4d8	[C++] Use 'nullptr'. Transforms edition. llvm-svn: 207196	2014-04-25 05:29:35 +00:00
Karthik Bhat	6a48f7d66e	Allow vectorization of bit intrinsics in BB Vectorizer. This patch adds support for vectorization of bit intrinsics such as bswap,ctpop,ctlz,cttz. llvm-svn: 207174	2014-04-25 03:33:48 +00:00
Karthik Bhat	81e6bf0a41	Allow vectorization of few missed llvm intrinsic calls in BBVectorizor by handling them in isVectorizableIntrinsic function. llvm-svn: 207085	2014-04-24 07:29:55 +00:00
Alexander Musman	f0785f4db4	[LV] Statistics numbers for LoopVectorize introduced: a number of analyzed loops & a number of vectorized loops. Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized. Patch by Zinovy Nis Differential Revision: http://reviews.llvm.org/D3438 llvm-svn: 206956	2014-04-23 08:40:37 +00:00
Chandler Carruth	964daaaf19	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Transforms/... edition. This one is tricky for two reasons. We again have a couple of passes that define something else before the includes as well. I've sunk their name macros with the DEBUG_TYPE. Also, InstCombine contains headers that need DEBUG_TYPE, so now those headers #define and #undef DEBUG_TYPE around their code, leaving them well formed modular headers. Fixing these headers was a large motivation for all of these changes, as "leaky" macros of this form are hard on the modules implementation. llvm-svn: 206844	2014-04-22 02:55:47 +00:00
Alexey Bataev	b97f9e8698	D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata llvm-svn: 206266	2014-04-15 09:37:30 +00:00
Arnold Schwaighofer	b373e01d87	Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them" This commit reapplies 205018. After 205855 we should correctly vectorize intrinsics. llvm-svn: 205965	2014-04-10 13:41:35 +00:00
Arnold Schwaighofer	fd0bf5d6e5	SLPVectorizer: Only vectorize intrinsics whose operands are widened equally The vectorizer only knows how to vectorize intrinics by widening all operands by the same factor. Patch by Tyler Nowicki! llvm-svn: 205855	2014-04-09 14:20:47 +00:00
Eric Christopher	fcd222aa47	Add NDEBUG markers around debug only function. llvm-svn: 205706	2014-04-07 12:46:30 +00:00
Eric Christopher	6ab6a0648b	Add debug location information to the vectorizer debug statements. Patch by Zinovy Nis. llvm-svn: 205705	2014-04-07 12:32:17 +00:00
David Blaikie	6425696818	Fixing typo. Differential Revision: http://reviews.llvm.org/D3154 llvm-svn: 205674	2014-04-05 20:30:31 +00:00
Tim Northover	670df3d937	SLPVectorizer: compare entire intrinsic for SLP compatibility. Some Intrinsics are overloaded to the extent that return type equality (all that's been checked up to now) does not guarantee that the arguments are the same. In these cases SLP vectorizer should not recurse into the operands, which can be achieved by comparing them as "Function *" rather than simply the ID. llvm-svn: 205424	2014-04-02 14:39:02 +00:00
Hal Finkel	b0ebdc0f43	[LoopVectorizer] Count dependencies of consecutive pointers as uniforms For the purpose of calculating the cost of the loop at various vectorization factors, we need to count dependencies of consecutive pointers as uniforms (which means that the VF = 1 cost is used for all overall VF values). For example, the TSVC benchmark function s173 has: ... %3 = add nsw i64 %indvars.iv, 16000 %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3 ... and we must realize that the add will be a scalar in order to correctly deduce it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all dependencies of a consecutive pointer must be a scalar (uniform), and so we simply need to add all consecutive pointers to the worklist that currently detects collects uniforms. Fixes PR19296. llvm-svn: 205387	2014-04-02 02:34:49 +00:00
Arnold Schwaighofer	15262e6703	Revert "SLPVectorizer: Ignore users that are insertelements we can reschedule them" This reverts commit r205018. Conflicts: lib/Transforms/Vectorize/SLPVectorizer.cpp test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll This is breaking libclc build. llvm-svn: 205260	2014-03-31 23:05:56 +00:00
Arnold Schwaighofer	c9d58e8d32	SLPVectorizer: Take credit for free extractelement instructions Extract element instructions that will be removed when vectorzing lower the cost. Patch by Arch D. Robison! llvm-svn: 205020	2014-03-28 17:21:32 +00:00
Arnold Schwaighofer	b0d3bcdd32	SLPVectorizer: Fix typos Patch by Arch D. Robison! llvm-svn: 205019	2014-03-28 17:21:27 +00:00
Arnold Schwaighofer	b190cb30c3	SLPVectorizer: Ignore users that are insertelements we can reschedule them Patch by Arch D. Robison! llvm-svn: 205018	2014-03-28 17:21:22 +00:00
Andrew Trick	c8ac7ea261	SLP vectorizer: Don't hoist vector extracts of phis. Extracts coming from phis were being hoisted, while all others were sunk to their uses. This was inconsistent and didn't seem to serve a purpose. Changing all extracts to be sunk to uses is a prerequisite for adding block frequency to the SLP vectorizer's cost model. I benchmarked the change in isolation (without block frequency). I only saw noise on x86 and some potentially significant improvements on ARM. No major regressions is good enough for me. llvm-svn: 204699	2014-03-25 02:18:47 +00:00
Chandler Carruth	4c5001cc9c	[LV] While I'm here, use range based for loops which are so much cleaner for this kind of walk. llvm-svn: 204188	2014-03-18 22:00:32 +00:00
Chandler Carruth	ae324439d0	[LV] The actual change I intended to commit in r204148. Sorry for the noise. Original commit log: Replace some dead code with an assert. When I first ported this pass from a loop pass to a function pass I did so in the naive, recursive way. It doesn't actually work, we need a worklist instead. When I switched to the worklist I didn't delete the naive recursion. That recursion was also buggy because it was dead and never really exercised. llvm-svn: 204187	2014-03-18 21:58:38 +00:00
Chandler Carruth	f73079ca89	[LV] Replace some dead code with an assert. When I first ported this pass from a loop pass to a function pass I did so in the naive, recursive way. It doesn't actually work, we need a worklist instead. When I switched to the worklist I didn't delete the naive recursion. That recursion was also buggy because it was dead and never really exercised. llvm-svn: 204184	2014-03-18 21:51:46 +00:00
Raul E. Silvera	62f0236d36	Resubmit "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8. The problems previously found have been resolved through other CLs. llvm-svn: 203707	2014-03-12 20:21:50 +00:00
Ahmed Charles	5d461ede5b	Fix build break. llvm-svn: 203366	2014-03-09 03:50:36 +00:00
Chandler Carruth	cdf4788401	[C++11] Add range based accessors for the Use-Def chain of a Value. This requires a number of steps. 1) Move value_use_iterator into the Value class as an implementation detail 2) Change it to actually be a Use iterator rather than a User iterator. 3) Add an adaptor which is a User iterator that always looks through the Use to the User. 4) Wrap these in Value::use_iterator and Value::user_iterator typedefs. 5) Add the range adaptors as Value::uses() and Value::users(). 6) Update all of the callers to correctly distinguish between whether they wanted a use_iterator (and to explicitly dig out the User when needed), or a user_iterator which makes the Use itself totally opaque. Because #6 requires churning essentially everything that walked the Use-Def chains, I went ahead and added all of the range adaptors and switched them to range-based loops where appropriate. Also because the renaming requires at least churning every line of code, it didn't make any sense to split these up into multiple commits -- all of which would touch all of the same lies of code. The result is still not quite optimal. The Value::use_iterator is a nice regular iterator, but Value::user_iterator is an iterator over Users rather than over the User objects themselves. As a consequence, it fits a bit awkwardly into the range-based world and it has the weird extra-dereferencing 'operator->' that so many of our iterators have. I think this could be fixed by providing something which transforms a range of T&s into a range of Ts, but that can be separated into another patch, and it isn't yet 100% clear whether this is the right move. However, this change gets us most of the benefit and cleans up a substantial amount of code around Use and User. =] llvm-svn: 203364	2014-03-09 03:16:01 +00:00
Arnold Schwaighofer	ab12363c02	LoopVectorizer: Preserve fast-math flags Fixes PR19045. llvm-svn: 203008	2014-03-05 21:10:47 +00:00
Craig Topper	3e4c697ca1	[C++11] Add 'override' keyword to virtual methods that override their base class. llvm-svn: 202953	2014-03-05 09:10:37 +00:00
Chandler Carruth	4220e9c154	[Modules] Move ValueHandle into the IR library where Value itself lives. Move the test for this class into the IR unittests as well. This uncovers that ValueMap too is in the IR library. Ironically, the unittest for ValueMap is useless in the Support library (honestly, so was the ValueHandle test) and so it already lives in the IR unittests. Mmmm, tasty layering. llvm-svn: 202821	2014-03-04 11:17:44 +00:00
Chandler Carruth	820a908df7	[Modules] Move the LLVM IR pattern match header into the IR library, it obviously is coupled to the IR. llvm-svn: 202818	2014-03-04 11:08:18 +00:00
Benjamin Kramer	d6f1f84f51	[C++11] Replace llvm::tie with std::tie. The old implementation is no longer needed in C++11. llvm-svn: 202644	2014-03-02 13:30:33 +00:00
Benjamin Kramer	b6d0bd48bd	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Benjamin Kramer	3a377bce4e	Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate. No intended functionality change. llvm-svn: 202588	2014-03-01 11:47:00 +00:00
Rafael Espindola	935125126c	Make DataLayout a plain object, not a pass. Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM don't don't handle passes to also use DataLayout. llvm-svn: 202168	2014-02-25 17:30:31 +00:00
Rafael Espindola	43b5a51e7c	Make a few more DataLayout variables const. llvm-svn: 202155	2014-02-25 14:24:11 +00:00
Rafael Espindola	aeff8a9c05	Make some DataLayout pointers const. No functionality change. Just reduces the noise of an upcoming patch. llvm-svn: 202087	2014-02-24 23:12:18 +00:00
Arnold Schwaighofer	9611d23d63	SLPVectorizer: Try vectorizing 'splat' stores Vectorize sequential stores of a broadcasted value. 5% on eon. radar://16124699 llvm-svn: 202067	2014-02-24 19:52:29 +00:00
Rafael Espindola	37dc9e19f5	Rename many DataLayout variables from TD to DL. I am really sorry for the noise, but the current state where some parts of the code use TD (from the old name: TargetData) and other parts use DL makes it hard to write a patch that changes where those variables come from and how they are passed along. llvm-svn: 201827	2014-02-21 00:06:31 +00:00
Gerolf Hoflehner	7a463d0650	fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178 llvm-svn: 201501	2014-02-17 03:06:16 +00:00
Gerolf Hoflehner	282949bf4d	fixed typo in comment as my test commit llvm-svn: 201486	2014-02-16 10:43:25 +00:00
Benjamin Kramer	989b92936c	Reduce code duplication resulting from the ConstantVector/ConstantDataVector split. No intended functionality change. llvm-svn: 201344	2014-02-13 16:48:38 +00:00
Andrea Di Biagio	b7882b3bd1	[Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called 'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. llvm-svn: 201272	2014-02-12 23:43:47 +00:00
Arnold Schwaighofer	348e1b60be	LoopVectorizer: Keep track of conditional store basic blocks Before conditional store vectorization/unrolling we had only one vectorized/unrolled basic block. After adding support for conditional store vectorization this will not only be one block but multiple basic blocks. The last block would have the back-edge. I updated the code to use a vector of basic blocks instead of a single basic block and fixed the users to use the last entry in this vector. But, I forgot to add the basic blocks to this vector! Fixes PR18724. llvm-svn: 201028	2014-02-08 20:41:13 +00:00
Paul Robinson	af4e64d095	Disable most IR-level transform passes on functions marked 'optnone'. Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892	2014-02-06 00:07:05 +00:00
Arnold Schwaighofer	17455633c7	LoopVectorizer: Enable unrolling of conditional stores and the load/store unrolling heuristic per default Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed up some benchmarks while not causing any interesting regressions. llvm-svn: 200621	2014-02-02 03:12:34 +00:00
Reid Kleckner	a04504fe97	Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." This reverts commit r200576. It broke 32-bit self-host builds by vectorizing two calls to @llvm.bswap.i64, which we then fail to expand. llvm-svn: 200602	2014-02-01 01:37:30 +00:00
Chandler Carruth	b3da389e30	[SLPV] Recognize vectorizable intrinsics during SLP vectorization and transform accordingly. Based on similar code from Loop vectorization. Subsequent commits will include vectorization of function calls to vector intrinsics and form function calls to vector library calls. Patch by Raul Silvera! (Much delayed due to my not running dcommit) llvm-svn: 200576	2014-01-31 21:14:40 +00:00
Chandler Carruth	c12224cb93	[vectorizer] Tweak the way we do small loop runtime unrolling in the loop vectorizer to not do so when runtime pointer checks are needed and share code with the new (not yet enabled) load/store saturation runtime unrolling. Also ensure that we only consider the runtime checks when the loop hasn't already been vectorized. If it has, the runtime check cost has already been paid. I've fleshed out a test case to cover the scalar unrolling as well as the vector unrolling and comment clearly why we are or aren't following the pattern. llvm-svn: 200530	2014-01-31 10:51:08 +00:00
Arnold Schwaighofer	1aab75ab49	LoopVectorizer: Don't count the induction variable multiple times When estimating register pressure, don't count the induction variable mulitple times. It is unlikely to be unrolled. This is currently disabled and hidden behind a flag ("enable-ind-var-reg-heur"). llvm-svn: 200371	2014-01-29 04:36:12 +00:00
Chandler Carruth	b783628560	[vectorizer] Completely disable the block frequency guidance of the loop vectorizer, placing it behind an off-by-default flag. It turns out that block frequency isn't what we want at all, here or elsewhere. This has been I think a nagging feeling for several of us working with it, but Arnold has given some really nice simple examples where the results are so comprehensively wrong that they aren't useful. I'm planning to email the dev list with a summary of why its not really useful and a couple of ideas about how to better structure these types of heuristics. llvm-svn: 200294	2014-01-28 09:10:41 +00:00
Arnold Schwaighofer	18865db3c1	LoopVectorize: Support conditional stores by scalarizing The vectorizer takes a loop like this and widens all instructions except for the store. The stores are scalarized/unrolled and hidden behind an "if" block. for (i = 0; i < 128; ++i) { if (a[i] < 10) a[i] += val; } for (i = 0; i < 128; i+=2) { v = a[i:i+1]; v0 = (extract v, 0) + 10; v1 = (extract v, 1) + 10; if (v0 < 10) a[i] = v0; if (v1 < 10) a[i] = v1; } The vectorizer relies on subsequent optimizations to sink instructions into the conditional block where they are anticipated. The flag "vectorize-num-stores-pred" controls whether and how many stores to handle this way. Vectorization of conditional stores is disabled per default for now. This patch also adds a change to the heuristic when the flag "enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small loops until load/store ports are saturated. This heuristic uses TTI's getMaxUnrollFactor as a measure for load/store ports. I also added a second flag -enable-cond-stores-vec. It will enable vectorization of conditional stores. But there is no cost model for vectorization of conditional stores in place yet so this will not do good at the moment. rdar://15892953 Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll -vectorize-num-stores-pred=1 (before the BFI change): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower) Applications/siod/siod 2.18% Performance improvements: mesa -4.42% libquantum -4.15% With a patch that slightly changes the register heuristics (by subtracting the induction variable on both sides of the register pressure equation, as the induction variable is probably not really unrolled): Performance Regressions: Benchmarks/Ptrdist/yacr2/yacr2 7.73% Applications/siod/siod 1.97% Performance Improvements: libquantum -13.05% (we now also unroll quantum_toffoli) mesa -4.27% llvm-svn: 200270	2014-01-28 01:01:53 +00:00
Chandler Carruth	e24f3973eb	[vectorize] Initial version of respecting PGO in the vectorizer: treat cold loops as-if they were being optimized for size. Nothing fancy here. Simply test case included. The nice thing is that we can now incrementally build on top of this to drive other heuristics. All of the infrastructure work is done to get the profile information into this layer. The remaining work necessary to make this a fully general purpose loop unroller for very hot loops is to make it a fully general purpose loop unroller. Things I know of but am not going to have time to benchmark and fix in the immediate future: 1) Don't disable the entire pass when the target is lacking vector registers. This really doesn't make any sense any more. 2) Teach the unroller at least and the vectorizer potentially to handle non-if-converted loops. This is trivial for the unroller but hard for the vectorizer. 3) Compute the relative hotness of the loop and thread that down to the various places that make cost tradeoffs (very likely only the unroller makes sense here, and then only when dealing with loops that are small enough for unrolling to not completely blow out the LSD). I'm still dubious how useful hotness information will be. So far, my experiments show that if we can get the correct logic for determining when unrolling actually helps performance, the code size impact is completely unimportant and we can unroll in all cases. But at least we'll no longer burn code size on cold code. One somewhat unrelated idea that I've had forever but not had time to implement: mark all functions which are only reachable via the global constructors rigging in the module as optsize. This would also decrease the impact of any more aggressive heuristics here on code size. llvm-svn: 200219	2014-01-27 13:11:50 +00:00
Chandler Carruth	edfa37effa	[vectorizer] Add an override for the target instruction cost and use it to stabilize a test that really is trying to test generic behavior and not a specific target's behavior. llvm-svn: 200215	2014-01-27 11:41:50 +00:00
Chandler Carruth	2bb03ba605	[vectorizer] Simplify code to use existing helpers on the Function object and fewer pointless variables. Also, add a clarifying comment and a FIXME because the code which disables all vectorization if we can't use implicit floating point instructions just makes no sense at all. llvm-svn: 200214	2014-01-27 11:27:37 +00:00
Chandler Carruth	147c23278f	[vectorizer] Teach the loop vectorizer's unroller to only unroll by powers of two. This is essentially always the correct thing given the impact on alignment, scaling factors that can be used in addressing modes, etc. Also, fix the management of the unroll vs. small loop cost to more accurately model things with this world. Enhance a test case to actually exercise more of the unroll machinery if using synthetic constants rather than a specific target model. Before this change, with the added flags this test will unroll 3 times instead of either 2 or 4 (the two sensible answers). While I don't expect this to make a huge difference, if there are lots of loops sitting right on the edge of hitting the 'small unroll' factor, they might change behavior. However, I've benchmarked moving the small loop cost up and down in many various ways and by a huge factor (2x) without seeing more than 0.2% code size growth. Small adjustments such as the series that led up here have led to about 1% improvement on some benchmarks, but it is very close to the noise floor so I mostly checked that nothing regressed. Let me know if you see bad behavior on other targets but I don't expect this to be a sufficiently dramatic change to trigger anything. llvm-svn: 200213	2014-01-27 11:12:24 +00:00
Chandler Carruth	7f90b4530b	[vectorizer] Add some flags which are useful for conducting experiments with the unrolling behavior in the loop vectorizer. No functionality changed at this point. These are a bit hack-y, but talking with Hal, there doesn't seem to be a cleaner way to easily experiment with different thresholds here and he was also interested in them so I wanted to commit them. Suggestions for improvement are very welcome here. llvm-svn: 200212	2014-01-27 11:12:19 +00:00
Chandler Carruth	328998b2f7	[vectorizer] Fix a trivial oversight where we always requested the number of vector registers rather than toggling between vector and scalar register number based on VF. I don't have a test case as I spotted this by inspection and on X86 it only makes a difference if your target is lacking SSE and thus has no vector registers. If someone wants to add a test case for this for ARM or somewhere else where this is more significant, that would be awesome. Also made the variable name a bit more sensible while I'm here. llvm-svn: 200211	2014-01-27 11:12:14 +00:00
Chandler Carruth	56612b204a	[vectorizer] Clean up the handling of unvectorized loop unrolling in the LoopVectorize pass. The logic here doesn't make much sense. We only unrolled if the unvectorized loop was a reduction loop with a single basic block and small loop body. The reduction part in particular doesn't make much sense. Instead, if we just fall through to the vectorized unroll logic it makes more sense of unrolling if there is a vectorized reduction that could be hacked on by the SLP vectorizer or if the loop is small. This is mostly a cleanup and nothing in the test suite really exercises this, but I did run benchmarks across this change and saw no really significant changes. llvm-svn: 200198	2014-01-27 08:17:58 +00:00
Chandler Carruth	3aebcb99f7	[LPM] Conclude my immediate work by making the LoopVectorizer a FunctionPass. With this change the loop vectorizer no longer is a loop pass and can readily depend on function analyses. In particular, with this change we no longer have to form a loop pass manager to run the loop vectorizer which simplifies the entire pass management of LLVM. The next step here is to teach the loop vectorizer to leverage profile information through the profile information providing analysis passes. llvm-svn: 200074	2014-01-25 10:01:55 +00:00
Alp Toker	cb40291100	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Arnold Schwaighofer	cc742dd9e4	LoopVectorizer: A reduction that has multiple uses of the reduction value is not a reduction. Really. Under certain circumstances (the use list of an instruction has to be set up right - hence the extra pass in the test case) we would not recognize when a value in a potential reduction cycle was used multiple times by the reduction cycle. Fixes PR18526. radar://15851149 llvm-svn: 199570	2014-01-19 03:18:31 +00:00
Arnold Schwaighofer	dc4c9460a2	LoopVectorize: Only strip casts from integer types when replacing symbolic strides Fixes PR18480. llvm-svn: 199291	2014-01-15 03:35:46 +00:00
Chandler Carruth	73523021d0	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104	2014-01-13 13:07:17 +00:00
Chandler Carruth	5ad5f15cff	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Arnold Schwaighofer	66c742aeea	LoopVectorizer: Enable strided memory accesses versioning per default I saw no compile or execution time regressions on x86_64 -mavx -O3. radar://13075509 llvm-svn: 199015	2014-01-11 20:40:34 +00:00
NAKAMURA Takumi	41c409ce0d	LoopVectorize.cpp: Appease MSC16. Excuse me, I hope msc16 builders would be fine till its end day. Introduce nullptr then. ;) llvm-svn: 199001	2014-01-11 09:59:27 +00:00
Arnold Schwaighofer	c2e9d759f2	LoopVectorizer: Handle strided memory accesses by versioning for (i = 0; i < N; ++i) A[i * Stride1] += B[i * Stride2]; We take loops like this and check that the symbolic strides 'Strided1/2' are one and drop to the scalar loop if they are not. This is currently disabled by default and hidden behind the flag 'enable-mem-access-versioning'. radar://13075509 llvm-svn: 198950	2014-01-10 18:20:32 +00:00
Chandler Carruth	8a8cd2bab9	Re-sort all of the includes with ./utils/sort_includes.py so that subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685	2014-01-07 11:48:04 +00:00
Arnold Schwaighofer	50b8302c55	LoopVectorizer: Don't if-convert constant expressions that can trap A phi node operand or an instruction operand could be a constant expression that can trap (division). Check that we don't vectorize such cases. PR16729 radar://15653590 llvm-svn: 197449	2013-12-17 01:11:01 +00:00
NAKAMURA Takumi	8bc9bfaa5a	Prune redundant dependencies in LLVMBuild.txt. llvm-svn: 196988	2013-12-11 00:30:57 +00:00
NAKAMURA Takumi	e3afe2ef62	Whitespaces. llvm-svn: 196880	2013-12-10 05:39:12 +00:00
Jakub Staszak	3ab283c157	Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces overall time of LLVM compilation by ~1%. llvm-svn: 196667	2013-12-07 21:20:17 +00:00
Renato Golin	729a3ae90a	Add #pragma vectorize enable/disable to LLVM The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537	2013-12-05 21:20:02 +00:00
Rafael Espindola	cdbde3aacc	Fix non-deterministic behavior. We use CSEBlocks to initialize a worklist: SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end()); so it must have a deterministic order. llvm-svn: 196520	2013-12-05 18:28:01 +00:00
Arnold Schwaighofer	7ee53cac80	SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use We were creating external uses for scalar values in MustGather entries that also had a ScalarToTreeEntry (they also are present in a vectorized tuple). This meant we would keep a value 'alive' as a scalar and vectorized causing havoc. This is not necessary because when we create a MustGather vector we explicitly create external uses entries for the insertelement instructions of the MustGather vector elements. Fixes PR18129. radar://15582184 llvm-svn: 196508	2013-12-05 15:14:40 +00:00
Alp Toker	f907b891da	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Nadav Rotem	b0082d246a	PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE. llvm-svn: 195791	2013-11-26 22:24:25 +00:00
Arnold Schwaighofer	a2c8e008d2	LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary In signed arithmetic we could end up with an i64 trip count for an i32 phi. Because it is signed arithmetic we know that this is only defined if the i32 does not wrap. It is therefore safe to truncate the i64 trip count to a i32 value. Fixes PR18049. llvm-svn: 195787	2013-11-26 22:11:23 +00:00
Nadav Rotem	f9f8482e3a	PR18060 - When we RAUW values with ExtractElement instructions in some cases we generate PHI nodes with multiple entries from the same basic block but with different values. Enabling CSE on ExtractElement instructions make sure that all of the RAUWed instructions are the same. llvm-svn: 195773	2013-11-26 17:29:19 +00:00
Chandler Carruth	57458517ef	Migrate metadata information from scalar to vector instructions during SLP vectorization. Based on the code in BBVectorizer. Fixes PR17741. Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my driving of clang-format. =] llvm-svn: 195528	2013-11-23 00:48:34 +00:00
Arnold Schwaighofer	1756e1ea92	SLPVectorizer: Fix whitespace errors. llvm-svn: 195468	2013-11-22 15:47:17 +00:00

... 3 4 5 6 7 ...

938 Commits