llvm-project

Commit Graph

Author	SHA1	Message	Date
Dehao Chen	79655792cc	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336	2017-06-26 21:41:09 +00:00
Ayal Zaks	e7e15d186b	[LV] Changing the interface of ValueMap, NFC. Instead of providing access to the internal MapStorage holding all Values associated with a given Key, used for setting or resetting them all together, ValueMap keeps its MapStorage internal; its new interface allows getting, setting or resetting a single Value, per part or per part-and-lane. Follows the discussion in https://reviews.llvm.org/D32871. Differential Revision: https://reviews.llvm.org/D34473 llvm-svn: 306331	2017-06-26 21:03:51 +00:00
Diana Picus	b512e91515	Revert "Enable vectorizer-maximize-bandwidth by default." This reverts commit r305960 because it broke self-hosting on AArch64. llvm-svn: 305990	2017-06-22 10:00:28 +00:00
Dehao Chen	014db29b89	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 305960	2017-06-21 22:01:32 +00:00
Taewook Oh	9083547ae3	Improve profile-guided heuristics to use estimated trip count. Summary: Existing heuristic uses the ratio between the function entry frequency and the loop invocation frequency to find cold loops. However, even if the loop executes frequently, if it has a small trip count per each invocation, vectorization is not beneficial. On the other hand, even if the loop invocation frequency is much smaller than the function invocation frequency, if the trip count is high it is still beneficial to vectorize the loop. This patch uses estimated trip count computed from the profile metadata as a primary metric to determine coldness of the loop. If the estimated trip count cannot be computed, it falls back to the original heuristics. Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32451 llvm-svn: 305729	2017-06-19 18:48:58 +00:00
Dinar Temirbulatov	e2c6991c07	Remove brackets, NFC. llvm-svn: 305706	2017-06-19 16:44:07 +00:00
George Burgess IV	a20352e13e	[LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops. If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behavior should be well-defined. So, we can't always carry wrapping flags with us when we shrink operations. If we do, we get incorrect optimizations in cases like: void foo(const unsigned char from, unsigned char to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] - 128; } which gets optimized to: void foo(const unsigned char from, unsigned char to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] \| 128; } Because: - InstCombine turned `sub i32 %from.i, 128` into `add nuw nsw i32 %from.i, 128`. - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a vector full of `i8 128`s - InstCombine took advantage of the fact that the newly-shrunken add "couldn't wrap", and changed the `add` to an `or`. InstCombine seems happy to figure out whether we can add nuw/nsw on its own, so I just decided to drop the flags. There are already a number of places in LoopVectorize where we rely on InstCombine to clean up. llvm-svn: 305053	2017-06-09 03:56:15 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Ayal Zaks	ab32aff838	[LV] Make scalarizeInstruction() non-virtual. NFC. Following the request made in https://reviews.llvm.org/D32871, scalarizeInstruction() which is no longer overridden by InnerLoopUnroller is hereby made non-virtual in InnerLoopVectorizer. Should have been part of r297580 originally. llvm-svn: 304685	2017-06-04 13:29:51 +00:00
Galina Kistanova	96d51f5bcb	Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC. llvm-svn: 304636	2017-06-03 05:18:46 +00:00
Alexey Bataev	e4e5923ef1	[SLP] Improve comments and naming of functions/variables/members, NFC. Fixed some comments, added an additional description of the algorithms, improved readability of the code. Differential revision: https://reviews.llvm.org/D33320 llvm-svn: 304616	2017-06-03 00:08:21 +00:00
Alexey Bataev	03ca396b95	Revert "[SLP] Improve comments and naming of functions/variables/members, NFC." This reverts commit 6e311de8b907aa20da9a1a13ab07c3ce2ef4068a. llvm-svn: 304609	2017-06-02 23:09:15 +00:00
Alexey Bataev	2c08fde9e5	[SLP] Improve comments and naming of functions/variables/members, NFC. Summary: Fixed some comments, added an additional description of the algorithms, improved readability of the code. Reviewers: anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33320 llvm-svn: 304593	2017-06-02 20:39:27 +00:00
Matthew Simpson	646475a9bc	[LV] Reapply r303763 with fix for PR33193 r303763 caused build failures in some out-of-tree tests due to an assertion in TTI. The original patch updated cost estimates for induction variable update instructions marked for scalarization. However, it didn't consider that the incoming value of an induction variable phi node could be a cast instruction. This caused queries for cast instruction costs with a mix of vector and scalar types. This patch includes a fix for cast instructions and the test case from PR33193. The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>. Reference: https://bugs.llvm.org/show_bug.cgi?id=33193 Original Differential Revision: https://reviews.llvm.org/D33457 llvm-svn: 304235	2017-05-30 19:55:57 +00:00
Joerg Sonnenberger	9375a25342	Revert r303763, results in asserts i.e. while building Ruby. llvm-svn: 304179	2017-05-29 22:52:17 +00:00
Matthew Simpson	d6f179cad6	[LV] Update type in cost model for scalarization For non-uniform instructions marked for scalarization, we should update `VectorTy` when computing instruction costs to reflect the scalar type. In addition to determining instruction costs, this type is also used to signal that all instructions in the loop will be scalarized. This currently affects memory instructions and non-pointer induction variables and their updates. (We also mark GEPs scalar after vectorization, but their cost is computed together with memory instructions.) For scalarized induction updates, this patch also scales the scalar cost by the vectorization factor, corresponding to each induction step. llvm-svn: 303763	2017-05-24 15:26:15 +00:00
Jonas Paulsson	8624b7e1ce	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Ayal Zaks	589e1d9610	[LV] Report multiple reasons for not vectorizing under allowExtraAnalysis The default behavior of -Rpass-analysis=loop-vectorizer is to report only the first reason encountered for not vectorizing, if one is found, at which time the vectorizer aborts its handling of the loop. This patch allows multiple reasons for not vectorizing to be identified and reported, at the potential expense of additional compile-time, under allowExtraAnalysis which can currently be turned on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed. Removed from LoopVectorizationLegality::canVectorize() the redundant checking and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also does that. This redundancy is caught by a lit test once multiple reasons are reported. Patch initially developed by Dror Barak. Differential Revision: https://reviews.llvm.org/D33396 llvm-svn: 303613	2017-05-23 07:08:02 +00:00
Amara Emerson	4d33c86359	Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather Also s/0/nullptr in the call site in LV. llvm-svn: 303416	2017-05-19 10:40:18 +00:00
Reid Kleckner	96ab8726a3	[IR] De-virtualize ~Value to save a vptr Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362	2017-05-18 17:24:10 +00:00
Matthew Simpson	af60af1ed5	Revert 303174, 303176, and 303178 These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182	2017-05-16 15:50:30 +00:00
Matthew Simpson	b7b5d55c38	[LV] Avoid potentential division by zero when selecting IC llvm-svn: 303174	2017-05-16 14:43:55 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Craig Topper	1a36b7d836	[ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits. This patch finishes off the conversion of ComputeSignBit to computeKnownBits. Differential Revision: https://reviews.llvm.org/D33166 llvm-svn: 303035	2017-05-15 06:39:41 +00:00
Simon Pilgrim	7d62e4b455	[LoopOptimizer][Fix]PR32859, PR24738 The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form. Here it is: before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ] and after this change we have: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ] Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D33055 llvm-svn: 302988	2017-05-13 13:25:57 +00:00
Craig Topper	8df66c602a	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925	2017-05-12 17:20:30 +00:00
Adam Nemet	0aca09fc6c	[SLP] Emit optimization remarks The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811	2017-05-11 17:06:17 +00:00
Ayal Zaks	58b28d549a	[LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from ILV to LVP. These changes facilitate building VPlans and using them to generate code, following https://reviews.llvm.org/D28975 and its tentative breakdown. Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to improve clarity; it's contents remain intact. Differential Revision: https://reviews.llvm.org/D32200 llvm-svn: 302790	2017-05-11 11:36:33 +00:00
Matthew Simpson	78fd46b230	[AArch64] Consider widening instructions in cost calculations The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582	2017-05-09 20:18:12 +00:00
Anna Thomas	0691483435	[LV] Fix insertion point for shuffle vectors in first order recurrence Summary: In first order recurrence vectorization, when the previous value is a phi node, we need to set the insertion point to the first non-phi node. We can have the previous value being a phi node, due to the generation of new IVs as part of trunc optimization [1]. [1] https://reviews.llvm.org/rL294967 Reviewers: mssimpso, mkuper Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32969 llvm-svn: 302532	2017-05-09 14:29:33 +00:00
Amara Emerson	cf9daa33a7	Introduce experimental generic intrinsics for horizontal vector reductions. - This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514	2017-05-09 10:43:25 +00:00
Jonas Paulsson	8bf1fdcc91	Use right function in LoopVectorize. - unsigned AS = getMemInstAlignment(I); + unsigned AS = getMemInstAddressSpace(I); Review: Hal Finkel llvm-svn: 302114	2017-05-04 05:31:56 +00:00
Sanjoy Das	e6bca0eecb	Rename WeakVH to WeakTrackingVH; NFC This relands r301424. llvm-svn: 301812	2017-05-01 17:07:49 +00:00
Craig Topper	b45eabcf82	[ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero\|One) so we don't write it out everywhere. Maybe a method for (Zero\|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432	2017-04-26 16:39:58 +00:00
Sanjoy Das	2cbeb00f38	Reverts commit r301424, r301425 and r301426 Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429	2017-04-26 16:37:05 +00:00
Matthew Simpson	9eed0bee3d	[LV] Handle external uses of floating-point induction variables Reference: https://bugs.llvm.org/show_bug.cgi?id=32758 Differential Revision: https://reviews.llvm.org/D32445 llvm-svn: 301428	2017-04-26 16:23:02 +00:00
Sanjoy Das	01de557738	Rename WeakVH to WeakTrackingVH; NFC Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424	2017-04-26 16:20:52 +00:00
Stanislav Mekhanoshin	f2db5434be	Skip bitcasts while looking for GEP in LoadStoreVectorizer Differential Revisison: https://reviews.llvm.org/D32101 llvm-svn: 301343	2017-04-25 18:00:08 +00:00
Craig Topper	f3dbd17d0a	[APInt] Use isSubsetOf, intersects, and bit counting methods to reduce temporary APInts This patch uses various APInt methods to reduce temporary APInt creation. This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking. I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time. Differential Revision: https://reviews.llvm.org/D32495 llvm-svn: 301338	2017-04-25 17:46:30 +00:00
Gil Rapaport	860f0a2bad	[LV] Remove redundant basic block split This patch is part of D28975's breakdown. Genreating the control-flow to guard predicated instructions modified to only use SplitBlockAndInsertIfThen() for producing the if-then construct. Differential Revision: https://reviews.llvm.org/D32224 llvm-svn: 301293	2017-04-25 05:57:22 +00:00
Matthew Simpson	e2037d24f9	[LV] Model if-converted phi node costs Phi nodes in non-header blocks are converted to select instructions after if-conversion. This patch updates the cost model to account for the selects. Differential Revision: https://reviews.llvm.org/D31906 llvm-svn: 300980	2017-04-21 14:14:54 +00:00
Easwaran Raman	76aba5f6d7	[SLP vectorizer] Allow phi node reordering in tryToVectorizeList. In tryToVectorizeList, under a very limited circumstance (when entered from tryToVectorizePair), the values may be reordered (swapped) and the SLP tree is built with the new order. This extends that to the case when starting from phis in vectorizeChainsInBlock when there are exactly two phis. The textual order of phi nodes shouldn't really matter. Without this change, the loop body in the accompnaying test case is fully vectorized when we swap the orde of the phis but not with this order. While this doesn't solve the phi-ordering problem in a general way (for more than 2 phis), this is simple fix that piggybacks on an existing mechanism and is useful in cases like multiplying two complex numbers. Differential revision: https://reviews.llvm.org/D32065 llvm-svn: 300574	2017-04-18 18:16:57 +00:00
Gil Rapaport	fb1d915ab2	[LV] Cache block mask values This patch is part of D28975's breakdown. Add caching for block masks similar to the cache already used for edge masks, replacing generation per user with reusing the first generated value which dominates all uses. Differential Revision: https://reviews.llvm.org/D32054 llvm-svn: 300557	2017-04-18 14:43:43 +00:00
Gil Rapaport	334f8fbe47	[LV] Remove implicit single basic block assumption This patch is part of D28975's breakdown - no change in output intended. LV's code currently assumes the vectorized loop is a single basic block up until predicateInstructions() is called. This patch removes two manifestations of this assumption (loop phi incoming values, dominator tree update) by replacing the use of vectorLoopBody with the vectorized loop's latch/header. Differential Revision: https://reviews.llvm.org/D32040 llvm-svn: 300310	2017-04-14 07:30:23 +00:00
Anna Thomas	dcdb325fee	[LV] Fix the vector code generation for first order recurrence Summary: In first order recurrences where phi's are used outside the loop, we should generate an additional vector.extract of the second last element from the vectorized phi update. This is because we require the phi itself (which is the value at the second last iteration of the vector loop) and not the phi's update within the loop. Also fix the code gen when we just unroll, but don't vectorize. Fixes PR32396. Reviewers: mssimpso, mkuper, anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D31979 llvm-svn: 300238	2017-04-13 18:59:25 +00:00
Ayal Zaks	cd712b6c49	[LV] Refactor ILV to provide vectorizeInstruction(); NFC Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide vectorizeInstruction(). Aligning DeadInstructions with its only user. Facilitates driving the transformation by VPlan - follows https://reviews.llvm.org/D28975 and its tentative breakdown. Differential Revision: https://reviews.llvm.org/D31997 llvm-svn: 300183	2017-04-13 09:07:23 +00:00
Jonas Paulsson	22776892c9	[SLPVectorizer] Pass the right type argument to getCmpSelInstrCost() In getEntryCost(), make the scalar type for a compare instruction that of the operands, not i1. This is needed in order to call getCmpSelInstrCost() for a compare in a sensible way, the same way as the LoopVectorizer does. New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll Review: Matthew Simpson https://reviews.llvm.org/D31601 llvm-svn: 300061	2017-04-12 13:29:25 +00:00
Jonas Paulsson	592dbea779	[LoopVectorizer] Improve handling of branches during cost estimation. The cost for a branch after vectorization is very different depending on if the vectorizer will if-convert the block (branch is eliminated), or if scalarized and predicated blocks will be produced (branch duplicated before each block). There is also the case of remaining scalar branches, such as the back-edge branch. This patch handles these cases differently with TTI based cost estimates. Review: Matthew Simpson https://reviews.llvm.org/D31175 llvm-svn: 300058	2017-04-12 13:13:15 +00:00
Jonas Paulsson	da74ed42da	[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore() Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056	2017-04-12 12:41:37 +00:00
Jonas Paulsson	fccc7d66c3	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Anna Thomas	00dc1b74b7	[LV] Avoid vectorizing first order recurrence when phi uses are outside loop In the vectorization of first order recurrence, we vectorize such that the last element in the vector will be the one extracted to pass into the scalar remainder loop. However, this is not true when there is a phi (other than the primary induction variable) is used outside the loop. In such a case, we need the value from the second last iteration (i.e. the phi value), not the last iteration (which would be the phi update). I've added a test case for this. Also see PR32396. A follow up patch would generate the correct code gen for such cases, and turn this vectorization on. Differential Revision: https://reviews.llvm.org/D31910 Reviewers: mssimpso llvm-svn: 299985	2017-04-11 21:02:00 +00:00
Matthew Simpson	11fe2e9f2b	Reapply r298620: [LV] Vectorize GEPs This patch reapplies r298620. The original patch was reverted because of two issues. First, the patch exposed a bug in InstCombine that caused the Chromium builds to fail (PR32414). This issue was fixed in r299017. Second, the patch introduced a bug in the vectorizer's scalars analysis that caused test suite builds to fail on SystemZ. The scalars analysis was too aggressive and marked a memory instruction scalar, even though it was going to be vectorized. This issue has been fixed in the current patch and several new test cases for the scalars analysis have been added. llvm-svn: 299770	2017-04-07 14:15:34 +00:00
Matthew Simpson	b8ff4a4a70	[LV] Transform truncations of non-primary induction variables The vectorizer tries to replace truncations of induction variables with new induction variables having the smaller type. After r295063, this optimization was applied to all integer induction variables, including non-primary ones. When optimizing the truncation of a non-primary induction variable, we still need to transform the new induction so that it has the correct start value. This should fix PR32419. Reference: https://bugs.llvm.org/show_bug.cgi?id=32419 llvm-svn: 298882	2017-03-27 20:07:38 +00:00
Ivan Krasin	c2124e185c	Revert r298620: [LV] Vectorize GEPs Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes) LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413 Original change description: [LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Original Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298735	2017-03-24 20:49:43 +00:00
Matthew Simpson	4e7b71bc86	[LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298620	2017-03-23 16:29:58 +00:00
Matthew Simpson	1fb4064531	[LV] Delete unneeded scalar GEP creation code The code for generating scalar base pointers in vectorizeMemoryInstruction is not needed. We currently scalarize all GEPs and maintain the scalarized values in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as that in scalarizeInstruction. The test cases that changed as a result of this patch changed because we were able to reuse the scalarized GEP that we previously generated instead of cloning a new one. Differential Revision: https://reviews.llvm.org/D30587 llvm-svn: 298615	2017-03-23 16:07:21 +00:00
Yi Kong	019e1c4f99	Test commit access Remove some trailing whitespaces. llvm-svn: 298379	2017-03-21 14:49:19 +00:00
Gil Rapaport	28d0f8ddf7	[LV] Refactor cross-iteration phi's back-patching; NFC This patch refactors the PHisToFix loop as follows: - The loop itself now resides in its own method. - The new method iterates on scalar-loop's header; the PHIsToFix map formerly propagated as an output parameter and filled during phi widening is removed. - The code handling reductions is moved into its own method, similar to the existing fixFirstOrderRecurrence(). Differential Revision: https://reviews.llvm.org/D30755 llvm-svn: 297740	2017-03-14 13:50:47 +00:00
Ayal Zaks	928ec40584	[LV] Refactor Cost Model's selectVectorizationFactor(); NFC Refactoring Cost Model's selectVectorizationFactor() so that it handles only the selection of the best VF from a pre-computed range of candidate VF's, extracting early-exit criteria and the computation of a MaxVF upper-bound to other methods, all driven by a newly introduced LoopVectorizationPlanner. Differential Revision: https://reviews.llvm.org/D30653 llvm-svn: 297737	2017-03-14 13:07:04 +00:00
Jonas Paulsson	a48ea231c0	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Gil Rapaport	00cb43908c	[LV] Set memcheck metadata also for VF==1 This commit is a follow-up on r297580. It fixes the FIXME added temporarily by that commit to keep the removal of Unroller's specialized version of scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details. llvm-svn: 297610	2017-03-13 10:23:46 +00:00
Gil Rapaport	a1e5a37d3f	[LV] A unified scalarizeInstruction() for Vectorizer and Unroller; NFC Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's variant. OTOH Vectorizer's scalarizeInstruction() already supports the special case of VF==1 except for avoiding mask-bit extraction in that case. This patch removes Unroller's specialized version in favor of a unified method. The only functional difference between the two variants seems to be setting memcheck metadata for loads and stores only in Vectorizer's variant, which is a bug in Unroller. To keep this patch an NFC the unified method doesn't set memcheck metadata for VF==1. Differential Revision: https://reviews.llvm.org/D30715 llvm-svn: 297580	2017-03-12 12:31:38 +00:00
Ayal Zaks	09cf3121d8	Test commit. llvm-svn: 297579	2017-03-12 09:48:06 +00:00
Michael Kuperstein	5fb39a7966	[SLP] Revert everything that has to do with memory access sorting. This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493	2017-03-10 18:59:07 +00:00
Daniel Berlin	04d9e746f1	Add support for DenseMap/DenseSet count and find using const pointers Summary: Similar to SmallPtrSet, this makes find and count work with both const referneces and const pointers. Reviewers: dblaikie Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30713 llvm-svn: 297424	2017-03-10 00:25:26 +00:00
Adam Nemet	8c83386f89	[SLP] Mark values in Dot that need to be extracted llvm-svn: 297361	2017-03-09 05:48:03 +00:00
Adam Nemet	95da05c3f5	[SLP] Visualize SLP trees with -view-slp-tree Analyzing larger trees is extremely difficult with the current debug output so this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data structure. We can now display the SLP trees with Graphviz as in https://reviews.llvm.org/F3132765. I decorated the graph where a value needs to be gathered for one reason or another. These are the red nodes. There are other improvement I am planning to make as I work through my case here. For example, I would also like to mark nodes that need to be extracted. Differential Revision: https://reviews.llvm.org/D30731 llvm-svn: 297303	2017-03-08 18:47:50 +00:00
Matthew Simpson	3388de1349	[LV] Select legal insert point when fixing first-order recurrences Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302	2017-03-08 18:18:20 +00:00
Matthew Simpson	c86b2134c7	[LV] Consider users that are memory accesses in uniforms expansion step When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179	2017-03-07 18:47:30 +00:00
Michael Kuperstein	768d013a03	[SLP] Revert r296863 due to miscompiles. Details and reproducer are on the email thread for r296863. llvm-svn: 297103	2017-03-06 23:54:51 +00:00
Mohammad Shahid	bdac9f30c0	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863	2017-03-03 10:02:47 +00:00
Matthew Simpson	455c2ee394	[LV] Considier non-consecutive but vectorizable accesses for VF selection When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized. Differential Revision: https://reviews.llvm.org/D30305 llvm-svn: 296747	2017-03-02 13:55:05 +00:00
Hans Wennborg	cc4ff78c9d	Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654	2017-03-01 18:57:16 +00:00
Alexey Bataev	4a45efa431	[SLP] Preserve IR flags when vectorizing horizontal reductions. Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614	2017-03-01 12:43:39 +00:00
Alexey Bataev	74e5a36856	[SLP] Preserve IR flags for extra args. Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613	2017-03-01 12:22:33 +00:00
Alexey Bataev	dfec81107f	[SLP] Fix for PR32038: extra add of PHI node when it is not required. Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606	2017-03-01 10:50:44 +00:00
Adam Nemet	15032a0455	[LV] These remark should have been missed remarks The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. llvm-svn: 296578	2017-03-01 04:31:15 +00:00
Mohammad Shahid	175ffa8c35	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575	2017-03-01 03:51:54 +00:00
Adam Nemet	abe46080d4	Revert "(HEAD, origin/master, origin/HEAD, master) [LV] These should missed remarks" This reverts commit r296544. This got committed by accident. llvm-svn: 296546	2017-02-28 23:54:27 +00:00
Adam Nemet	e193da191d	[LV] These should missed remarks llvm-svn: 296544	2017-02-28 23:48:58 +00:00
Matthew Simpson	bdc9c78880	[LV] Merge floating-point and integer induction widening code This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145	2017-02-24 18:20:12 +00:00
Adam Nemet	41b019a39c	[LAA] Remove unused LoopAccessReport The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296017	2017-02-23 21:17:36 +00:00
Adam Nemet	bf0e69532c	[LV] Remove unused VectorizationReport The need for this removed when I converted everything to use the opt-remark classes directly with the streaming interface. llvm-svn: 296016	2017-02-23 21:17:31 +00:00
Alexey Bataev	f77d1656af	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972	2017-02-23 13:37:09 +00:00
Alexey Bataev	14b370c1bf	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957	2017-02-23 11:09:35 +00:00
Alexey Bataev	7ae653285d	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956	2017-02-23 10:57:15 +00:00
Alexey Bataev	7337212e83	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951	2017-02-23 09:59:29 +00:00
Alexey Bataev	68f2402c61	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949	2017-02-23 09:40:38 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Michael Kuperstein	6181c62b95	Revert r295868 because it breaks a different SLP lit test. llvm-svn: 295906	2017-02-22 23:35:13 +00:00
Alexey Bataev	b551a81c28	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868	2017-02-22 20:06:40 +00:00
Karl-Johan Karlsson	6eaed7aceb	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858	2017-02-22 18:37:36 +00:00
Alexey Bataev	2e0031b371	[SLP] Remove unused initial value from the variable, NFC. llvm-svn: 295826	2017-02-22 12:57:58 +00:00
Alexey Bataev	f96465b9b8	[SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC. Initial value of V is sett nullptr, as it is not used. llvm-svn: 295642	2017-02-20 08:04:11 +00:00
Alexey Bataev	2f6b124e01	[SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC. Reviewers: mkuper Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30103 llvm-svn: 295641	2017-02-20 07:49:39 +00:00
Matthew Simpson	f68e183f91	[LV] Remove constant restriction for vector phi creation We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456	2017-02-17 16:09:07 +00:00
Michael Kuperstein	569162fefe	[LV] Rename Induction to PrimaryInduction. NFC. llvm-svn: 295111	2017-02-14 22:14:01 +00:00
Matthew Simpson	f09d13e5cc	Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps" This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063	2017-02-14 16:28:32 +00:00
Alexey Bataev	2a2f35d59c	[SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056	2017-02-14 15:20:48 +00:00
Karl-Johan Karlsson	ec21b769ec	Revert "[LoopVectorize] Added address space check when analysing interleaved accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042	2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson	2ec409cca2	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038	2017-02-14 08:14:06 +00:00
Matthew Simpson	659f92e2aa	Revert "[LV] Extend trunc optimization to all IVs with constant integer steps" This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973	2017-02-13 18:02:35 +00:00
Matthew Simpson	7b7f40297f	[LV] Extend trunc optimization to all IVs with constant integer steps This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967	2017-02-13 16:48:00 +00:00
Alexey Bataev	e8b1536e21	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934	2017-02-13 08:01:26 +00:00
Dehao Chen	fb02f7140a	Encode duplication factor from loop vectorization and loop unrolling to discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782	2017-02-10 21:09:07 +00:00
Matthew Simpson	df124a7569	[LV] Remove type restriction for vector phi creation We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755	2017-02-10 16:15:26 +00:00
Elena Demikhovsky	5267edd3e3	[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503	2017-02-08 19:25:23 +00:00
Michael Kuperstein	7a86bb2589	[SLP] Revert "Allow using of extra values in horizontal reductions." This breaks when one of the extra values is also a scalar that participates in the same vectorization tree which we'll end up reducing. llvm-svn: 294245	2017-02-06 21:50:59 +00:00
Michael Kuperstein	2a735b71b6	[SLP] Make sortMemAccesses explicitly return an error. NFC. llvm-svn: 294029	2017-02-03 19:32:50 +00:00
Alexey Bataev	a16cfe6fa9	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also it supports a vectorization of instructions with some extra arguments, like: float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D28961 llvm-svn: 293994	2017-02-03 08:08:50 +00:00
Adam Nemet	0bf1b863b9	[LV] Also port failure remarks to new OptimizationRemarkEmitter API llvm-svn: 293866	2017-02-02 05:41:51 +00:00
Matthew Simpson	ba5cf9dfee	[LV] Move interleaved access helper functions to VectorUtils (NFC) This patch moves some helper functions related to interleaved access vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like to use these functions in a follow-on patch that improves interleaved load and store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was already duplicated there and has been removed. Differential Revision: https://reviews.llvm.org/D29398 llvm-svn: 293788	2017-02-01 17:45:46 +00:00
Jonas Paulsson	3f71d6a38e	[LoopVectorize] Improve getVectorCallCost() getScalarizationOverhead() call. By calling getScalarizationOverhead with the CallInst instead of the types of its arguments, we make sure that only unique call arguments are added to the scalarization cost. getScalarizationOverhead() is extended to handle calls by only passing on the actual call arguments (which is not all the operands). This also eliminates a wrapper function with the same name. review: Hal Finkel llvm-svn: 293459	2017-01-30 05:38:05 +00:00
Mohammad Shahid	3121334d32	[SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way. The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask. Reviewers: hfinkel, mssimpso, mkuper Differential Revision: https://reviews.llvm.org/D26905 Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad llvm-svn: 293386	2017-01-28 17:59:44 +00:00
Alexey Bataev	4015bf8372	[SLP] Refactoring of horizontal reduction analysis, NFC. Some checks in SLP horizontal reduction analysis function are performed several times, though it is enough to perform these checks only once during an initial attempt at adding candidate for the reduction instruction/reduced value. Differential Revision: https://reviews.llvm.org/D29175 llvm-svn: 293274	2017-01-27 10:54:04 +00:00
Chandler Carruth	6f4ed077d0	[LV] Fix an issue where forming LCSSA in the place that we did would change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168	2017-01-26 10:41:09 +00:00
Jonas Paulsson	8e2f948ef0	[TargetTransformInfo] Refactor and improve getScalarizationOverhead() Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155	2017-01-26 07:03:25 +00:00
Alexey Bataev	d28ab559a7	[SLP] Improve horizontal vectorization for non-power-of-2 number of instructions. If number of instructions in horizontal reduction list is not power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either. Differential Revision: https://reviews.llvm.org/D28959 llvm-svn: 293042	2017-01-25 09:54:38 +00:00
Alexey Bataev	9f8bb384af	[SLP] Refactoring of HorizontalReduction class, NFC. Removed data members ReduxWidth and MinVecRegSize + some C++11 stylish improvements. Differential Revision: https://reviews.llvm.org/D29010 llvm-svn: 292899	2017-01-24 08:57:17 +00:00
Michael Kuperstein	807982359d	[SLP] Make ReductionOpcode have the right (enum) type. NFC. llvm-svn: 292703	2017-01-21 02:03:03 +00:00
Michael Kuperstein	f8458593cf	[SLP] Delete useless helper. NFC. The helper contained a branch for a special case that is unnecessary, and a cast. llvm-svn: 292698	2017-01-21 01:33:25 +00:00
Mikael Holmen	8bf15614fb	Test commit access, remove trailing whitespace llvm-svn: 292482	2017-01-19 13:35:13 +00:00
Michael Kuperstein	230867e583	[LV] Run loop-simplify and LCSSA explicitly instead of "requiring" them This changes the vectorizer to explicitly use the loopsimplify and lcssa utils, instead of "requiring" the transformations as if they were analyses. This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA for all loops, but rather only for the loops we expect to modify. Differential Revision: https://reviews.llvm.org/D28868 llvm-svn: 292456	2017-01-19 00:42:28 +00:00
Michael Kuperstein	7cefb409b0	[LV] Allow reductions that have several uses outside the loop We currently check whether a reduction has a single outside user. We don't really need to require that - we just need to make sure a single value is used externally. The number of external users of that value shouldn't actually matter. Differential Revision: https://reviews.llvm.org/D28830 llvm-svn: 292424	2017-01-18 19:02:52 +00:00
Matthew Simpson	3fbdaa5906	[LV] Mark non-consecutive-like pointers non-uniform If a memory instruction will be vectorized, but it's pointer operand is non-consecutive-like, the instruction is a gather or scatter operation. Its pointer operand will be non-uniform. This should fix PR31671. Reference: https://llvm.org/bugs/show_bug.cgi?id=31671 Differential Revision: https://reviews.llvm.org/D28819 llvm-svn: 292254	2017-01-17 20:51:39 +00:00
Chandler Carruth	ca68a3ec47	[PM] Introduce an analysis set used to preserve all analyses over a function's CFG when that CFG is unchanged. This allows transformation passes to simply claim they preserve the CFG and analysis passes to check for the CFG being preserved to remove the fanout of all analyses being listed in all passes. I've gone through and removed or cleaned up as many of the comments reminding us to do this as I could. Differential Revision: https://reviews.llvm.org/D28627 llvm-svn: 292054	2017-01-15 06:32:49 +00:00
Eric Fiselier	0a9eb89cf9	Give comparator const call operator llvm-svn: 292043	2017-01-15 02:06:44 +00:00
Malcolm Parsons	17d266bc96	Remove unused lambda captures. NFC llvm-svn: 291916	2017-01-13 17:12:16 +00:00
Michael Kuperstein	f69e64662b	[SLP] Remove bogus assert. The removed assert seems bogus - it's perfectly legal for the roots of the vectorized subtrees to be equal even if the original scalar values aren't, if the original scalars happen to be equivalent. This fixes PR31599. Differential Revision: https://reviews.llvm.org/D28539 llvm-svn: 291692	2017-01-11 19:23:57 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Chandler Carruth	410eaeb064	[PM] Rewrite the loop pass manager to use a worklist and augmented run arguments much like the CGSCC pass manager. This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses. An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it. This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them. While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact. I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place. Differential Revision: https://reviews.llvm.org/D28292 llvm-svn: 291651	2017-01-11 06:23:21 +00:00
Matthew Simpson	cf796478e9	[LV] Fix-up external IV users after updating dominator tree This patch delays the fix-up step for external induction variable users until after the dominator tree has been properly updated. This should fix PR30742. The SCEVExpander in InductionDescriptor::transform can generate code in the wrong location if the dominator tree is not up-to-date. We should work towards keeping the dominator tree up-to-date throughout the transformation. Reference: https://llvm.org/bugs/show_bug.cgi?id=30742 Differential Revision: https://reviews.llvm.org/D28168 llvm-svn: 291462	2017-01-09 19:05:29 +00:00
Jonas Paulsson	cf7543c44b	Remove unused method in LoopVectorize.cpp. computeInterleaveCount() is not defined/used and is therefore removed. Review: Davide Italiano llvm-svn: 291423	2017-01-09 06:13:21 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Michael Kuperstein	fb7dd86fd6	[LV] Sink tripcount query to where it's actually used. NFC. llvm-svn: 290142	2016-12-19 22:47:52 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Matthew Simpson	a4964f291a	Reapply "[LV] Enable vectorization of loops with conditional stores by default" This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975	2016-12-16 19:12:02 +00:00
Matthew Simpson	099af810de	[LV] Don't attempt to type-shrink scalarized instructions After r288909, instructions feeding predicated instructions may be scalarized if profitable. Since these instructions will remain scalar, we shouldn't attempt to type-shrink them. We should only truncate vector types to their minimal bit widths. This bug was exposed by enabling the vectorization of loops containing conditional stores by default. llvm-svn: 289958	2016-12-16 16:52:35 +00:00
Chandler Carruth	48b4e614d8	Revert r289863: [LV] Enable vectorization of loops with conditional stores by default This uncovers a crasher in the loop vectorizer on PPC when building the Python runtime. I'll send the testcase to the review thread for the original commit. llvm-svn: 289934	2016-12-16 11:31:39 +00:00
Matthew Simpson	6a98bcfe33	[LV] Enable vectorization of loops with conditional stores by default This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863	2016-12-15 20:11:05 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Michael Kuperstein	3d23d4a234	[LV] Don't vectorize when we have a small static bound on trip count We currently check if the exact trip count is known and is smaller than the "tiny loop" bound. We should be checking the maximum bound on the trip count instead. Differential Revision: https://reviews.llvm.org/D27690 llvm-svn: 289583	2016-12-13 20:38:18 +00:00
Matthew Simpson	92ce0230b5	[SLP] Fix sign-extends for type-shrinking This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470	2016-12-12 21:11:04 +00:00
Alexey Bataev	4f0d469d45	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043	2016-12-08 11:57:51 +00:00
Matthew Simpson	364da7e527	[LV] Scalarize operands of predicated instructions This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909	2016-12-07 15:03:32 +00:00
Renato Golin	5b8e7ecdb3	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508	2016-12-02 16:56:26 +00:00
Alexey Bataev	e8e94a7176	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497	2016-12-02 12:20:22 +00:00
Artem Belevich	704395a25a	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431	2016-12-01 22:52:15 +00:00
Alexey Bataev	2c01af5904	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412	2016-12-01 20:06:53 +00:00
Alexey Bataev	62af7252f1	[SLP] Fixed cost model for horizontal reduction. Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398	2016-12-01 18:42:42 +00:00

1 2 3 4 5 ...

1406 Commits