llvm-project

Commit Graph

Author	SHA1	Message	Date
Anton Afanasyev	ce28791e20	Test commit Fix typos. llvm-svn: 349644	2018-12-19 17:18:40 +00:00
Michael Kruse	d4eb13c880	[LoopVectorize] Rename pass options. NFC. Rename: NoUnrolling to InterleaveOnlyWhenForced and AlwaysVectorize to !VectorizeOnlyWhenForced Contrary to what the name 'AlwaysVectorize' suggests, it does not unconditionally vectorize all loops, but applies a cost model to determine whether vectorization is profitable to all loops. Hence, passing false will disable the cost model, except when a loop is marked with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested by @hfinkel in D55716) better matches this behavior. Similarly, 'NoUnrolling' disables the profitability cost model for interleaving (a term to distinguish it from unrolling by the LoopUnrollPass); rename it for consistency. Differential Revision: https://reviews.llvm.org/D55785 llvm-svn: 349513	2018-12-18 17:46:09 +00:00
Michael Kruse	7244852557	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944	2018-12-12 17:32:52 +00:00
Markus Lavin	4dc4ebd606	[PM] Port LoadStoreVectorizer to the new pass manager. Differential Revision: https://reviews.llvm.org/D54848 llvm-svn: 348570	2018-12-07 08:23:37 +00:00
Alexey Bataev	3689747619	[SLP]PR39774: Update references of the replaced external instructions. Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997	2018-11-30 15:14:20 +00:00
Alexey Bataev	579c2d9d64	[SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized. Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759	2018-11-28 14:34:11 +00:00
Vedant Kumar	4de31bba51	[IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See llvm.org/PR39702 for a harder but more general way of achieving similar results. Differential Revision: https://reviews.llvm.org/D54686 llvm-svn: 347256	2018-11-19 19:54:27 +00:00
Anna Thomas	5e9215f02b	[LV] Avoid vectorizing unsafe dependencies in uniform address Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220	2018-11-19 15:39:59 +00:00
Florian Hahn	6df11868b5	[VPlan, SLP] Use SmallPtrSet for Candidates. This slightly improves the candidate handling in getBest(). llvm-svn: 346870	2018-11-14 15:58:40 +00:00
Florian Hahn	02cb67deb9	[VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle. The caller should take care of only calling it with debug enabled. llvm-svn: 346860	2018-11-14 13:33:44 +00:00
Florian Hahn	2eca3728ee	[VPlan] Update ifdef. llvm-svn: 346858	2018-11-14 13:21:26 +00:00
Florian Hahn	09e516c54b	[VPlan, SLP] Add simple SLP analysis on top of VPlan. This patch adds an initial implementation of the look-ahead SLP tree construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha, Luís F. W. Góes'. It returns an SLP tree represented as VPInstructions, with combined instructions represented as a single, wider VPInstruction. This initial version does not support instructions with multiple different users (either inside or outside the SLP tree) or non-instruction operands; it won't generate any shuffles or insertelement instructions. It also just adds the analysis that builds an SLP tree rooted in a set of stores. It does not include any cost modeling or memory legality checks. The plan is to integrate it with VPlan based cost modeling, once available and to only apply it to operations that can be widened. A follow-up patch will add a support for replacing instructions in a VPlan with their SLP counter parts. Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D4949 llvm-svn: 346857	2018-11-14 13:11:49 +00:00
Florian Hahn	a4dc7feeea	[VPlan] VPlan version of InterleavedAccessInfo. This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758	2018-11-13 15:58:18 +00:00
Simon Pilgrim	631f2bf51e	[CostModel] Add more realistic SK_ExtractSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656	2018-11-12 14:25:23 +00:00
Ayal Zaks	45a3ca7be7	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959	2018-11-02 09:16:12 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Simon Pilgrim	44a9a71d2a	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368) Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617	2018-10-30 18:10:02 +00:00
Jonas Paulsson	1f067c94dc	[LoopVectorizer] Fix for cost values of memory accesses. This commit is a combination of two patches: * "Fix in getScalarizationOverhead()" If target returns false in TTI.prefersVectorizedAddressing(), it means the address registers will not need to be extracted. Therefore, there should be no operands scalarization overhead for a load instruction. * "Don't pass the instruction pointer from getMemInstScalarizationCost." Since VF is always > 1, this is a cost query for an instruction in the vectorized loop and it should not be evaluated within the scalar context of the instruction. Review: Ulrich Weigand, Hal Finkel https://reviews.llvm.org/D52351 https://reviews.llvm.org/D52417 llvm-svn: 345603	2018-10-30 14:34:15 +00:00
Dorit Nuzman	5114390e48	[LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 llvm-svn: 345115	2018-10-24 07:11:38 +00:00
Simon Pilgrim	532a0f122e	[SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions Expand arithmetic reduction to include mul/and/or/xor instructions. This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests. This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason. Differential Revision: https://reviews.llvm.org/D53473 llvm-svn: 345037	2018-10-23 15:13:09 +00:00
Dorit Nuzman	da5dc13355	Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left out of revision 344883 llvm-svn: 345021	2018-10-23 11:51:55 +00:00
Dorit Nuzman	3ec99fe21b	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 llvm-svn: 344883	2018-10-22 06:17:09 +00:00
Ayal Zaks	b0b5312e67	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 llvm-svn: 344743	2018-10-18 15:03:15 +00:00
Anna Thomas	6f732bfb79	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613	2018-10-16 15:46:26 +00:00
Chandler Carruth	e303c87e19	[TI removal] Make `getTerminator()` return a generic `Instruction`. This removes the primary remaining API producing `TerminatorInst` which will reduce the rate at which code is introduced trying to use it and generally make it much easier to remove the remaining APIs across the codebase. Also clean up some of the stragglers that the previous mechanical update of variables missed. Users of LLVM and out-of-tree code generally will need to update any explicit variable types to handle this. Replacing `TerminatorInst` with `Instruction` (or `auto`) almost always works. Most of these edits were made in prior commits using the perl one-liner: ``` perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g' ``` This also my break some rare use cases where people overload for both `Instruction` and `TerminatorInst`, but these should be easily fixed by removing the `TerminatorInst` overload. llvm-svn: 344504	2018-10-15 10:42:50 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Ayal Zaks	e567b5b526	[LV] Fix comments reported when not vectorizing single iteration loops; NFC Landing this as a separate part of https://reviews.llvm.org/D50480, being a seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). llvm-svn: 344483	2018-10-14 17:53:02 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Florian Hahn	18e07bb822	[LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC). We assign indices sequentially for seen instructions, so we can just use a vector and push back the seen instructions. No need for using a DenseMap. Reviewers: hsaito, rengolin, nadav, dcaballe Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D53089 llvm-svn: 344233	2018-10-11 09:46:25 +00:00
Florian Hahn	7eb5cb4ebc	[LV] Ignore more debug info. We can avoid doing some unnecessary work by skipping debug instructions in a few loops. It also helps to ensure debug instructions do not prevent vectorization, although I do not have any concrete test cases for that. Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk Reviewed By: rengolin, dcaballe Differential Revision: https://reviews.llvm.org/D53091 llvm-svn: 344232	2018-10-11 09:27:24 +00:00
Renato Golin	d8e7ca4a32	[VPlan] Fix CondBit quoting in dumpBasicBlock Quotes were being printed for VPInstructions but not the rest. llvm-svn: 344161	2018-10-10 17:55:21 +00:00
Max Kazantsev	b07369651e	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160 At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito llvm-svn: 343954	2018-10-08 05:46:29 +00:00
Jonas Paulsson	29d80f07ee	[LoopVectorizer] Use TTI.getOperandInfo() Call getOperandInfo() instead of using (near) duplicated code in LoopVectorizationCostModel::getInstructionCost(). This gets the OperandValueKind and OperandValueProperties values for a Value passed as operand to an arithmetic instruction. getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but is now instead a public member. Review: Florian Hahn https://reviews.llvm.org/D52883 llvm-svn: 343852	2018-10-05 14:34:04 +00:00
Anna Thomas	b1e3d45318	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028	2018-09-25 20:57:20 +00:00
Warren Ristow	4f27730eaf	[Loop Vectorizer] Abandon vectorization when no integer IV found Support for vectorizing loops with secondary floating-point induction variables was added in r276554. A primary integer IV is still required for vectorization to be done. If an FP IV was found, but no integer IV was found at all (primary or secondary), the attempt to vectorize still went forward, causing a compiler-crash. This change abandons that attempt when no integer IV is found. (Vectorizing FP-only cases like this, rather than bailing out, is discussed as possible future work in D52327.) See PR38800 for more information. Differential Revision: https://reviews.llvm.org/D52327 llvm-svn: 342786	2018-09-21 23:03:50 +00:00
Matt Arsenault	c640798597	LSV: Fix adjust alloca alignment trick for AMDGPU This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442	2018-09-18 02:05:44 +00:00
Hideki Saito	d19851ac7e	Fix for the buildbot failure http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/23635 from the commit (r342197) of https://reviews.llvm.org/D50820. llvm-svn: 342201	2018-09-14 02:02:57 +00:00
Hideki Saito	ea7f3035a0	[VPlan] Implement initial vector code generation support for simple outer loops. Summary: [VPlan] Implement vector code generation support for simple outer loops. Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following: - force vector code generation using explicit vectorize_width - add conservative early returns in cost model and other places for VPlanNativePath - add code for setting up outer loop inductions - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches - support for generating uniform inner branches We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal Reviewed By: fhahn, hsaito Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D50820 llvm-svn: 342197	2018-09-14 00:36:00 +00:00
Florian Hahn	1086ce2397	[LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC) Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use them for VPlan outside LoopVectorize.cpp Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00 Reviewed By: rengolin, xbolva00 Differential Revision: https://reviews.llvm.org/D49488 llvm-svn: 342027	2018-09-12 08:01:57 +00:00
Vikram TV	09be521d4d	Move a transformation routine from LoopUtils to LoopVectorize. Summary: Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp. Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex(). This is a child to D51153. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51837 llvm-svn: 341776	2018-09-10 06:16:44 +00:00
Vikram TV	6594dc377d	Move createMinMaxOp() out of RecurrenceDescriptor. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51838 llvm-svn: 341773	2018-09-10 05:05:08 +00:00
Anna Thomas	110df11a1a	[LV] Fix code gen for conditionally executed loads and stores Fix a latent bug in loop vectorizer which generates incorrect code for memory accesses that are executed conditionally. As pointed in review, this bug definitely affects uniform loads and may affect conditional stores that should have turned into scatters as well). The code gen for conditionally executed uniform loads on architectures that support masked gather instructions is broken. Without this patch, we were unconditionally executing the conditional load in the vectorized version. This patch does the following: 1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. 2. For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch. 3. Fix the vectorization cost model for scalarization (getMemInstScalarizationCost): implement and use isPredicatedInst to identify all predicated instructions, not just scalar+predicated. So, now the cost for scalarization will be increased for maskedloads/stores and gather/scatter operations. In short, we should be choosing the gather/scatter in place of scalarization on archs where it is profitable. 4. We needed to weaken the assert in useEmulatedMaskMemRefHack. Reviewers: Ayal, hsaito, mkuper Differential Revision: https://reviews.llvm.org/D51313 llvm-svn: 341673	2018-09-07 15:53:48 +00:00
Anna Thomas	dbacea188b	[LV] First order recurrence phis should not be treated as uniform This is fix for PR38786. First order recurrence phis were incorrectly treated as uniform, which caused them to be vectorized as uniform instructions. Patch by Ayal Zaks and Orivej Desh! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D51639 llvm-svn: 341416	2018-09-04 22:12:23 +00:00
Matt Arsenault	c807ce0ee4	SLPVectorizer: Fix assert with different sized address spaces llvm-svn: 341215	2018-08-31 14:34:53 +00:00
David Bolvansky	589bb484f6	[LoopVectorize][NFCI] Use find instead of count Summary: Avoid "count" if possible -> use "find" to check for the existence of keys. Passed llvm test suite. Reviewers: fhahn, dcaballe, mkuper, rengolin Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51054 llvm-svn: 340563	2018-08-23 18:34:58 +00:00
Anna Thomas	b02b0ad8c7	[LV] Vectorize loops where non-phi instructions used outside loop Summary: Follow up change to rL339703, where we now vectorize loops with non-phi instructions used outside the loop. Note that the cyclic dependency identification occurs when identifying reduction/induction vars. We also need to identify that we do not allow users where the PSCEV information within and outside the loop are different. This was the fix added in rL307837 for PR33706. Reviewers: Ayal, mkuper, fhahn Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D50778 llvm-svn: 340278	2018-08-21 14:40:27 +00:00
Anna Thomas	6a1dd77f5d	NFC: Clarify comment in loop vectorization legality Clarifying the comment about PSCEV and external IV users by referencing the bug in question. llvm-svn: 339722	2018-08-14 20:25:13 +00:00
Anna Thomas	60a1e4dddc	[LV] Teach about non header phis that have uses outside the loop Summary: This patch teaches the loop vectorizer to vectorize loops with non header phis that have have outside uses. This is because the iteration dependence distance for these phis can be widened upto VF (similar to how we do for induction/reduction) if they do not have a cyclic dependence with header phis. When identifying reduction/induction/first order recurrence header phis, we already identify if there are any cyclic dependencies that prevents vectorization. The vectorizer is taught to extract the last element from the vectorized phi and update the scalar loop exit block phi to contain this extracted element from the vector loop. This patch can be extended to vectorize loops where instructions other than phis have outside uses. Reviewers: Ayal, mkuper, mssimpso, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50579 llvm-svn: 339703	2018-08-14 18:22:19 +00:00

1 2 3 4 5 ...

1621 Commits