Commit Graph

467 Commits

Author SHA1 Message Date
Michael Zolotukhin 66f5591f9b [LoopVectorizer] Remove redundant variables PastOverflowCheck and OverflowCheckAnchor. NFCI.
llvm-svn: 241740
2015-07-08 21:47:56 +00:00
Michael Zolotukhin 00345cadd5 [LoopVectorizer] Move some code around to ease further refactoring. NFCI.
llvm-svn: 241739
2015-07-08 21:47:53 +00:00
Michael Zolotukhin 7db3063f87 [LoopVectorizer] Remove redundant variable LastBypassBlock. NFC.
llvm-svn: 241738
2015-07-08 21:47:47 +00:00
Alexey Samsonov 958dab71b3 [LoopVectorize] Use ReplaceInstWithInst() helper where appropriate.
This is mostly an NFC, which increases code readability (instead of
saving old terminator, generating new one in front of old, and deleting
old, we just call a function). However, it would additionaly copy
the debug location from old instruction to replacement, which
would help PR23837.

llvm-svn: 241197
2015-07-01 22:18:30 +00:00
David Majnemer 9f3979fd78 [LoopVectorize] Pointer indicies may be wider than the pointer
If we are dealing with a pointer induction variable, isInductionPHI
gives back a step value of Stride / size of pointer.  However, we might
be indexing with a legal type wider than the pointer width.
Handle this by inserting casts where appropriate instead of crashing.

This fixes PR23954.

llvm-svn: 240877
2015-06-27 08:38:17 +00:00
David Blaikie b447ac6435 Move VectorUtils from Transforms to Analysis to correct layering violation
llvm-svn: 240804
2015-06-26 18:02:52 +00:00
Michael Zolotukhin 79ff564ef3 [LoopVectorizer] Fix bailing-out condition for OptForSize case.
With option OptForSize enabled, the Loop Vectorizer is not supposed to
create tail loop. The condition checking that was invalid and was not
matching to the comment above.

Patch by Marianne Mailhot-Sarrasin.

llvm-svn: 240556
2015-06-24 17:26:24 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Tyler Nowicki 27b2c39eb3 Refactor RecurrenceInstDesc
Moved RecurrenceInstDesc into RecurrenceDescriptor to simplify the namespaces.

llvm-svn: 239862
2015-06-16 22:59:45 +00:00
Tyler Nowicki 0a91310c7f Rename Reduction variables/structures to Recurrence.
A reduction is a special kind of recurrence. In the loop vectorizer we currently
identify basic reductions. Future patches will extend this to identifying basic
recurrences.

llvm-svn: 239835
2015-06-16 18:07:34 +00:00
Hao Liu 405f1d1651 [LoopVectorize] Revert the enabling of interleaved memory access in Loop Vectorizor, which was wrongly committed in r239514.
llvm-svn: 239515
2015-06-11 09:18:07 +00:00
Hao Liu 4566d18e89 [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true"

E.g. Transform an interleaved load (Factor = 2):
       %wide.vec = load <8 x i32>, <8 x i32>* %ptr
       %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>  ; Extract even elements
       %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>  ; Extract odd elements
     Into:
       %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr)
       %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
       %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Transform an interleaved store (Factor = 2):
       %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7>  ; Interleaved vec
       store <8 x i32> %i.vec, <8 x i32>* %ptr
     Into:
       %v0 = shuffle %i.vec, undef, <0, 1, 2, 3>
       %v1 = shuffle %i.vec, undef, <4, 5, 6, 7>
       call void aarch64.neon.st2(%v0, %v1, %ptr)

llvm-svn: 239514
2015-06-11 09:05:02 +00:00
Hao Liu 32c0539691 [LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
       a = A[i];         // load of even element
       b = A[i+1];       // load of odd element
       ...               // operations on a, b, c, d
       A[i] = c;         // store of even element
       A[i+1] = d;       // store of odd element
     }

  The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
     %wide.vec = load <8 x i32>, <8 x i32>* %ptr
     %vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
     %vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>

  The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
     %interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> 
     store <8 x i32> %interleaved.vec, <8 x i32>* %ptr

This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'. 

llvm-svn: 239291
2015-06-08 06:39:56 +00:00
Wei Mi 062c74484d [X86] Disable loop unrolling in loop vectorization pass when VF is 1.
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.

Differential Revision: http://reviews.llvm.org/D9515

llvm-svn: 236613
2015-05-06 17:12:25 +00:00
Michael Zolotukhin d98330c424 Fix a couple of typos in comments.
llvm-svn: 235674
2015-04-24 00:10:27 +00:00
Karthik Bhat 24e6cc2de4 Move common loop utility function isInductionPHI into LoopUtils.cpp
This patch refactors the definition of common utility function "isInductionPHI" to LoopUtils.cpp.
This fixes compilation error when configured with -DBUILD_SHARED_LIBS=ON

llvm-svn: 235577
2015-04-23 08:29:20 +00:00
Karthik Bhat 76aa662cf0 [NFC] Refactor identification of reductions as common utility function.
This patch refactors reduction identification code out of LoopVectorizer and
exposes them as common utilities.
No functional change.
Review: http://reviews.llvm.org/D9046

llvm-svn: 235284
2015-04-20 04:38:33 +00:00
Adam Nemet ce48250f11 [LoopAccesses] Allow analysis to complete in the presence of uniform stores
(Re-apply r234361 with a fix and a testcase for PR23157)

Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.

Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).

In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.

When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).

The changes also adds support to query this property of the loop and
modify the vectorizer to use this.

Patch by Ashutosh Nema!

llvm-svn: 234424
2015-04-08 17:48:40 +00:00
Adam Nemet e09a928c80 Revert "[LoopAccesses] Allow analysis to complete in the presence of uniform stores"
This reverts commit r234361.

It caused PR23157.

llvm-svn: 234387
2015-04-08 04:16:55 +00:00
Adam Nemet 0515c33b70 [LoopAccesses] Allow analysis to complete in the presence of uniform stores
Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.

Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).

In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.

When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).

The changes also adds support to query this property of the loop and
modify the vectorizer to use this.

Patch by Ashutosh Nema!

llvm-svn: 234361
2015-04-07 21:46:16 +00:00
David Blaikie 93c5444fe0 [opaque pointer type] More GEP API migrations in IRBuilder uses
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.

Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.

llvm-svn: 234042
2015-04-03 19:41:44 +00:00
Duncan P. N. Exon Smith ec819c096b Transforms: Use the new DebugLoc API, NFC
Update lib/Analysis and lib/Transforms to use the new `DebugLoc` API.

llvm-svn: 233587
2015-03-30 19:49:49 +00:00
Karthik Bhat 0f8c908934 Refactor Code inside LoopVectorizer's function isInductionVariable.
This patch exposes LoopVectorizer's isInductionVariable function as common
a functionality.
http://reviews.llvm.org/D8608

llvm-svn: 233352
2015-03-27 03:44:15 +00:00
Michael Zolotukhin 9ef5671d36 Try to fix a test broken by one of my previous commits.
llvm-svn: 232536
2015-03-17 20:31:56 +00:00
Michael Zolotukhin 9b3cf604ce LoopVectorize: teach loop vectorizer to vectorize calls.
The tests would be committed in a commit for http://reviews.llvm.org/D8131

Review: http://reviews.llvm.org/D8095
llvm-svn: 232530
2015-03-17 19:46:50 +00:00
Michael Zolotukhin 1d4e52512c LoopVectorizer: Add TargetTransformInfo.
Review: http://reviews.llvm.org/D8092
llvm-svn: 232522
2015-03-17 19:17:18 +00:00
Adam Nemet 98c4c5dd78 [LAA-memchecks 2/3] Move number of memcheck threshold checking to LV
Now the analysis won't "fail" if the memchecks exceed the threshold.  It
is the transform pass' responsibility to perform the check.

This allows the transform pass to further analyze/eliminate the
memchecks.  E.g. in Loop distribution we only need to check pointers
that end up in different partitions.

Note that there is a slight change of functionality here.  The logic in
analyzeLoop is that if dependence checking fails due to non-constant
distance between the pointers, another attempt is made to prove safety
of the dependences purely using run-time checks.

Before this patch we could fail the loop due to exceeding the memcheck
threshold after the first step, now we only check the threshold in the
client after the full analysis.  There is no measurable compile-time
effect but I wanted to record this here.

llvm-svn: 231817
2015-03-10 18:54:23 +00:00
Mehdi Amini a28d91d81b DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Benjamin Kramer f044d3f93b Make helper functions static.
Found by -Wmissing-prototypes. NFC.

llvm-svn: 231664
2015-03-09 16:23:46 +00:00
Kevin Qin 715b01e979 Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization.
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.

llvm-svn: 231631
2015-03-09 06:14:18 +00:00
Olivier Sallenave 049d803ce0 Do not restrict interleaved unrolling to small loops, depending on the target.
llvm-svn: 231528
2015-03-06 23:12:04 +00:00
Mehdi Amini 46a43556db Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
Michael Zolotukhin 9302236680 Make ToVectorTy static.
llvm-svn: 231007
2015-03-02 20:43:24 +00:00
Eric Christopher b9f0009b5a Remove DebugLoc::print(LLVMContext, raw_ostream), it was just
forwarding to the one that didn't take a context.

llvm-svn: 230700
2015-02-26 23:32:17 +00:00
Adam Nemet 57ac766ee9 [LoopAccesses] Change LAA:getInfo to return a constant reference
As expected, this required a few more const-correctness fixes.

Based on Hal's feedback on D7684.

llvm-svn: 229899
2015-02-19 19:15:21 +00:00
Adam Nemet 2bd6e984ef [LoopAccesses] Split out LoopAccessReport from VectorizerReport
The only difference between these two is that VectorizerReport adds a
vectorizer-specific prefix to its messages.  When LAA is used in the
vectorizer context the prefix is added when we promote the
LoopAccessReport into a VectorizerReport via one of the constructors.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229897
2015-02-19 19:15:15 +00:00
Adam Nemet 3e87634fd8 [LoopAccesses] Add missing const to APIs in VectorizationReport
When I split out LoopAccessReport from this, I need to create some temps
so constness becomes necessary.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229896
2015-02-19 19:15:13 +00:00
Adam Nemet 339f42b396 [LoopAccesses] Change debug messages from LV to LAA
Also add pass name as an argument to VectorizationReport::emitAnalysis.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229894
2015-02-19 19:15:07 +00:00
Adam Nemet 3bfd93d789 [LoopAccesses] Create the analysis pass
This is a function pass that runs the analysis on demand.  The analysis
can be initiated by querying the loop access info via LAA::getInfo.  It
either returns the cached info or runs the analysis.

Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now.  The idea is that Loop Distribution won't support run-time stride
checking at least initially.

This means that when querying the analysis, symbolic stride information
can be provided optionally.  Whether stride information is used can
invalidate the cache entry and rerun the analysis.  Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.

Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.

On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass.  A large chunk of the
diff is due to LAI becoming a pointer from a reference.

A test will be added as part of the -analyze patch.

Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229893
2015-02-19 19:15:04 +00:00
Adam Nemet 436018c3ff [LoopAccesses] Cache the result of canVectorizeMemory
LAA will be an on-demand analysis pass, so we need to cache the result
of the analysis.  canVectorizeMemory is renamed to analyzeLoop which
computes the result.  canVectorizeMemory becomes the query function for
the cached result.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229892
2015-02-19 19:15:00 +00:00
Adam Nemet c922853b93 [LoopAccesses] Stash the report from the analysis rather than emitting it
The transformation passes will query this and then emit them as part of
their own report.  The currently only user LV is modified to do just
that.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229891
2015-02-19 19:14:56 +00:00
Adam Nemet f219c64723 [LoopAccesses] Make VectorizerParams global + fix for cyclic dep
As LAA is becoming a pass, we can no longer pass the params to its
constructor.  This changes the command line flags to have external
storage.  These can now be accessed both from LV and LAA.

VectorizerParams is moved out of LoopAccessInfo in order to shorten the
code to access it.

This commits also has the fix (D7731) to the break dependence cycle
between the analysis and vector libraries.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229890
2015-02-19 19:14:52 +00:00
Adam Nemet 04d4163e95 Revert "Reformat."
This reverts commit r229651.

I'd like to ultimately revert r229650 but this reformat stands in the
way.  I'll reformat the affected files once the the loop-access pass is
fully committed.

llvm-svn: 229889
2015-02-19 19:14:34 +00:00
NAKAMURA Takumi a250484c4c Reformat.
llvm-svn: 229651
2015-02-18 08:36:14 +00:00
NAKAMURA Takumi fa520c5f49 Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector.
r229622: "[LoopAccesses] Make VectorizerParams global"
  r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it"
  r229624: "[LoopAccesses] Cache the result of canVectorizeMemory"
  r229626: "[LoopAccesses] Create the analysis pass"
  r229628: "[LoopAccesses] Change debug messages from LV to LAA"
  r229630: "[LoopAccesses] Add canAnalyzeLoop"
  r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport"
  r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport"
  r229633: "[LoopAccesses] Add -analyze support"
  r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference"
  r229638: "Analysis: fix buildbots"

llvm-svn: 229650
2015-02-18 08:34:47 +00:00
Adam Nemet 85fd9f8d09 [LoopAccesses] Change LAA:getInfo to return a constant reference
As expected, this required a few more const-correctness fixes.

Based on Hal's feedback on D7684.

llvm-svn: 229634
2015-02-18 03:44:33 +00:00
Adam Nemet d7350dbb85 [LoopAccesses] Split out LoopAccessReport from VectorizerReport
The only difference between these two is that VectorizerReport adds a
vectorizer-specific prefix to its messages.  When LAA is used in the
vectorizer context the prefix is added when we promote the
LoopAccessReport into a VectorizerReport via one of the constructors.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229632
2015-02-18 03:44:25 +00:00
Adam Nemet 8b12afbeee [LoopAccesses] Add missing const to APIs in VectorizationReport
When I split out LoopAccessReport from this, I need to create some temps
so constness becomes necessary.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229631
2015-02-18 03:44:20 +00:00
Adam Nemet d0db4c1395 [LoopAccesses] Change debug messages from LV to LAA
Also add pass name as an argument to VectorizationReport::emitAnalysis.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229628
2015-02-18 03:43:37 +00:00