Commit Graph

1156 Commits

Author SHA1 Message Date
Elena Demikhovsky 376a18bd92 [Loop Vectorizer] Handling loops FP induction variables.
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
  A[i] = x;
  x -= fp_inc;
}

The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.

Differential Revision: https://reviews.llvm.org/D21330

llvm-svn: 276554
2016-07-24 07:24:54 +00:00
Michael Kuperstein 38e7298093 [SLPVectorizer] Vectorize reverse-order loads in horizontal reductions
When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.

This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.

This fixes PR28474.

Differential Revision: https://reviews.llvm.org/D22554

llvm-svn: 276477
2016-07-22 21:28:48 +00:00
Matthew Simpson 102729cf1b [LV] Move vector int induction update to end of latch
This patch moves the update instruction for vectorized integer induction phi
nodes to the end of the latch block. This ensures consistent placement of all
induction updates across all the kinds of int inductions we create (scalar,
splat vector, or vector phi).

Differential Revision: https://reviews.llvm.org/D22416

llvm-svn: 276339
2016-07-21 21:20:15 +00:00
Adam Nemet 7cfd5971ab [OptDiag,LV] Add hotness attribute to applied-optimization remarks
Test coverage is provided by modifying the function in the FP-math
testcase that we are allowed to vectorize.

llvm-svn: 276223
2016-07-21 01:07:13 +00:00
Adam Nemet 0e0e2d5d26 [OptDiag,LV] Add hotness attribute to the derived analysis remarks
This includes FPCompute and Aliasing.

Testcase is based on no_fpmath.ll.

llvm-svn: 276211
2016-07-20 23:50:32 +00:00
Adam Nemet 5b3a5cf6b0 [OptDiag,LV] Add hotness attribute to analysis remarks
The earlier change added hotness attribute to missed-optimization
remarks.  This follows up with the analysis remarks (the ones explaining
the reason for the missed optimization).

llvm-svn: 276192
2016-07-20 21:44:26 +00:00
Justin Lebar a272c12b73 [LSV] Don't move stores across may-load instrs, and loosen restrictions on moving loads.
Summary:
Previously we wouldn't move loads/stores across instructions that had
side-effects, where that was defined as may-write or may-throw.  But
this is not sufficiently restrictive: Stores can't safely be moved
across instructions that may load.

This patch also adds a DEBUG check that all instructions in our chain
are either loads or stores.

Reviewers: asbirlea

Subscribers: llvm-commits, jholewinski, arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22547

llvm-svn: 276171
2016-07-20 20:07:37 +00:00
Justin Lebar 62b03e344e [LSV] Vectorize up to side-effecting instructions.
Summary:
Previously if we had a chain that contained a side-effecting
instruction, we wouldn't vectorize it at all.  Now we'll vectorize
everything that comes before the side-effecting instruction.

Reviewers: asbirlea

Subscribers: arsenm, jholewinski, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22536

llvm-svn: 276170
2016-07-20 20:07:34 +00:00
Adam Nemet 67c8929a2c [LV] Add hotness attribute to missed-optimization remarks
The new OptimizationRemarkEmitter analysis pass is hooked up to both new
and old PM passes.

llvm-svn: 276080
2016-07-20 04:03:43 +00:00
Justin Lebar 6114b37838 [LSV] Don't assume that loads/stores appear in address order in the BB.
Summary:
getVectorizablePrefix previously didn't work properly in the face of
aliasing loads/stores.  It unwittingly assumed that the loads/stores
appeared in the BB in address order.  If they didn't, it would do the
wrong thing.

Reviewers: asbirlea, tstellarAMD

Subscribers: arsenm, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22535

llvm-svn: 276072
2016-07-20 00:55:12 +00:00
Justin Lebar 8778c62629 [LSV] Insert stores at the right point.
Summary:
Previously, the insertion point for stores was the last instruction in
Chain *before calling getVectorizablePrefixEndIdx*.  Thus if
getVectorizablePrefixEndIdx didn't return Chain.size(), we still would
insert at the last instruction in Chain.

This patch changes our internal API a bit in an attempt to make it less
prone to this sort of error.  As a result, we end up recalculating the
Chain's boundary instructions, but I think worrying about the speed hit
of this is a premature optimization right now.

Reviewers: asbirlea, tstellarAMD

Subscribers: mzolotukhin, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D22534

llvm-svn: 276056
2016-07-19 23:19:20 +00:00
Justin Lebar 2cf2c22870 [LSV] Use make_range, and reformat a DEBUG message. NFC
Summary:
The DEBUG message was hard to read because two Values were being printed
on the same line with only the delimiter "aliases".  This change makes
us print each Value on its own line.

Reviewers: asbirlea

Subscribers: llvm-commits, arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22533

llvm-svn: 276055
2016-07-19 23:19:18 +00:00
Justin Lebar 4ee8a2d024 [LSV] Nix two global (ish) variables in the LoadStoreVectorizer. NFC
Reviewers: asbirlea

Subscribers: mzolotukhin, llvm-commits, arsenm

Differential Revision: https://reviews.llvm.org/D22532

llvm-svn: 276054
2016-07-19 23:19:16 +00:00
Wei Mi 79997a24d7 Recommit the patch "Use uniforms set to populate VecValuesToIgnore".
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474

The recommit fixed the testcase global_alias.ll.

llvm-svn: 275936
2016-07-19 00:50:43 +00:00
Wei Mi f9afff71a2 Revert rL275912.
llvm-svn: 275915
2016-07-18 21:14:43 +00:00
Wei Mi 1fd25726af Use uniforms set to populate VecValuesToIgnore.
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474

llvm-svn: 275912
2016-07-18 20:59:53 +00:00
Matthew Simpson f855346f0b [LV] Swap A and B in interleaved access analysis (NFC)
This patch swaps A and B in the interleaved access analysis and clarifies
related comments. The algorithm is more intuitive if we let access A precede
access B in program order rather than the reverse. This change was requested in
the review of D19984.

llvm-svn: 275567
2016-07-15 15:22:43 +00:00
Matthew Simpson 96e881deb5 [LV] Rename StrideAccesses to AccessStrideInfo (NFC)
We now collect all accesses with a constant stride, not just the ones with a
stride greater than one. This change was requested in the review of D19984.

llvm-svn: 275473
2016-07-14 21:05:08 +00:00
Matthew Simpson 65ca32b83c [LV] Allow interleaved accesses in loops with predicated blocks
This patch allows the formation of interleaved access groups in loops
containing predicated blocks. However, the predicated accesses are prevented
from forming groups.

Differential Revision: https://reviews.llvm.org/D19694

llvm-svn: 275471
2016-07-14 20:59:47 +00:00
Matthew Simpson 3c3b4a257b [LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions
This patch prevents increases in the number of instructions, pre-instcombine,
due to induction variable scalarization. An increase in instructions can lead
to an increase in the compile-time required to simplify the induction
variables. We now maintain a new map for scalarized induction variables to
prevent us from converting between the scalar and vector forms.

This patch should resolve compile-time regressions seen after r274627.

llvm-svn: 275419
2016-07-14 14:36:06 +00:00
Alina Sbirlea 640a61cd8b Extended LoadStoreVectorizer to vectorize subchains.
Summary:
LSV used to abort vectorizing a chain for interleaved load/store accesses that alias.
Allow a valid prefix of the chain to be vectorized, mark just the prefix and retry vectorizing the remaining chain.

Reviewers: llvm-commits, jlebar, arsenm

Subscribers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D22119

llvm-svn: 275317
2016-07-13 21:20:01 +00:00
David Majnemer 81d877b392 [LoopVectorize] Further cleanups
No functional change is intended, just a minor cleanup.

llvm-svn: 275243
2016-07-13 03:24:38 +00:00
Michael Kuperstein 51078b81ca [LV] Do not invalidate use-lists we're iterating over.
Should make sanitizers happier.

llvm-svn: 275230
2016-07-12 23:11:34 +00:00
Michael Kuperstein a99c46cc73 [LV] Remove wrong assumption about LCSSA
The LCSSA pass itself will not generate several redundant PHI nodes in a single
exit block. However, such redundant PHI nodes don't violate LCSSA form, and may
be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass
itself. So, assuming a single PHI node per exit block is not safe.

llvm-svn: 275217
2016-07-12 21:24:06 +00:00
David Majnemer 9330b78431 [LoopVectorize] Assorted cleanups
Use range-based for loops instead of doing everything manually.
Use auto when appropriate.

No functional change is intended.

llvm-svn: 275205
2016-07-12 19:35:15 +00:00
Alina Sbirlea cbc6ac2afd Correct ordering of loads/stores.
Summary:
Aiming to correct the ordering of loads/stores. This patch changes the
insert point for loads to the position of the first load.
It updates the ordering method for loads to insert before, rather than after.

Before this patch the following sequence:
"load a[1], store a[1], store a[0], load a[2]"
Would incorrectly vectorize to "store a[0,1], load a[1,2]".
The correctness check was assuming the insertion point for loads is at
the position of the first load, when in practice it was at the last
load. An alternative fix would have been to invert the correctness check.
The current fix changes insert position but also requires reordering of
instructions before the vectorized load.

Updated testcases to reflect the changes.

Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm

Subscribers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D22071

llvm-svn: 275117
2016-07-11 22:34:29 +00:00
Alina Sbirlea 327955e057 Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer
Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains.
Add additional parameters: AddressSpace, Alignment, Fast.

Reviewers: llvm-commits, jlebar

Subscribers: arsenm, mzolotukhin

Differential Revision: http://reviews.llvm.org/D21935

llvm-svn: 275100
2016-07-11 20:46:17 +00:00
Benjamin Kramer 4d09892e9a Give helper classes/functions internal linkage. NFC.
llvm-svn: 275014
2016-07-10 11:28:51 +00:00
Sean Silva db90d4d9c1 [PM] Port LoopVectorize to the new PM.
llvm-svn: 275000
2016-07-09 22:56:50 +00:00
Sean Silva 0dacbd8f31 [PM] Fix a think-o. mv {Scalar,Vectorize}/SLPVectorize.h
llvm-svn: 274960
2016-07-09 03:11:29 +00:00
Xinliang David Li 7853c1dd73 Rename LoopAccessAnalysis to LoopAccessLegacyAnalysis /NFC
llvm-svn: 274927
2016-07-08 20:55:26 +00:00
Xinliang David Li 8c3554fa69 Remove duplicate inclusion /NFC
llvm-svn: 274921
2016-07-08 20:21:32 +00:00
Rui Ueyama a7e11a5d34 Add a missing semicolon.
llvm-svn: 274794
2016-07-07 20:21:50 +00:00
Alina Sbirlea 598f8aad98 Clang-format LoadStoreVectorizer
Reviewers: llvm-commits, jlebar, arsenm

Subscribers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D22107

llvm-svn: 274792
2016-07-07 20:10:35 +00:00
Elena Demikhovsky fc1e969dfc Fixed a bug in vectorizing GEP before gather/scatter intrinsic.
Vectorizing GEP was incorrect and broke SSA in some cases.
 
The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997.

Differential revision: http://reviews.llvm.org/D22035

llvm-svn: 274735
2016-07-07 06:06:46 +00:00
Matthew Simpson 433cb1dfe3 [LV] Don't widen trivial induction variables
We currently always vectorize induction variables. However, if an induction
variable is only used for counting loop iterations or computing addresses with
getelementptr instructions, we don't need to do this. Vectorizing these trivial
induction variables can create vector code that is difficult to simplify later
on. This is especially true when the unroll factor is greater than one, and we
create vector arithmetic when computing step vectors. With this patch, we check
if an induction variable is only used for counting iterations or computing
addresses, and if so, scalarize the arithmetic when computing step vectors
instead. This allows for greater simplification.

This patch addresses the suboptimal pointer arithmetic sequence seen in
PR27881.

Reference: https://llvm.org/bugs/show_bug.cgi?id=27881
Differential Revision: http://reviews.llvm.org/D21620

llvm-svn: 274627
2016-07-06 14:26:59 +00:00
Matthew Simpson 89188729c3 [LV] Refactor integer induction widening (NFC)
This patch also removes the SCEV variants of getStepVector() since they have no
uses after the refactoring.

Differential Revision: http://reviews.llvm.org/D21903

llvm-svn: 274558
2016-07-05 15:41:28 +00:00
Matt Arsenault 3add3a40a4 LoadStoreVectorizer: Fix warning about extra semicolon
llvm-svn: 274406
2016-07-01 23:26:54 +00:00
Alina Sbirlea 8d8aa5dd6c Address two correctness issues in LoadStoreVectorizer
Summary:
GetBoundryInstruction returns the last instruction as the instruction which follows or end(). Otherwise the last instruction in the boundry set is not being tested by isVectorizable().
Partially solve reordering of instructions. More extensive solution to follow.

Reviewers: tstellarAMD, llvm-commits, jlebar

Subscribers: escha, arsenm, mzolotukhin

Differential Revision: http://reviews.llvm.org/D21934

llvm-svn: 274389
2016-07-01 21:44:12 +00:00
Xinliang David Li 94734eef33 [PM] refactor LoopAccessInfo code part-2
Differential Revision: http://reviews.llvm.org/D21636

llvm-svn: 274334
2016-07-01 05:59:55 +00:00
Matt Arsenault a8576706e3 LoadStoreVectorizer: improvements: better pointer analysis
If OpB has an ADD NSW/NUW, we can use that to prove that adding 1
to OpA won't wrap if OpA + 1 == OpB.

Patch by Fiona Glaser

llvm-svn: 274324
2016-07-01 02:16:24 +00:00
Matt Arsenault 0101ecade0 LoadStoreVectorizer: Don't increase alignment with no align set
If no alignment was set on the load/stores, it would vectorize
to the new type even though this increases the default alignment.

llvm-svn: 274323
2016-07-01 02:09:38 +00:00
Matt Arsenault 370e8226c7 LoadStoreVectorizer: Check TTI for vec reg bit width
llvm-svn: 274322
2016-07-01 02:07:22 +00:00
Matt Arsenault 42ad17059a LoadStoreVectorizer: Fix assert when merging pointer ops
This needs to use inttoptr/ptrtoint if combining an int and pointer
load. If a pointer is used always do an integer load.

llvm-svn: 274321
2016-07-01 01:55:52 +00:00
Matt Arsenault 241f34cde8 LoadStoreVectorizer: Use AA metadata
This was not passing the full instruction with metadata
to the alias query.

llvm-svn: 274318
2016-07-01 01:47:46 +00:00
Matt Arsenault d7e8898bdd LoadStoreVectorizer: if one element of a vector is integer, default to
integer.

Fixes issues on some architectures where we use arithmetic ops to build
vectors, which can cause bad things to happen for loads/stores of mixed
types.

Patch by Fiona Glaser

llvm-svn: 274307
2016-07-01 00:37:01 +00:00
Matt Arsenault 8a4ab5e19f LoadStoreVectorizer: Fix crashes on sub-byte types
llvm-svn: 274306
2016-07-01 00:36:54 +00:00
Matt Arsenault 079d0f19a2 LoadStoreVectorizer: Check skipFunction first.
Also add test I forgot to add to r274296.

llvm-svn: 274299
2016-06-30 23:50:18 +00:00
Matt Arsenault 2cbe52b990 LoadStoreVectorizer: Skip optnone functions
llvm-svn: 274296
2016-06-30 23:30:29 +00:00
Matt Arsenault 08debb0244 Add LoadStoreVectorizer pass
This was contributed by Apple, and I've been working on
minimal cleanups and generalizing it.

llvm-svn: 274293
2016-06-30 23:11:38 +00:00
Matt Arsenault 2ec640a62f Don't use unchecked dyn_cast
llvm-svn: 274282
2016-06-30 21:18:06 +00:00
Matt Arsenault 727e279ac4 SLPVectorizer: Move propagateMetadata to VectorUtils
This will be re-used by the LoadStoreVectorizer.

Fix handling of range metadata and testcase by Justin Lebar.

llvm-svn: 274281
2016-06-30 21:17:59 +00:00
Wei Mi 95685faeee Refine the set of UniformAfterVectorization instructions.
Except the seed uniform instructions (conditional branch and consecutive ptr
instructions), dependencies to be added into uniform set should only be used
by existing uniform instructions or intructions outside of current loop.

Differential Revision: http://reviews.llvm.org/D21755

llvm-svn: 274262
2016-06-30 18:42:56 +00:00
Adam Nemet e1af3c635c [LV] Improve accuracy and formatting of function comment
llvm-svn: 274182
2016-06-29 22:04:10 +00:00
Elena Demikhovsky 5e21c94f25 Reverted patch 273864
llvm-svn: 274115
2016-06-29 10:01:06 +00:00
Adam Nemet ad437fff53 [Diag] Add getter shouldAlwaysPrint. NFC
For the new hotness attribute, the API will take the pass rather than
the pass name so we can no longer play the trick of AlwaysPrint being a
special pass name. This adds a getter to help the transition.

There is also a corresponding clang patch.

llvm-svn: 274100
2016-06-29 04:55:19 +00:00
Elena Demikhovsky 6f2ec8104a Fixed crash of SLP Vectorizer on KNL
The bug is connected to vector GEPs.
https://llvm.org/bugs/show_bug.cgi?id=28313

llvm-svn: 273919
2016-06-27 20:07:00 +00:00
Elena Demikhovsky 4c58b2761a Fixed consecutive memory access detection in Loop Vectorizer.
It did not handle correctly cases without GEP.

The following loop wasn't vectorized:

for (int i=0; i<len; i++)

  *to++ = *from++;

I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.

Re-commit rL273257 - revision: http://reviews.llvm.org/D20789

llvm-svn: 273864
2016-06-27 11:19:23 +00:00
Benjamin Kramer 135f735af1 Apply clang-tidy's modernize-loop-convert to most of lib/Transforms.
Only minor manual fixes. No functionality change intended.

llvm-svn: 273808
2016-06-26 12:28:59 +00:00
Matthew Simpson e794678404 [LV] Preserve order of dependences in interleaved accesses analysis
The interleaved access analysis currently assumes that the inserted run-time
pointer aliasing checks ensure the absence of dependences that would prevent
its instruction reordering. However, this is not the case.

Issues can arise from how code generation is performed for interleaved groups.
For a load group, all loads in the group are essentially moved to the location
of the first load in program order, and for a store group, all stores in the
group are moved to the location of the last store. For groups having members
involved in a dependence relation with any other instruction in the loop, this
reordering can violate the dependence.

This patch teaches the interleaved access analysis how to avoid breaking such
dependences, and should fix PR27626.

An assumption of the original analysis was that the accesses had been collected
in "program order". The analysis was then simplified by visiting the accesses
bottom-up. However, this ordering was never guaranteed for anything other than
single basic block loops. Thus, this patch also enforces the desired ordering.

Reference: https://llvm.org/bugs/show_bug.cgi?id=27626
Differential Revision: http://reviews.llvm.org/D19984

llvm-svn: 273687
2016-06-24 15:33:25 +00:00
Elena Demikhovsky a266cf0518 reverted the prev commit due to assertion failure
llvm-svn: 273258
2016-06-21 12:10:11 +00:00
Elena Demikhovsky 9823c995bc Fixed consecutive memory access detection in Loop Vectorizer.
It did not handle correctly cases without GEP.

The following loop wasn't vectorized:

for (int i=0; i<len; i++)
  *to++ = *from++;

I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.

Differential revision: http://reviews.llvm.org/D20789

llvm-svn: 273257
2016-06-21 11:32:01 +00:00
Adam Nemet a9f09c6245 [LAA] Enable symbolic stride speculation for all LAA clients
This is a functional change for LLE and LDist.  The other clients (LV,
LVerLICM) already had this explicitly enabled.

The temporary boolean parameter to LAA is removed that allowed turning
off speculation of symbolic strides.  This makes LAA's caching interface
LAA::getInfo only take the loop as the parameter.  This makes the
interface more friendly to the new Pass Manager.

The flag -enable-mem-access-versioning is moved from LV to a LAA which
now allows turning off speculation globally.

llvm-svn: 273064
2016-06-17 22:35:41 +00:00
Benjamin Kramer 1afc1de406 Apply another batch of fixes from clang-tidy's performance-unnecessary-value-param.
Contains some manual fixes. No functionality change intended.

llvm-svn: 273047
2016-06-17 20:41:14 +00:00
Adam Nemet c953bb9953 [LV] Move management of symbolic strides to LAA. NFCI
This is still NFCI, so the list of clients that allow symbolic stride
speculation does not change (yes: LV and LoopVersioningLICM, no: LLE,
LDist).  However since the symbolic strides are now managed by LAA
rather than passed by client a new bool parameter is used to enable
symbolic stride speculation.

The existing test Transforms/LoopVectorize/version-mem-access.ll checks
that stride speculation is performed for LV.

The previously added test Transforms/LoopLoadElim/symbolic-stride.ll
ensures that no speculation is performed for LLE.

The next patch will change the functionality and turn on symbolic stride
speculation in all of LAA's clients and remove the bool parameter.

llvm-svn: 272970
2016-06-16 22:57:55 +00:00
Adam Nemet 886e0617a2 [LV] Make getSymbolicStrides return a pointer rather than a reference. NFC
Turns out SymbolicStrides is actually used in canVectorizeWithIfConvert
before it gets set up in canVectorizeMemory.

This works fine as long as SymbolicStrides resides in LV since we just
have an empty map.  Based on this the conclusion is made that there are
no symbolic strides which is conservatively correct.

However once SymbolicStrides becomes part of LAI, LAI is nullptr at this
point so we need to differentiate the uninitialized state by returning a
nullptr for SymbolicStrides.

llvm-svn: 272966
2016-06-16 21:55:10 +00:00
Sean Silva a4cfb620df Attempt to define friend function more portably.
Patch written by Reid. I verified it locally with clang.

llvm-svn: 272875
2016-06-16 07:00:19 +00:00
Adam Nemet 76a41d3a25 [LV] Make the new getter return a const reference. NFC
LoopVectorizationLegality holds a constant reference to LAI, so this
will have to be const as well.

Also added missed function comment.

llvm-svn: 272851
2016-06-15 22:58:27 +00:00
Adam Nemet 82b9d2a72c [LV] Add getter function for LoopVectorizationLegality::Strides. NFC
This should help moving Strides to LAA later.

llvm-svn: 272796
2016-06-15 15:49:46 +00:00
Adam Nemet 927b54e48a [LV] Remove more unused functions. NFC
LoopVectorizationLegality::strides_begin/end are also unused.

llvm-svn: 272781
2016-06-15 12:26:15 +00:00
Adam Nemet b1973be8e2 [LV] Remove unused function. NFC
LoopVectorizationLegality::mustCheckStrides is unused.

llvm-svn: 272780
2016-06-15 12:26:11 +00:00
Sean Silva 7eeda20c72 Work around MSVC "friend" semantics.
The error on clang-x86-win2008-selfhost is:

C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(955) : error C2248: 'llvm::slpvectorizer::BoUpSLP::ScheduleData' : cannot access private struct declared in class 'llvm::slpvectorizer::BoUpSLP'
        C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(608) : see declaration of 'llvm::slpvectorizer::BoUpSLP::ScheduleData'
        C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(337) : see declaration of 'llvm::slpvectorizer::BoUpSLP'

I reproduced this locally with both MSVC 2013 and MSVC 2015.

llvm-svn: 272772
2016-06-15 10:51:40 +00:00
Sean Silva ec3ed2097b Speculative buildbot fix.
This wasn't failing for me with clang as the compiler. I think GCC may
disagree with clang about whether a friend declaration introduces a
declaration in the enclosing namespace (or something).

Example error:

/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:950:77: error: ‘llvm::raw_ostream& llvm::slpvectorizer::operator<<(llvm::raw_ostream&, const llvm::slpvectorizer::BoUpSLP::ScheduleData&)’ should have been declared inside ‘llvm::slpvectorizer’
                                              const BoUpSLP::ScheduleData &SD) {
                                                                             ^

llvm-svn: 272767
2016-06-15 09:00:33 +00:00
Sean Silva e0a9e66040 [PM] Port SLPVectorizer to the new PM
This uses the "runImpl" approach to share code with the old PM.

Porting to the new PM meant abandoning the anonymous namespace enclosing
most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal
compared to having to pull the pass class into a header which the new PM
requires since it calls the constructor directly).

llvm-svn: 272766
2016-06-15 08:43:40 +00:00
Michael Kuperstein 3277a05fcf Recommit [LV] Enable vectorization of loops where the IV has an external use
r272715 broke libcxx because it did not correctly handle cases where the
last iteration of one IV is the second-to-last iteration of another.

Original commit message:
Vectorizing loops with "escaping" IVs has been disabled since r190790, due to
PR17179. This re-enables it, with support for external use of both
"post-increment" (last iteration) and "pre-increment" (second-to-last iteration)
IVs.

llvm-svn: 272742
2016-06-15 00:35:26 +00:00
Michael Kuperstein d4bd3ab5fe Reverting r272715 since it broke libcxx.
llvm-svn: 272730
2016-06-14 22:30:41 +00:00
Michael Kuperstein 23b6d6adc9 [LV] Enable vectorization of loops where the IV has an external use
Vectorizing loops with "escaping" IVs has been disabled since r190790, due to
PR17179. This re-enables it, with support for external use of both
"post-increment" (last iteration) and "pre-increment" (second-to-last iteration)
IVs.

Differential Revision: http://reviews.llvm.org/D21048

llvm-svn: 272715
2016-06-14 21:27:27 +00:00
Easwaran Raman e12c487b8c [PM] Port LCSSA to the new PM.
Differential Revision: http://reviews.llvm.org/D21090

llvm-svn: 272294
2016-06-09 19:44:46 +00:00
Michael Kuperstein c5edcdeb0e [LV] Use vector phis for some secondary induction variables
Previously, we materialized secondary vector IVs from the primary scalar IV,
by offseting the primary to match the correct start value, and then broadcasting
it - inside the loop body. Instead, we can use a real vector IV, like we do for
the primary.

This enables using vector IVs for secondary integer IVs whose type matches the
type of the primary.

Differential Revision: http://reviews.llvm.org/D20932

llvm-svn: 272283
2016-06-09 18:03:15 +00:00
Xinliang David Li ecde1c7f3d Revert r272194 No need for it if loop Analysis Manager is used
llvm-svn: 272243
2016-06-09 03:22:39 +00:00
Michael Zolotukhin 987ab631fa [SLPVectorizer] Handle GEP with differing constant index types
Summary:
This fixes PR27617.

Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.

The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.

Reviewers: nadav, mzolotukhin

Subscribers: JesperAntonsson, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D20685

Patch by Jesper Antonsson!

llvm-svn: 272206
2016-06-08 21:55:16 +00:00
Xinliang David Li 572135f717 [PM] Refector LoopAccessInfo analysis code
This is the preparation patch to port the analysis to new PM

Differential Revision: http://reviews.llvm.org/D20560

llvm-svn: 272194
2016-06-08 20:15:37 +00:00
Michael Kuperstein 3a3c64d23e [LV] For some IVs, use vector phis instead of widening in the loop body
Previously, whenever we needed a vector IV, we would create it on the fly,
by splatting the scalar IV and adding a step vector. Instead, we can create a
real vector IV. This tends to save a couple of instructions per iteration.

This only changes the behavior for the most basic case - integer primary
IVs with a constant step.

Differential Revision: http://reviews.llvm.org/D20315

llvm-svn: 271410
2016-06-01 17:16:46 +00:00
Guozhi Wei b994f4cdbc [SLP] Pass in correct alignment when query memory access cost
This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897.

When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case.

It can be fixed by simply giving correct alignment.

llvm-svn: 271333
2016-05-31 20:41:19 +00:00
Michael Kuperstein 9a81b62a01 [BBVectorize] Don't vectorize selects with a scalar condition and vector operands.
This fixes PR27879.

Differential Revision: http://reviews.llvm.org/D20659

llvm-svn: 270888
2016-05-26 18:43:57 +00:00
Sanjay Patel 6be09ee827 fix typo; NFC
llvm-svn: 270760
2016-05-25 21:03:31 +00:00
Sanjay Patel 929ebf5a54 fix typos; NFC
llvm-svn: 270579
2016-05-24 16:51:26 +00:00
Wei Mi 0456d9dd18 Recommit r255691 since PR26509 has been fixed.
llvm-svn: 270113
2016-05-19 20:38:03 +00:00
James Molloy a854c0a0c3 [VectorUtils] Fix nasty use-after-free
In truncateToMinimalBitwidths() we were RAUW'ing an instruction then erasing it. However, that intruction could be cached in the map we're iterating over. The first check is "I->use_empty()" which in most cases would return true, as the (deleted) object was RAUW'd first so would have zero use count. However in some cases the object could have been polluted or written over and this wouldn't be the case. Also it makes valgrind, asan and traditionalists who don't like their compiler to crash sad.

No testcase as there are no externally visible symptoms apart from a crash if the stars align.

Fixes PR26509.

llvm-svn: 269908
2016-05-18 11:57:58 +00:00
Matthew Simpson e43198dc4b [LV] Ensure safe VF for loops with interleaved accesses
The selection of the vectorization factor currently doesn't consider
interleaved accesses. The vectorization factor is based on the maximum safe
dependence distance computed by LAA. However, for loops with interleaved
groups, we should instead base the vectorization factor on the maximum safe
dependence distance divided by the maximum interleave factor of all the
interleaved groups. Interleaved accesses not in a group will be scalarized.

Differential Revision: http://reviews.llvm.org/D20241

llvm-svn: 269659
2016-05-16 15:08:20 +00:00
Matthew Simpson c326d050ca Correct spelling in comment (NFC)
llvm-svn: 269482
2016-05-13 21:01:07 +00:00
Simon Pilgrim 3ac4d831ee Tidied up switch cases. NFCI.
Split FCMP//ICMP/SEL from the basic arithmetic cost functions. They were not sharing any notable code path (just the return) and were repeatedly testing the opcode.

llvm-svn: 269348
2016-05-12 21:01:20 +00:00
Michael Kuperstein 82e7df5a58 [LoopVectorizer] LoopVectorBody doesn't need to be a vector. NFC.
LoopVectorBody was changed from a single pointer to a SmallVector when
store predication was introduced in r200270. Since r247139, store predication
no longer splits the vector loop body in-place, so we can go back to having
a single LoopVectorBody block.

This reverts the no-longer-needed changes from r200270.

llvm-svn: 269321
2016-05-12 18:44:51 +00:00
Elena Demikhovsky c434d091c5 [LoopVectorize] Handling induction variable with non-constant step.
Allow vectorization when the step is a loop-invariant variable.
This is the loop example that is getting vectorized after the patch:

 int int_inc;
 int bar(int init, int *restrict A, int N) {

  int x = init;
  for (int i=0;i<N;i++){
    A[i] = x;
    x += int_inc;
  }
  return x;
 }

"x" is an induction variable with *loop-invariant* step.
But it is not a primary induction. Primary induction variable with non-constant step is not handled yet.

Differential Revision: http://reviews.llvm.org/D19258

llvm-svn: 269023
2016-05-10 07:33:35 +00:00
Denis Zobnin 15d1e64b2b [LAA] Rename "isStridedPtr" with "getPtrStride". NFC.
Changing misleading function name was approved in http://reviews.llvm.org/D17268.
Patch by Roman Shirokiy.

llvm-svn: 269021
2016-05-10 05:55:16 +00:00
Chad Rosier b438a327d7 Remove dead include. NFC.
llvm-svn: 268655
2016-05-05 17:55:51 +00:00
Silviu Baranga 28eb344140 Fix unused variable warning after r268632
llvm-svn: 268634
2016-05-05 15:27:57 +00:00
Silviu Baranga c05bab8a9c [LV] Identify more induction PHIs by coercing expressions to AddRecExprs
Summary:
Some PHIs can have expressions that are not AddRecExprs due to the presence
of sext/zext instructions. In order to prevent the Loop Vectorizer from
bailing out when encountering these PHIs, we now coerce the SCEV
expressions to AddRecExprs using SCEV predicates (when possible).

We only do this when the alternative would be to not vectorize.

Reviewers: mzolotukhin, anemet

Subscribers: mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17153

llvm-svn: 268633
2016-05-05 15:20:39 +00:00
Silviu Baranga 7e0d4353f2 [LV] Refactor the validation of PHI inductions. NFC
This moves the validation of PHI inductions into a
separate method, making it easier to reuse this
logic.

llvm-svn: 268632
2016-05-05 15:14:01 +00:00
Dehao Chen d55bc4c7ab clang-format some files in preparation of coming patch reviews.
llvm-svn: 268583
2016-05-05 00:54:54 +00:00
David Majnemer 13d5526392 [SLPVectorizer] Add operand bundles to vectorized functions
SLPVectorizing a call site should result in further propagation of its
bundles.

llvm-svn: 268004
2016-04-29 07:09:51 +00:00
David Majnemer 50ddc0e1b6 [LoopVectorize] Add operand bundles to vectorized functions
Also, do not crash when calculating a cost model for loop-invariant
token values.

llvm-svn: 268003
2016-04-29 07:09:48 +00:00
Michael Zolotukhin 1816d03b7d [PR25281] Remove AAResultsWrapper from preserved analyses of loop vectorizer.
We don't preserve AAResults, because, for one, we don't preserve SCEV-AA.
That fixes PR25281.

llvm-svn: 267980
2016-04-29 03:31:25 +00:00
Hal Finkel 1b66f7e3c8 [LoopVectorize] Keep hints from original loop on the vector loop
We need to keep loop hints from the original loop on the new vector loop.
Failure to do this meant that, for example:

  void foo(int *b) {
  #pragma clang loop unroll(disable)
    for (int i = 0; i < 16; ++i)
      b[i] = 1;
  }

this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the
hints that unrolling should be disabled, and then we'd unroll it.

llvm-svn: 267970
2016-04-29 01:27:40 +00:00
Arch D. Robison 0e61034018 [SLPVectorizer] Extend SLP Vectorizer to deal with aggregates.
The refactoring portion part was done as r267748.

http://reviews.llvm.org/D14185

llvm-svn: 267899
2016-04-28 16:11:45 +00:00
Matthew Simpson 622b95be7b [LV] Reallow positive-stride interleaved load groups with gaps
We previously disallowed interleaved load groups that may cause us to
speculatively access memory out-of-bounds (r261331). We did this by ensuring
each load group had an access corresponding to the first and last member.
Instead of bailing out for these interleaved groups, this patch enables us to
peel off the last vector iteration, ensuring that we execute at least one
iteration of the scalar remainder loop. This solution was proposed in the
review of the previous patch.

Differential Revision: http://reviews.llvm.org/D19487

llvm-svn: 267751
2016-04-27 18:21:36 +00:00
Arch D. Robison aca7c412b4 [SLPVectorizer] Refactor where MinVecRegSize and MaxVecRegSize live.
This is the first of two commits for extending SLP Vectorizer to deal with aggregates.
This commit merely refactors existing logic.

http://reviews.llvm.org/D14185

llvm-svn: 267748
2016-04-27 17:46:25 +00:00
Matthew Simpson e5dfb08fcb [TTI] Add hook for vector extract with extension
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.

Differential Revision: http://reviews.llvm.org/D18523

llvm-svn: 267725
2016-04-27 15:20:21 +00:00
Elena Demikhovsky 308a7eb0d2 Masked Store in Loop Vectorizer - bugfix
Fixed a bug in loop vectorization with conditional store.

Differential Revision: http://reviews.llvm.org/D19532

llvm-svn: 267597
2016-04-26 20:18:04 +00:00
Hal Finkel 411d31ad72 [LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops
I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.

One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.

The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.

Differential Revision: http://reviews.llvm.org/D19512

llvm-svn: 267514
2016-04-26 02:00:36 +00:00
Andrew Kaylor aa641a5171 Re-commit optimization bisect support (r267022) without new pass manager support.
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231
2016-04-22 22:06:11 +00:00
Vedant Kumar 6013f45f92 Revert "Initial implementation of optimization bisect support."
This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115
2016-04-22 06:51:37 +00:00
Andrew Kaylor f0f279291c Initial implementation of optimization bisect support.
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022
2016-04-21 17:58:54 +00:00
David Majnemer b4b27230bf [ValueTracking, VectorUtils] Refactor getIntrinsicIDForCall
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic.  If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.

Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.

llvm-svn: 266801
2016-04-19 19:10:21 +00:00
Michael Kuperstein de16b44f74 Port DemandedBits to the new pass manager.
Differential Revision: http://reviews.llvm.org/D18679

llvm-svn: 266699
2016-04-18 23:55:01 +00:00
Mehdi Amini b550cb1750 [NFC] Header cleanup
Removed some unused headers, replaced some headers with forward class declarations.

Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'

Patch by Eugene Kosov <claprix@yandex.ru>

Differential Revision: http://reviews.llvm.org/D19219

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
2016-04-18 09:17:29 +00:00
Renato Golin 5cb666add7 [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

llvm-svn: 266363
2016-04-14 20:42:18 +00:00
David Majnemer 0f26b0aeb4 [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.

It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM.  Notably, ARMv7 NEON's vmin has the
-NAN semantics.

N.B.  Of course, it is only safe to do this if the intrinsic call is
marked nnan.

llvm-svn: 266279
2016-04-14 07:13:24 +00:00
Elena Demikhovsky 751ed0a06a Loop vectorization with uniform load
Vectorization cost of uniform load wasn't correctly calculated.
As a result, a simple loop that loads a uniform value wasn't vectorized.

Differential Revision: http://reviews.llvm.org/D18940

llvm-svn: 265901
2016-04-10 16:53:19 +00:00
David Majnemer 60c6abc3cc [LoopVectorize] Register cloned assumptions
InstCombine cannot effectively remove redundant assumptions without them
registered in the assumption cache.  The vectorizer can create identical
assumptions but doesn't register them with the cache, resulting in
slower compile times because InstCombine tries to reason about a lot
more assumptions.

Fix this by registering the cloned assumptions.

llvm-svn: 265800
2016-04-08 16:37:10 +00:00
Silviu Baranga 6f444dfd55 Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.

Original commit message:

Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265786
2016-04-08 14:29:09 +00:00
Silviu Baranga a393baf1fd Revert r265535 until we know how we can fix the bots
llvm-svn: 265541
2016-04-06 14:06:32 +00:00
Silviu Baranga 72b4a4a330 [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265535
2016-04-06 13:18:26 +00:00
David Majnemer 6f1f85f0e1 [SLPVectorizer] Don't insert an extractelement before a catchswitch
A catchswitch cannot be preceded by another instruction in the same
basic block (other than a PHI node).

Instead, insert the extract element right after the materialization of
the vectorized value.  This isn't optimal but is a reasonable compromise
given the constraints of WinEH.

This fixes PR27163.

llvm-svn: 265157
2016-04-01 17:28:15 +00:00
Hal Finkel 2e0ff2b244 [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

llvm-svn: 264904
2016-03-30 19:37:08 +00:00
Nirav Dave 8dd66e5753 Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

llvm-svn: 264872
2016-03-30 15:41:12 +00:00
Chad Rosier 2e5c526bb1 [SLP] Remove unnecessary member variables by using container APIs.
This changes the debug output, but still retains its usefulness.
Differential Revision: http://reviews.llvm.org/D18324

llvm-svn: 263975
2016-03-21 19:47:44 +00:00
Adam Nemet b0c4eae073 [LoopVectorize] Annotate versioned loop with noalias metadata
Summary:
Use the new LoopVersioning facility (D16712) to add noalias metadata in
the vector loop if we versioned with memchecks.  This can enable some
optimization opportunities further down the pipeline (see the included
test or the benchmark improvement quoted in D16712).

The test also covers the bug I had in the initial version in D16712.

The vectorizer did not previously use LoopVersioning.  The reason is
that the vectorizer performs its transformations in single shot.  It
creates an empty single-block vector loop that it then populates with
the widened, if-converted instructions.  Thus creating an intermediate
versioned scalar loop seems wasteful.

So this patch (rather than bringing in LoopVersioning fully) adds a
special interface to LoopVersioning to allow the vectorizer to add
no-alias annotation while still performing its own versioning.

As the vectorizer propagates metadata from the instructions in the
original loop to the vector instructions we also check the pointer in
the original instruction and see if LoopVersioning can add no-alias
metadata based on the issued memchecks.

Reviewers: hfinkel, nadav, mzolotukhin

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17191

llvm-svn: 263744
2016-03-17 20:32:37 +00:00
Chad Rosier fea398188c [SLP] Make DataLayout a member variable.
llvm-svn: 263656
2016-03-16 19:48:42 +00:00
Adam Nemet fdb20595a1 [LV] Preserve LoopInfo when store predication is used
This was a latent bug that got exposed by the change to add LoopSimplify
as a dependence to LoopLoadElimination.  Since LoopInfo was corrupted
after LV, LoopSimplify mis-compiled nbench in the test-suite (more
details in the PR).

The problem was that when we create the blocks for predicated stores we
didn't add those to any loops.

The original testcase for store predication provides coverage for this
assuming we verify LI on the way out of LV.

Fixes PR26952.

llvm-svn: 263565
2016-03-15 18:06:20 +00:00
Chad Rosier ebe559019b [SLP] Update comment to reflect reality. NFC.
llvm-svn: 263548
2016-03-15 13:27:58 +00:00
Keno Fischer a91ae8336b [SLPVectorizer] Fix dependency list
Summary:
DemandedBits was added to the requirements of SLPVectorizer in rL261212
(and various earlier version of it), but the appropriate initialization
statement was accidentally forgotten.

Ref [[ https://github.com/JuliaLang/julia/issues/14998 | JuliaLang/julia#14998 ]].

Patch by Yichao Yu.
Reviewers: mssimpso
Differential Revision: http://reviews.llvm.org/D18152

llvm-svn: 263476
2016-03-14 20:04:24 +00:00
Mehdi Amini ba9fba81d6 Remove PreserveNames template parameter from IRBuilder
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263393
2016-03-13 21:05:13 +00:00
Eric Christopher 35abd051c0 Temporarily revert:
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date:   Fri Mar 11 17:15:50 2016 +0000

    Remove PreserveNames template parameter from IRBuilder

    Summary:
    Following r263086, we are now relying on a flag on the Context to
    discard Value names in release builds.

    Reviewers: chandlerc

    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: http://reviews.llvm.org/D18023

    From: Mehdi Amini <mehdi.amini@apple.com>

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
    91177308-0d34-0410-b5e6-96231b3b80d8

until we can figure out what to do about clang and Release build testing.

This reverts commit 263258.

llvm-svn: 263321
2016-03-12 01:47:22 +00:00
Mehdi Amini 99eab3dd06 Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.

Reviewers: chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18023

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
2016-03-11 17:15:50 +00:00
Michael Zolotukhin b88fbe08fc [SLP] Add -slp-min-reg-size command line option.
MinVecRegSize is currently hardcoded to 128; this patch adds a cl::opt
to allow changing it. I tried not to change any existing behavior for the default
case.

Differential revision: http://reviews.llvm.org/D13278

llvm-svn: 263089
2016-03-10 02:49:47 +00:00
Duncan P. N. Exon Smith e9bc579c37 ADT: Remove == and != comparisons between ilist iterators and pointers
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.

Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators.  (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)

There should be NFC here.

llvm-svn: 261498
2016-02-21 20:39:50 +00:00
Hans Wennborg a0f7090563 Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions."
It caused PR26509.

llvm-svn: 261368
2016-02-19 21:40:12 +00:00
Matthew Simpson 29c997c1a1 [LV] Vectorize first-order recurrences
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.

In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197

llvm-svn: 261346
2016-02-19 17:56:08 +00:00
Silviu Baranga ad1dafb2c3 [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization
Summary:
If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).

We now discard interleaved load groups that don't have the first and
load in the group.

Reviewers: hfinkel, rengolin

Subscribers: rengolin, llvm-commits, mzolotukhin, anemet

Differential Revision: http://reviews.llvm.org/D17332

llvm-svn: 261331
2016-02-19 15:46:10 +00:00
Matthew Simpson 92821cb4a8 Reapply commit r259357 with a fix for PR26629
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.

llvm-svn: 261212
2016-02-18 14:14:40 +00:00
Elena Demikhovsky 88e76cad16 Create masked gather and scatter intrinsics in Loop Vectorizer.
Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
Differential Revision: http://reviews.llvm.org/D15690

llvm-svn: 261140
2016-02-17 19:23:04 +00:00
David Majnemer f48bcb2bd9 Revert "Reapply commit r258404 with fix."
This reverts commit r259357, it caused PR26629.

llvm-svn: 261137
2016-02-17 19:02:36 +00:00
Silviu Baranga ec7063ac77 [LV] Add support for insertelt/extractelt processing during type truncation
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.

This change adds support for truncating insert/extract element
operations, and adds a regression test.

Reviewers: jmolloy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17078

llvm-svn: 260893
2016-02-15 15:38:17 +00:00
Matthew Simpson a4e43c5b51 [SLP] Add debug output for extract cost (NFC)
llvm-svn: 260614
2016-02-11 23:06:40 +00:00
Silviu Baranga ea63a7f512 [SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.

Original commit message:

[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection

Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260112
2016-02-08 17:02:45 +00:00
Igor Breger 1a39a34eae [SLP] Fix placement of debug statement (NFC)
By Ayal Zaks (ayal.zaks@intel.com)

Differential Revision: http://reviews.llvm.org/D16976

llvm-svn: 260094
2016-02-08 14:11:39 +00:00
Silviu Baranga 41b4973329 Revert r260086 and r260085. They have broken the memory
sanitizer bots.

llvm-svn: 260087
2016-02-08 11:56:15 +00:00
Silviu Baranga a35fadc7c4 [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260085
2016-02-08 10:45:50 +00:00
Wei Mi a49559befb [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.

1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
   In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
   The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.

2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
   The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.

Differential Revision: http://reviews.llvm.org/D12090

llvm-svn: 259736
2016-02-04 01:27:38 +00:00