Commit Graph

633 Commits

Author SHA1 Message Date
Erik Eckstein 8354cfaf95 Fix: SLPVectorizer tried to move an instruction which was replaced by a vector instruction.
For a detailed description of the problem see the comment in the test file.
The problematic moveBefore() calls are not required anymore because the new
scheduling algorithm ensures a correct ordering anyway.

llvm-svn: 216656
2014-08-28 07:04:02 +00:00
Michael Zolotukhin 5dc466b863 [SLP] Re-enable vectorization of GEP expressions (re-apply r210342 with a fix).
llvm-svn: 216549
2014-08-27 15:01:18 +00:00
Craig Topper e1d1294853 Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.
llvm-svn: 216525
2014-08-27 05:25:25 +00:00
Joerg Sonnenberger cb5674b9c2 Revert r210342 and r210343, add test case for the crasher.
PR 20642.

llvm-svn: 216475
2014-08-26 19:06:41 +00:00
Sanjay Patel 4e31cdabd1 fix typos in comments
llvm-svn: 216424
2014-08-26 00:59:15 +00:00
Karthik Bhat 7f33ff7dea Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)

llvm-svn: 216371
2014-08-25 04:56:54 +00:00
Erik Eckstein b49d7abb7b fix: SLPVectorizer crashes for unreachable blocks containing not schedulable instructions.
In unreachable blocks it's legal to have instructions like "%x = op %x".
Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for
unreachable blocks and ignore them.

Fixes bug 20646.

llvm-svn: 216256
2014-08-22 01:18:39 +00:00
Craig Topper 71b7b68b74 Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 216158
2014-08-21 05:55:13 +00:00
James Molloy 82c995d450 [LoopVectorizer] Limit unroll factor in the presence of nested reductions.
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.

llvm-svn: 216140
2014-08-20 23:53:52 +00:00
Renato Golin 06d601fb3e Revert "Small refactor on VectorizerHint for deduplication"
This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness.

llvm-svn: 215999
2014-08-19 18:08:50 +00:00
Renato Golin dd6394d833 Small refactor on VectorizerHint for deduplication
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.

This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.

llvm-svn: 215994
2014-08-19 17:30:43 +00:00
Craig Topper 6230691c91 Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size."
Getting a weird buildbot failure that I need to investigate.

llvm-svn: 215870
2014-08-18 00:24:38 +00:00
Craig Topper 5229cfd163 Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 215868
2014-08-17 23:47:00 +00:00
Rafael Espindola ea46c32f81 Introduce a helper to combine instruction metadata.
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.

Patch by Björn Steinbrink!

llvm-svn: 215723
2014-08-15 15:46:38 +00:00
James Molloy 65b08f5e46 [LoopVectorizer] Enable support for floating-point subtraction reductions
llvm-svn: 215200
2014-08-08 12:41:08 +00:00
Arnold Schwaighofer 4fb3c47456 SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment
We were using the pointer type which is incorrect.

llvm-svn: 215162
2014-08-07 22:47:27 +00:00
James Molloy 2b8933c354 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Erik Eckstein 26a1bf7d84 fix bug 20513 - Crash in SLP Vectorizer
llvm-svn: 214638
2014-08-02 19:39:42 +00:00
Tyler Nowicki 064896bbc5 Add diagnostics to the vectorizer cost model.
When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision.

Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive.

Reviewed by Arnold Schwaighofer

llvm-svn: 214599
2014-08-02 00:14:03 +00:00
Erik Eckstein 690dd037d9 SLPVectorizer: fix build problem in Release configuration
llvm-svn: 214496
2014-08-01 09:47:38 +00:00
Erik Eckstein c80e1dc081 SLPVectorizer: improved scheduling algorithm.
llvm-svn: 214494
2014-08-01 09:20:42 +00:00
Erik Eckstein f16a808292 SLP Vectorizer: added statistics counter
llvm-svn: 214487
2014-08-01 08:14:28 +00:00
Erik Eckstein 4944b2ff94 SLP Vectorizer: improve canonicalize tree operands of commutitive binary operands.
This reverts r214338 (except the test file) and replaces it with a more general algorithm.

llvm-svn: 214485
2014-08-01 08:05:55 +00:00
Tyler Nowicki b5a65395cc Improve the remark generated for -Rpass-missed.
The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss.

Reviewed by Arnold Schwaighofer`

llvm-svn: 214445
2014-07-31 21:22:22 +00:00
Tyler Nowicki 9fe497fcac Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable.
Reviewed by Arnold Schwaighofer

llvm-svn: 214440
2014-07-31 21:02:40 +00:00
Chad Rosier 78f41b3ca7 SLP Vectorizer: Canonicalize tree operands of commutitive binary operands.
llvm-svn: 214338
2014-07-30 21:07:56 +00:00
Hal Finkel 9414665a3b Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

llvm-svn: 213864
2014-07-24 14:25:39 +00:00
Hal Finkel cc39b67530 AA metadata refactoring (introduce AAMDNodes)
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).

This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.

No functionality change intended.

llvm-svn: 213859
2014-07-24 12:16:19 +00:00
Mark Heffernan 9d20e42765 Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave.
llvm-svn: 213588
2014-07-21 23:11:03 +00:00
Duncan P. N. Exon Smith 6c99015fe2 Revert "[C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ranges."
This reverts commit r213474 (and r213475), which causes a miscompile on
a stage2 LTO build.  I'll reply on the list in a moment.

llvm-svn: 213562
2014-07-21 17:06:51 +00:00
Hal Finkel 07c9bb3d87 [LoopVectorize] Remove an unused private AA pointer
Thanks to the lld-x86_64-darwin13 builder for catching this first.

llvm-svn: 213488
2014-07-20 23:28:25 +00:00
Hal Finkel 7ae00a1282 [LoopVectorize] Use AA to partition potential dependency checks
Prior to this change, the loop vectorizer did not make use of the alias
analysis infrastructure. Instead, it performed memory dependence analysis using
ScalarEvolution-based linear dependence checks within equivalence classes
derived from the results of ValueTracking's GetUnderlyingObjects.

Unfortunately, this meant that:
  1. The loop vectorizer had logic that essentially duplicated that in BasicAA
     for aliasing based on identified objects.
  2. The loop vectorizer could not partition the space of dependency checks
     based on information only easily available from within AA (TBAA metadata is
     currently the prime example).

This means, for example, regardless of whether -fno-strict-aliasing was
provided, the vectorizer would only vectorize this loop with a runtime
memory-overlap check:

void foo(int *a, float *b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i];
}

This is suboptimal because the TBAA metadata already provides the information
necessary to show that this check unnecessary. Of course, the vectorizer has a
limit on the number of such checks it will insert, so in practice, ignoring
TBAA means not vectorizing more-complicated loops that we should.

This change causes the vectorizer to use an AliasSetTracker to keep track of
the pointers in the loop. The resulting alias sets are then used to partition
the space of dependency checks, and potential runtime checks; this results in
more-efficient vectorizations.

When pointer locations are added to the AliasSetTracker, two things are done:
  1. The location size is set to UnknownSize (otherwise you'd not catch
     inter-iteration dependencies)
  2. For instructions in blocks that would need to be predicated, TBAA is
     removed (because the metadata might have a control dependency on the condition
     being speculated).

For non-predicated blocks, you can leave the TBAA metadata. This is safe
because you can't have an iteration dependency on the TBAA metadata (if you
did, and you unrolled sufficiently, you'd end up with the same pointer value
used by two accesses that TBAA says should not alias, and that would yield
undefined behavior).

llvm-svn: 213486
2014-07-20 23:07:52 +00:00
Manuel Jacob d11beffef4 [C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ranges.
Summary: This patch introduces two new iterator ranges and updates existing code to use it.  No functional change intended.

Test Plan: All tests (make check-all) still pass.

Reviewers: dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4481

llvm-svn: 213474
2014-07-20 09:10:11 +00:00
Hal Finkel aac5fc9cf8 [LoopVectorize] Use CreateAligned(Load|Store)
IRBuilder has CreateAligned(Load|Store) functions; use them and we don't need
to make a second call to setAlignment.

No functionality change intended.

llvm-svn: 213453
2014-07-19 13:39:45 +00:00
Hal Finkel 4f7d55aac8 [LoopVectorize] Propagate known metadata to vectorized instructions
There are some kinds of metadata that are safe to propagate from the scalar
instructions to the vector instructions (fpmath and tbaa currently).

Regarding TBAA, one might worry about propagating it on if-converted loads and
stores, because the metadata might have had a control dependency on the
condition, and thus actually aliased with some other non-speculated memory
access when the condition was false. However, this would be caught by the
runtime overlap checks.

llvm-svn: 213452
2014-07-19 13:33:16 +00:00
Tyler Nowicki 641d8a06bd Emit warnings if vectorization is forced and fails.
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.

Reviewed by: Arnold Schwaighofer

llvm-svn: 213110
2014-07-16 00:36:00 +00:00
Alp Toker e69170a110 Revert "Introduce a string_ostream string builder facilty"
Temporarily back out commits r211749, r211752 and r211754.

llvm-svn: 211814
2014-06-26 22:52:05 +00:00
Alp Toker 614717388c Introduce a string_ostream string builder facilty
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.

small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.

This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.

The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.

llvm-svn: 211749
2014-06-26 00:00:48 +00:00
Tyler Nowicki 4b07b00786 Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call.
Reviewed by: Arnold Schwaighofer

llvm-svn: 211721
2014-06-25 17:50:15 +00:00
Eli Bendersky 5d5e18da3e Rename loop unrolling and loop vectorizer metadata to have a common prefix.
[LLVM part]

These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix.  Metadata name
changes:

llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*

This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata. 

Patch by Mark Heffernan.

llvm-svn: 211710
2014-06-25 15:41:00 +00:00
Arnold Schwaighofer c11107cb1e LoopVectorizer: Fix a dominance issue
The induction variables start value needs to be defined before we branch
(overflow check) to the scalar preheader where we used it.

llvm-svn: 211460
2014-06-22 03:38:59 +00:00
Karthik Bhat e03a25da70 Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer.
This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and
vectorizes them as vector shuffles if they are profitable.
These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86.
Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 

llvm-svn: 211339
2014-06-20 04:32:48 +00:00
Michael Zolotukhin 1c51612d42 [SLP] Enable vectorization of GEP expressions.
The use cases look like the following:
    x->a = y->a + 10
    x->b = y->b + 12

llvm-svn: 210342
2014-06-06 15:34:24 +00:00
Karthik Bhat bf56d44cab Fix PR19657 (scalar loads not combined into vector load)
If we have common uses on separate paths in the tree; process the one with greater common depth first.
This makes sure that we do not assume we need to extract a load when it is actually going to be part of a vectorized tree.

Review: http://reviews.llvm.org/D3800
llvm-svn: 210310
2014-06-06 06:20:08 +00:00
Karthik Bhat 5ab7795649 Allow vectorization of intrinsics such as powi,cttz and ctlz in Loop and SLP Vectorizer.
This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other
intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar.
Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857

llvm-svn: 209873
2014-05-30 04:31:24 +00:00
Arnold Schwaighofer e2067680a6 LoopVectorizer: Add a check that the backedge taken count + 1 does not overflow
The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count.
If this expression overflows the generated code was invalid.

In case of overflow the code now jumps to the scalar loop.

Fixes PR17288.

llvm-svn: 209854
2014-05-29 22:10:01 +00:00
Diego Novillo 7f8af8bf91 Add support for missed and analysis optimization remarks.
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.

-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.

-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.

The patch also:

1- Adds support in the inliner for the two new remarks and a
   test case.

2- Moves emitOptimizationRemark* functions to the llvm namespace.

3- Adds an LLVMContext argument instead of making them member functions
   of LLVMContext.

Reviewers: qcolombet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3682

llvm-svn: 209442
2014-05-22 14:19:46 +00:00
Eric Christopher 650c8f2a06 Clean up language and grammar.
Based on a patch by jfcaron3@gmail.com!
PR19806

llvm-svn: 209216
2014-05-20 17:11:11 +00:00
Zinovy Nis abdf44e7f3 [LV][REFACTOR] One more tiny fix for printing debug locations in loop vectorizer. Now consistent with the remarks emitter.
Differential Revision: http://reviews.llvm.org/D3821

llvm-svn: 209197
2014-05-20 08:26:20 +00:00
Benjamin Kramer 1fef64214c SLPVectorizer: Instead of just performing CSE on dead blocks ignore them completely.
Turns out that there is a very cheap way of testing whether a block is dead,
just look it up in the DomTree. We have to do this anyways so just ignore
unreachable blocks before sorting by domination. This restores a proper
ordering for std::stable_sort when dead code is present.

Covered by existing tests & buildbots running in STL debug mode (MSVC).

llvm-svn: 208492
2014-05-11 10:28:58 +00:00