Commit Graph

261 Commits

Author SHA1 Message Date
Mohammad Shahid dbd30edb7f [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh

Reviewed By: Ayal

Subscribers: mgrang, dcaballe, hans, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36130

llvm-svn: 320548
2017-12-13 03:08:29 +00:00
Dorit Nuzman eb13dd3eac [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a
single-iteration loop

This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that
Stride >= Trip-Count. Such a predicate will effectively optimize a single
or zero iteration loop, as Trip-Count <= Stride == 1.

Differential Revision: https://reviews.llvm.org/D38785

llvm-svn: 317438
2017-11-05 16:53:15 +00:00
Adam Nemet 0965da2055 Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*
Sync it up with the name of the class actually defined here.  This has been
bothering me for a while...

llvm-svn: 315249
2017-10-09 23:19:02 +00:00
Hans Wennborg 9a9048e19f Revert r314806 "[SLP] Vectorize jumbled memory loads."
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/

> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130

llvm-svn: 314824
2017-10-03 18:32:29 +00:00
Mohammad Shahid 1d5422f27f [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh

Reviewed By: Ayal

Subscribers: hans, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36130

llvm-svn: 314806
2017-10-03 15:28:48 +00:00
Hans Wennborg 57c3341ada Revert r313771 "[SLP] Vectorize jumbled memory loads."
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391

> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b

llvm-svn: 313781
2017-09-20 18:00:03 +00:00
Mohammad Shahid 2b281de576 [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Subscribers: mzolotukhin

Reviewed By: ayal

Differential Revision: https://reviews.llvm.org/D36130

Review comments updated accordingly

Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0

Added a TODO for sortLoadAccesses API

Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58

Modified the TODO for sortLoadAccesses API

Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565

Review comment update for using OpdNum to insert the mask in respective location

Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce

Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase

Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
2017-09-20 17:19:57 +00:00
Alexander Kornienko 6a140234ed Revert r313736: "[SLP] Vectorize jumbled memory loads."
The revision breaks buildbots:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/6694/steps/test/logs/stdio

llvm-svn: 313758
2017-09-20 14:53:07 +00:00
Alexander Kornienko 7302344bdf Revert r313753: "Fix a -Wsign-compare warning in LoopAccessAnalysis.cpp"
llvm-svn: 313757
2017-09-20 14:52:56 +00:00
Alexander Kornienko 6c629b5728 Fix a -Wsign-compare warning in LoopAccessAnalysis.cpp
llvm-svn: 313753
2017-09-20 12:18:22 +00:00
Mohammad Shahid f8db9bd857 [SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.

This patch fixes the mask representation by recording the 'use mask' in the usertree entry.

Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df

Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh

Reviewed By: Ayal

Subscribers: mzolotukhin

Differential Revision: https://reviews.llvm.org/D36130

Commit after rebase for patch D36130

Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736
2017-09-20 08:18:28 +00:00
Alon Kom 682cfc1d4c [LV] Fix maximum legal VF calculation
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.

Differential Revision: https://reviews.llvm.org/D37507

llvm-svn: 313237
2017-09-14 07:40:02 +00:00
Silviu Baranga ac920f7716 [LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs
Summary:
LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.

This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.

Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D17080

llvm-svn: 313012
2017-09-12 07:48:22 +00:00
James Molloy 37dd4d7aaa [LAA] Correctly return a half-open range in expandBounds
This is a latent bug that's been hanging around for a while. For a loop-invariant
pointer, expandBounds would return the range {Ptr, Ptr}, but this was interpreted
as a half-open range, not a closed range. So we ended up planting incorrect
bounds checks. Even worse, they were tautological, so we ended up incorrectly
executing the optimized loop.

llvm-svn: 299526
2017-04-05 09:24:26 +00:00
Michael Kuperstein 5fb39a7966 [SLP] Revert everything that has to do with memory access sorting.
This reverts r293386, r294027, r294029 and r296411.

Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.

Revert until we can fix this properly.

llvm-svn: 297493
2017-03-10 18:59:07 +00:00
Amjad Aboud 5448e989a5 [SLP] Fixed non-deterministic behavior in Loop Vectorizer.
Differential Revision: https://reviews.llvm.org/D30638

llvm-svn: 297257
2017-03-08 05:09:10 +00:00
Michael Kuperstein 768d013a03 [SLP] Revert r296863 due to miscompiles.
Details and reproducer are on the email thread for r296863.

llvm-svn: 297103
2017-03-06 23:54:51 +00:00
Mohammad Shahid bdac9f30c0 [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for
external use of vectorizable scalars based on shuffle mask.

Reviewers: mkuper

Differential Revision: https://reviews.llvm.org/D30159

Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
llvm-svn: 296863
2017-03-03 10:02:47 +00:00
Hans Wennborg cc4ff78c9d Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available"
It caused miscompiles, e.g. in Chromium (PR32109).

llvm-svn: 296654
2017-03-01 18:57:16 +00:00
Mohammad Shahid 175ffa8c35 [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.

Reviewers: mkuper

Differential Revision: https://reviews.llvm.org/D30159

Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d
llvm-svn: 296575
2017-03-01 03:51:54 +00:00
Michael Kuperstein c07cca85fb [SLP] Load sorting should not try to sort things that aren't loads.
We may get a VL where the first element is a load, but the others
aren't. Trying to sort such VLs can only lead to sorrow.

llvm-svn: 296411
2017-02-27 23:18:11 +00:00
Adam Nemet 41b019a39c [LAA] Remove unused LoopAccessReport
The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.

llvm-svn: 296017
2017-02-23 21:17:36 +00:00
Matthew Simpson a899f86054 [LAA] Remove unused code (NFC)
llvm-svn: 295493
2017-02-17 20:46:52 +00:00
Dorit Nuzman eac89d736c [LV/LoopAccess] Check statically if an unknown dependence distance can be
proven larger than the loop-count

This fixes PR31098: Try to resolve statically data-dependences whose
compile-time-unknown distance can be proven larger than the loop-count, 
instead of resorting to runtime dependence checking (which are not always 
possible).

For vectorization it is sufficient to prove that the dependence distance 
is >= VF; But in some cases we can prune unknown dependence distances early,
and even before selecting the VF, and without a runtime test, by comparing 
the distance against the loop iteration count. Since the vectorized code 
will be executed only if LoopCount >= VF, proving distance >= LoopCount 
also guarantees that distance >= VF. This check is also equivalent to the 
Strong SIV Test.

Reviewers: mkuper, anemet, sanjoy

Differential Revision: https://reviews.llvm.org/D28044

llvm-svn: 294892
2017-02-12 09:32:53 +00:00
Michael Kuperstein 2a735b71b6 [SLP] Make sortMemAccesses explicitly return an error. NFC.
llvm-svn: 294029
2017-02-03 19:32:50 +00:00
Michael Kuperstein 723999d4aa [SLP] Use SCEV to sort memory accesses.
This generalizes memory access sorting to use differences between SCEVs,
instead of relying on constant offsets. That allows us to properly do
SLP vectorization of non-sequentially ordered loads within loops bodies.

Differential Revision: https://reviews.llvm.org/D29425

llvm-svn: 294027
2017-02-03 19:09:45 +00:00
Mohammad Shahid 3121334d32 [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.
The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask.

Reviewers: hfinkel, mssimpso, mkuper

Differential Revision: https://reviews.llvm.org/D26905

Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad
llvm-svn: 293386
2017-01-28 17:59:44 +00:00
Chandler Carruth 3bab7e1a79 [PM] Separate the LoopAnalysisManager from the LoopPassManager and move
the latter to the Transforms library.

While the loop PM uses an analysis to form the IR units, the current
plan is to have the PM itself establish and enforce both loop simplified
form and LCSSA. This would be a layering violation in the analysis
library.

Fundamentally, the idea behind the loop PM is to *transform* loops in
addition to running passes over them, so it really seemed like the most
natural place to sink this was into the transforms library.

We can't just move *everything* because we also have loop analyses that
rely on a subset of the invariants. So this patch splits the the loop
infrastructure into the analysis management that has to be part of the
analysis library, and the transform-aware pass manager.

This also required splitting the loop analyses' printer passes out to
the transforms library, which makes sense to me as running these will
transform the code into LCSSA in theory.

I haven't split the unittest though because testing one component
without the other seems nearly intractable.

Differential Revision: https://reviews.llvm.org/D28452

llvm-svn: 291662
2017-01-11 09:43:56 +00:00
Chandler Carruth 410eaeb064 [PM] Rewrite the loop pass manager to use a worklist and augmented run
arguments much like the CGSCC pass manager.

This is a major redesign following the pattern establish for the CGSCC layer to
support updates to the set of loops during the traversal of the loop nest and
to support invalidation of analyses.

An additional significant burden in the loop PM is that so many passes require
access to a large number of function analyses. Manually ensuring these are
cached, available, and preserved has been a long-standing burden in LLVM even
with the help of the automatic scheduling in the old pass manager. And it made
the new pass manager extremely unweildy. With this design, we can package the
common analyses up while in a function pass and make them immediately available
to all the loop passes. While in some cases this is unnecessary, I think the
simplicity afforded is worth it.

This does not (yet) address loop simplified form or LCSSA form, but those are
the next things on my radar and I have a clear plan for them.

While the patch is very large, most of it is either mechanically updating loop
passes to the new API or the new testing for the loop PM. The code for it is
reasonably compact.

I have not yet updated all of the loop passes to correctly leverage the update
mechanisms demonstrated in the unittests. I'll do that in follow-up patches
along with improved FileCheck tests for those passes that ensure things work in
more realistic scenarios. In many cases, there isn't much we can do with these
until the loop simplified form and LCSSA form are in place.

Differential Revision: https://reviews.llvm.org/D28292

llvm-svn: 291651
2017-01-11 06:23:21 +00:00
Keno Fischer 92f377bd74 [LAA] Prevent invalid IR for loop-invariant bound in loop body
Summary:
If LAA expands a bound that is loop invariant, but not hoisted out
of the loop body, it used to use that value anyway, causing a
non-domination error, because the memcheck block is of course not
dominated by the scalar loop body. Detect this situation and expand
the SCEV expression instead.

Fixes PR31251

Reviewers: anemet
Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D27397

llvm-svn: 288705
2016-12-05 21:25:03 +00:00
Eugene Zelenko a3fe70d233 Fix some Clang-tidy and Include What You Use warnings; other minor fixes (NFC).
This preparation to remove SetVector.h dependency on SmallSet.h.

llvm-svn: 288256
2016-11-30 17:48:10 +00:00
Chandler Carruth dab4eae274 [PM] Change the static object whose address is used to uniquely identify
analyses to have a common type which is enforced rather than using
a char object and a `void *` type when used as an identifier.

This has a number of advantages. First, it at least helps some of the
confusion raised in Justin Lebar's code review of why `void *` was being
used everywhere by having a stronger type that connects to documentation
about this.

However, perhaps more importantly, it addresses a serious issue where
the alignment of these pointer-like identifiers was unknown. This made
it hard to use them in pointer-like data structures. We were already
dodging this in dangerous ways to create the "all analyses" entry. In
a subsequent patch I attempted to use these with TinyPtrVector and
things fell apart in a very bad way.

And it isn't just a compile time or type system issue. Worse than that,
the actual alignment of these pointer-like opaque identifiers wasn't
guaranteed to be a useful alignment as they were just characters.

This change introduces a type to use as the "key" object whose address
forms the opaque identifier. This both forces the objects to have proper
alignment, and provides type checking that we get it right everywhere.
It also makes the types somewhat less mysterious than `void *`.

We could go one step further and introduce a truly opaque pointer-like
type to return from the `ID()` static function rather than returning
`AnalysisKey *`, but that didn't seem to be a clear win so this is just
the initial change to get to a reliably typed and aligned object serving
is a key for all the analyses.

Thanks to Richard Smith and Justin Lebar for helping pick plausible
names and avoid making this refactoring many times. =] And thanks to
Sean for the super fast review!

While here, I've tried to move away from the "PassID" nomenclature
entirely as it wasn't really helping and is overloaded with old pass
manager constructs. Now we have IDs for analyses, and key objects whose
address can be used as IDs. Where possible and clear I've shortened this
to just "ID". In a few places I kept "AnalysisID" to make it clear what
was being identified.

Differential Revision: https://reviews.llvm.org/D27031

llvm-svn: 287783
2016-11-23 17:53:26 +00:00
Adam Nemet 877ccee8cc [LAA, LV] Port to new streaming interface for opt remarks. Update LV
(Recommit after making sure IsVerbose gets properly initialized in
DiagnosticInfoOptimizationBase.  See previous commit that takes care of
this.)

OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.

Then we need the magic to be able to turn an LAA remark into an LV
remark.  This is done via a new OptimizationRemark ctor.

llvm-svn: 282813
2016-09-30 00:01:30 +00:00
Adam Nemet 556a06b1ee Revert "[LAA, LV] Port to new streaming interface for opt remarks. Update LV"
This reverts commit r282758.

There are some clang failures I haven't seen.

llvm-svn: 282759
2016-09-29 20:17:37 +00:00
Adam Nemet c1d21817d1 [LAA, LV] Port to new streaming interface for opt remarks. Update LV
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.

Then we need the magic to be able to turn an LAA remark into an LV
remark.  This is done via a new OptimizationRemark ctor.

llvm-svn: 282758
2016-09-29 20:12:18 +00:00
Adam Nemet 69330e0bc5 [LAA] Rename emitAnalysis to recordAnalys. NFC
Ever since LAA was split out into an analysis on its own, this function
stopped emitting the report directly.  Instead it stores it to be
retrieved by the client which can then emit it as its own report
(e.g. -Rpass-analysis=loop-vectorize).

llvm-svn: 282561
2016-09-28 00:58:36 +00:00
Adam Nemet e3cef93727 [LV] When reporting about a specific instruction without debug location use loop's
This can occur for example if some optimization drops the debug location.

llvm-svn: 282048
2016-09-21 03:14:20 +00:00
Elena Demikhovsky 5f8cc0c346 [Loop Vectorizer] Consecutive memory access - fixed and simplified
Amended consecutive memory access detection in Loop Vectorizer.
Load/Store were not handled properly without preceding GEP instruction.

Differential Revision: https://reviews.llvm.org/D20789

llvm-svn: 281853
2016-09-18 13:56:08 +00:00
Chad Rosier 83a120337a Fix indent. NFC.
llvm-svn: 280270
2016-08-31 18:37:52 +00:00
Elena Demikhovsky 3622fbfc68 [Loop Vectorizer] Fixed memory confilict checks.
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".

Differential revision: https://reviews.llvm.org/D23176

llvm-svn: 279930
2016-08-28 08:53:53 +00:00
David Majnemer 2d006e7673 Use the range variant of transform instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278476
2016-08-12 04:32:42 +00:00
Sean Silva 0746f3bfa4 Consistently use LoopAnalysisManager
One exception here is LoopInfo which must forward-declare it (because
the typedef is in LoopPassManager.h which depends on LoopInfo).

Also, some includes for LoopPassManager.h were needed since that file
provides the typedef.

Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.

Thanks to David for the suggestion.

llvm-svn: 278079
2016-08-09 00:28:52 +00:00
Sean Silva 36e0d01e13 Consistently use FunctionAnalysisManager
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.

Thanks to David for the suggestion.

llvm-svn: 278077
2016-08-09 00:28:15 +00:00
Michael Kuperstein 3ceac2bbd5 [LV, X86] Be more optimistic about vectorizing shifts.
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.

Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.

Differential Revision: https://reviews.llvm.org/D23049

llvm-svn: 277782
2016-08-04 22:48:03 +00:00
Adam Nemet 5b3a5cf6b0 [OptDiag,LV] Add hotness attribute to analysis remarks
The earlier change added hotness attribute to missed-optimization
remarks.  This follows up with the analysis remarks (the ones explaining
the reason for the missed optimization).

llvm-svn: 276192
2016-07-20 21:44:26 +00:00
Adam Nemet 7da74abf3d [LAA] Don't hold on to DominatorTree in the analysis result
llvm-svn: 275335
2016-07-13 22:36:35 +00:00
Adam Nemet b49d9a56eb [LAA] Don't hold on to TargetLibraryInfo in the analysis result
llvm-svn: 275334
2016-07-13 22:36:27 +00:00
Adam Nemet 1824e411c6 [LAA] Don't hold on to DataLayout in the analysis result
In fact, don't even pass this to the ctor since we can get it from the
module.

llvm-svn: 275326
2016-07-13 22:18:51 +00:00
Adam Nemet 6616ad08f6 [LAA] Don't hold on to LoopInfo in the analysis result
llvm-svn: 275325
2016-07-13 22:18:48 +00:00
Adam Nemet 1556357677 [LAA] Don't hold on to AliasAnalysis in the analysis result
llvm-svn: 275322
2016-07-13 21:39:09 +00:00
David Majnemer 8b401013c1 [LoopAccessAnalysis] Some minor cleanups
Use range-base for loops.
Use auto when appropriate.

No functional change is intended.

llvm-svn: 275213
2016-07-12 20:31:46 +00:00
Xinliang David Li 07e08fa36b [PM] name the new PM LAA class LoopAccessAnalysis (LAA) /NFC
llvm-svn: 274934
2016-07-08 21:21:44 +00:00
Xinliang David Li 7853c1dd73 Rename LoopAccessAnalysis to LoopAccessLegacyAnalysis /NFC
llvm-svn: 274927
2016-07-08 20:55:26 +00:00
David Majnemer 7afb46d3c8 [LoopAccessAnalysis] Fix an integer overflow
We were inappropriately using 32-bit types to account for quantities
that can be far larger.

Fixed in PR28443.

llvm-svn: 274737
2016-07-07 06:24:36 +00:00
Sean Silva 284b0324e2 [PM] Avoid getResult on a higher level in LoopAccessAnalysis
Note that require<domtree> and require<loops> aren't needed because they
come in implicitly via the loop pass manager.

llvm-svn: 274712
2016-07-07 01:01:53 +00:00
Xinliang David Li 8a021317a2 [PM] Port LoopAccessInfo analysis to new PM
It is implemented as a LoopAnalysis pass as 
discussed and agreed upon.

llvm-svn: 274452
2016-07-02 21:18:40 +00:00
Xinliang David Li 94734eef33 [PM] refactor LoopAccessInfo code part-2
Differential Revision: http://reviews.llvm.org/D21636

llvm-svn: 274334
2016-07-01 05:59:55 +00:00
Adam Nemet f45594c912 [LAA] Fix alphabetical sorting of headers. NFC
llvm-svn: 274302
2016-07-01 00:09:02 +00:00
Elena Demikhovsky 5e21c94f25 Reverted patch 273864
llvm-svn: 274115
2016-06-29 10:01:06 +00:00
Elena Demikhovsky 4c58b2761a Fixed consecutive memory access detection in Loop Vectorizer.
It did not handle correctly cases without GEP.

The following loop wasn't vectorized:

for (int i=0; i<len; i++)

  *to++ = *from++;

I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.

Re-commit rL273257 - revision: http://reviews.llvm.org/D20789

llvm-svn: 273864
2016-06-27 11:19:23 +00:00
Xinliang David Li ce030acb4e [PM]: LoopAccessInfo simple refactoring
To make definition of mov ctors easier.
Differential Revision: http://reviews.llvm.org/D21563

llvm-svn: 273506
2016-06-22 23:20:59 +00:00
Elena Demikhovsky a266cf0518 reverted the prev commit due to assertion failure
llvm-svn: 273258
2016-06-21 12:10:11 +00:00
Elena Demikhovsky 9823c995bc Fixed consecutive memory access detection in Loop Vectorizer.
It did not handle correctly cases without GEP.

The following loop wasn't vectorized:

for (int i=0; i<len; i++)
  *to++ = *from++;

I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.

Differential revision: http://reviews.llvm.org/D20789

llvm-svn: 273257
2016-06-21 11:32:01 +00:00
Adam Nemet a9f09c6245 [LAA] Enable symbolic stride speculation for all LAA clients
This is a functional change for LLE and LDist.  The other clients (LV,
LVerLICM) already had this explicitly enabled.

The temporary boolean parameter to LAA is removed that allowed turning
off speculation of symbolic strides.  This makes LAA's caching interface
LAA::getInfo only take the loop as the parameter.  This makes the
interface more friendly to the new Pass Manager.

The flag -enable-mem-access-versioning is moved from LV to a LAA which
now allows turning off speculation globally.

llvm-svn: 273064
2016-06-17 22:35:41 +00:00
Adam Nemet c953bb9953 [LV] Move management of symbolic strides to LAA. NFCI
This is still NFCI, so the list of clients that allow symbolic stride
speculation does not change (yes: LV and LoopVersioningLICM, no: LLE,
LDist).  However since the symbolic strides are now managed by LAA
rather than passed by client a new bool parameter is used to enable
symbolic stride speculation.

The existing test Transforms/LoopVectorize/version-mem-access.ll checks
that stride speculation is performed for LV.

The previously added test Transforms/LoopLoadElim/symbolic-stride.ll
ensures that no speculation is performed for LLE.

The next patch will change the functionality and turn on symbolic stride
speculation in all of LAA's clients and remove the bool parameter.

llvm-svn: 272970
2016-06-16 22:57:55 +00:00
Adam Nemet 139ffba398 [LAA] Rename Strides to SymblicStrides in analyzeLoop. NFC
This is to facilitate to move of SymblicStrides from LV to LAA.

llvm-svn: 272879
2016-06-16 08:27:03 +00:00
Adam Nemet bdbc5227ce [LAA] Default getInfo to not speculate symbolic strides. NFC
Soon we won't be passing Strides to getInfo and then we'll have fewer
call sites to update.

llvm-svn: 272878
2016-06-16 08:26:56 +00:00
Xinliang David Li ecde1c7f3d Revert r272194 No need for it if loop Analysis Manager is used
llvm-svn: 272243
2016-06-09 03:22:39 +00:00
Xinliang David Li 572135f717 [PM] Refector LoopAccessInfo analysis code
This is the preparation patch to port the analysis to new PM

Differential Revision: http://reviews.llvm.org/D20560

llvm-svn: 272194
2016-06-08 20:15:37 +00:00
Andrey Turetskiy 9f02c58670 [LAA] Improve non-wrapping pointer detection by handling loop-invariant case.
This fixes PR26314. This patch adds new helper “isNoWrap” with detection of
loop-invariant pointer case.

Patch by Roman Shirokiy.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26314

Differential Revision: http://reviews.llvm.org/D17268

llvm-svn: 272014
2016-06-07 14:55:27 +00:00
Matthew Simpson e3e3b994ae [LAA] Use load and store vectors (NFC)
Contributed-by: Aditya Kumar <hiraditya@msn.com>
Differential Revision: http://reviews.llvm.org/D20953

llvm-svn: 271895
2016-06-06 14:15:41 +00:00
Matthew Simpson 6feebe9847 [LAA] Check independence of strided accesses before forward case
This patch changes the order in which we attempt to prove the independence of
strided accesses. We previously did this after we knew the dependence distance
was positive. With this change, we check for independence before handling the
negative distance case. The patch prevents LAA from reporting forward
dependences for independent strided accesses.

This change was requested in the review of D19984.

llvm-svn: 270072
2016-05-19 15:37:19 +00:00
Matthew Simpson 37ec5f914e [LAA] Rename forwarding conflict detection option (NFC)
This patch renames the option enabling the store-to-load forwarding conflict
detection optimization. This change was requested in the review of D20241.

llvm-svn: 269668
2016-05-16 17:00:56 +00:00
Adam Nemet 884d313b7f [LAA] Comment couldPreventStoreLoadForward. NFC
Also s/Cycles/Iters/ in NumCyclesForStoreLoadThroughMemory to make it
clear that this is not about clock cycles but loop cycles/iterations.

llvm-svn: 269667
2016-05-16 16:57:47 +00:00
Adam Nemet 9b5852aeb2 [LAA] clang-format the function couldPreventStoreLoadForward. NFC
llvm-svn: 269666
2016-05-16 16:57:42 +00:00
Matthew Simpson a250dc9f11 [LAA] Add option to disable conflict detection (NFC)
llvm-svn: 269654
2016-05-16 14:14:49 +00:00
Adam Nemet c62e554e9a [LAA] Include MaxSafeDepDistBytes in the analysis print-out
llvm-svn: 269508
2016-05-13 22:49:13 +00:00
Adam Nemet 4ad38b63d5 [LAA] Prepare the code to print more things in the summary. NFC
llvm-svn: 269507
2016-05-13 22:49:09 +00:00
Adam Nemet 2c34ab51a4 [LAA] Use std::min. NFC
llvm-svn: 269356
2016-05-12 21:41:53 +00:00
Silviu Baranga adf4b739ea [LAA] Use re-written SCEV expressions when computing distances
This removes a redundant stride versioning step (we already
do it in getPtrStride, so it has no effect) and uses PSE to
get the SCEV expressions for the source and destination
(this might have changed when getPtrStride was called).

I discovered this through code inspection, and couldn't
produce a regression test for it.

llvm-svn: 269052
2016-05-10 12:28:49 +00:00
Denis Zobnin 15d1e64b2b [LAA] Rename "isStridedPtr" with "getPtrStride". NFC.
Changing misleading function name was approved in http://reviews.llvm.org/D17268.
Patch by Roman Shirokiy.

llvm-svn: 269021
2016-05-10 05:55:16 +00:00
Adam Nemet 0a77dfad95 [LV] Hint at the new loop distribution pragma in optimization remark
When we encounter unsafe memory dependencies, loop distribution could
help.

Even though, the diagnostics is in LAA, it's only currently emitted in
the vectorizer.

llvm-svn: 268987
2016-05-09 23:03:44 +00:00
Adam Nemet 724ab22378 [LAA] Fix confusing debug message
This message used to be correct, when all we cared about was whether the
dependence was safe (i.e. NoDep) or unsafe.  With the current more
precise characterization, this is a forward dep.

llvm-svn: 268695
2016-05-05 23:41:28 +00:00
David Majnemer b4b27230bf [ValueTracking, VectorUtils] Refactor getIntrinsicIDForCall
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic.  If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.

Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.

llvm-svn: 266801
2016-04-19 19:10:21 +00:00
Silviu Baranga b77365b595 [SCEV][LAA] Add tests for SCEV expression transformations performed during LAA
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.

Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.

The additional checking also acts as white-box testing for the getAsAddRec method.

Reviewers: anemet, sanjoy

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18792

llvm-svn: 266334
2016-04-14 16:08:45 +00:00
Silviu Baranga 6f444dfd55 Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.

Original commit message:

Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265786
2016-04-08 14:29:09 +00:00
Silviu Baranga a393baf1fd Revert r265535 until we know how we can fix the bots
llvm-svn: 265541
2016-04-06 14:06:32 +00:00
Silviu Baranga 72b4a4a330 [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265535
2016-04-06 13:18:26 +00:00
Adam Nemet 59a6550425 [LAA] Formatting fix in previous change
llvm-svn: 264244
2016-03-24 05:15:24 +00:00
Adam Nemet 279784ffc4 [LAA] Support memchecks involving loop-invariant addresses
We used to only allow SCEVAddRecExpr for pointer expressions in order to
be able to compute the bounds.  However this is also trivially possible
for loop-invariant addresses (scUnknown) since then the bounds are the
address itself.

Interestingly, we used allow this for the special case when the
loop-invariant address happens to also be an SCEVAddRecExpr (in an outer
loop).

There are a couple more loops that are vectorized in SPEC after this.
My guess is that the main reason we don't see more because for example a
loop-invariant load is vectorized into a splat vector with several
vector-inserts.  This is likely to make the vectorization unprofitable.
I.e. we don't notice that a later LICM will move all of this out of the
loop so the cost estimate should really be 0.

llvm-svn: 264243
2016-03-24 04:28:47 +00:00
Silviu Baranga d68ed85401 [SCEV] Change the SCEV Predicates interfaces for conversion to AddRecExpr to return SCEVAddRecExpr* instead of SCEV*
Summary:
This changes the conversion functions from SCEV * to SCEVAddRecExpr from
ScalarEvolution and PredicatedScalarEvolution to return a SCEVAddRecExpr*
instead of a SCEV* (which removes the need of most clients to do a
dyn_cast right after calling these functions).

We also don't add new predicates if the transformation was not successful.

This is not entirely a NFC (as it can theoretically remove some predicates
from LAA when we have an unknown dependece), but I couldn't find an obvious
regression test for it.

Reviewers: sanjoy

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18368

llvm-svn: 264161
2016-03-23 15:29:30 +00:00
Adam Nemet b8486e5a32 [LAA] Add missing debug output
llvm-svn: 262279
2016-03-01 00:50:08 +00:00
Richard Trieu 7a08381403 Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Silviu Baranga ea63a7f512 [SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.

Original commit message:

[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection

Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260112
2016-02-08 17:02:45 +00:00
Silviu Baranga 41b4973329 Revert r260086 and r260085. They have broken the memory
sanitizer bots.

llvm-svn: 260087
2016-02-08 11:56:15 +00:00
Silviu Baranga a35fadc7c4 [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260085
2016-02-08 10:45:50 +00:00
Haicheng Wu f1c00a22be [LIR] Add support for structs and hand unrolled loops
This is a recommit of r258620 which causes PR26293.

The original message:

Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258777
2016-01-26 02:27:47 +00:00
Quentin Colombet a392810bea Speculatively revert r258620 as it is the likely culprid of PR26293.
llvm-svn: 258703
2016-01-25 19:12:49 +00:00
Haicheng Wu dd5e9d2159 [LIR] Add support for structs and hand unrolled loops
Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258620
2016-01-23 06:52:41 +00:00
Adam Nemet d8968f0945 [LAA] Include function name in debug output
llvm-svn: 258088
2016-01-18 21:16:33 +00:00