Commit Graph

198 Commits

Author SHA1 Message Date
Adam Nemet a9f09c6245 [LAA] Enable symbolic stride speculation for all LAA clients
This is a functional change for LLE and LDist.  The other clients (LV,
LVerLICM) already had this explicitly enabled.

The temporary boolean parameter to LAA is removed that allowed turning
off speculation of symbolic strides.  This makes LAA's caching interface
LAA::getInfo only take the loop as the parameter.  This makes the
interface more friendly to the new Pass Manager.

The flag -enable-mem-access-versioning is moved from LV to a LAA which
now allows turning off speculation globally.

llvm-svn: 273064
2016-06-17 22:35:41 +00:00
Adam Nemet c953bb9953 [LV] Move management of symbolic strides to LAA. NFCI
This is still NFCI, so the list of clients that allow symbolic stride
speculation does not change (yes: LV and LoopVersioningLICM, no: LLE,
LDist).  However since the symbolic strides are now managed by LAA
rather than passed by client a new bool parameter is used to enable
symbolic stride speculation.

The existing test Transforms/LoopVectorize/version-mem-access.ll checks
that stride speculation is performed for LV.

The previously added test Transforms/LoopLoadElim/symbolic-stride.ll
ensures that no speculation is performed for LLE.

The next patch will change the functionality and turn on symbolic stride
speculation in all of LAA's clients and remove the bool parameter.

llvm-svn: 272970
2016-06-16 22:57:55 +00:00
Adam Nemet 139ffba398 [LAA] Rename Strides to SymblicStrides in analyzeLoop. NFC
This is to facilitate to move of SymblicStrides from LV to LAA.

llvm-svn: 272879
2016-06-16 08:27:03 +00:00
Adam Nemet bdbc5227ce [LAA] Default getInfo to not speculate symbolic strides. NFC
Soon we won't be passing Strides to getInfo and then we'll have fewer
call sites to update.

llvm-svn: 272878
2016-06-16 08:26:56 +00:00
Xinliang David Li ecde1c7f3d Revert r272194 No need for it if loop Analysis Manager is used
llvm-svn: 272243
2016-06-09 03:22:39 +00:00
Xinliang David Li 572135f717 [PM] Refector LoopAccessInfo analysis code
This is the preparation patch to port the analysis to new PM

Differential Revision: http://reviews.llvm.org/D20560

llvm-svn: 272194
2016-06-08 20:15:37 +00:00
Andrey Turetskiy 9f02c58670 [LAA] Improve non-wrapping pointer detection by handling loop-invariant case.
This fixes PR26314. This patch adds new helper “isNoWrap” with detection of
loop-invariant pointer case.

Patch by Roman Shirokiy.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26314

Differential Revision: http://reviews.llvm.org/D17268

llvm-svn: 272014
2016-06-07 14:55:27 +00:00
Matthew Simpson e3e3b994ae [LAA] Use load and store vectors (NFC)
Contributed-by: Aditya Kumar <hiraditya@msn.com>
Differential Revision: http://reviews.llvm.org/D20953

llvm-svn: 271895
2016-06-06 14:15:41 +00:00
Matthew Simpson 6feebe9847 [LAA] Check independence of strided accesses before forward case
This patch changes the order in which we attempt to prove the independence of
strided accesses. We previously did this after we knew the dependence distance
was positive. With this change, we check for independence before handling the
negative distance case. The patch prevents LAA from reporting forward
dependences for independent strided accesses.

This change was requested in the review of D19984.

llvm-svn: 270072
2016-05-19 15:37:19 +00:00
Matthew Simpson 37ec5f914e [LAA] Rename forwarding conflict detection option (NFC)
This patch renames the option enabling the store-to-load forwarding conflict
detection optimization. This change was requested in the review of D20241.

llvm-svn: 269668
2016-05-16 17:00:56 +00:00
Adam Nemet 884d313b7f [LAA] Comment couldPreventStoreLoadForward. NFC
Also s/Cycles/Iters/ in NumCyclesForStoreLoadThroughMemory to make it
clear that this is not about clock cycles but loop cycles/iterations.

llvm-svn: 269667
2016-05-16 16:57:47 +00:00
Adam Nemet 9b5852aeb2 [LAA] clang-format the function couldPreventStoreLoadForward. NFC
llvm-svn: 269666
2016-05-16 16:57:42 +00:00
Matthew Simpson a250dc9f11 [LAA] Add option to disable conflict detection (NFC)
llvm-svn: 269654
2016-05-16 14:14:49 +00:00
Adam Nemet c62e554e9a [LAA] Include MaxSafeDepDistBytes in the analysis print-out
llvm-svn: 269508
2016-05-13 22:49:13 +00:00
Adam Nemet 4ad38b63d5 [LAA] Prepare the code to print more things in the summary. NFC
llvm-svn: 269507
2016-05-13 22:49:09 +00:00
Adam Nemet 2c34ab51a4 [LAA] Use std::min. NFC
llvm-svn: 269356
2016-05-12 21:41:53 +00:00
Silviu Baranga adf4b739ea [LAA] Use re-written SCEV expressions when computing distances
This removes a redundant stride versioning step (we already
do it in getPtrStride, so it has no effect) and uses PSE to
get the SCEV expressions for the source and destination
(this might have changed when getPtrStride was called).

I discovered this through code inspection, and couldn't
produce a regression test for it.

llvm-svn: 269052
2016-05-10 12:28:49 +00:00
Denis Zobnin 15d1e64b2b [LAA] Rename "isStridedPtr" with "getPtrStride". NFC.
Changing misleading function name was approved in http://reviews.llvm.org/D17268.
Patch by Roman Shirokiy.

llvm-svn: 269021
2016-05-10 05:55:16 +00:00
Adam Nemet 0a77dfad95 [LV] Hint at the new loop distribution pragma in optimization remark
When we encounter unsafe memory dependencies, loop distribution could
help.

Even though, the diagnostics is in LAA, it's only currently emitted in
the vectorizer.

llvm-svn: 268987
2016-05-09 23:03:44 +00:00
Adam Nemet 724ab22378 [LAA] Fix confusing debug message
This message used to be correct, when all we cared about was whether the
dependence was safe (i.e. NoDep) or unsafe.  With the current more
precise characterization, this is a forward dep.

llvm-svn: 268695
2016-05-05 23:41:28 +00:00
David Majnemer b4b27230bf [ValueTracking, VectorUtils] Refactor getIntrinsicIDForCall
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic.  If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.

Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.

llvm-svn: 266801
2016-04-19 19:10:21 +00:00
Silviu Baranga b77365b595 [SCEV][LAA] Add tests for SCEV expression transformations performed during LAA
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.

Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.

The additional checking also acts as white-box testing for the getAsAddRec method.

Reviewers: anemet, sanjoy

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18792

llvm-svn: 266334
2016-04-14 16:08:45 +00:00
Silviu Baranga 6f444dfd55 Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.

Original commit message:

Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265786
2016-04-08 14:29:09 +00:00
Silviu Baranga a393baf1fd Revert r265535 until we know how we can fix the bots
llvm-svn: 265541
2016-04-06 14:06:32 +00:00
Silviu Baranga 72b4a4a330 [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

llvm-svn: 265535
2016-04-06 13:18:26 +00:00
Adam Nemet 59a6550425 [LAA] Formatting fix in previous change
llvm-svn: 264244
2016-03-24 05:15:24 +00:00
Adam Nemet 279784ffc4 [LAA] Support memchecks involving loop-invariant addresses
We used to only allow SCEVAddRecExpr for pointer expressions in order to
be able to compute the bounds.  However this is also trivially possible
for loop-invariant addresses (scUnknown) since then the bounds are the
address itself.

Interestingly, we used allow this for the special case when the
loop-invariant address happens to also be an SCEVAddRecExpr (in an outer
loop).

There are a couple more loops that are vectorized in SPEC after this.
My guess is that the main reason we don't see more because for example a
loop-invariant load is vectorized into a splat vector with several
vector-inserts.  This is likely to make the vectorization unprofitable.
I.e. we don't notice that a later LICM will move all of this out of the
loop so the cost estimate should really be 0.

llvm-svn: 264243
2016-03-24 04:28:47 +00:00
Silviu Baranga d68ed85401 [SCEV] Change the SCEV Predicates interfaces for conversion to AddRecExpr to return SCEVAddRecExpr* instead of SCEV*
Summary:
This changes the conversion functions from SCEV * to SCEVAddRecExpr from
ScalarEvolution and PredicatedScalarEvolution to return a SCEVAddRecExpr*
instead of a SCEV* (which removes the need of most clients to do a
dyn_cast right after calling these functions).

We also don't add new predicates if the transformation was not successful.

This is not entirely a NFC (as it can theoretically remove some predicates
from LAA when we have an unknown dependece), but I couldn't find an obvious
regression test for it.

Reviewers: sanjoy

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18368

llvm-svn: 264161
2016-03-23 15:29:30 +00:00
Adam Nemet b8486e5a32 [LAA] Add missing debug output
llvm-svn: 262279
2016-03-01 00:50:08 +00:00
Richard Trieu 7a08381403 Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Silviu Baranga ea63a7f512 [SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.

Original commit message:

[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection

Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260112
2016-02-08 17:02:45 +00:00
Silviu Baranga 41b4973329 Revert r260086 and r260085. They have broken the memory
sanitizer bots.

llvm-svn: 260087
2016-02-08 11:56:15 +00:00
Silviu Baranga a35fadc7c4 [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
  - support for runtime checking
  - support for expression rewriting:
      (sext ({x,+,y}) -> {sext(x),+,sext(y)}
      (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Reviewers: mzolotukhin, sanjoy, anemet

Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel

Differential Revision: http://reviews.llvm.org/D15412

llvm-svn: 260085
2016-02-08 10:45:50 +00:00
Haicheng Wu f1c00a22be [LIR] Add support for structs and hand unrolled loops
This is a recommit of r258620 which causes PR26293.

The original message:

Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258777
2016-01-26 02:27:47 +00:00
Quentin Colombet a392810bea Speculatively revert r258620 as it is the likely culprid of PR26293.
llvm-svn: 258703
2016-01-25 19:12:49 +00:00
Haicheng Wu dd5e9d2159 [LIR] Add support for structs and hand unrolled loops
Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258620
2016-01-23 06:52:41 +00:00
Adam Nemet d8968f0945 [LAA] Include function name in debug output
llvm-svn: 258088
2016-01-18 21:16:33 +00:00
Kyle Butt a02ce98bd4 [Vectorization] Actually return from error case in isStridedPtr
The early return seems to be missed. This causes a radical and wrong loop
optimization on powerpc. It isn't reproducible on x86_64, because
"UseInterleaved" is false.

Patch by Tim Shen.

llvm-svn: 257134
2016-01-08 01:55:13 +00:00
Sanjoy Das 0de2feceb1 [SCEV] Add and use SCEVConstant::getAPInt; NFCI
llvm-svn: 255921
2015-12-17 20:28:46 +00:00
Silviu Baranga 9cd9a7e310 Re-commit r255115, with the PredicatedScalarEvolution class moved to
ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform
and Analysis modules:

[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions

Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255122
2015-12-09 16:06:28 +00:00
Silviu Baranga ad1ccb357b Revert r255115 until we figure out how to fix the bot failures.
llvm-svn: 255117
2015-12-09 15:25:28 +00:00
Silviu Baranga 41eb682501 [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions
Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255115
2015-12-09 15:03:52 +00:00
Sanjay Patel e4b9f507cf fix 'the the '; NFC
llvm-svn: 254928
2015-12-07 19:21:39 +00:00
Mehdi Amini afd135197b Fix LoopAccessAnalysis when potentially nullptr check are involved
Summary:
GetUnderlyingObjects() can return "null" among its list of objects,
we don't want to deduce that two pointers can point to the same
memory in this case, so filter it out.

Reviewers: anemet

Subscribers: dexonsmith, llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 252149
2015-11-05 05:49:43 +00:00
Adam Nemet 397f5829c7 [LAA] LLE 5/6: Add predicate functions Dependence::isForward/isBackward, NFC
Summary: Will be used by the LoopLoadElimination pass.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13258

llvm-svn: 252016
2015-11-03 23:50:03 +00:00
Adam Nemet a2df750fb3 [LAA] LLE 3/6: Rename InterestingDependence to Dependences, NFC
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.

Reviewers: hfinkel

Subscribers: rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13256

llvm-svn: 251985
2015-11-03 21:39:52 +00:00
Adam Nemet d7037c56d3 [LAA] LLE 2/6: Fix a NoDep case that should be a Forward dependence
Summary:
When the dependence distance in zero then we have a loop-independent
dependence from the earlier to the later access.

No current client of LAA uses forward dependences so other than
potentially hitting the MaxDependences threshold earlier, this change
shouldn't affect anything right now.

This and the previous patch were tested together for compile-time
regression.  None found in LNT/SPEC.

Reviewers: hfinkel

Subscribers: rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13255

llvm-svn: 251973
2015-11-03 20:13:43 +00:00
Adam Nemet b45516e875 [LAA] LLE 1/6: Expose Forward dependences
Summary:
Before this change, we didn't use to collect forward dependences since
none of the current clients (LV, LDist) required them.

The motivation to also collect forward dependences is a new pass
LoopLoadElimination (LLE) which discovers store-to-load forwarding
opportunities across the loop's backedge.  The pass uses both lexically
forward or backward loop-carried dependences to detect these
opportunities.

The new pass also analyzes loop-independent (forward) dependences since
they can conflict with the loop-carried dependences in terms of how the
data flows through memory.

The newly added test only covers loop-carried forward dependences
because loop-independent ones are currently categorized as NoDep.  The
next patch will fix this.

The two patches were tested together for compile-time regression.  None
found in LNT/SPEC.

Note that with this change LAA provides all dependences rather than just
"interesting" ones.  A subsequent NFC patch will remove the now trivial
isInterestingDependence and rename the APIs.

Reviewers: hfinkel

Subscribers: jmolloy, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13254

llvm-svn: 251972
2015-11-03 20:13:23 +00:00
Silviu Baranga e3c0534b11 [SCEV][LV] Add SCEV Predicates and use them to re-implement stride versioning
Summary:
SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.

ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate
method.

We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions

We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.

We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.

Reviewers: mzolotukhin, anemet, hfinkel, sanjoy

Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13595

llvm-svn: 251800
2015-11-02 14:41:02 +00:00
Benjamin Kramer 039b10423a Put global classes into the appropriate namespace.
Most of the cases belong into an anonymous namespace. No
functionality change intended.

llvm-svn: 251515
2015-10-28 13:54:36 +00:00
Duncan P. N. Exon Smith 5a82c916b0 Analysis: Remove implicit ilist iterator conversions
Remove implicit ilist iterator conversions from LLVMAnalysis.

I came across something really scary in `llvm::isKnownNotFullPoison()`
which relied on `Instruction::getNextNode()` being completely broken
(not surprising, but scary nevertheless).  This function is documented
(and coded to) return `nullptr` when it gets to the sentinel, but with
an `ilist_half_node` as a sentinel, the sentinel check looks into some
other memory and we don't recognize we've hit the end.

Rooting out these scary cases is the reason I'm removing the implicit
conversions before doing anything else with `ilist`; I'm not at all
surprised that clients rely on badness.

I found another scary case -- this time, not relying on badness, just
bad (but I guess getting lucky so far) -- in
`ObjectSizeOffsetEvaluator::compute_()`.  Here, we save out the
insertion point, do some things, and then restore it.  Previously, we
let the iterator auto-convert to `Instruction*`, and then set it back
using the `Instruction*` version:

    Instruction *PrevInsertPoint = Builder.GetInsertPoint();

    /* Logic that may change insert point */

    if (PrevInsertPoint)
      Builder.SetInsertPoint(PrevInsertPoint);

The check for `PrevInsertPoint` doesn't protect correctly against bad
accesses.  If the insertion point has been set to the end of a basic
block (i.e., `SetInsertPoint(SomeBB)`), then `GetInsertPoint()` returns
an iterator pointing at the list sentinel.  The version of
`SetInsertPoint()` that's getting called will then call
`PrevInsertPoint->getParent()`, which explodes horribly.  The only
reason this hasn't blown up is that it's fairly unlikely the builder is
adding to the end of the block; usually, we're adding instructions
somewhere before the terminator.

llvm-svn: 249925
2015-10-10 00:53:03 +00:00
Chandler Carruth 7b560d40bd [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Adam Nemet 4e533ef7a9 [LAA] Hold bounds via ValueHandles during SCEV expansion
SCEV expansion can invalidate previously expanded values.  For example
in SCEVExpander::ReuseOrCreateCast, if we already have the requested
cast value but it's not at the desired location, a new cast is inserted
and the old cast will be invalidated.

Therefore, when expanding the bounds for the pointers, a later entry can
invalidate the IR value for an earlier one.  The fix is to store a value
handle rather than the value itself.

The newly added test has a more detailed description of how the bug
triggers.

This bug can have a negative but potentially highly variable performance
impact in Loop Distribution.  Because one of the bound values was
invalidated and is an undef expression now, InstCombine is free to
transform the array overlap check:

   Start0 <= End1 && Start1 <= End0

into:

   Start0 <= End1

So depending on the runtime location of the arrays, we would detect a
conflict and fall back on the original loop of the versioned loop.

Also tested compile time with SPEC2006 LTO bc files.

llvm-svn: 245760
2015-08-21 23:19:57 +00:00
Adam Nemet cdb791cd33 [LAA] Comment how memchecks are codegened
llvm-svn: 245465
2015-08-19 17:24:36 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Adam Nemet 5b0a479541 [LAA] Change name from addRuntimeCheck to addRuntimeChecks, NFC
This was requested by Hal in D11205.

llvm-svn: 244540
2015-08-11 00:09:37 +00:00
Adam Nemet 651a5a2401 [LAA] Remove unused pointer partition argument from needsChecking(), NFC
This is no longer used in any of the callers.  Also remove the logic of
handling this argument.

llvm-svn: 244421
2015-08-09 20:06:08 +00:00
Adam Nemet 385308877c [LAA] Remove unused pointer partition argument from generateChecks, NFC
LoopDistribution does its own filtering now.

llvm-svn: 244420
2015-08-09 20:06:06 +00:00
Adam Nemet 155e8741f3 [LAA] Remove unused pointer partition argument from getNumberOfChecks, NFC
This is unused after filtering checks was moved to the clients.

As a result, we can just return the number of the checks in the
precomputed set.

llvm-svn: 244369
2015-08-07 22:44:21 +00:00
Adam Nemet 15840393f3 [LAA] Make the set of runtime checks part of the state of LAA, NFC
This is the full set of checks that clients can further filter. IOW,
it's client-agnostic.  This makes LAA complete in the sense that it now
provides the two main results of its analysis precomputed:

1. memory dependences via getDepChecker().getInsterestingDependences()
2. run-time checks via getRuntimePointerCheck().getChecks()

However, as a consequence we now compute this information pro-actively.
Thus if the client decides to skip the loop based on the dependences
we've computed the checks unnecessarily.  In order to see whether this
was a significant overhead I checked compile time on SPEC2k6 LTO bitcode
files.  The change was in the noise.

The checks are generated in canCheckPtrAtRT, at the same place where we
used to call groupChecks to merge checks.

llvm-svn: 244368
2015-08-07 22:44:15 +00:00
Adam Nemet 3a91e94734 [LAA] Remove unused pointer partition argument from print(), NFC
This is now handled in the client.  No need for LAA to provide this
variant.

llvm-svn: 244349
2015-08-07 19:44:48 +00:00
Adam Nemet 8701118792 [LAA] Remove unused pointer partition argument from addRuntimeCheck, NFC
This variant of addRuntimeCheck is only used now from the LoopVectorizer
which does not use this parameter.

llvm-svn: 243955
2015-08-04 05:16:20 +00:00
Adam Nemet 53e30aec46 [LAA] Remove unused needsAnyChecking(), NFC
llvm-svn: 243921
2015-08-03 23:33:03 +00:00
Craig Topper e3dcce9700 De-constify pointers to Type since they can't be modified. NFC
This was already done in most places a while ago. This just fixes the ones that crept in over time.

llvm-svn: 243842
2015-08-01 22:20:21 +00:00
Silviu Baranga 4825060059 [LAA] Add clarifying comments for the checking pointer grouping algorithm. NFC
llvm-svn: 243416
2015-07-28 13:44:08 +00:00
Adam Nemet 54f0b83ee2 [LAA] Split out a helper to print a collection of memchecks
This is effectively an NFC but we can no longer print the index of the
pointer group so instead I print its address.  This still lets us
cross-check the section that list the checks against the section that
list the groups (see how I modified the test).

E.g. before we printed this:

    Run-time memory checks:
    Check 0:
      Comparing group 0:
        %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
        %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
      Against group 1:
        %arrayidxA = getelementptr i16, i16* %a, i64 %ind
        %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
    ...
    Grouped accesses:
      Group 0:
        (Low: %c High: (78 + %c))
          Member: {%c,+,4}<%for.body>
          Member: {(2 + %c),+,4}<%for.body>

Now we print this (changes are underlined):

    Run-time memory checks:
    Check 0:
      Comparing group (0x7f9c6040c320):
                       ~~~~~~~~~~~~~~
        %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
        %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
      Against group (0x7f9c6040c358):
                     ~~~~~~~~~~~~~~
        %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
        %arrayidxA = getelementptr i16, i16* %a, i64 %ind
    ...
    Grouped accesses:
      Group 0x7f9c6040c320:
            ~~~~~~~~~~~~~~
        (Low: %c High: (78 + %c))
          Member: {(2 + %c),+,4}<%for.body>
          Member: {%c,+,4}<%for.body>

llvm-svn: 243354
2015-07-27 23:54:41 +00:00
Adam Nemet 7c52e0527d [LAA] Upper-case variable names, NFC
llvm-svn: 243313
2015-07-27 19:38:50 +00:00
Adam Nemet bbe1f1de16 [LAA] Split out a helper from addRuntimeCheck to generate the check, NFC
llvm-svn: 243312
2015-07-27 19:38:48 +00:00
NAKAMURA Takumi 94abbbd6ab LoopAccessAnalysis.cpp: Tweak r243239 to avoid side effects. It caused different emissions between gcc and clang.
llvm-svn: 243258
2015-07-27 01:35:30 +00:00
Adam Nemet 1da7df3700 [LAA] Begin moving the logic of generating checks out of addRuntimeCheck
Summary:
The goal is to start moving us closer to the model where
RuntimePointerChecking will compute and store the checks.  Then a client
can filter the check according to its requirements and then use the
filtered list of checks with addRuntimeCheck.

Before the patch, this is all done in addRuntimeCheck.  So the patch
starts to split up addRuntimeCheck while providing the old API under
what's more or less a wrapper now.

The new underlying addRuntimeCheck takes a collection of checks now,
expands the code for the bounds then generates the code for the checks.

I am not completely happy with making expandBounds static because now it
needs so many explicit arguments but I don't want to make the type
PointerBounds part of LAI.  This should get fixed when addRuntimeCheck
is moved to LoopVersioning where it really belongs, IMO.

Audited the assembly diff of the testsuite (including externals).  There
is a tiny bit of assembly churn that is due to the different order the
code for the bounds is expanded now
(MultiSource/Benchmarks/Prolangs-C/bison/conflicts.s and with LoopDist
on 456.hmmer/fast_algorithms.s).

Reviewers: hfinkel

Subscribers: klimek, llvm-commits

Differential Revision: http://reviews.llvm.org/D11205

llvm-svn: 243239
2015-07-26 05:32:14 +00:00
Silviu Baranga 0e5804a6af Fix memcheck interval ends for pointers with negative strides
Summary:
The checking pointer grouping algorithm assumes that the
starts/ends of the pointers are well formed (start <= end).

The runtime memory checking algorithm also assumes this by doing:

 start0 < end1 && start1 < end0

to detect conflicts. This check only works if start0 <= end0 and
start1 <= end1.

This change correctly orders the interval ends by either checking
the stride (if it is constant) or by using min/max SCEV expressions.

Reviewers: anemet, rengolin

Subscribers: rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11149

llvm-svn: 242400
2015-07-16 14:02:58 +00:00
Adam Nemet 041e6deb2c [LAA] Split out a helper to check the pointer partitions, NFC
This is made a static public member function to allow the transition of
this logic from LAA to LoopDistribution.  (Technically, it could be an
implementation-local static function but then it would not be accessible
from LoopDistribution.)

llvm-svn: 242376
2015-07-16 02:48:05 +00:00
Adam Nemet 9f7dedc376 [LAA] Introduce RuntimePointerChecking::PointerInfo, NFC
Turn this structure-of-arrays (i.e. the various pointer attributes) into
array-of-structures.

llvm-svn: 242219
2015-07-14 22:32:50 +00:00
Adam Nemet 7cdebac0c8 [LAA] Lift RuntimePointerCheck out of LoopAccessInfo, NFC
I am planning to add more nested classes inside RuntimePointerCheck so
all these triple-nesting would be hard to follow.

Also rename it to RuntimePointerChecking (i.e. append 'ing').

llvm-svn: 242218
2015-07-14 22:32:44 +00:00
Silviu Baranga a647c30f88 Cleanup after r241809 - remove uncessary call to std::sort
Summary:
The iteration order within a member of DepCands is deterministic
and therefore we don't have to sort the accesses within a member.
We also don't have to copy the indices of the pointers into a
vector, since we can iterate over the members of the class.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11145

llvm-svn: 242033
2015-07-13 14:48:24 +00:00
Adam Nemet 0f67c6c1d5 [LAA] Fix grammar in debug output
llvm-svn: 241867
2015-07-09 22:17:41 +00:00
Adam Nemet ee61474a61 [LAA] Hide NeedRTCheck logic completely inside canCheckPtrAtRT, NFC
Currently canCheckPtrAtRT returns two flags NeedRTCheck and CanDoRT.
NeedRTCheck says whether we need checks and CanDoRT whether we can
generate the checks.  The idea is to encode three states with these:

     Need/Can:
(1) false/dont-care: no checks are needed
(2) true/false: we need checks but can't generate them
(3) true/true: we need checks and we can generate them

This is pretty unnecessary since the caller (analyzeLoop) is only
interested in whether we can generate the checks if we actually need
them (i.e. 1 or 3).

So this change cleans up to return just that (CanDoRTIfNeeded) and pulls
all the underlying logic into canCheckPtrAtRT.

By doing all this, we simplify analyzeLoop which is the complex function
in LAA.

There is further room for improvement here by using RtCheck.Need
directly rather than a new local variable NeedRTCheck but that's for a
later patch.

llvm-svn: 241866
2015-07-09 22:17:38 +00:00
Silviu Baranga ce3877fc8c Don't rely on the DepCands iteration order when constructing checking pointer groups
Summary:
The checking pointer group construction algorithm relied on the iteration on DepCands.
We would need the same leaders across runs and the same iteration order over the underlying std::set for determinism.

This changes the algorithm to process the pointers in the order in which they were added to the runtime check, which is deterministic.
We need to update the tests, since the order in which pointers appear has changed.

No new tests were added, since it is impossible to test for non-determinism.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11064

llvm-svn: 241809
2015-07-09 15:18:25 +00:00
Adam Nemet b41d2d3fa3 [LAA] Fix line break in comment
llvm-svn: 241785
2015-07-09 06:47:21 +00:00
Adam Nemet 5dc3b2cf53 [LAA] Rename IsRTNeeded to IsRTCheckAnalysisNeeded
The original name was too close to NeedRTCheck which is what the actual
memcheck analysis returns.  This flag, as the new name suggests, is only
used to whether to initiate that analysis.

Also a comment is added to answer one question I had about this code for
a long time.  Namely, how does this flag differ from
isDependencyCheckNeeded since they are seemingly set at the same time.

llvm-svn: 241784
2015-07-09 06:47:18 +00:00
Adam Nemet 943befedf1 [LAA] Fix misleading use of word 'consecutive'
Fix some places where the word consecutive is used but the code really
means constant-stride (i.e. not just unit stride).

llvm-svn: 241763
2015-07-09 00:03:22 +00:00
Adam Nemet 424edc6c80 [LAA] Revert a small part of r239295
This commit ([LAA] Fix estimation of number of memchecks) regressed the
logic a bit.  We shouldn't quit the analysis if we encounter a pointer
without known bounds *unless* we actually need to emit a memcheck for
it.

The original code was using NumComparisons which is now computed
differently.  Instead I compute NeedRTCheck from NumReadPtrChecks and
NumWritePtrChecks.

As side note, I find the separation of NeedRTCheck and CanDoRT
confusing, so I will try to merge them in a follow-up patch.

llvm-svn: 241756
2015-07-08 22:58:48 +00:00
Adam Nemet 0131a5693a [LAA] Add missing debug output after r239285
r239285 ([LoopAccessAnalysis] Teach LAA to check the memory dependence
between strided accesses.) introduced a new case under
MemoryDepChecker::isDependent.  We normally have debug output for each
case.

llvm-svn: 241707
2015-07-08 18:47:38 +00:00
Silviu Baranga 1b6b50a921 [LAA] Merge memchecks for accesses separated by a constant offset
Summary:
Often filter-like loops will do memory accesses that are
separated by constant offsets. In these cases it is
common that we will exceed the threshold for the
allowable number of checks.

However, it should be possible to merge such checks,
sice a check of any interval againt two other intervals separated
by a constant offset (a,b), (a+c, b+c) will be equivalent with
a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap.
Assuming the loop will be executed for a sufficient number of
iterations, this will be true. If not true, checking against
(a, b+c) is still safe (although not equivalent).

As long as there are no dependencies between two accesses,
we can merge their checks into a single one. We use this
technique to construct groups of accesses, and then check
the intervals associated with the groups instead of
checking the accesses directly.

Reviewers: anemet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10386

llvm-svn: 241673
2015-07-08 09:16:33 +00:00
David Blaikie b447ac6435 Move VectorUtils from Transforms to Analysis to correct layering violation
llvm-svn: 240804
2015-06-26 18:02:52 +00:00
Adam Nemet c4866d29dd [LAA] Try to prove non-wrapping of pointers if SCEV cannot
Summary:
Scalar evolution does not propagate the non-wrapping flags to values
that are derived from a non-wrapping induction variable because
the non-wrapping property could be flow-sensitive.

This change is a first attempt to establish the non-wrapping property in
some simple cases.  The main idea is to look through the operations
defining the pointer.  As long as we arrive to a non-wrapping AddRec via
a small chain of non-wrapping instruction, the pointer should not wrap
either.

I believe that this essentially is what Andy described in
http://article.gmane.org/gmane.comp.compilers.llvm.cvs/220731 as the way
forward.

Reviewers: aschwaighofer, nadav, sanjoy, atrick

Reviewed By: atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10472

llvm-svn: 240798
2015-06-26 17:25:43 +00:00
Chandler Carruth ecbd16829a [PM/AA] Remove the UnknownSize static member from AliasAnalysis.
This is now living in MemoryLocation, which is what it pertains to. It
is also an enum there rather than a static data member which is left
never defined.

llvm-svn: 239886
2015-06-17 07:21:38 +00:00
Chandler Carruth ac80dc7532 [PM/AA] Remove the Location typedef from the AliasAnalysis class now
that it is its own entity in the form of MemoryLocation, and update all
the callers.

This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.

llvm-svn: 239885
2015-06-17 07:18:54 +00:00
Silviu Baranga 98a137196a [LAA] Fix estimation of number of memchecks
Summary:
We need to add a runtime memcheck for pair of accesses (x,y) where at least one of x and y
are writes.
 
Assuming we have w writes and r reads, currently this number is  estimated as being
w* (w+r-1). This estimation will count (write,write) pairs twice and will overestimate
the number of checks required.

This change adds a getNumberOfChecks method to RuntimePointerCheck, which
will count the number of runtime checks needed (similar in implementation to
needsAnyChecking) and uses it to produce the correct number of runtime checks.

Test Plan:
llvm test suite
spec2k
spec2k6

Performance results: no changes observed (not surprising since the formula for 1 writer is basically the same, which would covers most cases - at least with the current check limit).

Reviewers: anemet

Reviewed By: anemet

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D10217

llvm-svn: 239295
2015-06-08 10:27:06 +00:00
Hao Liu 32c0539691 [LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
       a = A[i];         // load of even element
       b = A[i+1];       // load of odd element
       ...               // operations on a, b, c, d
       A[i] = c;         // store of even element
       A[i+1] = d;       // store of odd element
     }

  The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
     %wide.vec = load <8 x i32>, <8 x i32>* %ptr
     %vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
     %vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>

  The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
     %interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> 
     store <8 x i32> %interleaved.vec, <8 x i32>* %ptr

This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'. 

llvm-svn: 239291
2015-06-08 06:39:56 +00:00
Hao Liu 751004a67d [LoopAccessAnalysis] Teach LAA to check the memory dependence between strided accesses.
Differential Revision: http://reviews.llvm.org/D9368

llvm-svn: 239285
2015-06-08 04:48:37 +00:00
Chandler Carruth 70c61c1a8a [PM/AA] Start refactoring AliasAnalysis to remove the analysis group and
port it to the new pass manager.

All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.

This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.

There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.

Differential Revision: http://reviews.llvm.org/D10228

llvm-svn: 239003
2015-06-04 02:03:15 +00:00
Adam Nemet df3dc5b9ca [LoopAccesses] If shouldRetryWithRuntimeCheck, reset InterestingDependences
When dependence analysis encounters a non-constant distance between
memory accesses it aborts the analysis and falls back to run-time checks
only.  In this case we weren't resetting the array of dependences.

llvm-svn: 237574
2015-05-18 15:37:03 +00:00
Adam Nemet c3384320f2 [LoopAccesses] Rearrange printed lines in -analyze
"Store to invariant address..." is moved as the last line.  This is not
the prime result of the analysis.  Plus it simplifies some of the tests.

llvm-svn: 237573
2015-05-18 15:36:57 +00:00
Adam Nemet f10ca27884 [LoopAccesses] Debug improvement
Report pointers with unknown bounds.

llvm-svn: 237572
2015-05-18 15:36:52 +00:00
Adam Nemet e2b885c4bc [getUnderlyingOjbects] Analyze loop PHIs further to remove false positives
Specifically, if a pointer accesses different underlying objects in each
iteration, don't look through the phi node defining the pointer.

The motivating case is the underlyling-objects-2.ll testcase.  Consider
the loop nest:

  int **A;
  for (i)
    for (j)
       A[i][j] = A[i-1][j] * B[j]

This loop is transformed by Load-PRE to stash away A[i] for the next
iteration of the outer loop:

  Curr = A[0];          // Prev_0
  for (i: 1..N) {
    Prev = Curr;        // Prev = PHI (Prev_0, Curr)
    Curr = A[i];
    for (j: 0..N)
       Curr[j] = Prev[j] * B[j]
  }

Since A[i] and A[i-1] are likely to be independent pointers,
getUnderlyingObjects should not assume that Curr and Prev share the same
underlying object in the inner loop.

If it did we would try to dependence-analyze Curr and Prev and the
analysis of the corresponding SCEVs would fail with non-constant
distance.

To fix this, the getUnderlyingObjects API is extended with an optional
LoopInfo parameter.  This is effectively what controls whether we want
the above behavior or the original.  Currently, I only changed to use
this approach for LoopAccessAnalysis.

The other testcase is to guard the opposite case where we do want to
look through the loop PHI.  If we step through an array by incrementing
a pointer, the underlying object is the incoming value of the phi as the
loop is entered.

Fixes rdar://problem/19566729

llvm-svn: 235634
2015-04-23 20:09:20 +00:00
Adam Nemet 8dcb3b6a59 [LoopAccesses] Improve debug output
llvm-svn: 235238
2015-04-17 22:43:10 +00:00
Adam Nemet 26da8e9800 [LoopAccesses] Properly print whether memchecks are needed
Fix oversight in -analyze output.  PtrRtCheck contains the pointers that
need to be checked against each other and not whether memchecks are
necessary.

For instance in the testcase PtrRtCheck has four elements but all
no-alias so no checking is necessary.

llvm-svn: 234833
2015-04-14 01:12:55 +00:00
Adam Nemet ce48250f11 [LoopAccesses] Allow analysis to complete in the presence of uniform stores
(Re-apply r234361 with a fix and a testcase for PR23157)

Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.

Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).

In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.

When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).

The changes also adds support to query this property of the loop and
modify the vectorizer to use this.

Patch by Ashutosh Nema!

llvm-svn: 234424
2015-04-08 17:48:40 +00:00
Adam Nemet e09a928c80 Revert "[LoopAccesses] Allow analysis to complete in the presence of uniform stores"
This reverts commit r234361.

It caused PR23157.

llvm-svn: 234387
2015-04-08 04:16:55 +00:00