Commit Graph

4929 Commits

Author SHA1 Message Date
Sebastian Pop 20daf3276d implement missing SCEVDivision case
without this case we would end on an infinite recursion: the remainder is zero,
so Numerator - Remainder is equal to Numerator and so we would recursively ask
for the division of Numerator by Denominator.

llvm-svn: 209838
2014-05-29 19:44:09 +00:00
Sebastian Pop 5352408169 fail to find dimensions when ElementSize is nullptr
when ScalarEvolution::getElementSize returns nullptr it is safe to early return
in ScalarEvolution::findArrayDimensions such that we avoid later problems when
we try to divide the terms by ElementSize.

llvm-svn: 209837
2014-05-29 19:44:05 +00:00
Sanjay Patel 26b6edcf44 test check-in: added missing parenthesis in comment
llvm-svn: 209763
2014-05-28 19:03:33 +00:00
Sebastian Pop f93ef12330 avoid type mismatch when building SCEVs
This is a corner case I have stumbled upon when dealing with ARM64 type
conversions. I was not able to extract a testcase for the community codebase to
fail on. The patch conservatively discards a division that would have ended up
in an ICE due to a type mismatch when building a multiply expression. I have
also added code to a place that builds add expressions and in which we should be
careful not to pass in operands of different types.

llvm-svn: 209694
2014-05-27 22:42:00 +00:00
Sebastian Pop e30bd351cc do not use the GCD to compute the delinearization strides
We do not need to compute the GCD anymore after we removed the constant
coefficients from the terms: the terms are now all parametric expressions and
there is no need to recognize constant terms that divide only a subset of the
terms. We only rely on the size of the terms, i.e., the number of operands in
the multiply expressions, to sort the terms and recognize the parametric
dimensions.

llvm-svn: 209693
2014-05-27 22:41:56 +00:00
Sebastian Pop 28e6b97b5d remove BasePointer before delinearizing
No functional change is intended: instead of relying on the delinearization to
come up with the base pointer as a remainder of the divisions in the
delinearization, we just compute it from the array access and use that value.
We substract the base pointer from the SCEV to be delinearized and that
simplifies the work of the delinearizer.

llvm-svn: 209692
2014-05-27 22:41:51 +00:00
Sebastian Pop a6e5860513 remove constant terms
The delinearization is needed only to remove the non linearity induced by
expressions involving multiplications of parameters and induction variables.
There is no problem in dealing with constant times parameters, or constant times
an induction variable.

For this reason, the current patch discards all constant terms and multipliers
before running the delinearization algorithm on the terms. The only thing
remaining in the term expressions are parameters and multiply expressions of
parameters: these simplified term expressions are passed to the array shape
recognizer that will not recognize constant dimensions anymore: these will be
recognized as different strides in parametric subscripts.

The only important special case of a constant dimension is the size of elements.
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.

llvm-svn: 209691
2014-05-27 22:41:45 +00:00
Michael Zolotukhin 265dfa411c Some cleanup for r209568.
llvm-svn: 209634
2014-05-26 14:49:46 +00:00
Michael Zolotukhin d4c724625a Implement sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar
Evolution.

That helps SLP-vectorizer to recognize consecutive loads/stores.

<rdar://problem/14860614>

llvm-svn: 209568
2014-05-24 08:09:57 +00:00
Andrew Trick 839e30b2c0 Fix and improve SCEV ComputeBackedgeTankCount.
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.

That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.

<rdar://17005101> PR19799: Indvars miscompile...

llvm-svn: 209545
2014-05-23 19:47:13 +00:00
Justin Bogner cbb8438bb3 ScalarEvolution: Fix handling of AddRecs in isKnownPredicate
ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison
when both the LHS and RHS are SCEVAddRecExprs. This checks that both
LHS and RHS are guarded in the case when both are SCEVAddRecExprs.

The test case is against indvars because I could not find a way to
directly test SCEV.

Patch by Sanjay Patel!

llvm-svn: 209487
2014-05-23 00:06:56 +00:00
Andrew Trick e255359b57 Fix a bug in SCEV's backedge taken count computation from my prior fix in Jan.
This has to do with the trip count computation for loops with multiple
exits, which is quite subtle. Most passes just ask for a single trip
count number, so we must be conservative assuming any exit could be
taken.  Normally, we rely on the "exact" trip count, which was
correctly given as "unknown". However, SCEV also gives a "max"
back-edge taken count. The loops max BE taken count is conservatively
a maximum over the max of each exit's non-exiting iterations
count. Note that some exit tests can be skipped so the max loop
back-edge taken count can actually exceed the max non-exiting
iterations for some exits. However, when we know the loop *latch*
cannot be skipped, we can directly use its max taken count
disregarding other exits. I previously took the minimum here without
checking whether the other exit could be skipped. The correct, and
simpler thing to do here is just to directly use the loop latch's max
non-exiting iterations as the loops max back-edge count.

In the problematic test case, the first loop exit had a max of zero
non-exiting iterations, but could be skipped. The loop latch was known
not to be skipped but had max of one non-exiting iteration. We
incorrectly claimed the loop back-edge could be taken zero times, when
it is actually taken one time.

Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count.
Loop %for.body.i: max backedge-taken count is 1.

llvm-svn: 209358
2014-05-22 00:37:03 +00:00
Eric Christopher 650c8f2a06 Clean up language and grammar.
Based on a patch by jfcaron3@gmail.com!
PR19806

llvm-svn: 209216
2014-05-20 17:11:11 +00:00
Nick Lewycky ec373545b8 Teach isKnownNonNull that a nonnull return is not null. Add a test for this case as well as the case of a nonnull attribute (already handled but not tested).
llvm-svn: 209193
2014-05-20 05:13:21 +00:00
Nick Lewycky d52b1528c0 Add 'nonnull', a new parameter and return attribute which indicates that the pointer is not null. Instcombine will elide comparisons between these and null. Patch by Luqman Aden!
llvm-svn: 209185
2014-05-20 01:23:40 +00:00
Peter Collingbourne 68a889757d Check the alwaysinline attribute on the call as well as on the caller.
Differential Revision: http://reviews.llvm.org/D3815

llvm-svn: 209150
2014-05-19 18:25:54 +00:00
David Majnemer 78910fc4da InstSimplify: Improve handling of ashr/lshr
Summary:
Analyze the range of values produced by ashr/lshr cst, %V when it is
being used in an icmp.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3774

llvm-svn: 209000
2014-05-16 17:14:03 +00:00
David Majnemer ea8d5dbf24 InstSimplify: Optimize using dividend in sdiv
Summary:
The dividend in an sdiv tells us the largest and smallest possible
results.  Use this fact to optimize comparisons against an sdiv with a
constant dividend.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3795

llvm-svn: 208999
2014-05-16 16:57:04 +00:00
Juergen Ributzka 34390c70a5 Add C API for thread yielding callback.
Sometimes a LLVM compilation may take more time then a client would like to
wait for. The problem is that it is not possible to safely suspend the LLVM
thread from the outside. When the timing is bad it might be possible that the
LLVM thread holds a global mutex and this would block any progress in any other
thread.

This commit adds a new yield callback function that can be registered with a
context. LLVM will try to yield by calling this callback function, but there is
no guaranteed frequency. LLVM will only do so if it can guarantee that
suspending the thread won't block any forward progress in other LLVM contexts
in the same process.

Once the client receives the call back it can suspend the thread safely and
resume it at another time.

Related to <rdar://problem/16728690>

llvm-svn: 208945
2014-05-16 02:33:15 +00:00
Jay Foad 5a29c367f7 Instead of littering asserts throughout the code after every call to
computeKnownBits, consolidate them into one assert at the end of
computeKnownBits itself.

llvm-svn: 208876
2014-05-15 12:12:55 +00:00
Chandler Carruth a0e5695ad9 Teach the constant folder to look through bitcast constant expressions
much more effectively when trying to constant fold a load of a constant.
Previously, we only handled bitcasts by trying to find a totally generic
byte representation of the constant and use that. Now, we look through
the bitcast to see what constant we might fold the load into, and then
try to form a constant expression cast of the found value that would be
equivalent to loading the value.

You might wonder why on earth this actually matters. Well, turns out
that the Itanium ABI causes us to create a single array for a vtable
where the first elements are virtual base offsets, followed by the
virtual function pointers. Because the array is homogenous the element
type is consistently i8* and we inttoptr the virtual base offsets into
the initial elements.

Then constructors bitcast these pointers to i64 pointers prior to
loading them. Boom, no more constant folding of virtual base offsets.
This is the first fix to LLVM to address the *insane* performance Eric
Niebler discovered with Clang on his range comprehensions[1]. There is
more to come though, this doesn't *really* fix the problem fully.

[1]: http://ericniebler.com/2014/04/27/range-comprehensions/

llvm-svn: 208856
2014-05-15 09:56:28 +00:00
Alp Toker beaca19c7c Fix typos
llvm-svn: 208839
2014-05-15 01:52:21 +00:00
Jay Foad a0653a3e6c Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
David Majnemer 2d6c023576 InstSimplify: Optimize signed icmp of -(zext V)
Summary:
We know that -(zext V) will always be <= zero, simplify signed icmps
that have these.

Uncovered using http://www.cs.utah.edu/~regehr/souper/

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3754

llvm-svn: 208809
2014-05-14 20:16:28 +00:00
Jay Foad e48d9e8efe Update the comments for ComputeMaskedBits, which lost its Mask parameter
in r154011.

llvm-svn: 208757
2014-05-14 08:00:07 +00:00
Sebastian Pop 05719e486f use nullptr instead of NULL
llvm-svn: 208622
2014-05-12 20:11:01 +00:00
Sebastian Pop b1a548f72d do not assert when delinearization fails
llvm-svn: 208615
2014-05-12 19:01:53 +00:00
Sebastian Pop 0e75c5cb64 use isZero()
llvm-svn: 208614
2014-05-12 19:01:49 +00:00
Benjamin Kramer 8cff45aa20 SCEV: Use range-based for loop and fold variable into assert.
llvm-svn: 208476
2014-05-10 17:47:18 +00:00
Sebastian Pop 47fe7de1b5 move findArrayDimensions to ScalarEvolution
we do not use the information from SCEVAddRecExpr to compute the shape of the array,
so a better place for this function is in ScalarEvolution.

llvm-svn: 208456
2014-05-09 22:45:07 +00:00
Sebastian Pop 444621abe1 fix typo in debug message
llvm-svn: 208455
2014-05-09 22:45:02 +00:00
Tobias Grosser 1e9db7e639 Correct formatting.
Sorry for the commit spam. My clang-format crashed on me and the vim
plugin did not print an error, but instead just left the formatting
untouched.

llvm-svn: 208358
2014-05-08 21:43:19 +00:00
Tobias Grosser b2101c3ca5 Use std::remove_if to remove elements from a vector
Suggested-by: Benjamin Kramer <benny.kra@gmail.com>
llvm-svn: 208357
2014-05-08 21:32:59 +00:00
Rafael Espindola 0c4eea74d7 Use a range loop.
llvm-svn: 208343
2014-05-08 17:57:50 +00:00
Tobias Grosser 3080cf16a5 Revert "SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time"
as committed in r208282. The original commit was incorrect.

llvm-svn: 208286
2014-05-08 07:55:34 +00:00
Tobias Grosser ecfe9d06eb SCEV: Use I = vector<>.erase(I) to iterate and delete at the same time
llvm-svn: 208282
2014-05-08 07:12:44 +00:00
Sebastian Pop b8d56f42b7 avoid segfaulting
*Quotient and *Remainder don't have to be initialized.

llvm-svn: 208238
2014-05-07 19:00:37 +00:00
Sebastian Pop a7d3d6ab9f do not collect undef terms
llvm-svn: 208237
2014-05-07 19:00:32 +00:00
Sebastian Pop 448712b1a6 split delinearization pass in 3 steps
To compute the dimensions of the array in a unique way, we split the
delinearization analysis in three steps:

- find parametric terms in all memory access functions
- compute the array dimensions from the set of terms
- compute the delinearized access functions for each dimension

The first step is executed on all the memory access functions such that we
gather all the patterns in which an array is accessed. The second step reduces
all this information in a unique description of the sizes of the array. The
third step is delinearizing each memory access function following the common
description of the shape of the array computed in step 2.

This rewrite of the delinearization pass also solves a problem we had with the
previous implementation: because the previous algorithm was by induction on the
structure of the SCEV, it would not correctly recognize the shape of the array
when the memory access was not following the nesting of the loops: for example,
see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll

; void foo(long n, long m, long o, double A[n][m][o]) {
;
;   for (long i = 0; i < n; i++)
;     for (long j = 0; j < m; j++)
;       for (long k = 0; k < o; k++)
;         A[i][k][j] = 1.0;

Starting with this patch we no longer delinearize access functions that do not
contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll

;;  for (long int i = 0; i < 100; i++)
;;    for (long int j = 0; j < 100; j++) {
;;      A[2*i - 4*j] = i;
;;      *B++ = A[6*i + 8*j];

these accesses will not be delinearized as the upper bound of the loops are
constants, and their access functions do not contain SCEVUnknown parameters.

llvm-svn: 208232
2014-05-07 18:01:20 +00:00
Tobias Grosser 924221cb37 [C++11] Add NArySCEV->Operands iterator range
llvm-svn: 208158
2014-05-07 06:07:47 +00:00
Duncan P. N. Exon Smith 87c40fdfdb blockfreq: Move include to .cpp
llvm-svn: 208035
2014-05-06 01:57:42 +00:00
Chandler Carruth 312dddfb81 [LCG] Add the last (and most complex) of the edge insertion mutation
operations on the call graph. This one forms a cycle, and while not as
complex as removing an internal edge from an SCC, it involves
a reasonable amount of work to find all of the nodes newly connected in
a cycle.

Also somewhat alarming is the worst case complexity here: it might have
to walk roughly the entire SCC inverse DAG to insert a single edge. This
is carefully documented in the API (I hope).

llvm-svn: 207935
2014-05-04 09:38:32 +00:00
Juergen Ributzka d35c114d15 [TBAA] Fix handling of mixed TBAA (path-aware and non-path-aware TBAA).
This fix simply ensures that both metadata nodes are path-aware before
performing path-aware alias analysis.

This issue isn't normally triggered in LLVM, because we perform an autoupgrade
of the TBAA metadata to the new format when reading in LL or BC files. This
issue only appears when a client creates the IR manually and mixes old and new
TBAA metadata format.

This fixes <rdar://problem/16760860>.

llvm-svn: 207923
2014-05-03 22:32:52 +00:00
Chandler Carruth 7cc4ed8202 [LCG] Add the other simple edge insertion API to the call graph. This
just connects an SCC to one of its descendants directly. Not much of an
impact. The last one is the hard one -- connecting an SCC to one of its
ancestors, and thereby forming a cycle such that we have to merge all
the SCCs participating in the cycle.

llvm-svn: 207751
2014-05-01 12:18:20 +00:00
Chandler Carruth 034d0d6805 [LCG] Don't lookup the child SCC twice. Spotted this by inspection, and
no functionality changed.

llvm-svn: 207750
2014-05-01 12:16:31 +00:00
Chandler Carruth 4b096741b4 [LCG] Add some basic methods for querying the parent/child relationships
of SCCs in the SCC DAG. Exercise them in the big graph test case. These
will be especially useful for establishing invariants in insertion
logic.

llvm-svn: 207749
2014-05-01 12:12:42 +00:00
Chandler Carruth 5217c94522 [LCG] Add the really, *really* boring edge insertion case: adding an
edge entirely within an existing SCC. Shockingly, making the connected
component more connected is ... a total snooze fest. =]

Anyways, its wired up, and I even added a test case to make sure it
pretty much sorta works. =D

llvm-svn: 207631
2014-04-30 10:48:36 +00:00
Chandler Carruth c5026b670e [LCG] Actually test the *basic* edge removal bits (IE, the non-SCC
bits), and discover that it's totally broken. Yay tests. Boo bug. Fix
the basic edge removal so that it works by nulling out the removed edges
rather than actually removing them. This leaves the indices valid in the
map from callee to index, and preserves some of the locality for
iterating over edges. The iterator is made bidirectional to reflect that
it now has to skip over null entries, and the skipping logic is layered
onto it.

As future work, I would like to track essentially the "load factor" of
the edge list, and when it falls below a threshold do a compaction.

An alternative I considered (and continue to consider) is storing the
callees in a doubly linked list where each element of the list is in
a set (which is essentially the classical linked-hash-table
datastructure). The problem with that approach is that either you need
to heap allocate the linked list nodes and use pointers to them, or use
a bucket hash table (with even *more* linked list pointer overhead!),
etc. It's pretty easy to get 5x overhead for values that are just
pointers. So far, I think punching holes in the vector, and periodic
compaction is likely to be much more efficient overall in the space/time
tradeoff.

llvm-svn: 207619
2014-04-30 07:45:27 +00:00
Benjamin Kramer d59664f4f7 raw_ostream: Forward declare OpenFlags and include FileSystem.h only where necessary.
llvm-svn: 207593
2014-04-29 23:26:49 +00:00
Duncan P. N. Exon Smith d22bea7dad blockfreq: Defer to BranchProbability::scale()
`BlockMass` can now defer to `BranchProbability::scale()`.

llvm-svn: 207547
2014-04-29 16:20:05 +00:00