Commit Graph

180 Commits

Author SHA1 Message Date
Nadav Rotem 13306816fc LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes.
llvm-svn: 180593
2013-04-26 05:08:59 +00:00
Nadav Rotem f43cbeee15 LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers.
llvm-svn: 180570
2013-04-25 19:55:03 +00:00
Arnold Schwaighofer 3fa801fbc2 LoopVectorizer: Change variable name Stride to ConsecutiveStride
This makes it easier to read the code.

No functionality change.

llvm-svn: 180197
2013-04-24 16:16:03 +00:00
Arnold Schwaighofer a6578f7056 LoopVectorize: Scalarize padded types
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.

Patch by Daisuke Takahashi!

Fixes PR15758.

llvm-svn: 180196
2013-04-24 16:16:01 +00:00
Arnold Schwaighofer 23a0589bce LoopVectorizer: Bail out if we don't have datalayout we need it
llvm-svn: 180195
2013-04-24 16:15:58 +00:00
Nadav Rotem 71c9d6d333 LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order.
This fixes a miscompilation in FreeBSD's regex library.

llvm-svn: 180121
2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen d3c90e132a Call the potentially costly isAnnotatedParallel() only once.
Made the uniform write test's checks a bit stricter.

llvm-svn: 180119
2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen 6f2f66b63f Refuse to (even try to) vectorize loops which have uniform writes,
even if erroneously annotated with the parallel loop metadata.

Fixes Bug 15794: 
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"

llvm-svn: 180081
2013-04-23 08:08:51 +00:00
Arnold Schwaighofer 5146940316 LoopVectorizer: Use matcher from PatternMatch.h for the min/max patterns
Also make some static function class functions to avoid having to mention the
class namespace for enums all the time.

No functionality change intended.

llvm-svn: 179886
2013-04-19 21:03:36 +00:00
Dmitri Gribenko d29ea04446 Fix a -Wdocumentation warning
llvm-svn: 179789
2013-04-18 20:13:04 +00:00
Arnold Schwaighofer 4cd6aa110c LoopVectorizer: Recognize min/max reductions
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.

This appears to help bzip2 by about 1.5% on an imac12,2.

radar://12960601

llvm-svn: 179773
2013-04-18 17:22:34 +00:00
Benjamin Kramer 8df2cfb858 LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.
Fixes PR15748.

llvm-svn: 179757
2013-04-18 14:29:13 +00:00
Arnold Schwaighofer f9cea17f75 LoopVectorizer: integer division is not a reduction operation
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381
2013-04-12 15:15:19 +00:00
Arnold Schwaighofer df6f67ed87 LoopVectorizer: Pass OperandValueKind information to the cost model
Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809
2013-04-04 23:26:27 +00:00
Arnold Schwaighofer c63cf3a0ae LoopVectorize: Invert case when we use a vector cmp value to query select cost
We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.

llvm-svn: 177098
2013-03-14 18:54:36 +00:00
Benjamin Kramer 6eda79f69a Remove a source of nondeterminism from the LoopVectorizer.
This made us emit runtime checks in a random order. Hopefully bootstrap
miscompares will go away now.

llvm-svn: 176775
2013-03-09 19:22:40 +00:00
Arnold Schwaighofer 8b3dc09400 LoopVectorizer: Ignore all dbg intrinisic
Ignore all DbgIntriniscInfo instructions instead of just DbgValueInst.

llvm-svn: 176769
2013-03-09 16:27:27 +00:00
Arnold Schwaighofer 4090b61ac3 LoopVectorizer: Ignore dbg.value instructions
We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic
and don't transfer them to the vectorized code.

radar://13378964

llvm-svn: 176768
2013-03-09 15:56:34 +00:00
Benjamin Kramer 37c2d65c5a Insert the reduction start value into the first bypass block to preserve domination.
Fixes PR15344.

llvm-svn: 176701
2013-03-08 16:58:37 +00:00
Nadav Rotem 739e37a0d2 PR14448 - prevent the loop vectorizer from vectorizing the same loop twice.
The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.

PR14448.

llvm-svn: 176399
2013-03-02 01:33:49 +00:00
Benjamin Kramer 12f98fae98 LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses.
Fixes PR15384.

llvm-svn: 176366
2013-03-01 19:07:31 +00:00
Benjamin Kramer dc145816fd LoopVectorize: Vectorize math builtin calls.
This properly asks TargetLibraryInfo if a call is available and if it is, it
can be translated into the corresponding LLVM builtin. We don't vectorize sqrt()
yet because I'm not sure about the semantics for negative numbers. The other
intrinsic should be exact equivalents to the libm functions.

Differential Revision: http://llvm-reviews.chandlerc.com/D465

llvm-svn: 176188
2013-02-27 15:24:19 +00:00
Renato Golin cf928cb53f Allow GlobalValues to vectorize with AliasAnalysis
Storing the load/store instructions with the values
and inspect them using Alias Analysis to make sure
they don't alias, since the GEP pointer operand doesn't
take the offset into account.

Trying hard to not add any extra cost to loads and stores
that don't overlap on global values, AA is *only* calculated
if all of the previous attempts failed.

Using biggest vector register size as the stride for the
vectorization access, as we're being conservative and
the cost model (which calculates the real vectorization
factor) is only run after the legalization phase.

We might re-think this relationship in the future, but
for now, I'd rather be safe than sorry.

llvm-svn: 175818
2013-02-21 22:39:03 +00:00
Benjamin Kramer 0aa2ad6104 LoopVectorize: Simplify code for clarity.
No functionality change.

llvm-svn: 175076
2013-02-13 21:12:29 +00:00
Pekka Jaaskelainen 0d23725a8d Metadata for annotating loops as parallel. The first consumer for this
metadata is the loop vectorizer.

See the documentation update for more info.

llvm-svn: 175060
2013-02-13 18:08:57 +00:00
Jakob Stoklund Olesen 479e5a9313 Typos.
llvm-svn: 174723
2013-02-08 17:43:32 +00:00
Arnold Schwaighofer 594fa2dc2b ARM cost model: Address computation in vector mem ops not free
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.

radar://13097204

llvm-svn: 174713
2013-02-08 14:50:48 +00:00
Michael Kuperstein f63b77be7f Test Commit
llvm-svn: 174709
2013-02-08 12:58:29 +00:00
Nadav Rotem a9100f3609 fix 80-col violation and fix the docs.
llvm-svn: 174671
2013-02-07 22:34:07 +00:00
Arnold Schwaighofer 3476fc8c82 Loop Vectorizer: Refactor Memory Cost Computation
We don't want too many classes in a pass and the classes obscure the details. I
was going a little overboard with object modeling here. Replace classes by
generic code that handles both loads and stores.

No functionality change intended.

llvm-svn: 174646
2013-02-07 19:05:21 +00:00
Arnold Schwaighofer 3be40b56c5 Loop Vectorizer: Refactor code to compute vectorized memory instruction cost
Introduce a helper class that computes the cost of memory access instructions.
No functionality change intended.

llvm-svn: 174422
2013-02-05 18:46:41 +00:00
Arnold Schwaighofer 22174f5d5a Loop Vectorizer: Handle pointer stores/loads in getWidestType()
In the loop vectorizer cost model, we used to ignore stores/loads of a pointer
type when computing the widest type within a loop. This meant that if we had
only stores/loads of pointers in a loop we would return a widest type of 8bits
(instead of 32 or 64 bit) and therefore a vector factor that was too big.

Now, if we see a consecutive store/load of pointers we use the size of a pointer
(from data layout).

This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced
test case is the first test in vector_ptr_load_store.ll).

radar://13139343

llvm-svn: 174377
2013-02-05 15:08:02 +00:00
Pekka Jaaskelainen f50ab84bb1 LoopVectorize: convert TinyTripCountVectorThreshold constant
to a command line switch.

llvm-svn: 173837
2013-01-29 21:42:08 +00:00
Benjamin Kramer cf406756ce LoopVectorize: Clean up ValueMap a bit and avoid double lookups.
No intended functionality change.

llvm-svn: 173809
2013-01-29 17:31:33 +00:00
Renato Golin 1258519674 Vectorization Factor clarification
llvm-svn: 173691
2013-01-28 16:02:45 +00:00
Nadav Rotem 69a040d3eb LoopVectorize: Refactor the code that vectorizes loads/stores to remove duplication.
llvm-svn: 173500
2013-01-25 21:47:42 +00:00
Benjamin Kramer 21e8da5990 LoopVectorize: Simplify code. No functionality change.
llvm-svn: 173475
2013-01-25 19:43:15 +00:00
Nadav Rotem 8e9ca2f8cb LoopVectorizer: Refactor more code to use the IRBuilder.
llvm-svn: 173471
2013-01-25 19:26:23 +00:00
Nadav Rotem c8adf3ff6e Refactor some code to use the IRBuilder.
llvm-svn: 173467
2013-01-25 18:34:09 +00:00
Nadav Rotem ab3e698ee9 Add support for reverse pointer induction variables. These are loops that contain pointers that count backwards.
For example, this is the hot loop in BZIP:

  do {
    m = *--p;
    *p = ( ... );
  } while (--n);

llvm-svn: 173219
2013-01-23 01:35:00 +00:00
Nadav Rotem b2e7e7a0b6 Fix a comment. Induction vars dont need to start at zero.
llvm-svn: 173061
2013-01-21 17:59:18 +00:00
Benjamin Kramer a6e2e2a0a7 LoopVectorize: Fix a C++11 incompatibility.
llvm-svn: 172990
2013-01-20 20:29:52 +00:00
Nadav Rotem da9f2adffd Fix a build error.
llvm-svn: 172971
2013-01-20 09:39:17 +00:00
Nadav Rotem c42f90b1f4 LoopVectorizer: Implement a new heuristics for selecting the unroll factor.
We ignore the cpu frontend and focus on pipeline utilization. We do this because we
don't have a good way to estimate the loop body size at the IR level.

llvm-svn: 172964
2013-01-20 05:24:29 +00:00
Benjamin Kramer d455ed85d1 LoopVectorizer: Emit memory checks into their own basic block.
This separates the check for "too few elements to run the vector loop" from the
"memory overlap" check, giving a lot nicer code and allowing to skip the memory
checks when we're not going to execute the vector code anyways. We still leave
the decision of whether to emit the memory checks as branches or setccs, but it
seems to be doing a good job. If ugly code pops up we may want to emit them as
separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso.

Most of this is legwork to allow multiple bypass blocks while updating PHIs,
dominators and loop info.

llvm-svn: 172902
2013-01-19 13:57:58 +00:00
Nadav Rotem d33ce6f100 LoopVectorizer cost model. Honor the user command line flag that selects the vectorization factor even if the target machine does not have any vector registers.
llvm-svn: 172544
2013-01-15 18:25:16 +00:00
Nadav Rotem 40e45eeae2 Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 and i16).
llvm-svn: 172348
2013-01-13 07:56:29 +00:00
Nadav Rotem 853fe0acb9 ARM Cost Model: We need to detect the max bitwidth of types in the loop in order to select the max vectorization factor.
We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use
another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out
of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector
zext/sext/trunc operations.

llvm-svn: 172178
2013-01-11 07:11:59 +00:00
Nadav Rotem 6eae65cfac LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals.
PR14878

llvm-svn: 172079
2013-01-10 17:34:39 +00:00
Nadav Rotem b1791a75cd ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
2013-01-09 22:29:00 +00:00