this patch brought to you by the tool clang-format.
I wanted to fix up the names of constructor parameters because they
followed a bit of an anti-pattern by naming initialisms with CamelCase:
'Tti', 'Se', etc. This appears to have been in an attempt to not overlap
with the names of member variables 'TTI', 'SE', etc. However,
constructor arguments can very safely alias members, and in fact that's
the conventional way to pass in members. I've fixed all of these I saw,
along with making some strang abbreviations such as 'Lp' be simpler 'L',
or 'Lgl' be the word 'Legal'.
However, the code I was touching had indentation and formatting somewhat
all over the map. So I ran clang-format and fixed them.
I also fixed a few other formatting or doxygen formatting issues such as
using ///< on trailing comments so they are associated with the correct
entry.
There is still a lot of room for improvement of the formating and
cleanliness of this code. ;] At least a few parts of the coding
standards or common practices in LLVM's code aren't followed, the enum
naming rules jumped out at me. I may mix some of these while I'm here,
but not all of them.
llvm-svn: 171719
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.
llvm-svn: 171469
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
LCSSA PHIs may have undef values. The vectorizer updates values that are used by outside users such as PHIs.
The bug happened because undefs are not loop values. This patch handles these PHIs.
PR14725
llvm-svn: 171251
For the time being this includes only some dummy test cases. Once the
generic implementation of the intrinsics cost function does something other
than assuming scalarization in all cases, or some target specializes the
interface, some real test cases can be added.
Also, for consistency, I changed the type of IID from unsigned to Intrinsic::ID
in a few other places.
llvm-svn: 171079
memory bound checks. Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:
for ( k=1 ; k<n ; k++ )
x[k] = x[k-1] + y[k];
llvm-svn: 170811
Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.
This fixes External/SPEC/CINT95/099_go/099_go
llvm-svn: 170756
MapVector is a bit heavyweight, but I don't see a simpler way. Also the
InductionList is unlikely to be large. This should help 3-stage selfhost
compares (PR14647).
llvm-svn: 170528
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.
Reviewed by: Nadav
llvm-svn: 169711
Added the code that actually performs the if-conversion during vectorization.
We can now vectorize this code:
for (int i=0; i<n; ++i) {
unsigned k = 0;
if (a[i] > b[i]) <------ IF inside the loop.
k = k * 5 + 3;
a[i] = k; <---- K is a phi node that becomes vector-select.
}
llvm-svn: 169217
which is the legality of the if-conversion transformation. The next step is to
implement the cost-model for the if-converted code as well as the
vectorization itself.
llvm-svn: 169152
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
For now, this uses 8 on-stack elements. I'll need to do some profiling
to see if this is the best number.
Pointed out by Jakob in post-commit review.
llvm-svn: 167966
Iterating over the children of each node in the potential vectorization
plan must happen in a deterministic order (because it affects which children
are erased when two children conflict). There was no need for this data
structure to be a map in the first place, so replacing it with a vector
is a small change.
I believe that this was the last remaining instance if iterating over the
elements of a Dense* container where the iteration order could matter.
There are some remaining iterations over std::*map containers where the order
might matter, but so long as the Value* for instructions in a block increase
with the order of the instructions in the block (or decrease) monotonically,
then this will appear to be deterministic.
llvm-svn: 167942
Don't choose a vectorization plan containing only shuffles and
vector inserts/extracts. Due to inperfections in the cost model,
these can lead to infinite recusion.
llvm-svn: 167811
This fixes another infinite recursion case when using target costs.
We can only replace insert element input chains that are pure (end
with inserting into an undef).
llvm-svn: 167784
The old checking code, which assumed that input shuffles and insert-elements
could always be folded (and thus were free) is too simple.
This can only happen in special circumstances.
Using the simple check caused infinite recursion.
llvm-svn: 167750
The pass would previously assert when trying to compute the cost of
compare instructions with illegal vector types (like struct pointers).
llvm-svn: 167743
This fixes a bug where shuffles were being fused such that the
resulting input types were not legal on the target. This would
occur only when both inputs and dependencies were also foldable
operations (such as other shuffles) and there were other connected
pairs in the same block.
llvm-svn: 167731
When target cost information is available, compute explicit costs of inserting and
extracting values from vectors. At this point, all costs are estimated using the
target information, and the chain-depth heuristic is not needed. As a result, it is now, by
default, disabled when using target costs.
llvm-svn: 167256
When target costs are available, use them to account for the costs of
shuffles on internal edges of the DAG of candidate pairs.
Because the shuffle costs here are currently for only the internal edges,
the current target cost model is trivial, and the chain depth requirement
is still in place, I don't yet have an easy test
case. Nevertheless, by looking at the debug output, it does seem to do the right
think to the effective "size" of each DAG of candidate pairs.
llvm-svn: 167217
BBVectorize would, except for loads and stores, always fuse instructions
so that the first instruction (in the current source order) would always
represent the low part of the input vectors and the second instruction
would always represent the high part. This lead to too many shuffles
being produced because sometimes the opposite order produces fewer of them.
With this change, BBVectorize tracks the kind of pair connections that form
the DAG of candidate pairs, and uses that information to reorder the pairs to
avoid excess shuffles. Using this information, a future commit will be able
to add VTTI-based shuffle costs to the pair selection procedure. Importantly,
the number of remaining shuffles can now be estimated during pair selection.
There are some trivial instruction reorderings in the test cases, and one
simple additional test where we certainly want to do a reordering to
avoid an unnecessary shuffle.
llvm-svn: 167122
Instead of recomputing relative pointer information just prior to fusing,
cache this information (which also needs to be computed during the
candidate-pair selection process). This cuts down on the total number of
SE queries made, and also is a necessary intermediate step on the road toward
including shuffle costs in the pair selection procedure.
No functionality change is intended.
llvm-svn: 167049
Stop propagating the FlipMemInputs variable into the routines that
create the replacement instructions. Instead, just flip the arguments
of those routines. This allows for some associated cleanup (not all
of which is done here). No functionality change is intended.
llvm-svn: 167042
SE was being called during the instruction-fusion process (when the result
is unreliable, and thus ignored). No functionality change is intended.
llvm-svn: 167037
The monolithic interface for instruction costs has been split into
several functions. This is the corresponding change. No functionality
change is intended.
llvm-svn: 166865
Add getCostXXX calls for different families of opcodes, such as casts, arithmetic, cmp, etc.
Port the LoopVectorizer to the new API.
The LoopVectorizer now finds instructions which will remain uniform after vectorization. It uses this information when calculating the cost of these instructions.
llvm-svn: 166836
This is needed so that perl's SHA can be compiled (otherwise
BBVectorize takes far too long to find its fixed point).
I'll try to come up with a reduced test case.
llvm-svn: 166738
This is the first of several steps to incorporate information from the new
TargetTransformInfo infrastructure into BBVectorize. Two things are done here:
1. Target information is used to determine if it is profitable to fuse two
instructions. This means that the cost of the vector operation must not
be more expensive than the cost of the two original operations. Pairs that
are not profitable are no longer considered (because current cost information
is incomplete, for intrinsics for example, equal-cost pairs are still
considered).
2. The 'cost savings' computed for the profitability check are also used to
rank the DAGs that represent the potential vectorization plans. Specifically,
for nodes of non-trivial depth, the cost savings is used as the node
weight.
The next step will be to incorporate the shuffle costs into the DAG weighting;
this will give the edges of the DAG weights as well. Once that is done, when
target information is available, we should be able to dispense with the
depth heuristic.
llvm-svn: 166716
Unreachable blocks can have invalid instructions. For example,
jump threading can produce self-referential instructions in
unreachable blocks. Also, we should not be spending time
optimizing unreachable code. Fixes PR14133.
llvm-svn: 166423
We used a SCEV to detect that A[X] is consecutive. We assumed that X was
the induction variable. But X can be any expression that uses the induction
for example: X = i + 2;
llvm-svn: 166388
This is important for nested-loop reductions such as :
In the innermost loop, the induction variable does not start with zero:
for (i = 0 .. n)
for (j = 0 .. m)
sum += ...
llvm-svn: 166387
If the pointer is consecutive then it is safe to read and write. If the pointer is non-loop-consecutive then
it is unsafe to vectorize it because we may hit an ordering issue.
llvm-svn: 166371