Commit r206510 falsely advertised to fix the load cases, even though it only
fixed the store case. This commit adds the same fix for the load case including
the missing test coverage.
llvm-svn: 206577
Even tough we may want to generate a vector load, the address from which to load
still is a scalar. Make sure even if previous address computations may have been
vectorized, that the addresses are also available as scalars.
This fixes http://llvm.org/PR19469
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206510
The following example shows a non-parallel loop
void f(int a[]) {
int i;
for (i = 0; i < 10; ++i)
A[i] = A[i+5];
}
which, in case we import a schedule that limits the iteration domain
to 0 <= i < 5, becomes parallel. Previously we crashed in such cases, now we
just recognize it as parallel.
This fixes http://llvm.org/PR19435
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206318
During code preperation trivial PHI nodes (mainly introduced by lcssa) are
deleted to decrease the number of introduced allocas (==> dependences). However
simply replacing them by their only incoming value would cause the independent
block pass to introduce new allocas. To prevent this we try to share stack slots
during code preperarion, hence to reuse a already created alloca 'to demote' the
trivial PHI node. This works if we know that the value stored in this alloca
will be the incoming value of the trivial PHI at the end of the predecessor
block of this trivial PHI.
Contributed-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 205320
We explicitly specifying all filenames instead of assuming some naming
convention used by clang and opt.
Contributed-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 204726
For complex examples it may happen that we do not compute dependences. In this
case we do not want to crash, but just not detect parallel loops.
llvm-svn: 204470
This patch enables vectorization of loops containing backward array
traversal (array stride is -1).
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
llvm-svn: 204257
In case we are at the innermost band, we try to prepare for vectorization. This
means, we look for the innermost parallel loop and strip mine this loop to the
innermost level using a strip-mine factor corresponding to the number of vector
iterations.
For whatever reason, the code that implemented this feature was broken. We now
added a comment, a test case and obviously also the right code.
llvm-svn: 203544
This is necessary to avoid test failures in the CLooG test suite due to the
recent isl update.
We also need to update two polly test cases which rely on a certain order in the
textual description that isl chooses for its sets and maps. Changes here are not
often, but we should probably switch to a check that verifies such maps are
semantically equivalent instead of represented identically.
llvm-svn: 203476
For now we only mark innermost loops for the loop vectorizer. We could later
also mark not-innermost loops to enable the introduction of openmp parallelism.
llvm-svn: 202854
In 'obsequi' we have a scop in which the current dead code elimination works,
but the generated code is way too complex. To avoid this trouble (and to not
disable the DCE entirely) we add an additional approximative step before
the actual dead code elimination. This should fix one of the two current
nightly-test issues.
Polly could be improved to handle 'obsequi' by teaching it to introduce only a
single parameter for (%1 and zext %1) which halves the number of parameters and
allows polly to derive a simpler representation for the set of live iterations.
However, this needs some time to investigate.
I will commit a test case as soon as we have a reduced one.
llvm-svn: 202010
In case we do not have valid dependences, we do not run dead code elimination or
the schedule optimizer. This fixes an infinite loop in the dead code
elimination (PR12110).
llvm-svn: 201982
Instead of giving a choice between a precise (but possibly very complex)
analysis and an approximative analysis we now use a hybrid approach which uses N
precise steps followed by one approximating step. The precision of the analysis
can be changed by increasing N. With a default of 'N' = 2, we get fully precise
results for our current test cases and should not run into performance problems
for more complex test cases. We can adjust this value when we got more
experience with this dead code elimination.
llvm-svn: 201888
We now skip the debug intrinsics which is a lot better than crashing due to
uncopied metadata references. We should step by step investigate which debug
intrinsics we can copy without trouble.
We still keep the debug location metadata.
llvm-svn: 201860
This pass eliminates loop iterations that compute results that are not used
later on. This can help e.g. in D, where the default zero-initialization is
often unnecessary if right after new values are assigned to an array.
Contributed-by: Peter Conn <conn.peter@gmail.com>
llvm-svn: 201817
We do not have a use for this information at the moment. If we need this at some
point, the "instruction -> access" mapping needs to be enhanced as a single
instruction could then possibly perform multiple accesses.
This patch allows us to build the polyhedral information for scops with scalar
dependences.
llvm-svn: 201815
In rare cases the modification of one scop can effect the validity of other
scops, as code generation of an earlier scop may make the scalar evolution
functions derived for later scops less precise. The example that triggered this
patch was a scop that contained an 'or' expression as follows:
%add13710 = or i32 %j.19, 1
--> {(1 + (4 * %l)),+,2}<nsw><%for.body81>
Scev could only analyze the 'or' as it knew %j.19 is a multiple of 2. This
information was not available after the first scop was code generated (or
independent-blocks was run on it) and SCEV could not derive a precise SCEV
expression any more. This means we could not any more code generate this SCoP.
My current understanding is that there is always the risk that an earlier code
generation change invalidates later scops. As the example we have seen here is
difficult to avoid, we use this occasion to guard us against all such
invalidations.
This patch "solves" this issue by verifying right before we start working on
a detected scop, if this scop is in fact still valid. This adds a certain
overhead. However the verification we run is anyways very fast and secondly
it is only run on detected scops. So the overhead should not be very large. As
a later optimization we could detect scops only on demand, such that we need
to run scop-detections always only a single time.
This should fix the single last failure in the LLVM test-suite for the new
scev-based code generation.
llvm-svn: 201593
There does not seem to be a reason that we can not support PHI nodes outside of
the scop that reference values within the SCoP. Or at least, the attached test
case seems to do the right thing. We remove the assert for now.
llvm-svn: 200427
In rare cases, a region R which is itself not valid has an indirect child region
that is valid. When R becomes part of a valid region by expansion of another
region, then all children of R have to be erased from the set of valid regions.
This patch ensures that indirect children are erased in addition to direct
children.
Contributed-by: Armin Groesslinger <armin.groesslinger@uni-passau.de>
Tobias: I added a reduced test case and adjusted the logic of the patch to
only recurse until the first child is found.
llvm-svn: 200411
Array base addresses need to be invariant in the region considered. The base
address has to be computed outside the region, or, when it is computed inside,
the value must not change with the iterations of the loops. For example, when a
two-dimensional array is represented as a pointer to pointers the base address
A[i] in an access A[i][j] changes with i; therefore, such regions have to be
rejected.
Contributed by: Armin Größlinger <armin.groesslinger@uni-passau.de>
llvm-svn: 200314
This is not only not necessary, but in case -03 changes this can actually
cause arbitrarily failing test cases such as, e.g., a recent change by Chandler
that caused -O3 to unroll the loop body, which made the loop we wanted to
detect disappear and consequently this test case fail.
llvm-svn: 200204
Count the number of computational steps that have been used to solve the
dependence problem and abort in case we reach the "compute-out". This ensures we
do not hang forever in cases the dependence problem is too difficult to solve.
There is just a single case in the LLVM test-suite that runs into the
compute-out. Even in this case, we can probably coalesce some of the parameters
(i32 b, i32 b zext i64, ...) to simplify the problem enough to not hit the
compute out. However, for now we set the compute out in place to address the
general issue. The compute out was choosen such that it stops on a recent laptop
after about 8 seconds.
llvm-svn: 200156
We now report the following:
$ polly-clang -O3 -mllvm -polly -mllvm -polly-report test.c -c \
-gline-tables-only
note: Polly detected an optimizable loop region (scop) in function 'foo'
test.c:2: Start of scop
test.c:3: End of scop
note: Polly detected an optimizable loop region (scop) in function 'bar'
test.c:9: Start of scop
test.c:13: End of scop
llvm-svn: 197558
When constructing a scop sometimes the exact representation of a statement or
condition would be very complex, but there is a common case which is a lot
simpler, but which is only valid under certain assumptions. The assumed context
records the assumptions taken during the construction of this scop and that need
to be code generated as a run-time test.
At the moment, we do not yet model any assumptions, but only added the
AssumedContext as well as the isl-ast generation support. As a next step,
this needs to be hooked up with the isl code generation.
if (1) /* run-time condition */
{ /* optimized code */ }
else
{ /* original code */ }
llvm-svn: 193652
SCoP invariant parameters with the different start value would deter parameter
sharing. For example, when compiling the following C code:
void foo(float *input) {
for (long j = 0; j < 8; j++) {
// SCoP begin
for (long i = 0; i < 8; i++) {
float x = input[j * 64 + i + 1];
input[j * 64 + i] = x * x;
}
}
}
Polly would creat two parameters for these memory accesses:
p_0: {0,+,256}
p_2: {4,+,256}
[j * 64 + i + 1] => MemRef_input[o0] : 4o0 = p_1 + 4i0
[j * 64 + i] => MemRef_input[o0] : 4o0 = p_0 + 4i0
These parameters only differ from start value. To enable parameter sharing,
we split the start value from SCEVAddRecExpr, so they would share a single
parameter that always has zero start value:
p0: {0,+,256}<%for.cond1.preheader>
[j * 64 + i + 1] => MemRef_input[o0] : 4o0 = 4 + p_1 + 4i0
[j * 64 + i] => MemRef_input[o0] : 4o0 = p_0 + 4i0
Such translation can make the polly-dependence much faster.
Contributed-by: Star Tan <tanmx_star@yeah.net>
llvm-svn: 187728