Adding a new pass PolyhedralInfo. This pass will be the interface to Polly.
Initially, we will provide the following interface:
- #IsParallel(Loop *L) - return a bool depending on whether the loop is
parallel or not for the given program order.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D21486
llvm-svn: 276637
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422
I just learned that target triples prevent test cases to be run on other
architectures. Polly test cases are until now sufficiently target independent
to not require any target triples. Hence, we drop them.
llvm-svn: 235384
isl recently introduced a new interface to create run-time checks from
constraint sets. Use this interface to simplify our run-time check generation.
llvm-svn: 230640
Scops that only read seem generally uninteresting and scops that only write are
most likely initializations where there is also little to optimize. To not
waste compile time we bail early.
Differential Revision: http://reviews.llvm.org/D7735
llvm-svn: 229820
Schedule dimensions that have the same constant value accross all statements do
not carry any information, but due to the increased dimensionality of the
schedule cost compile time. To not pay this cost, we remove constant dimensions
if possible.
llvm-svn: 225067
In case a GEP instruction references into a fixed size array e.g., an access
A[i][j] into an array A[100x100], LLVM-IR does not guarantee that the subscripts
always compute values that are within array bounds. We now derive the set of
parameter values for which all accesses are within bounds and add the assumption
that the scop is only every executed with this set of parameter values.
Example:
void foo(float A[][20], long n, long m {
for (long i = 0; i < n; i++)
for (long j = 0; j < m; j++)
A[i][j] = ...
This loop yields out-of-bound accesses if m is at least 20 and at the same time
at least one iteration of the outer loop is executed. Hence, we assume:
n <= 0 or m <= 20.
Doing so simplifies the dependence analysis problem, allows us to perform
more optimizations and generate better code.
TODO: The location where the GEP instruction is executed is not necessarily the
location where the memory is actually accessed. As a result scanning for GEP[s]
is imprecise. Even though this is not a correctness problem, this imprecision
may result in missed optimizations or non-optimal run-time checks.
In polybench where this mismatch between parametric loop bounds and fixed size
arrays is common, we see with this patch significant reductions in compile time
(up to 50%) and execution time (up to 70%). We see two significant compile time
regressions (fdtd-2d, jacobi-2d-imper), and one execution time regression
(trmm). Both regressions arise due to additional optimizations that have been
enabled by this patch. They can be addressed in subsequent commits.
http://reviews.llvm.org/D6369
llvm-svn: 222754
Instead of parallelizing every parallel outermost loop, we now use a very
minimalistic cost model. Specifically, we assume innermost loops are not
worth parallelising and all non-innermost loops are.
When parallelizing all loops in LNT we got several slowdowns/timeouts due to
us parallelizing innermost loops that are executed only a couple of times
(number of iterations not known statically). With this basic heuristic enabled
LNT does not show any more timeouts, while several interesting loops are still
parallelized.
There are many ways to obtain an improved heuristic. Constructing such an
improvide heuristic from a position of minimal slow-down and zero code size
increase seems to be the best, as it allows us to track progress on LNT.
llvm-svn: 222096
We introduces a new flag -polly-parallel and use it to annotate the for-nodes in
the isl ast that we want to execute thread parallel (e.g., using OpenMP). We
previously already emmitted openmp annotations, but we did this for various
kinds of parallel loops, including some which we can not run in parallel.
With this patch we now have three annotations:
1) #pragma known-parallel [reduction]
2) #pragma omp for
3) #pragma simd
meaning:
1) loop has no loop carried dependences
2) loop will be executed thread-parallel
3) loop can possibly be vectorized
This patch introduces 1) and reduces the use of 2) to only the cases where we
will actually generate thread parallel code.
It is in preparation of openmp code generation in our isl backend.
Legacy:
- We also have a command line option -enable-polly-openmp. This option controls
the OpenMP code generation in CLooG. It will become an alias of
-polly-parallel after the CLooG code generation has been dropped.
http://reviews.llvm.org/D6142
llvm-svn: 221479
There was a bug in the IslAst which caused that no more outermost
parallel loops were detected/checked after a parallel outermost loop
of depth 1.
+ Test case attached
llvm-svn: 217452
+ Introduced dependency type TYPE_TC_RED to represent the transitive closure
(& the reverse) of reduction dependences. These are used when we check for
reduction parallel loops.
+ Test cases including loop reversals and modulo schedules which compute
reductions in a alternated order.
llvm-svn: 213019
For complex examples it may happen that we do not compute dependences. In this
case we do not want to crash, but just not detect parallel loops.
llvm-svn: 204470