This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23260
llvm-svn: 281441
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D23991
llvm-svn: 281234
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.
To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:
{ Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }
And here the same lifetime for a semantically identical one-dimensional
schedule:
{ Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }
Differential Revision: https://reviews.llvm.org/D24310
llvm-svn: 280948
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.
llvm-svn: 280468
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23740
llvm-svn: 279395
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D22187
llvm-svn: 278666
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21491
llvm-svn: 276627
This ensures that the error status set with -polly-on-isl-error-abort is
maintained even after running DependenceInfo and ScheduleOptimizer. Both
passes temporarily set the error status to CONTINUE as the dependence
analysis uses a compute-out and the scheduler may not be able to derive
a schedule. In both cases we want to not abort, but to handle the error
gracefully. Before this commit, we always set the error reporting to ABORT
after these passes. After this commit, we use the error reporting mode that was
active earlier.
This comes without a test case as this would require us to introduce (memory)
errors which would trigger the isl errors.
llvm-svn: 274272
llvm commonly adds a comment to the closing brace of a namespace to indicate
which namespace is closed. clang-tidy provides with llvm-namespace-comment
a handy tool to check for this habit. We use it to ensure we consitently use
namespace comments in Polly.
There are slightly different styles in how namespaces are closed in LLVM. As
there is no large difference between the different comment styles we go for the
style clang-tidy suggests by default.
To reproduce this fix run:
for i in `ls tools/polly/lib/*/*.cpp`; \
clang-tidy -checks='-*,llvm-namespace-comment' -p build $i -fix \
-header-filter=".*"; \
done
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273621
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273437
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273435
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21140
llvm-svn: 273397
multiplication
Fix small issues related to characters, operators and descriptions of tests.
Differential Revision: http://reviews.llvm.org/D20806
llvm-svn: 271264
Created a new pass ScopInfoRegionPass. As name suggests, it is a
region pass and it is there to preserve compatibility with our
existing Polly passes. ScopInfoRegionPass will return a SCoP object
for a valid region while the creation of the SCoP stays in the
ScopInfo class.
Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D20770
llvm-svn: 271259
This header is required to make the ISO 646 alternative operator
spellings ("and", "or" instead of "&&", "||") work. Should these
operators be replaced by the standard ones as already suggested by
Johannes, also remove this #include again.
llvm-svn: 271206
Add determination of statements that contain, in particular,
matrix multiplications and can be optimized with [1] to try to
get close-to-peak performance. It can be enabled
via polly-pm-based-opts, which is false by default.
Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D20575
llvm-svn: 271128
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.
In practice this allows loop skewing for more parallelism in inner
loops.
llvm-svn: 268222
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
This will allow us to optimize C++ template code with Polly. This support is
mostly for debugging purpose and individual experiments. The ultimate goal is
still to run Polly later in the pass manager when inlining already happened.
llvm-svn: 250092
Helper functions in the BlockGenerators.h/cpp introduce dependences
from the frontend to the backend of Polly. As they are used in
ScopDetection, ScopInfo, etc. we move them to the ScopHelper file.
llvm-svn: 249919
The changes affect methods that are part of the Pass interface and
include:
- Comments that describe the methods purpose.
- A consistent use of the keywords override and virtual.
Additionally, the printScop method is now optional and removed from
SCoP passes that do not implement it.
llvm-svn: 248685
Other passes which perform different optimizations might be interested in
also applying data-locality transformations as part of their overall
transformation.
llvm-svn: 245824
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.
llvm-svn: 245564
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.
llvm-svn: 245563
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.
This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.
llvm-svn: 245424