IslAst could mark two nested outer loops as "OutermostParallel". It
caused that the code generator tried to OpenMP-parallelize both loops,
which it is not prepared loop.
It was because the recursive AST build algorithm managed a flag
"InParallelFor" to ensure that no nested loop is also marked as
"OutermostParallel". Unfortunatetly the same flag was used by nodes
marked as SIMD, and reset to false after the SIMD node. Since loops can
be marked as SIMD inside "OutermostParallel" loops, the recursive
algorithm again tried to mark loops as "OutermostParellel" although
still nested inside another "OutermostParallel" loop.
The fix exposed another bug: The function "astScheduleDimIsParallel" was
only called when a loop was potentially "OutermostParallel" or
"InnermostParallel", but as a side-effect also determines the minimum
dependence distance. Hence, changing when we need to know whether a loop
is "OutermostParallel" also changed which loop was annotated with
"#pragma minimal dependence distance".
Moreover, some complex condition linked with "InParallelFor" determined
whether a loop should be an "InnermostParallel" loop. It missed some
situations where it would not use mark as such although being inside an
SIMD mark node, and therefore not be annotated using "#pragma simd".
The changes in particular:
1. Split the "InParallelFor" flag into an "InParallelFor" and an
"InSIMD" flag.
2. Unconditionally call "astScheduleDimIsParallel" for its side-effects
and store the result in "InParallel" for later use.
3. Simplify the condition when a loop is "InnermostParallel".
Fixes llvm.org/PR33153 and llvm.org/PR38073.
llvm-svn: 343212
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30293
llvm-svn: 295958
Providing the context to the ast generator allows for additional simplifcations
and -- more importantly -- allows to generate loops with only partially bounded
domains, assuming the domains are bounded for all parameter configurations
that are valid as defined by the context.
This change fixes the crash reported in http://llvm.org/PR30956
The original reason why we did not include the context when generating an
AST was that CLooG and later isl used to sometimes transfer some of the
constraints that bound the size of parameters from the context into the
generated AST. This resulted in operations with very large constants, which
sometimes introduced problematic integer overflows. The latest versions of
the isl AST generator are careful to not introduce such constants.
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286442
This release includes sevaral improvments compared to the previous
version isl-0.16.1-145-g243bf7c (from the ISL 0.17 announcement):
- optionally combine SCCs incrementally in scheduler
- optionally maximize coincidence in scheduler
- optionally avoid loop coalescing in scheduler
- minor AST generator improvements
- improve support for expansions in schedule trees
llvm-svn: 268500
ISL 0.16 will change how sets are printed which breaks 117 unit tests
that text-compare printed sets. This patch re-formats most of these unit
tests using a script and small manual editing on top of that. When
actually updating ISL, most work is done by just re-running the script
to adapt to the changed output.
Some tests that compare IR and tests with single CHECK-lines that can be
easily updated manually are not included here.
The re-format script will also be committed afterwards. The per-test
formatter invocation command lines options will not be added in the near
future because it is ad hoc and would overwrite the manual edits.
Ideally it also shouldn't be required anymore because ISL's set printing
has become more stable in 0.16.
Differential Revision: http://reviews.llvm.org/D16095
llvm-svn: 257851
This update brings in improvements to isl's 'isolate' option that reduce the
number of code versions generated. This results in both code-size and compile
time reduction for outer loop vectorization.
Thanks to Roman Garev and Sven Verdoolaege for working on this improvement.
llvm-svn: 254706
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809