forked from OSchip/llvm-project
ca7f5bb767
We isolate full tiles from partial tiles to be able to, for example, vectorize loops with parametric lower and/or upper bounds. If we use -polly-vectorizer=stripmine, we can see execution-time improvements: correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to 0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from 8m5565s to 7m4285s (-13.18 %). The current full/partial tile separation increases compile-time more than necessary. As a result, we see in compile time regressions, for example, for 3mm from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected as we generate more IR and consequently more time is spent in the LLVM backends. However, a first investiagation has shown that a larger portion of compile time is unnecessarily spent inside Polly's parallelism detection and could be eliminated by propagating existing knowledge about vector loop parallelism. Before enabling -polly-vectorizer=stripmine by default, it is necessary to address this compile-time issue. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewers: jdoerfert, grosser Subscribers: grosser, #polly Differential Revision: http://reviews.llvm.org/D13779 llvm-svn: 250809 |
||
---|---|---|
.. | ||
2012-03-16-Empty-Domain.ll | ||
2012-04-16-Trivially-vectorizable-loops.ll | ||
2013-04-11-Empty-Domain-two.ll | ||
computeout.ll | ||
full_partial_tile_separation.ll | ||
line-tiling-2.ll | ||
line-tiling.ll | ||
one-dimensional-band.ll | ||
prevectorization-without-tiling.ll | ||
prevectorization.ll | ||
rectangular-tiling.ll |