We verify the optimized function now for a long time and it helped to track
down bugs early. This will now also happen for all parallel subfunctions we
generate.
llvm-svn: 265823
The way to get the elements size with getPrimitiveSizeInBits() is not
the same as used in other parts of Polly which should use
DataLayout::getTypeAllocSize(). Its use only queries the size of the
pointer and getPrimitiveSizeInBits returns 0 for types that require a
DataLayout object such as pointers.
Together with r265379, this should fix PR27195.
llvm-svn: 265795
If we build the domains for error blocks and later remove them we lose
the information that they are not executed. Thus, in the SCoP it looks
like the control will always reach the statement S:
for (i = 0 ... N)
if (*valid == 0)
doSth(&ptr);
S: A[i] = *ptr;
Consequently, we would have assumed "ptr" to be always accessed and
preloaded it unconditionally. However, only if "*valid != 0" we would
execute the optimized version of the SCoP. Nevertheless, we would have
hoisted and accessed "ptr"regardless of "*valid". This changes the
semantic of the program as the value of "*valid" can cause a change of
"ptr" and control if it is executed or not.
To fix this problem we adjust the execution context of hoisted loads
wrt. error domains. To this end we introduce an ErrorDomainCtxMap that
maps each basic block to the error context under which it might be
executed. Thus, to the context under which it is executed but an error
block would have been executed to. To fill this map one traversal of
the blocks in the SCoP suffices. During this traversal we do also
"remove" error statements and those that are only reachable via error
statements. This was previously done by the removeErrorBlockDomains
function which is therefor not needed anymore.
This fixes bug PR26683 and thereby several SPEC miscompiles.
Differential Revision: http://reviews.llvm.org/D18822
llvm-svn: 265778
If ScalarEvolution cannot look through some expression but we do, it
might happen that a multiplication will arrive at the
SCEVAffinator::visitMulExpr. While we could always try to improve the
extractConstantFactor function we might still miss something, thus we
reintroduce the code to generate multiplicative piecewise-affine
functions as a fall-back.
llvm-svn: 265777
The findValues() function did not look through div & srem instructions
that were part of the argument SCEV. However, in different other
places we already look through it. This mismatch caused us to preload
values in the wrong order.
llvm-svn: 265775
If all exiting blocks of a SCoP are error blocks and therefor not
represented we will not generate accesses and consequently no SAI
objects for exit PHIs. However, they are needed in the code generation
to generate the merge PHIs between the original and optimized region.
With this patch we enusre that the SAI objects for exit PHIs exist
even if all exiting blocks turn out to be eror blocks.
This fixes the crash reported in PR27207.
llvm-svn: 265393
We currently only consider the first GEP when delinearizing access functions,
which makes us loose information about additional index expression offsets,
which results in our SCoP model to be incorrect. With this patch we now
compare the base pointers used to ensure we do not miss any additional offsets.
This fixes llvm.org/PR27195.
We may consider supporting nested GEP in our delinearization heuristics in
the future.
llvm-svn: 265379
Even before we build the domain the branch condition can become very
complex, especially if we have to build the complement of a lot of
equality constraints. With this patch we bail if the branch condition
has a lot of basic sets and parameters.
After this patch we now successfully compile
External/SPEC/CINT2000/186_crafty/186_crafty
with "-polly-process-unprofitable -polly-position=before-vectorizer".
llvm-svn: 265286
As a CFG is often structured we can simplify the steps performed during
domain generation. When we push domain information we can utilize the
information from a block A to build the domain of a block B, if A dominates B
and there is no loop backede on a path from A to B. When we pull domain
information we can use information from a block A to build the domain of a
block B if B post-dominates A. This patch implements both ideas and thereby
simplifies domains that were not simplified by isl. For the FINAL basic block
in test/ScopInfo/complex-successor-structure-3.ll we used to build a universe
set with 81 basic sets. Now it actually is represented as universe set.
While the initial idea to utilize the graph structure depended on the
dominator and post-dominator tree we can use the available region
information as a coarse grained replacement. To this end we push the
region entry domain to the region exit and pull it from the region
entry for the region exit if applicable.
With this patch we now successfully compile
External/SPEC/CINT2006/400_perlbench/400_perlbench
and
SingleSource/Benchmarks/Adobe-C++/loop_unroll.
Differential Revision: http://reviews.llvm.org/D18450
llvm-svn: 265285
If a loop has no exiting blocks the region covering we use during
schedule genertion might not cover that loop properly. For now we bail
out as we would not optimize these loops anyway.
llvm-svn: 265280
If an exit PHI is written and also read in the SCoP we should not create two
SAI objects but only one. As the read is only modeled to ensure OpenMP code
generation knows about it we can simply use the EXIT_PHI MemoryKind for both
accesses.
llvm-svn: 265261
If a loop has no exiting blocks the region covering we use during
schedule genertion might not cover that loop properly. For now we bail
out as we would not optimize these loops anyway.
llvm-svn: 265260
If a non-affine region PHI is generated we should not move the insert
point prior to the synthezised value in the same block as we might
split that block at the insert point later on. Only if the incoming
value should be placed in a different block we should change the
insertion point.
llvm-svn: 265132
... instead of hardcoding something that has been free at some point. This fixes
a crash triggered by r265084, where the diagnostic IDs have been shifted in a
way that resulted our hardcode ID to not be assigned any implementation. Our ID
was likely already wrong earlier on, but this time we really crashed nicely.
llvm-svn: 265114
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:
FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time
The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.
llvm-svn: 264885
As a CFG is often structured we can simplify the steps performed
during domain generation. When we push domain information we can
utilize the information from a block A to build the domain of a
block B, if A dominates B. When we pull domain information we can
use information from a block A to build the domain of a block B
if B post-dominates A. This patch implements both ideas and thereby
simplifies domains that were not simplified by isl. For the FINAL
basic block in
test/ScopInfo/complex-successor-structure-3.ll .
we used to build a universe set with 81 basic sets. Now it actually is
represented as universe set.
While the initial idea to utilize the graph structure depended on the
dominator and post-dominator tree we can use the available region
information as a coarse grained replacement. To this end we push the
region entry domain to the region exit and pull it from the region
entry for the region exit.
Differential Revision: http://reviews.llvm.org/D18450
llvm-svn: 264789
Instead of waiting for the domain construction to finish we will now
bail as early as possible in case a complexity problem is encountered.
This might save compile time but more importantly it makes the "abort"
explicit. While we can always check if we invalidated the assumed
context we can simply propagate the result of the construction back.
This also removes the HasComplexCFG flag that was used for the very
same reason.
Differential Revision: http://reviews.llvm.org/D18504
llvm-svn: 264775
Ensure the length of the header underline matches the length of the header.
This prevents SPHINX from erroring on this file and consequently not updating
the documentation.
Also, make this its own point not belonging to the 'increased applicability'
section.
llvm-svn: 264592
This patch applies the restrictions on the number of domain conjuncts
also to the domain parts of piecewise affine expressions we generate.
To this end the wording is change slightly. It was needed to support
complex additions featuring zext-instructions but it also fixes PR27045.
lnt profitable runs reports only little changes that might be noise:
Compile Time:
Polybench/[...]/2mm +4.34%
SingleSource/[...]/stepanov_container -2.43%
Execution Time:
External/[...]/186_crafty -2.32%
External/[...]/188_ammp -1.89%
External/[...]/473_astar -1.87%
llvm-svn: 264514
This got accidentally dropped in r264283.
Also, drop the wwwfiles from the removal list. This is not needed any more as
we now explicitly list the directories that should be formatted.
llvm-svn: 264397
This pass is not enabled in the default tool chain and currently can run into an
infinite loop, due to other parts of LLVM generating incorrect IR
(http://llvm.org/PR27065) -- which is not executed and consequently does not
seem to disturb other passes. As this pass is not really needed, we can just
drop it to get our build clean.
This fixes the timeout issues in MultiSource/Benchmarks/MiBench/consumer-jpeg
and MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg for
-polly-position=before-vectorizer -polly-process-unprofitable.. Unfortunately,
we are still left with a miscompile in cjpeg.
llvm-svn: 264396