Before this patch we only expanded valid __and__ profitable region. Therefor
we did not allow the expansion to create a profitable region from a
non-profitable one. With this patch we will remember and expand all valid
regions and check for profitability only at the end.
This patch increases the number of valid SCoPs in the LLVM-TS and SPEC
2000/2006 by 28% (from 303 to 390), including the hot loop in hmmer.
llvm-svn: 269343
This patch cleans up the rejection log handling during the
ScopDetection. It consists of two interconnected parts:
- We keep all detection contexts for a function in order to provide
more information to the user, e.g., about the rejection of
extended/intermediate regions.
- We remove the mutable "RejectLogs" member as the information is
available through the detection contexts.
llvm-svn: 269323
Truncate operations are basically modulo operations, thus we can model
them that way. However, for large types we assume the operand to fit
in the new type size instead of introducing a modulo with a very large
constant.
llvm-svn: 269300
We utilize assumptions on the input to model IR in polyhedral world.
To verify these assumptions we version the code and guard it with a
runtime-check (RTC). However, since the RTCs are themselves generated
from the polyhedral representation we generate them under the same
assumptions that they should verify. In other words, the guarantees
that we try to provide with the RTCs do not hold for the RTCs
themselves. To this end it is necessary to employ a different check
for the RTCs that will verify the assumptions did hold for them too.
Differential Revision: http://reviews.llvm.org/D20165
llvm-svn: 269299
If a profitable run is performed we will check if the SCoP seems to be
profitable after creation but before e.g., dependence are computed. This is
needed as SCoP detection only approximates the actual SCoP representation.
In the end this should allow us to be less conservative during the SCoP
detection while keeping the compile time in check.
llvm-svn: 269074
Regions with one affine loop can be profitable if the loop is
distributable. To this end we will allow them to be treated as
profitable if they contain at least two non-trivial basic blocks.
llvm-svn: 269064
The assumption attached to an llvm.assume in the SCoP needs to be
combined with the domain of the surrounding statement but can
nevertheless be used to refine the context.
This fixes the problems mentioned in PR27067.
llvm-svn: 269060
This patches makes the propagation of complexity problems during
domain generation consistent. Additionally, it makes it less likely to
encounter ill-formed domains later, e.g., during schedule generation.
llvm-svn: 269055
Before this patch we generated error-restrictions only for
error-blocks, thus blocks (or regions) containing a not represented
function call. However, the same reasoning is needed if the invalid
domain of a statement subsumes its actual domain. To this end we move
the generation of error-restrictions after the propagation of the
invalid domains. Consequently, error-statements are now defined more
general as statements that are assumed to be not executed.
Additionally, we do not record an empty domain for such statements but
a nullptr instead. This allows to distinguish between error-statements
and dead-statements.
llvm-svn: 269053
We now use context information to simplify the domains and access
functions of the SCoP instead of just aligning them with the parameter
space.
llvm-svn: 269048
Previously we checked the number of pieces to decide whether or not a
invariant load was to complex to be generated. However, there are
cases when e.g., divisions cause the complexity to spike regardless of
the number of pieces. To this end we now check the number of totally
involved dimensions which will increase with the number of pieces but
also the number of divisions.
llvm-svn: 269045
This exposes the functionality to interpret a SCEV, or better the
piece-wise function created from the SCEV, as an unsigned value
instead of a signed one.
llvm-svn: 269044
Min/max expressions are easier to read and can in some cases also result in
more concise IR that is generated as the min/max --- when lowered to a
cmp+select pattern -- commonly has a simpler condition then the ternary
condition isl would normally generate.
llvm-svn: 268855
This release includes sevaral improvments compared to the previous
version isl-0.16.1-145-g243bf7c (from the ISL 0.17 announcement):
- optionally combine SCCs incrementally in scheduler
- optionally maximize coincidence in scheduler
- optionally avoid loop coalescing in scheduler
- minor AST generator improvements
- improve support for expansions in schedule trees
llvm-svn: 268500
The check for complexity compares the number of polyhedra in a set,
which are combined by disjunctions (union, "OR"),
not conjunctions (intersection, "AND").
llvm-svn: 268223
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.
In practice this allows loop skewing for more parallelism in inner
loops.
llvm-svn: 268222
After zero-extend operations and unsigned comparisons we now allow
unsigned divisions. The handling is basically the same as for signed
division, except the interpretation of the operands. As the divisor
has to be constant in both cases we can simply interpret it as an
unsigned value without additional complexity in the representation.
For the dividend we could choose from the different representation
schemes introduced for zero-extend operations but for now we will
simply use an assumption.
llvm-svn: 268032
For debugging it is often convenient to not abort at the very first memory
management error. This option allows to control this behavior at run-time.
llvm-svn: 268030
It does not suffice to take a global assumptions for unsigned comparisons but
we also need to adjust the invalid domain of the statements guarded by such
an assumption. To this end we allow to specialize the getPwAff call now in
order to indicate unsigned interpretation.
llvm-svn: 268025
When we materialize parameter SCEVs we did so without considering the
side effects they might have, e.g., both division and modulo are
undefined if the right hand side is zero. This is a problem because we
potentially extended the domain under which we evaluate parameters,
thus we might have introduced such undefined behaviour. To prevent
that from happening we will now guard divisions and modulo operations
in the parameters with a compare and select.
llvm-svn: 268023
Assumptions and restrictions can both be simplified with the domain of a
statement but not the same way. After this patch we will correctly
distinguish them.
llvm-svn: 267885
If the base pointer of an invariant load is is loaded conditionally, that
condition needs to hold for the invariant load too. The structure of the
program will imply this for domain constraints but not for imprecisions in
the modeling. To this end we will propagate the execution context of base
pointers during code generation and thus ensure the derived pointer does
not access an invalid base pointer.
llvm-svn: 267707
With this patch we will optimistically assume that the result of an unsigned
comparison is the same as the result of the same comparison interpreted as
signed.
llvm-svn: 267559
Additive expressions can have constant factors too that we can extract
and thereby simplify the internal representation. For now we do
compute the gcd of all constant factors but only extract the same
(possibly negated) factor if there is one.
llvm-svn: 267445
Before, we checked all GEPs in a statement in order to derive
out-of-bound assumptions. However, this can not only introduce new
parameters but it is also not clear what we can learn from GEPs that
are not immediately used in a memory accesses inside the SCoP. As this
case is very rare, no actual change in the behaviour is expected.
llvm-svn: 267442
Before, assumptions derived from llvm.assume could reference new
parameters that were not known to the SCoP before. These were neither
beneficial to the representation nor to the user that reads the
emitted remark. Now we project them out and keep only user assumptions
on known parameters. Nevertheless, the new parameters are still part
of the SCoPs parameter space as the SCEVAffinator currently adds them
on demand.
llvm-svn: 267441
The new handling is consistent with the remaining code, e.g., we do
not create a new parameter id for each lookup call but copy an
existing one. Additionally, we now use the implicit order defined by
the Parameters set instead of an explicit one defined in a map.
llvm-svn: 267423
A zero-extended value can be interpreted as a piecewise defined signed
value. If the value was non-negative it stays the same, otherwise it
is the sum of the original value and 2^n where n is the bit-width of
the original (or operand) type. Examples:
zext i8 127 to i32 -> { [127] }
zext i8 -1 to i32 -> { [256 + (-1)] } = { [255] }
zext i8 %v to i32 -> [v] -> { [v] | v >= 0; [256 + v] | v < 0 }
However, LLVM/Scalar Evolution uses zero-extend (potentially lead by a
truncate) to represent some forms of modulo computation. The left-hand side
of the condition in the code below would result in the SCEV
"zext i1 <false, +, true>for.body" which is just another description
of the C expression "i & 1 != 0" or, equivalently, "i % 2 != 0".
for (i = 0; i < N; i++)
if (i & 1 != 0 /* == i % 2 */)
/* do something */
If we do not make the modulo explicit but only use the mechanism described
above we will get the very restrictive assumption "N < 3", because for all
values of N >= 3 the SCEVAddRecExpr operand of the zero-extend would wrap.
Alternatively, we can make the modulo in the operand explicit in the
resulting piecewise function and thereby avoid the assumption on N. For the
example this would result in the following piecewise affine function:
{ [i0] -> [(1)] : 2*floor((-1 + i0)/2) = -1 + i0;
[i0] -> [(0)] : 2*floor((i0)/2) = i0 }
To this end we can first determine if the (immediate) operand of the
zero-extend can wrap and, in case it might, we will use explicit modulo
semantic to compute the result instead of emitting non-wrapping assumptions.
Note that operands with large bit-widths are less likely to be negative
because it would result in a very large access offset or loop bound after the
zero-extend. To this end one can optimistically assume the operand to be
positive and avoid the piecewise definition if the bit-width is bigger than
some threshold (here MaxZextSmallBitWidth).
We choose to go with a hybrid solution of all modeling techniques described
above. For small bit-widths (up to MaxZextSmallBitWidth) we will model the
wrapping explicitly and use a piecewise defined function. However, if the
bit-width is bigger than MaxZextSmallBitWidth we will employ overflow
assumptions and assume the "former negative" piece will not exist.
llvm-svn: 267408
Memory accesses can have non-precisely modeled access functions that
would cause us to build incorrect execution context for hoisted loads.
This is the same issue that occurred during the domain construction for
statements and it is dealt with the same way.
llvm-svn: 267289
The SCEVAffinator will now produce not only the isl representaiton of
a SCEV but also the domain under which it is invalid. This is used to
record possible overflows that can happen in the statement domains in
the statements invalid domain. The result is that invalid loads have
an accurate execution contexts with regards to the validity of their
statements domain. While the SCEVAffinator currently is only taking
"no-wrapping" assumptions, we can add more withouth worrying about the
execution context of loads that are optimistically hoisted.
llvm-svn: 267288
The invalid context is not enough to describe the parameter constraints under
which a statement is not modeled precisely. The reason is that during the
domain construction the bounds on the induction variables are not known but
needed to check if e.g., an overflow can actually happen. To this end we
replace the invalid context of a statement with an invalid domain. It is
initialized during domain construction and intersected with the domain once
it was completely build. Later this invalid domain allows to eliminate
falsely assumed wrapping cases and other falsely assumed mismatches in the
modeling.
llvm-svn: 267286
We used checks to minimize the number of remarks we present to a user
but these checks can become expensive, especially since all wrapping
assumptions are emitted separately. Because there is not benefit for a
"headless" run we put these checks under a command line flag. Thus, if
the flag is not given we will emit "non-effective" remarks, e.g.,
duplicates and revert to the old behaviour if it is given. As this
also changes the internal representation of some sets we set the flag
by default for our unit tests.
llvm-svn: 266087
Utilizing the record option for assumptions we can simplify the wrapping
assumption generation a lot. Additionally, we can now report locations
together with wrapping assumptions, though they might not be accurate yet.
llvm-svn: 266069
There are three reasons why we want to record assumptions first before we
add them to the assumed/invalid context:
1) If the SCoP is not profitable or otherwise invalid without the
assumed/invalid context we do not have to compute it.
2) Information about the context are gathered rather late in the SCoP
construction (basically after we know all parameters), thus the user
might see overly complicated assumptions to be taken while they would
have been simplified later on.
3) Currently we cannot take assumptions at any point but have to wait,
e.g., for the domain generation to finish. This makes wrapping
assumptions much more complicated as they need to be and it will
have a similar effect on "signed-unsigned" assumptions later.
llvm-svn: 266068
Collect the error domain contexts (formerly in the ErrorDomainCtxMap)
for each statement in the new InvalidContext member variable. While
this commit is basically a [NFC] it is a first step to make hoisting
sound by allowing a more fine grained record of invalid contexts,
e.g., here on statement level.
llvm-svn: 266053
Allow overflow of indices into the next higher dimension if it has
constant size. E.g.
float A[32][2];
((float*)A)[5];
is effectively the same as
A[2][1];
This can happen since r265379 as a side effect if ScopDetection
recognizes an access as affine, but ScopInfo rejects the GetElementPtr.
Differential Revision: http://reviews.llvm.org/D18878
llvm-svn: 265942
MSVC warns with:
warning C4239: nonstandard extension used: 'initializing': conversion from 'llvm::DebugLoc' to 'llvm::DebugLoc &'
note: A non-const reference may only be bound to an lvalue
Change the reference to a const reference.
llvm-svn: 265937
In r247147 we disabled pointer expressions because the IslExprBuilder did not
fully support them. This patch reintroduces them by simply treating them as
integers. The only special handling for pointers that is left detects the
comparison of two address_of operands and uses an unsigned compare.
llvm-svn: 265894
This reverts commit 2879c53e80e05497f408f21ce470d122e9f90f94.
Additionally, it adds SDiv and SRem instructions to the set of values
discovered by the findValues function even if we add the operands to
be able to recompute the SCEVs. In subfunctions we do not want to
recompute SDiv and SRem instructions but pass them instead as they
might have been created through the IslExprBuilder and are more
complicated than simple SDiv/SRem instructions in the code.
llvm-svn: 265873
We verify the optimized function now for a long time and it helped to track
down bugs early. This will now also happen for all parallel subfunctions we
generate.
llvm-svn: 265823
The way to get the elements size with getPrimitiveSizeInBits() is not
the same as used in other parts of Polly which should use
DataLayout::getTypeAllocSize(). Its use only queries the size of the
pointer and getPrimitiveSizeInBits returns 0 for types that require a
DataLayout object such as pointers.
Together with r265379, this should fix PR27195.
llvm-svn: 265795
If we build the domains for error blocks and later remove them we lose
the information that they are not executed. Thus, in the SCoP it looks
like the control will always reach the statement S:
for (i = 0 ... N)
if (*valid == 0)
doSth(&ptr);
S: A[i] = *ptr;
Consequently, we would have assumed "ptr" to be always accessed and
preloaded it unconditionally. However, only if "*valid != 0" we would
execute the optimized version of the SCoP. Nevertheless, we would have
hoisted and accessed "ptr"regardless of "*valid". This changes the
semantic of the program as the value of "*valid" can cause a change of
"ptr" and control if it is executed or not.
To fix this problem we adjust the execution context of hoisted loads
wrt. error domains. To this end we introduce an ErrorDomainCtxMap that
maps each basic block to the error context under which it might be
executed. Thus, to the context under which it is executed but an error
block would have been executed to. To fill this map one traversal of
the blocks in the SCoP suffices. During this traversal we do also
"remove" error statements and those that are only reachable via error
statements. This was previously done by the removeErrorBlockDomains
function which is therefor not needed anymore.
This fixes bug PR26683 and thereby several SPEC miscompiles.
Differential Revision: http://reviews.llvm.org/D18822
llvm-svn: 265778
If ScalarEvolution cannot look through some expression but we do, it
might happen that a multiplication will arrive at the
SCEVAffinator::visitMulExpr. While we could always try to improve the
extractConstantFactor function we might still miss something, thus we
reintroduce the code to generate multiplicative piecewise-affine
functions as a fall-back.
llvm-svn: 265777
The findValues() function did not look through div & srem instructions
that were part of the argument SCEV. However, in different other
places we already look through it. This mismatch caused us to preload
values in the wrong order.
llvm-svn: 265775
If all exiting blocks of a SCoP are error blocks and therefor not
represented we will not generate accesses and consequently no SAI
objects for exit PHIs. However, they are needed in the code generation
to generate the merge PHIs between the original and optimized region.
With this patch we enusre that the SAI objects for exit PHIs exist
even if all exiting blocks turn out to be eror blocks.
This fixes the crash reported in PR27207.
llvm-svn: 265393
We currently only consider the first GEP when delinearizing access functions,
which makes us loose information about additional index expression offsets,
which results in our SCoP model to be incorrect. With this patch we now
compare the base pointers used to ensure we do not miss any additional offsets.
This fixes llvm.org/PR27195.
We may consider supporting nested GEP in our delinearization heuristics in
the future.
llvm-svn: 265379
Even before we build the domain the branch condition can become very
complex, especially if we have to build the complement of a lot of
equality constraints. With this patch we bail if the branch condition
has a lot of basic sets and parameters.
After this patch we now successfully compile
External/SPEC/CINT2000/186_crafty/186_crafty
with "-polly-process-unprofitable -polly-position=before-vectorizer".
llvm-svn: 265286
As a CFG is often structured we can simplify the steps performed during
domain generation. When we push domain information we can utilize the
information from a block A to build the domain of a block B, if A dominates B
and there is no loop backede on a path from A to B. When we pull domain
information we can use information from a block A to build the domain of a
block B if B post-dominates A. This patch implements both ideas and thereby
simplifies domains that were not simplified by isl. For the FINAL basic block
in test/ScopInfo/complex-successor-structure-3.ll we used to build a universe
set with 81 basic sets. Now it actually is represented as universe set.
While the initial idea to utilize the graph structure depended on the
dominator and post-dominator tree we can use the available region
information as a coarse grained replacement. To this end we push the
region entry domain to the region exit and pull it from the region
entry for the region exit if applicable.
With this patch we now successfully compile
External/SPEC/CINT2006/400_perlbench/400_perlbench
and
SingleSource/Benchmarks/Adobe-C++/loop_unroll.
Differential Revision: http://reviews.llvm.org/D18450
llvm-svn: 265285
If a loop has no exiting blocks the region covering we use during
schedule genertion might not cover that loop properly. For now we bail
out as we would not optimize these loops anyway.
llvm-svn: 265280
If an exit PHI is written and also read in the SCoP we should not create two
SAI objects but only one. As the read is only modeled to ensure OpenMP code
generation knows about it we can simply use the EXIT_PHI MemoryKind for both
accesses.
llvm-svn: 265261
If a loop has no exiting blocks the region covering we use during
schedule genertion might not cover that loop properly. For now we bail
out as we would not optimize these loops anyway.
llvm-svn: 265260
If a non-affine region PHI is generated we should not move the insert
point prior to the synthezised value in the same block as we might
split that block at the insert point later on. Only if the incoming
value should be placed in a different block we should change the
insertion point.
llvm-svn: 265132
... instead of hardcoding something that has been free at some point. This fixes
a crash triggered by r265084, where the diagnostic IDs have been shifted in a
way that resulted our hardcode ID to not be assigned any implementation. Our ID
was likely already wrong earlier on, but this time we really crashed nicely.
llvm-svn: 265114
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:
FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time
The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.
llvm-svn: 264885
As a CFG is often structured we can simplify the steps performed
during domain generation. When we push domain information we can
utilize the information from a block A to build the domain of a
block B, if A dominates B. When we pull domain information we can
use information from a block A to build the domain of a block B
if B post-dominates A. This patch implements both ideas and thereby
simplifies domains that were not simplified by isl. For the FINAL
basic block in
test/ScopInfo/complex-successor-structure-3.ll .
we used to build a universe set with 81 basic sets. Now it actually is
represented as universe set.
While the initial idea to utilize the graph structure depended on the
dominator and post-dominator tree we can use the available region
information as a coarse grained replacement. To this end we push the
region entry domain to the region exit and pull it from the region
entry for the region exit.
Differential Revision: http://reviews.llvm.org/D18450
llvm-svn: 264789
Instead of waiting for the domain construction to finish we will now
bail as early as possible in case a complexity problem is encountered.
This might save compile time but more importantly it makes the "abort"
explicit. While we can always check if we invalidated the assumed
context we can simply propagate the result of the construction back.
This also removes the HasComplexCFG flag that was used for the very
same reason.
Differential Revision: http://reviews.llvm.org/D18504
llvm-svn: 264775
This patch applies the restrictions on the number of domain conjuncts
also to the domain parts of piecewise affine expressions we generate.
To this end the wording is change slightly. It was needed to support
complex additions featuring zext-instructions but it also fixes PR27045.
lnt profitable runs reports only little changes that might be noise:
Compile Time:
Polybench/[...]/2mm +4.34%
SingleSource/[...]/stepanov_container -2.43%
Execution Time:
External/[...]/186_crafty -2.32%
External/[...]/188_ammp -1.89%
External/[...]/473_astar -1.87%
llvm-svn: 264514
This pass is not enabled in the default tool chain and currently can run into an
infinite loop, due to other parts of LLVM generating incorrect IR
(http://llvm.org/PR27065) -- which is not executed and consequently does not
seem to disturb other passes. As this pass is not really needed, we can just
drop it to get our build clean.
This fixes the timeout issues in MultiSource/Benchmarks/MiBench/consumer-jpeg
and MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg for
-polly-position=before-vectorizer -polly-process-unprofitable.. Unfortunately,
we are still left with a miscompile in cjpeg.
llvm-svn: 264396
This fixes PR27035. While we now exclude MemIntrinsics from the
polyhedral model if they would access "null" we could exploit this
even more, e.g., remove all parameter combinations that would lead to
the execution of this statement from the context.
llvm-svn: 264284
Similar to r262612 we need to check not only the pointer SCEV and the
type of an alias group but also the actual access instruction. The
reason is again the same: The pointer SCEV is not flow sensitive but the
access function is. In r262612 we avoided consolidating alias groups
even though the pointer SCEV and the type were the same but the access
function was not. Here it is simpler as we can simply check all members
of an alias group against the given access instruction.
llvm-svn: 264274
When codegenerating invariant loads in some rare cases we cannot generate code
and bail out. This change ensures that we maintain a valid dominator tree
in these situations. This fixes llvm.org/PR26736
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 264142
This might be useful to evaluate the benefit of us handling modref funciton
calls. Also, a new bug that was triggered by modref function calls was
recently reported http://llvm.org/PR27035. To ensure the same issue does not
cause troubles for other people, we temporarily disable this until the bug
is resolved.
llvm-svn: 264140
ISL can conclude additional conditions on parameters from restrictions
on loop variables. Such conditions persist when leaving the loop and the
loop variable is projected out. This results in a narrower domain for
exiting the loop than entering it and is logically impossible for
non-infinite loops.
We fix this by not adding a lower bound i>=0 when constructing BB
domains, but defer it to when also the upper bound it computed, which
was done redundantly even before this patch.
This reduces the number of LNT fails with -polly-process-unprofitable
-polly-position=before-vectorizer from 8 to 6.
llvm-svn: 264118
We bail out if current scop has a complex control flow as this could lead to
building of large domain conditions. This is to reduce compile time. This
addresses r26382.
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D18362
llvm-svn: 264105
Affine branches are fully modeled and regenerated from the polyhedral domain and
consequently do not require any input conditions to be propagated.
llvm-svn: 263678
Index calculations can use the last value that come out of a loop.
Ideally, ScalarEvolution can compute that exit value directly without
depending on the loop induction variable, but not in all cases.
This changes isAffine to not consider such loop exit values as affine to
avoid that SCEVExpander adds uses of the original loop induction
variable.
This fix is analogous to r262404 that applies to general uses of loop
exit values instead of index expressions and loop bouds as in this
patch.
This reduces the number of LNT test-suite fails with
-polly-position=before-vectorizer -polly-unprofitable
from 10 to 8.
llvm-svn: 262665
The scope will be required in the following fix. This commit separates
the large changes that do not change behaviour from the small, but
functional change.
llvm-svn: 262664
Value merging is only necessary for scalars when they are used outside
of the scop. While an array's base pointer can be used after the scop,
it gets an extra ScopArrayInfo of type MK_Value. We used to generate
phi's for both of them, where one was assuming the reault of the other
phi would be the original value, because it has already been replaced by
the previous phi. This resulted in IR that the current IR verifier
allows, but is probably illegal.
This reduces the number of LNT test-suite fails with
-polly-position=before-vectorizer -polly-process-unprofitable
from 16 to 10.
Also see llvm.org/PR26718.
llvm-svn: 262629
This should fix PR19422.
Thanks to Jeremy Huddleston Sequoia for reporting this.
Thanks to Roman Gareev for his investigation and the reduced test case.
llvm-svn: 262612
The LNT test suite with -polly-process-unprofitable
-polly-position=before-vectorizer currenty fails 59 tests. With this
barrier added, only 16 keep failing. This is probably because Polly's
code generation currently does not correctly preserve all analyses it
promised to preserve. Temporarily add this barrier until further
investigation.
llvm-svn: 262488
Polly recognizes affine loops that ScalarEvolution does not, in
particular those with loop conditions that depend on hoisted invariant
loads. Check for SCEVAddRec dependencies on such loops and do not
consider their exit values as synthesizable because SCEVExpander would
generate them as expressions that depend on the original induction
variables. These are not available in generated code.
llvm-svn: 262404
In order to speed up compile time and to avoid random timeouts we now
separately track assumptions and restrictions. In this context
assumptions describe parameter valuations we need and restrictions
describe parameter valuations we do not allow. During AST generation
we create a runtime check for both, whereas the one for the
restrictions is negated before a conjunction is build.
Except the In-Bounds assumptions we currently only track restrictions.
Differential Revision: http://reviews.llvm.org/D17247
llvm-svn: 262328
removeCachedResults deletes the DetectionContext from
DetectionContextMap such that any it cannot be used anymore.
Unfortunately invalid<ReportUnprofitable> and RejectLogs.insert still do
use it. Because the memory is part of a map and not returned to to the
OS immediatly, such that the observable effect was only a memory leak
due to reference counters not decreased when the second call to
removeCachedResults does not remove the DetectionContext because because
it already has been removed.
Fix by not removing the DetectionContext prematurely. The second call to
removeCachedResults will handle it anyway.
llvm-svn: 262235
We move verifyInvariantLoads out of this function to allow for an early return
without the need for code duplication. A similar transformation was suggested
by Johannes Doerfert in post commit review of r262033.
llvm-svn: 262203
This debug output distracts from the -debug-only=polly-scops output. As it is
rather verbose and only really needed for debugging the domain construction
I drop this output. The domain construction is meanwhile stable enough to
not require regular debugging.
llvm-svn: 262117
The functions buildAccessMultiDimFixed and buildAccessMultiDimParam were
refactored from buildMemoryAccess. In their own functions, the control
flow can be shortcut and simplified using returns.
Suggested-by: etherzhhb
llvm-svn: 262029
This allows to construct run-time checks for a scop without having to generate
a full AST. This is currently not taken advantage of in Polly itself, but
external users may benefit from this feature.
llvm-svn: 262009
This commit updates to the latest isl development version. There is no specific
feature we need on the Polly side, but we want to ensure test coverage for the
latest isl changes.
llvm-svn: 262001
Check the ModRefBehaviour of functions in order to decide whether or
not a call instruction might be acceptable.
Differential Revision: http://reviews.llvm.org/D5227
llvm-svn: 261866
The generated dedicated subregion exit block was assumed to have the same
dominance relation as the original exit block. This is incorrect if the exit
block receives other edges than only from the subregion, which results in that
e.g. the subregion's entry block does not dominate the exit block.
llvm-svn: 261865
From now on we bail only if a non-trivial alias group contains a non-affine
access, not when we discover aliasing and non-affine accesses are allowed.
llvm-svn: 261863
Replace Scop::getStmtForBasicBlock and Scop::getStmtForRegionNode, and
add overloads for llvm::Instruction and llvm::RegionNode.
getStmtFor and overloads become the common interface to get the Stmt
that contains something. Named after LoopInfo::getLoopFor and
RegionInfo::getRegionFor.
llvm-svn: 261791
This is also be caught by the function verifier, but disconnected from
the place that produced it. Catch it already at creation to be able to
reason more directly about the cause.
llvm-svn: 261790
This allows other passes and transformations to use some of the existing AST
building infrastructure. This is not yet used in Polly itself.
llvm-svn: 261496
This patch adds support for memcpy, memset and memmove intrinsics. They are
represented as one (memset) or two (memcpy, memmove) memory accesses in the
polyhedral model. These accesses have an access range that describes the
summarized effect of the intrinsic, i.e.,
memset(&A[i], '$', N);
is represented as a write access from A[i] to A[i+N].
Differential Revision: http://reviews.llvm.org/D5226
llvm-svn: 261489
We now always print the reason why the code did not pass the LLVM verifier and
we also allow to disable verfication with -polly-codegen-verify=false. Before
this change the first assertion had generally no information why or what might
have gone wrong and it was also impossible to -view-cfg without recompile. This
change makes debugging bugs that result in incorrect IR a lot easier.
llvm-svn: 261320
To support non-aligned accesses we introduce a virtual element size
for arrays that divides each access function used for this array. The
adjustment of the access function based on the element size of the
array was therefore moved after this virtual element size was
determined, thus after all accesses have been created.
Differential Revision: http://reviews.llvm.org/D17246
llvm-svn: 261226
After we moved isl_ctx into Scop, we need to free the isl_ctx after
freeing all isl objects, which requires the ScopInfo pass to be freed
at last. But this is not guaranteed by the PassManager, and we need
extra code to free the isl_ctx at the right time.
We introduced a shared pointer to manage the isl_ctx, and distribute
it to all analyses that create isl objects. As such, whenever we free
an analyses with the shared_ptr (and also free the isl objects which
are created by the analyses), we decrease the (shared) reference
counter of the shared_ptr by 1. Whenever the reference counter reach
0 in the releaseMemory function of an analysis, that analysis will
be the last one that hold any isl objects, and we can safely free the
isl_ctx with that analysis.
Differential Revision: http://reviews.llvm.org/D17241
llvm-svn: 261100
First support for this feature was committed in r259784. Support for
loop invariant load hoisting with different types was added by
Johannes Doerfert in r260045 and r260886.
llvm-svn: 260965
A load can only be invariant if its base pointer is invariant too. To
this end, we check if the base pointer is defined inside the region or
outside. In the former case we recursively check if we can (and
therefore will) hoist the base pointer too. Only if that happends we
can hoist the load.
llvm-svn: 260886
This reverts commit 98efa006c96ac981c00d2e386ec1102bce9f549a.
The fix was broken since we do not use AA in the ScopDetection anymore to
check for invariant accesses.
llvm-svn: 260884
Eliminate the global variable "InsnToMemAcc" to make Scop/ScopInfo become
more protable, such that we can safely use them in a CallGraphSCC pass.
Differential Revision: http://reviews.llvm.org/D17238
llvm-svn: 260863
Before this patch it could happen that we did not hoist a load that
was a base pointer of another load even though AA already declared the
first one as invariant (during ScopDetection). If this case arises we
will now skipt the "can be overwriten" check because in this case the
over-approximating nature causes us to generate broken code.
llvm-svn: 260862
The former ScopArrayInfo::updateSizes was implicitly divided into an
updateElementType and an updateSizes. Now this partitioning is
explicit.
llvm-svn: 260860
So far we separated constant factors from multiplications, however,
only when they are at the outermost level of a parameter SCEV. Now,
we also separate constant factors from the parameter SCEV if the
outermost expression is a SCEVAddRecExpr. With the changes to the
SCEVAffinator we can now improve the extractConstantFactor(...)
function at will without worrying about any other code part. Thus,
if needed we can implement a more comprehensive
extractConstantFactor(...) function that will traverse the SCEV
instead of looking only at the outermost level.
Four test cases were affected. One did not change much and the other
three were simplified.
llvm-svn: 260859
This reverts commit https://llvm.org/svn/llvm-project/polly/trunk@260853
We unfortunately still have two bugs left which show only up with
-polly-process-unprofitable and which I forgot to test before committing.
llvm-svn: 260854
First support for this feature was committed in r259784. Support for
loop invariant load hoisting with different types was added by Johannes
Doerfert in r260045. This fixed the last known bug.
llvm-svn: 260853
Since the origin AccFuncMap in ScopInfo is used by the underlying Scop
only, and it must stay alive until we delete the Scop. It will be better
if we simply move the origin AccFuncMap in ScopInfo into the Scop class.
llvm-svn: 260820
Make Scop become more portable such that we can use it in a CallGraphSCC pass.
The first step is to drop the analyses that are only used during Scop construction.
This patch drop LoopInfo from Scop.
llvm-svn: 260819
Make Scop become more portable such that we can use it in a CallGraphSCC pass.
The first step is to drop the analyses that are only used during Scop construction.
This patch drop DominatorTree from Scop.
llvm-svn: 260818
Make Scop become more portable such that we can use it in a CallGraphSCC pass.
The first step is to drop the analyses that are only used during Scop construction.
This patch drop ScopDecection from Scop.
llvm-svn: 260817
We now distinguish invariant loads to the same memory location if they
have different types. This will cause us to pre-load an invariant
location once for each type that is used to access it. However, we can
thereby avoid invalid casting, especially if an array is accessed
though different typed/sized invariant loads.
This basically reverts the changes in r260023 but keeps the test
cases.
llvm-svn: 260045
We also disable this feature by default, as there are still some issues in
combination with invariant load hoisting that slipped through my initial
testing.
llvm-svn: 260025
Invariant load hoisting of memory accesses with non-canonical element
types lacks support for equivalence classes that contain elements of
different width/size. This support should be added, but to get our buildbots
back to green, we disable load hoisting for memory accesses with non-canonical
element size for now.
llvm-svn: 260023
Always use access-instruction pointer type to load the invariant values.
Otherwise mismatches between ScopArrayInfo element type and memory access
element type will result in invalid casts. These type mismatches are after
r259784 a lot more common and also arise with types of different size, which
have not been handled before.
Interestingly, this change actually simplifies the code, as we now have only
one code path that is always taken, rather then a standard code path for the
common case and a "fixup" code path that replaces the standard code path in
case of mismatching types.
llvm-svn: 260009
The previously implemented approach is to follow value definitions and
create write accesses ("push defs") while searching for uses. This
requires the same relatively validity- and requirement conditions to be
replicated at multiple locations (PHI instructions, other instructions,
uses by PHIs).
We replace this by iterating over the uses in a SCoP ("pull in
requirements"), and add writes only when at least one read has been
added. It turns out to be simpler code because each use is only iterated
over once and writes are added for the first access that reads it. We
need another iteration to identify escaping values (uses not in the
SCoP), which also makes the difference between such accesses more
obvious. As a side-effect, the order of scalar MemoryAccess can change.
Differential Revision: http://reviews.llvm.org/D15706
llvm-svn: 259987
This allows code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To model such code we use as canonical element type of the modeled array the
smallest element type of all original array accesses, if type allocation sizes
are multiples of each other. Otherwise, we use a newly created iN type, where N
is the gcd of the allocation size of the types used in the accesses to this
array. Accesses with types larger as the canonical element type are modeled as
multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support code-generating these memory accesses, we introduce a new method
getAccessAddressFunction that assigns each statement instance a single memory
location, the address we load from/store to. Currently we obtain this address by
taking the lexmin of the access function. We may consider keeping track of the
memory location more explicitly in the future.
We currently do _not_ handle multi-dimensional arrays and also keep the
restriction of not supporting accesses where the offset expression is not a
multiple of the access element type size. This patch adds tests that ensure
we correctly invalidate a scop in case these accesses are found. Both types of
accesses can be handled using the very same model, but are left to be added in
the future.
We also move the initialization of the scop-context into the constructor to
ensure it is already available when invalidating the scop.
Finally, we add this as a new item to the 2.9 release notes
Reviewers: jdoerfert, Meinersbur
Differential Revision: http://reviews.llvm.org/D16878
llvm-svn: 259784
This includes some (optional) improvements to the isl scheduler, which we do not
use yet, as well as a fix for a bug previously also affecting Polly:
commit 662ee9b7d45ebeb7629b239d3ed43442e25bf87c
Author: Sven Verdoolaege <skimo@kotnet.org>
Date: Mon Jan 25 16:59:32 2016 +0100
isl_basic_map_realign: perform Gaussian elimination on result
Many parts of isl assume that Gaussian elimination has been
applied to the equality constraints. In particular singleton_extract_point
makes this assumption. The input to singleton_extract_point
may have undergone parameter alignment. This parameter alignment
(ultimately performed by isl_basic_map_realign) therefore
needs to make sure the result preserves this property
llvm-svn: 259757
We support now code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To support such code we use as element type of the modeled array the smallest
element type of all original array accesses. Accesses with larger types are
modeled as multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support jscop-rewritable memory accesses we need each statement instance to
only be assigned a single memory location, which will be the address at which
we load the value. Currently we obtain this address by taking the lexmin of
the access function. We may consider keeping track of the memory location more
explicitly in the future.
llvm-svn: 259587
We create separate functions for fixed-size multi-dimensional, parameteric-sized
multi-dimensional, as well as single-dimensional memory accesses to reduce the
complexity of a large monolithic function.
Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 259522
There is no need to pass the size of the elements as the last size dimension
to ScopArrayInfo. This information is already available through the ElementType.
Tracking it twice is not only redundant but may result in inconsistencies.
llvm-svn: 259521
For schedule generation we assumed that the reverse post order traversal used by
the domain generation is sufficient, however it is not. Once a loop is
discovered, we have to completely traverse it, before we can generate the
schedule for any block/region that is only reachable through a loop exiting
block.
To this end, we add a "loop stack" that will keep track of loops we
discovered during the traversal but have not yet traversed completely.
We will never visit a basic block (or region) outside the most recent
(thus smallest) loop in the loop stack but instead queue such blocks
(or regions) in a waiting list. If the waiting list is not empty and
(might) contain blocks from the most recent loop in the loop stack the
next block/region to visit is drawn from there, otherwise from the
reverse post order iterator.
We exploit the new property of loops being always completed before additional
loops are processed, by removing the LoopSchedules map and instead keep all
information in LoopStack. This clarifies that we indeed always only keep a
stack of in-process loops, but will never keep incomplete schedules for an
arbitrary set of loops. As a result, we can simplify some of the existing code.
This patch also adds some more documentation about how our schedule construction
works.
This fixes http://llvm.org/PR25879
This patch is an modified version of Johannes Doerfert's initial fix.
Differential Revision: http://reviews.llvm.org/D15679
llvm-svn: 259354
In https://llvm.org/svn/llvm-project/polly/trunk@251870 code was committed to
avoid a failure in the presence of infinite loops, but the test case committed
along with this change passes without the actual change. I looked back into the
code and also checked with the original committer (Johannes), but could not find
the reason why the code is needed. The introduction of LoopStacks for
buildSchedule in one of the next commits will make it even more clear that this
code is not needed, but I remove this ahead of time to facilitate bisecting in
case I missed something.
llvm-svn: 259347
darwin requires the additional linkages of...
LLVMBitReader
LLVMMCParser
LLVMObject
LLVMProfileData
LLVMTarget
LLVMVectorize
as the darwin requires all of the weak undefined symbols in a library to be
resolved when linking it against an executable (unless
-Wl,-undefined,dynamic_lookup is used to override the default behavior of
-Wl,-undefined,error).
Contributed-by: Jack Howarth
llvm-svn: 259332
The autotools build system is based on and requires LLVM's autotools
build system to work, which has been depricated and finally removed in
r258861. Consequently we also remove the autotools build system from
Polly.
Differential Revision: http://reviews.llvm.org/D16655
llvm-svn: 259041
Before adding a MK_Value READ MemoryAccess, check whether the read is
necessary or synthesizable. Synthesizable values are later generated by
the SCEVExpander and therefore do not need to be transferred
explicitly. This can happen because the check for synthesizability has
presumbly been forgotten in the case where a phi's incoming value has
been defined in a different statement.
Differential Revision: http://reviews.llvm.org/D15687
llvm-svn: 258998
MemAccInst wraps the common members of LoadInst and StoreInst. Also use
of this class in:
- ScopInfo::buildMemoryAccess
- BlockGenerator::generateLocationAccessed
- ScopInfo::addArrayAccess
- Scop::buildAliasGroups
- Replace every use of polly::getPointerOperand
Reviewers: jdoerfert, grosser
Differential Revision: http://reviews.llvm.org/D16530
llvm-svn: 258947
Ensure that there is at most one phi write access per PHINode and
ScopStmt. In particular, this would be possible for non-affine
subregions with multiple exiting blocks. We replace multiple MAY_WRITE
accesses by one MUST_WRITE access. The written value is constructed
using a PHINode of all exiting blocks. The interpretation of the PHI
WRITE's "accessed value" changed from the incoming value to the PHI like
for PHI READs since there is no unique incoming value.
Because region simplification shuffles around PHI nodes -- particularly
with exit node PHIs -- the PHINodes at analysis time does not always
exist anymore in the code generation pass. We instead remember the
incoming block/value pair in the MemoryAccess.
Differential Revision: http://reviews.llvm.org/D15681
llvm-svn: 258809
Keep at most one value read MemoryAccess per value and statement;
multiple generated loads do not have any additional effect. As one such
MemoryAccess can cater multiple uses within the statement, the
AccessInstruction property is not unique any more and set to nullptr.
Differential Revision: http://reviews.llvm.org/D15510
llvm-svn: 258808
Ensure there is at most one write access per definition of an
llvm::Value. Keep track of already created value write access by using
a (dense) map.
Replace addValueWriteAccess by ensureValueStore which can be uses more
liberally without worrying to add redundant accesses. It will be used,
e.g. in a logical correspondant for value reads -- ensureValueReload --
to ensure that the expected definition has been written when loading it.
Differential Revision: http://reviews.llvm.org/D15483
llvm-svn: 258807
Both functions implement the same functionality, with the difference that
getNewScalarValue assumes that globals and out-of-scop scalars can be directly
reused without loading them from their corresponding stack slot. This is correct
for sequential code generation, but causes issues with outlining code e.g. for
OpenMP code generation. getNewValue handles such cases correctly.
Hence, we can replace getNewScalarValue with getNewValue. This is not only more
future proof, but also eliminates a bunch of code.
The only functionality that was available in getNewScalarValue that is lost
is the on-demand creation of scalar values. However, this is not necessary any
more as scalars are always loaded at the beginning of each basic block and will
consequently always be available when scalar stores are generated. As this was
not the case in older versions of Polly, it seems the on-demand loading is just
some older code that has not yet been removed.
Finally, generateScalarLoads also generated loads for values that are loop
invariant, available in GlobalMap and which are preferred over the ones loaded
in generateScalarLoads. Hence, we can just skip the code generation of such
scalar values, avoiding the generation of dead code.
Differential Revision: http://reviews.llvm.org/D16522
llvm-svn: 258799
Polly currently does not support irreducible control and it is probably not
worth supporting. This patch adds code that checks for irreducible control
and refuses regions containing irreducible control.
Polly traditionally had rather restrictive checks on the control flow structure
which would have refused irregular control, but within the last couple of months
most of the control flow restrictions have been removed. As part of this
generalization we accidentally allowed irregular control flow.
Contributed-by: Karthik Senthil and Ajith Pandel
llvm-svn: 258497
In Polly, after hoisting loop invariant loads outside loop, the alignment
information for hoisted loads are missing, this patch restore them.
Contributed-by: Lawrence Hu <lawrence@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16160
llvm-svn: 258105
When importing a schedule, do not verify the load/store alignment of
scalar accesses. Scalar loads/store are always created newly in code
generation with no alignment restrictions. Previously, scalar alignment
was checked if the access instruction happened to be a LoadInst or
StoreInst, but only its array (MK_Array) access is relevant.
This will be implicitly unit-tested when the access instruction of a
value read can be nullptr.
Differential Revision: http://reviews.llvm.org/D15680
llvm-svn: 257904
should perform loop interchanges itself.
This also fixes a bug we see due to the "loop-interchange" pass producing
incorrect IR when compiling linpack-pc.c from the LLVM test-suite with
"-polly-position=before-vectorizer".
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 257495
Call assumeNoOutOfBound only in updateDimensionality to process situations
when new dimensions are added and new bounds checks are required.
Contributed-by: Tobias Grosser, Gareev Roman
llvm-svn: 257170
This change clarifies that for Not-NonAffine-SubRegions we actually iterate over
the subnodes and for both NonAffine-SubRegions and BasicBlocks, we perform the
schedule construction. As a result, the tree traversal becomes trivial, the
special case for a scop consisting just of a single non-affine region
disappears and the indentation of the code is reduced.
No functional change intended.
llvm-svn: 256940
At code generation, scalar reads are generated before the other
statement's instructions, respectively scalar writes after them, in
contrast to array accesses which are "executed" with the instructions
they are linked to. Therefore it makes sense to not map the scalar
accesses to a place of execution. Follow-up patches will also remove
some of the directs links from a scalar access to a single instruction,
such that only having array accesses in InstructionToAccess ensures
consistency.
Differential Revision: http://reviews.llvm.org/D13676
llvm-svn: 256298
We clarify that certain code is only executed if LSchedule is != nullptr.
Previously some of these functions have been executed, but they only passed
a nullptr through. This caused some confusion when reading the code.
llvm-svn: 256209
Besides improving the documentation and the code we now assert in case the input
is invalid (N < 0) and also do not any more return a nullptr in case USet is
empty. This should make the code more readable.
llvm-svn: 256208
If a loop has a sufficiently large amount of compute instruction in its loop
body, it is unlikely that our rewrite of the loop iterators introduces large
performance changes. As Polly can also apply beneficical optimizations (such
as parallelization) to such loop nests, we mark them as profitable.
This option is currently "disabled" by default, but can be used to run
experiments. If enabled by setting it e.g. to 40 instructions, we currently
see some compile-time increases on LNT without any significant run-time
changes.
llvm-svn: 256199
.. and add some documentation. We also simplify the code by dropping an early
check that is also covered by the the later checks. This might have a small
compile time impact, but as the scops that are skipped are small we should
probably only add this back in the unlikely case that this has a notable
compile-time cost.
No functional change intended.
llvm-svn: 256149
As we already log an error when calling invalid, scops unprofitable scops are in
any case marked invalid, but returning immediately safes (a tiny bit of) compile
time and is consistent with our use of 'invalid' in the remainder of the file.
Found by inspection.
llvm-svn: 256140
Without this return we still log the incorrect array size (and do not detect
this scop), but we would unnecessarily continue to verify that access functions
are affine. As we do not need to do this, we can return right ahead and
consequently safe compile time.
This issue was found by inspection.
llvm-svn: 256139
Instead of counting all array memory accesses associated with a load
instruction, we now explicitly check that the single array access that could
(potentially) be associated with a load instruction does not exist. This helps
to document the current behavior of Polly where load instructions can indeed
have at most one associated array access. In the unlikely case this changes
in the future, we add an assert for the case where two load accesses would
prevent us to return a single memory access, but we still should communicate
that not all array memory accesses have been removed.
This addresses post-commit comments from Johannes Doerfert for commit 255776.
llvm-svn: 256136
Scops that contain many complex branches are likely to result in complex domain
conditions that consist of a large (> 100) number of conjucts. Transforming
such domains is expensive and unlikely to result in efficient code. To avoid
long compile times we detect this case and skip such scops. In the future we may
improve this by either using non-affine subregions to hide such complex
condition structures or by exploiting in certain cases properties (e.g.,
dominance) that allow us to construct the domains of a scop in a way that
results in a smaller number improving conjuncts.
Example of a code that results in complex iteration spaces:
loop.header
/ | \ \
A0 A2 A4 \
\ / \ / \
A1 A3 \
/ \ / \ |
B0 B2 B4 |
\ / \ / |
B1 B3 ^
/ \ / \ |
C0 C2 C4 |
\ / \ / /
C1 C3 /
\ / /
loop backedge
llvm-svn: 256123
The patch fixes Bug 25759 produced by inappropriate handling of unsigned
maximum SCEV expressions by SCEVRemoveMax. Without a fix, we get an infinite
loop and a segmentation fault, if we try to process, for example,
'((-1 + (-1 * %b1)) umax {(-1 + (-1 * %yStart)),+,-1}<%.preheader>)'.
It also fixes a potential issue related to signed maximum SCEV expressions.
Tested-by: Roman Gareev <gareevroman@gmail.com>
Fixed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D15563
llvm-svn: 255922
When running 'clang -O3 -mllvm -polly -mllvm -polly-show' we now only show the
CFGs of functions with at least one detected scop. For larger files/projects
this reduces the number of graphs printed significantly and is likely what
developers want to see. The new option -polly-view-all enforces all graphs to be
printed and the exiting option -poll-view-only limites the graph printing to
functions that match a certain pattern.
This patch requires https://llvm.org/svn/llvm-project/llvm/trunk@255889 (and
vice versa) to compile correctly.
llvm-svn: 255891
Load instructions may possibly be related to multiple memory accesses, but we
are only interested in the array read access that describes the memory location
the load instructions loads from. By using getArrayAccessfor we ensure to always
obtain the right memory access.
This issue was found by inspection without having a failing test case.
llvm-svn: 255716
getAccessFor does not guarantee a certain access to be returned in case an
instruction is related to multiple accesses. However, in the vector code
generation we want to know the stride of the array access of a store
instruction. By using getArrayAccessFor we ensure we always get the correct
memory access.
This patch fixes a potential bug, but I was unable to produce a failing test
case. Several existing test cases cover this code, but all of them already
passed out of luck (or the specific but not-guaranteed order in which we build
memory accesses).
llvm-svn: 255715
When generating scalar loads/stores separately the vector code has not been
updated. This commit adds code to generate scalar loads for vector code as well
as code to assert in case scalar stores are encountered within a vector loop.
llvm-svn: 255714
When rewriting the access functions of load/store statements, we are only
interested in the actual array memory location. The current code just took
the very first memory access, which could be a scalar or an array access. As
a result, we failed to update access functions even though this was requested
via .jscop.
llvm-svn: 255713
This change should not change the behavior of Polly today, but it allows
external constants to be remapped e.g. when targetting multiple LLVM modules.
llvm-svn: 255506
This reverts commit r255471.
Johannes raised in the post-commit review of r255471 the concern that PHI
writes in non-affine regions with two exiting blocks are not really MUST_WRITE,
but we just know that at least one out of the set of all possible PHI writes
will be executed. Modeling all PHI nodes as MUST_WRITEs is probably save, but
adding the needed documentation for such a special case is probably not worth
the effort. Michael will be proposing a new patch that ensures only a single
PHI_WRITE is created for non-affine regions, which - besides other benefits -
should also allow us to use a single well-defined MUST_WRITE for such PHI
writes.
(This is not a full revert, but the condition and documentation have been
slightly extended)
llvm-svn: 255503
Before this commit, only the region's entry block was assumed to always
execute in a non-affine subregion. We replace this by a test whether it
dominates the exit block (this necessarily includes the entry block)
which should be more accurate.
llvm-svn: 255473
LLVM's IR guarantees that a value definition occurs before any use, and
also the value of a PHI must be one of the incoming values, "written"
in one of the incoming blocks. Hence, such writes are never conditional
in the context of a non-affine subregion.
llvm-svn: 255471
Over time different vocabulary has been introduced to describe the different
memory objects in Polly, resulting in different - often inconsistent - naming
schemes in different parts of Polly. We now standartize this to the following
scheme:
KindArray, KindValue, KindPHI, KindExitPHI
| ------- isScalar -----------|
In most cases this naming scheme has already been used previously (this
minimizes changes and ensures we remain consistent with previous publications).
The main change is that we remove KindScalar to clearify the difference between
a scalar as a memory object of kind Value, PHI or ExitPHI and a value (former
KindScalar) which is a memory object modeling a llvm::Value.
We also move all documentation to the Kind* enum in the ScopArrayInfo class,
remove the second enum in the MemoryAccess class and update documentation to be
formulated from the perspective of the memory object, rather than the memory
access. The terms "Implicit"/"Explicit", formerly used to describe memory
accesses, have been dropped. From the perspective of memory accesses they
described the different memory kinds well - especially from the perspective of
code generation - but just from the perspective of a memory object it seems more
straightforward to talk about scalars and arrays, rather than explicit and
implicit arrays. The last comment is clearly subjective, though. A less
subjective reason to go for these terms is the historic use both in mailing list
discussions and publications.
llvm-svn: 255467
Use it to print "null" if a MemoryAccess's access relation is not
available instead of printing nothing.
Suggested-by: Johannes Doerfert
llvm-svn: 255466
Introduce a function getStmtForRegionNode() to the corresponding
ScopStmt of a RegionNode. We can use it to call the existing
ScopStmt::isEmpty() function instead of searching for accesses.
llvm-svn: 255465
When introducing separate control flow for the original and optimized code we
introduce now a special 'ExitingBlock':
\ /
EnteringBB
|
SplitBlock---------\
_____|_____ |
/ EntryBB \ StartBlock
| (region) | |
\_ExitingBB_/ ExitingBlock
| |
MergeBlock---------/
|
ExitBB
/ \
This 'ExitingBlock' contains code such as the final_reloads for scalars, which
previously were just added to whichever statement/loop_exit/branch-merge block
had been generated last. Having an explicit basic block makes it easier to
find these constructs when looking at the CFG.
llvm-svn: 255107
This update brings in improvements to isl's 'isolate' option that reduce the
number of code versions generated. This results in both code-size and compile
time reduction for outer loop vectorization.
Thanks to Roman Garev and Sven Verdoolaege for working on this improvement.
llvm-svn: 254706
The motivation is to fix a compilation error with Visual Studio 2013.
See http://reviews.llvm.org/D14886.
Thanks to Sumanth Gundapaneni for finding the issue and suggesting a
patch.
llvm-svn: 254498
The script will checkout the most recent master from
http://repo.or.cz/isl.git into /tmp, create a distribution tarball, and
extract it as replacement of lib/External/isl. After that it can be
committed to the Polly repository.
llvm-svn: 254497
Acc==MA implies Acc->getAccessInstruction() == MA->getAccessInstruction().
Suggested as post-commit review for 254305 by Michael Kruse.
llvm-svn: 254327
The use of C++'s high-level iterator functionality instead of two while loops
and explicit iterator handling improves readability of this code.
Proposed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: http://reviews.llvm.org/D15068
llvm-svn: 254305
Re-run canonicalization passes after Polly's code generation.
The set of passes currently added here are nearly all the passes between
--polly-position=early and --polly-position=before-vectorizer, i.e. all
passes that would usually run after Polly.
In order to run these only if Polly actually modified the code, we add a
function attribute "polly-optimzed" to a function that contains
generated code. The cleanup pass is skipped if the function does not
have this attribute.
There is no support by the (legacy) PassManager to run passes only under
some conditions. One could have wrapped all transformation passes to run
only when CodeGeneration changed the code, but the analyses would run
anyway. This patch creates an independent pass manager. The
disadvantages are that all analyses have to re-run even if preserved and
it does not honor compiler switches like the PassManagerBuilder does.
Differential Revision: http://reviews.llvm.org/D14333
llvm-svn: 254150
Previously, accesses that originate from PHI nodes in the exit block
were registered as SCALAR. In some context they are treated as scalars,
but it makes a difference in others. We used to check whether the
AccessInstruction is a terminator to differentiate the cases.
This patch introduces an MemoryAccess origin EXIT_PHI and a
ScopArrayInfo kind KIND_EXIT_PHI to make this case more explicit. No
behavioural change intended.
Differential Revision: http://reviews.llvm.org/D14688
llvm-svn: 254149
gfortran (and fortran in general?) does not compute the address of an array
element directly from the array sizes (e.g., %s0, %s1), but takes first the
maximum of the sizes and 0 (e.g., max(0, %s0)) before multiplying the resulting
value with the per-dimension array subscript expressions. To successfully
delinearize index expressions as we see them in fortran, we first filter 'smax'
expressions out of the SCEV expression, use them to guess array size parameters
and only then continue with the existing delinearization.
llvm-svn: 253995
Trying to build up access functions for any of these blocks is likely to fail,
as error blocks may contain invalid/non-representable instructions, and blocks
dominated by error blocks may reference such instructions, which wil also cause
failures. As all of these blocks are anyhow assumed to not be executed, we can
just remove them early on.
This fixes http://llvm.org/PR25596
llvm-svn: 253818
The most interesting change for Polly in this isl update is 4d5654af which
in certain cases can speed up the construction of run-time checks from an isl
set consisting of several disjuncts significantly.
llvm-svn: 253794
At some point we enforced lcssa for the loop surrounding the entry block.
This is not only questionable as it does not check any other loop but also
not needed any more.
llvm-svn: 253789
In case the original parameter instruction does not have a name, but it comes
from a load instruction where the base pointer has a name we used the name of
the load instruction to give some more intuition of where the parameter came
from. To ensure this works also through GEPs which may have complex offsets,
we originally just dropped the offsets and _only_ used the base pointer name.
As this can result in multiple parameters to get the same name, we now prefix
the parameter ID to ensure parameter names are unique. This will make it easier
to understand debug output.
This change does not affect correctness, as parameter IDs (even of the same
name) can always be distinguished through the SCEV pointer stored inside them.
llvm-svn: 253330
Without this change we may start to refuse scops in larger compilation units
just because a lot of code has already been compiled earlier.
Found by inspection. I do not yet have a good test case for this.
llvm-svn: 253050
Only when we check for wrapping we want to use the store size, for all
other cases we use the alloc size now.
Suggested by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 252941