Such PHI nodes can not only appear in the ExitBlock of the Scop, but indeed
any scalar PHI node above the scop and used in the scop is modeled as scalar
read access.
llvm-svn: 251198
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809
New values were always synthesized in the block of the instruction
that needed them. This is incorrect for PHI node whose' value must be
defined in the respective incoming block. This patch temporarily moves
the builder's insert point to the incoming block while synthesizing phi
node arguments.
This fixes PR25241 (http://llvm.org/bugs/show_bug.cgi?id=25241)
llvm-svn: 250693
There are several different kinds of constants that could occur in a
branch condition, however we can only handle the most interesting one
namely constant integers. To this end we have to treat others as
non-affine.
This fixes bug 25244.
llvm-svn: 250669
We build the schedule based on a traversal of the region and accumulate
information for each loop in it. The total schedule is associated with the
loop surrounding the SCoP, though it can happen that there are blocks in the
SCoP which are part of loops that are only partially in the SCoP. Instead of
associating information with them (they are not part of the SCoP and
consequently are not modeled) we have to associate the schedule information
with the surrounding loop if any.
This fixes bug 25240.
llvm-svn: 250668
Accesses that have a relative offset (in bytes) that is not divisible
by the type size (in bytes) will be represented as empty in the SCoP
description. This is on its own not good but it also crashed the
invariant load hoisting. This patch will fix the latter problem while
the former should be addressed too.
This fixes bug 25236.
llvm-svn: 250664
If the base pointer of a load is invariant and defined in the SCoP but
not loaded we cannot hoist the load as we would not hoist the base
pointer definition.
This fixes bug 25237.
llvm-svn: 250663
Sorting is replaced by a demand driven code generation that will pre-load a
value when it is needed or, if it was not needed before, at some point
determined by the order of invariant accesses in the program. Only in very
little cases this demand driven pre-loading will kick in, though it will
prevent us from generating faulty code. An example where it is needed is
shown in:
test/ScopInfo/invariant_loads_complicated_dependences.ll
Invariant loads that appear in parameters but are not on the top-level (e.g.,
the parameter is not a SCEVUnknown) will now be treated correctly.
Differential Revision: http://reviews.llvm.org/D13831
llvm-svn: 250655
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
Expressing this in terms of BlockGenerator::getOrCreateAlloca(const
ScopArrayInfo *Array) does not work as the MemoryAccess BasePtr is in case of
invariant load hoisting different to the ScopArrayInfo BasePtr. Until this is
investigated and fixed, we move back to code that just uses the baseptr of
MemoryAccess.
llvm-svn: 250637
Instead of generating implicit loads within basic blocks, put them
before the instructions of the statment itself, including non-affine
subregions. The region's entry node is dominating all blocks in the
region and therefore the loaded value will be available there.
Implicit writes in block-stmts were already stored back at the end of
the block. Now, also generate the stores of non-affine subregions when
leaving the statement, i.e. in the exiting block.
This change is required for array-mapped implicits ("De-LICM") to
ensure that there are no dependencies of demoted scalars within
statments. Statement load all required values, operator on copied in
registers, and then write back the changed value to the demoted memory.
Lifetimes analysis within statements becomes unecessary.
Differential Revision: http://reviews.llvm.org/D13487
llvm-svn: 250625
Accesses for exit node phis will be handled separately by
buildPHIAccesses if there is more than one exiting edge,
buildScalarDependences does not need to create additional SCALAR
accesses.
This is a corrected version of r250517, which was reverted in r250607.
Differential Revision: http://reviews.llvm.org/D13848
llvm-svn: 250622
In r250408 'CHECK-NEXT: br' lines were removed as they also matched a
'%polly.subregion.iv.inc' instruction and did consequently not check what they
were supposed to check. However, without these lines we can not test that the
.s2a instructions that are not any more generated since r250411 really are not
emitted. Hence, we add back the CHECK-NEXT lines to ensure there are really no
instructions generated between the store that we check for and the branch at the
end of the basic block. To ensure we do not match too early, we now check for
'br i1' or 'br label'.
llvm-svn: 250435
When pulling a llvm::Value to be written as a PHI write, the former
code did only check whether it is within the same basic block, but it
could also be the same non-affine subregion. In that case some
unecessary pair of MemoryAccesses would have been created.
Two unit test were explicitely checking for the unecessary writes,
including the comments that the writes are unecessary.
llvm-svn: 250411
They happen to match
%polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1
^^ ^^
that is, are misleading in what they actually check.
llvm-svn: 250408
When sharing the same map from old to new value, CodeGeneration would
reuse the same new value for each basic block. However, the SCEV
expander might emit code in a basic block that does not dominate a use
of the SCEV in another basic block. This test checks whether both such
blocks have their own expanded new values.
llvm-svn: 250389
We harden one test case by ensuring no additional stores may possibly be
introduced between the stores we check for and the basic block terminator
statements.
We also add a test case for the situation where a value that is passed from
a non-affine region to a PHI node does not dominate the exit of the non-affine
region. This case has come up in patch reviews, so we make sure it is properly
handled today and in the future.
llvm-svn: 250217
We also allow such products for cases where 'Parameter' is loaded within the
scop, but where we can dynamically verify that the value of 'Parameter' remains
unchanged during the execution of the scop.
This change relies on Polly's new RequiredILS tracking infrastructure recently
contributed by Johannes.
llvm-svn: 250019
The domain generation can handle lazy && and || by default but eager
evaluated expressions were dismissed as non-affine. With this patch we
will allow arbitrary combinations of and/or bit-operations in the
conditions of branches.
Differential Revision: http://reviews.llvm.org/D13624
llvm-svn: 249971
If a (assumed) invariant location is loaded multiple times we
generated a parameter for each location. However, this caused compile
time problems for several benchmarks (e.g., 445_gobmk in SPEC2006 and
BT in the NAS benchmarks). Additionally, the code we generate is
suboptimal as we preload the same location multiple times and perform
the same checks on all the parameters that refere to the same value.
With this patch we consolidate the invariant loads in three steps:
1) During SCoP initialization required invariant loads are put in
equivalence classes based on their pointer operand. One
representing load is used to generate a parameter for the whole
class, thus we never generate multiple parameters for the same
location.
2) During the SCoP simplification we remove invariant memory
accesses that are in the same equivalence class. While doing so
we build the union of all execution domains as it is only
important that the location is at least accessed once.
3) During code generation we only preload one element of each
equivalence class with the unified execution domain. All others
are mapped to that preloaded value.
Differential Revision: http://reviews.llvm.org/D13338
llvm-svn: 249853
Drop an unused flag polly-allow-non-scev-backedge-taken-count and also
its occurrences from the tests.
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D13400
llvm-svn: 249675
This replaces the support for user defined error functions by a
heuristic that tries to determine if a call to a non-pure function
should be considered "an error". If so the block is assumed not to be
executed at runtime. While treating all non-pure function calls as
errors will allow a lot more regions to be analyzed, it will also
cause us to dismiss a lot again due to an infeasible runtime context.
This patch tries to limit that effect. A non-pure function call is
considered an error if it is executed only in conditionally with
regards to a cheap but simple heuristic.
llvm-svn: 249611
This patch allows invariant loads to be used in the SCoP description,
e.g., as loop bounds, conditions or in memory access functions.
First we collect "required invariant loads" during SCoP detection that
would otherwise make an expression we care about non-affine. To this
end a new level of abstraction was introduced before
SCEVValidator::isAffineExpr() namely ScopDetection::isAffine() and
ScopDetection::onlyValidRequiredInvariantLoads(). Here we can decide
if we want a load inside the region to be optimistically assumed
invariant or not. If we do, it will be marked as required and in the
SCoP generation we bail if it is actually not invariant. If we don't
it will be a non-affine expression as before. At the moment we
optimistically assume all "hoistable" (namely non-loop-carried) loads
to be invariant. This causes us to expand some SCoPs and dismiss them
later but it also allows us to detect a lot we would dismiss directly
if we would ask e.g., AliasAnalysis::canBasicBlockModify(). We also
allow potential aliases between optimistically assumed invariant loads
and other pointers as our runtime alias checks are sound in case the
loads are actually invariant. Together with the invariant checks this
combination allows to handle a lot more than LICM can.
The code generation of the invariant loads had to be extended as we
can now have dependences between parameters and invariant (hoisted)
loads as well as the other way around, e.g.,
test/Isl/CodeGen/invariant_load_parameters_cyclic_dependence.ll
First, it is important to note that we cannot have real cycles but
only dependences from a hoisted load to a parameter and from another
parameter to that hoisted load (and so on). To handle such cases we
materialize llvm::Values for parameters that are referred by a hoisted
load on demand and then materialize the remaining parameters. Second,
there are new kinds of dependences between hoisted loads caused by the
constraints on their execution. If a hoisted load is conditionally
executed it might depend on the value of another hoisted load. To deal
with such situations we sort them already in the ScopInfo such that
they can be generated in the order they are listed in the
Scop::InvariantAccesses list (see compareInvariantAccesses). The
dependences between hoisted loads caused by indirect accesses are
handled the same way as before.
llvm-svn: 249607
This single option replaces -polly-detect-unprofitable and -polly-no-early-exit
and is supposed to be the only option that disables compile-time heuristics that
aim to bail out early on scops that are believed to not benefit from Polly
optimizations.
Suggested-by: Johannes Doerfert
llvm-svn: 249426
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422
Polly's profitability heuristic saves compile time by skipping trivial scops or
scops were we know no good optimization can be applied. For almost all our tests
this heuristic makes little sense as we aim for minimal test cases when testing
functionality. Hence, in almost all cases this heuristic is better be disabled.
In preparation of disabling Polly's compile time heuristic by default in the
test suite we first explicitly enable it in the couple of test cases that really
use it (or run with/without heuristic side-by-side).
llvm-svn: 249418
This test case was XFAILed under the assumption Polly is unable to detect the
scop. However, disabling Polly's profitability heuristics is sufficient to
detect this scop.
llvm-svn: 249414
A statement with an empty domain complicates the invariant load
hoisting and does not help any subsequent analysis or transformation.
In fact it might introduce parameter dimensions or increase the
schedule dimensionality. To this end, we remove statements with an
empty domain early in the SCoP simplification.
llvm-svn: 249276
Before isValidCFG() could hide the fact that a loop is non-affine by
over-approximation. This is problematic if a subregion of the loop contains
an exit/latch block and is over-approximated. Now we do not over-approximate
in the isValidCFG function if we check loop control. If such control is
non-affine the whole loop is over-approximated, not only a subregion.
llvm-svn: 249273
We have to skip accesses in non-affine subregions during hoisting as
they might not be executed under the same condition as the entry of
the non-affine subregion.
llvm-svn: 249139
This moves the construction of ScopStmt to the beginning of the
ScopInfo pass. The late creation was a result of the earlier separation
of ScopInfo and TempScopInfo. This will avoid introducing more
ScopStmt-like maps in future commits. The AccFuncMap will also be
removed in some future commit. DomainMap might also be included into
ScopStmt.
The order in which ScopStmt are created changes and initially creates
empty statements that are removed in a simplification.
Differential Revision: http://reviews.llvm.org/D13341
llvm-svn: 249132
If a value is globally mapped (IslNodeBuilder::ValueMap) and
referenced in the code that will be put into a subfunction, we hand
down the new value to the subfunction.
This patch also removes code that handed down all invariant loads to
the subfunction. Instead, only needed invariant loads are given to the
subfunction. There are two possible reasons for an invariant load to
be handed down:
1) The invariant load is used in a block that is placed in the
subfunction but which is not the parent of the load. In this
case, the scalar access that will read the loaded value, will
cause its base pointer (the preloaded value) to be handed down to
the subfunction.
2) The invariant load is defined and used in a block that is placed
in the subfunction. With this patch we will hand down the
preloaded value to the subfunction as the invariant load is
globally mapped to that value.
llvm-svn: 249126
When error blocks are not terminated by an unreachable they have successors
that might only be reachable via error blocks. Additionally, branches in
error blocks are not checked during SCoP detection, thus we might not be able
to handle them. With this patch we do not try to model error block exit
conditions. Anything that is only reachable via error blocks is ignored too,
as it will not be executed in the optimized version of the SCoP anyway.
llvm-svn: 249099
The user can provide function names with
-polly-error-functions=name1,name2,name3
that will be treated as error functions. Any call to them is assumed
not to be executed.
This feature is mainly for developers to play around with the new
"error block" feature.
llvm-svn: 249098
Instructions which we can synthesis from a SCEV expression are not
generated directly, but only when they are used as an operand of
another instruction. This avoids generating unnecessary instructions
and works more reliably than first inserting them and then deleting
them later on.
This commit was reverted in r248860 due to a remaining miscompile, where
we forgot to synthesis the operand values that were referenced from scalar
writes. test/Isl/CodeGen/scalar-store-from-same-bb.ll tests that we do this
now correctly.
llvm-svn: 248900
Before we unconditinoally forced all users outside the SCoP to use
the preloaded value. However, if the SCoP is not executed due to the
runtime checks, we need to use the original value because it might not
be invariant in the first place.
llvm-svn: 248881
As a first step in the direction of assumed invariant loads (loads
that are not written in some context) we now detect and hoist
definitively invariant loads. These invariant loads will be preloaded
in the code generation and used in the optimized version of the SCoP.
If the load is only conditionally executed the preloaded version will
also only be executed under the same condition, hence we will never
access memory that wouldn't have been accessed otherwise. This is also
the most distinguishing feature to licm.
As hoisting can make statements empty we will simplify the SCoP and
remove empty statements that would otherwise cause artifacts in the
code generation.
Differential Revision: http://reviews.llvm.org/D13194
llvm-svn: 248861
This reverts commit 07830c18d789ee72812d5b5b9b4f8ce72ebd4207.
The commit broke at least one test in lnt,
MultiSource/Benchmarks/Ptrdist/bc/number.c
was miss compiled and the test produced a wrong result.
One Polly test case that was added later was adjusted too.
llvm-svn: 248860
Every once in a while we see code that accesses memory with different types,
e.g. to perform operations on a piece of memory using type 'float', but to copy
data to this memory using type 'int'. Modeled in C, such codes look like:
void foo(float A[], float B[]) {
for (long i = 0; i < 100; i++)
*(int *)(&A[i]) = *(int *)(&B[i]);
for (long i = 0; i < 100; i++)
A[i] += 10;
}
We already used the correct types during normal operations, but fall back to our
detected type as soon as we import changed memory access functions. For these
memory accesses we may generate invalid IR due to a mismatch between the element
type of the array we detect and the actual type used in the memory access. To
address this issue, we always cast the newly created address of a memory access
back to the type of the memory access where the address will be used.
llvm-svn: 248781
Instructions which we can synthesis from a SCEV expression are not generated
directly, but only when they are used as an operand of another instruction. This
avoids generating unnecessary instruction and works more reliably than first
inserting them and then deleting them later on.
Suggested-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D13208
llvm-svn: 248712
This patch allows switch instructions with affine conditions in the
SCoP. Also switch instructions in non-affine subregions are allowed.
Both did not require much changes to the code, though there was some
refactoring needed to integrate them without code duplication.
In the llvm-test suite the number of profitable SCoPs increased from
135 to 139 but more importantly we can handle more benchmarks and user
inputs without preprocessing.
Differential Revision: http://reviews.llvm.org/D13200
llvm-svn: 248701
This check was needed at some point but seems not useful anymore. Only
one adjustment in the domain generation was needed to cope with the
cases this check prevented from happening before.
llvm-svn: 248695
We now only delete trivially dead instructions in the BB we copy (copyBB), but
not in any other BB. Only for copyBB we know that there will _never_ be any
future uses of instructions that have no use after copyBB has been generated.
Other instructions in the AST that have been generated by IslNodeBuilder may
look dead at the moment, but may possibly still be referenced by GlobalMaps. If
we delete them now, later uses would break surprisingly.
We do not have a test case that breaks due to us deleting too many instructions.
This issue was found by inspection.
llvm-svn: 248688
After having generated a new user statement a couple of inefficient or
trivially dead instructions may remain. This commit runs instruction
simplification over the newly generated blocks to ensure unneeded
instructions are removed right away.
This commit does adds simplification for non-affine subregions which was not
yet part of 248681.
llvm-svn: 248683
Otherwise, part of the computation will be just simplified away when we add
instruction simplification support to the RegionGenerator.
llvm-svn: 248682
After having generated a new user statement a couple of inefficient or trivially
dead instructions may remain. This commit runs instruction simplification over
the newly generated blocks to ensure unneeded instructions are removed right
away.
This commit does not yet add simplification for non-affine subregions.
llvm-svn: 248681
This commit basically reverts r246427 but still solves the issue
tackled by that commit. Instead of emitting initialization code in the
beginning of the start block we now generate parallel code in its own
block and thereby guarantee separation. This is necessary as we cannot
generate code for hoisted loads prior to the start block but it still
needs to be placed prior to everything else.
llvm-svn: 248674
The new domain construction algorithm now correctly models this test case (and
derives an empty run-time condition). Add this test case to ensure we do not
regress.
llvm-svn: 248669
When the whole SCoP is a non-affine region we need to use the
surrounding loop in the construction of the schedule as that is
the one that will be looked up after the schedule generation.
This fixes bug 24947
llvm-svn: 248667
When recovering multi-dimensional memory accesses, it may happen that different
accesses to the same base array are recovered with different dimensionality.
This patch ensures that the dimensionalities are unified by adding zero valued
dimensions to acesses with lower dimensionality. When starting to model
fixed-size arrays as multi-dimensional in 247906, this has not been taken
care of.
llvm-svn: 248662
This change addresses three issues:
- Read only scalars that enter a PHI node through an edge that comes from
outside the scop are not modeled any more, as such PHI nodes will always
be initialized to this initial value right before the SCoP is entered.
- For PHI nodes that depend on a scalar value that is defined outside the
scop, but where the scalar values is passed through an edge that itself
comes from a BB that is part of the region, we introduce in this basic
block a read of the out-of-scop value to ensure it's value is available
to write it into the PHI alloc location.
- Read only uses of scalars by PHI nodes are ignored in the general read only
handling code, as they are taken care of by the general PHI node modeling
code.
llvm-svn: 248535
After the merge of TempScopInfo into ScopInfo the analysis output
remained because of the existing unit tests. These remains are removed
and the units tests converted to match the equivalent output of
ScopInfo's analysis output. The unit tests are also moved into the
directory of ScopInfo tests.
Differential Revision: http://reviews.llvm.org/D13116
llvm-svn: 248485
A missing return statement that previously did not have a visibly negative
effect caused after some data-structure changes in r248024 multi-dimensional
accesses to be modeled both multi-dimensional as well as linearized. This
commit adds the missing return to avoid the incorrect double modeling as
well as the compile time increases it caused.
llvm-svn: 248171
If we encounter a <nsw> tagged AddRec for a loop we know the trip count of
that loop has to be bounded or the semantics is undefined anyway. Hence, we
only need to add unbounded assumptions if no such AddRec is known.
llvm-svn: 248128
So far we ignored the unbounded parts of the iteration domain, however
we need to assume they do not occure at all to remain sound if they do.
llvm-svn: 248126
We now add loop carried information during the second traversal of the
region instead of in a intermediate step in-between. This makes the
generation simpler, removes code and should even be faster.
llvm-svn: 248125
In order to allow multiple back edges we:
- compute the conditions under which each back edge is taken
- build the union over all these conditions, thus the condition that
any back edge is taken
- apply the same logic to the union we applied to a single back edge
llvm-svn: 248120
As we currently do not perform any optimizations that targets (or is
even aware) small trip counts we will skip them when we count the
loops in a region.
llvm-svn: 248119
All MemoryAccess objects will be owned by ScopInfo::AccFuncMap which
previously stored the IRAccess objects. Instead of creating new
MemoryAccess objects, the already created ones are reused, but their
order might be different now. Some fields of IRAccess and MemoryAccess
had the same meaning and are merged.
This is the last step of fusioning TempScopInfo.{h|cpp} and
ScopInfo.{h.cpp}. Some refactoring might still make sense.
Differential Revision: http://reviews.llvm.org/D12843
llvm-svn: 248024
If the GEP instructions give us enough insights, model scalar accesses as
multi-dimensional (and generate the relevant run-time checks to ensure
correctness). This will allow us to simplify the dependence computation in
a subsequent commit.
llvm-svn: 247906
This will allow to generate non-wrap assumptions for integer expressions
that are part of the SCoP. We compare the common isl representation of
the expression with one computed with modulo semantic. For all parameter
combinations they are not equal we can have integer overflows.
The nsw flags are respected when the modulo representation is computed,
nuw and nw flags are ignored for now.
In order to not increase compile time to much, the non-wrap assumptions
are collected in a separate boundary context instead of the assumed
context. This helps compile time as the boundary context can become
complex and it is therefor not advised to use it in other operations
except runtime check generation. However, the assumed context is e.g.,
used to tighten dependences. While the boundary context might help to
tighten the assumed context it is doubtful that it will help in practice
(it does not effect lnt much) as the boundary (or no-wrap assumptions)
only restrict the very end of the possible value range of parameters.
PET uses a different approach to compute the no-wrap context, though lnt runs
have shown that this version performs slightly better for us.
llvm-svn: 247732
Add polly-check-format as dependency of check-polly if clang-format is
available in the same build.
Differential Revision: http://reviews.llvm.org/D12850
llvm-svn: 247600
At some point we build loop trip counts using this method. It was replaced by
a simpler trick that works only for affine (e.g., not modulo) constraints and
relies on the removal of unbounded parts. In order to allow modulo constrains
again we go back to the former, more accurate method.
llvm-svn: 247540
The function use_after_scop would iterate from 0 to 1024 and accessing element A[1024] where A has only valid indexes from 0 to 1023. Polly detects the situation of unconditionally undefined behavior and bail out in ScopInfo as non-feasible for optimization.
Other tests add impossible context assumptions as well, hance might show the same problem.
llvm-svn: 247412
Hoist runtime checks in the loop nest if they guard an "error" like event.
Such events are recognized as blocks with an unreachable terminator or a call
to the ubsan function that deals with out of bound accesses. Other "error"
events can be added easily.
We will ignore these blocks when we detect/model/optmize and code generate SCoPs
but we will make sure that they would not have been executed using the assumption
framework.
llvm-svn: 247310
As we do not rely on ScalarEvolution any more we do not need to get
the backedge taken count. Additionally, our domain generation handles
everything that is affine and has one latch and our ScopDetection will
over-approximate everything else.
This change will therefor allow loops with:
- one latch
- exiting conditions that are affine
Additionally, it will not check for structured control flow anymore.
Hence, loops and conditionals are not necessarily single entry single
exit regions any more.
Differential Version: http://reviews.llvm.org/D12758
llvm-svn: 247289
The TempScopInfo (-polly-analyze-ir) pass is removed and its work taken
over by ScopInfo (-polly-scops). Several tests depend on
-polly-analyze-ir and use -polly-scops instead which for the moment
prints the output of both passes. This again is not expected by some
other tests, especially those with negative searches, which have been
adapted.
Differential Version: http://reviews.llvm.org/D12694
llvm-svn: 247288
This patch replaces the last legacy part of the domain generation, namely the
ScalarEvolution part that was used to obtain loop bounds. We now iterate over
the loops in the region and propagate the back edge condition to the header
blocks. Afterwards we propagate the new information once through the whole
region. In this process we simply ignore unbounded parts of the domain and
thereby assume the absence of infinite loops.
+ This patch already identified a couple of broken unit tests we had for
years.
+ We allow more loops already and the step to multiple exit and multiple back
edges is minimal.
+ It allows to model the overflow checks properly as we actually visit
every block in the SCoP and know where which condition is evaluated.
- It is currently not compatible with modulo constraints in the
domain.
Differential Revision: http://reviews.llvm.org/D12499
llvm-svn: 247279
The support for modulo expressions is not comlete and makes the new
domain generation harder. As the currently broken domain generation
needs to be replaced, we will first swap in the new, fixed domain
generation and make it compatible with the modulo expressions later.
llvm-svn: 247278
The support for pointer expressions is broken as it can only handle
some patterns in the IslExprBuilder. We should to treat pointers in
expressions the same as integers at some point and revert this patch.
llvm-svn: 247147
While we do not need to model PHI nodes in the region exit (as it is not part
of the SCoP), we need to prepare for the case that the exit block is split in
code generation to create a single exiting block. If this will happen, hence
if the region did not have a single exiting block before, we will model the
operands of the PHI nodes as escaping scalars in the SCoP.
Differential Revision: http://reviews.llvm.org/D12051
llvm-svn: 247078
Instead of having two separate options
-polly-detect-scops-in-functions-without-loops and
-polly-detect-scops-in-regions-without-loops we now just use
-polly-detect-unprofitable to force the detection of scops ignoring any compile
time saving bailout heuristics.
llvm-svn: 247057
Certain backends, e.g. NVPTX, do not support '.' in function names. Hence,
we ensure all '.' are replaced by '_' when generating function names for
subfunctions. For the current OpenMP code generation, this is not strictly
necessary, but future uses cases (e.g. GPU offloading) need this issue to be
fixed.
llvm-svn: 246980
Our alias metadata is currently not emitted in a deterministic order. As it
is not needed in this test, we just drop it for now (but keep in mind to fix
this).
llvm-svn: 246942
When this option is enabled, Polly will emit printf calls for each scalar
load/and store which dump the scalar value loaded/stored at run time.
This patch also refactors the RuntimeDebugBuilder to use variadic templates
when generating CPU printfs. As result, it now becomes easier to print
strings that consist of a set of arguments. Also, as a single printf
call is emitted, it is more likely for such strings to be emitted atomically
if executed multi-threaded.
llvm-svn: 246941
When computing the index expressions for new, multi-dimensional memory accesses
these new index expressions may reference original llvm::Values that are not
transfered into the OpenMP subfunction. Using GlobalMap we now replace
references to such values with the rewritten values that have e.g. been passed
to the OpenMP subfunction.
llvm-svn: 246923
Originally, we disallowed the import of multi-dimensional access functions due
to our code generation not supporting the generation of new address expressions
for multi-dimensional memory accesses. When building our run-time alias check
infrastructure we added code generation support for multi-dimensional address
calculations. Hence, we can now savely allow the import of new
multi-dimensional access functions.
llvm-svn: 246917
This case probably does not happen for LLVM generated code that is in loop
simplify form, but Polly does support such kind of loops. This commit ensures we
have test coverage as well.
llvm-svn: 246543
Code generation currently does not expect unbounded loops. When
using ISL to compute the loop trip count, if we find that the
iteration domain remains unbounded, we invalidate the Scop by
creating an infeasible context.
Contributed-by: Matthew Simpson <mssimpso@codeaurora.org>
This fixes PR24634.
Differential Revision: http://reviews.llvm.org/D12493
llvm-svn: 246477
Before this commit we did this only for Arguments or Constants, but indeed
an instruction may define a value a lot higher up in the dominance tree, but
the actual write generally needs to happen right before branching to the
PHI node. Otherwise, the writes of different branches into PHI nodes may get
intermixed if they lay higher up in the dominance tree.
llvm-svn: 246441
There is no reason the loops in a region need to touch either entry or exit
block. Hence, we need to look through all loops that may touch the region as
well as their children to understand if our region has at least two loops.
llvm-svn: 246433
While ignoring read-only scalar dependences it was not necessary to consider
store instructins, but as store instructions can be the target of a scalar
read-only dependency we need to consider them for the construction of scalar
read-only dependences.
llvm-svn: 246429
Our OpenMP code generation generated part of its launching code directly into
the start basic block and without this change the scalar initialization was
run _after_ the OpenMP threads have been launched. This resulted in
uninitialized scalar values to be used.
llvm-svn: 246427
In order to compute domain conditions for conditionals we will now
traverse the region in the ScopInfo once and build the domains for
each block in the region. The SCoP statements can then use these
constraints when they build their domain.
The reason behind this change is twofold:
1) This removes a big chunk of preprocessing logic from the
TempScopInfo, namely the Conditionals we used to build there.
Additionally to moving this logic it is also simplified. Instead
of walking the dominance tree up for each basic block in the
region (as we did before), we now traverse the region only
once in order to collect the domain conditions.
2) This is the first step towards the isl based domain creation.
The second step will traverse the region similar to this step,
however it will propagate back edge conditions. Once both are in
place this conditional handling will allow multiple exit loops
additional logic.
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12428
llvm-svn: 246398
We already modeled read-only dependences to scalar values defined outside the
scop as memory reads and also generated read accesses from the corresponding
alloca instructions that have been used to pass these scalar values around
during code generation. However, besides for PHI nodes that have already been
handled, we failed to store the orignal read-only scalar values into these
alloc. This commit extends the initialization of scalar values to all read-only
scalar values used within the scop.
llvm-svn: 246394
Our code generation currently does not support scalar references to metadata
values. Hence, it would crash if we try to model scalar dependences to metadata
values. Fortunately, for one of the common uses, debug information, we can
for now just ignore the relevant intrinsics and consequently the issue of how
to model scalar dependences to metadata.
llvm-svn: 246388
I ran the script from r246327 and it touched all the right files;
committing now to hopefully right the bots, but if my check-polly
doesn't come back clean I'll keep looking.
http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/33648
llvm-svn: 246341
If a region does not have more than one loop, we do not identify it as
a Scop in ScopDetection. The main optimizations Polly is currently performing
(tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
to have a positive impact on individual loops. In some cases, Polly's run-time
alias checks or conditional hoisting may still have a positive impact, but those
are mostly enabling transformations which LLVM already performs for individual
loops. As we do not focus on individual loops, we leave them untouched to not
introduce compile time regressions and execution time noise. This results in
good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12268
llvm-svn: 246161
Use ISL to compute the loop trip count when scalar evolution is unable to do
so.
Contributed-by: Matthew Simpson <mssimpso@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D9444
llvm-svn: 246142
If nothing is executed we can bail out early. Otherwise we can use the
constraints that ensure at least one statement is executed for
simplification.
llvm-svn: 245585
We will record if a SAI is the base of another SAI or derived from it.
This will allow to reason about indirect base pointers later on and
allows a clearer picture of indirection also in the SCoP dump.
llvm-svn: 245584
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.
llvm-svn: 245564
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.
llvm-svn: 245563
Instead of generating code for an empty assumed context we bail out
early. As the number of assumptions we generate increases this becomes
more and more important. Additionally, this change will allow us to
hide internal contexts that are only used in runtime checks e.g., a
boundary context with constraints not suited for simplifications.
llvm-svn: 245540
To make alias scope metadata generation work in OpenMP mode we now provide
the ScopAnnotator with information about the base pointer rewrite that happens
when passing arrays into the OpenMP subfunction.
llvm-svn: 245451
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.
This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.
llvm-svn: 245424
This test was written to check the workings of IndependentBlocks on
arrays which doesn't do such transformations anymore. The test itself
is still useful to check that the region is rejected as SCoP.
llvm-svn: 245353
executeScopConditionally would destroy a predecessor region if it the
scop's entry was the region's exit block by forking it to polly.start
and thus creating a secnd exit out of the region. This patch "shrinks"
the predecessor region s.t. polly.split_new_and_old is not the
region's exit anymore.
llvm-svn: 245294
The SCEVExpander cannot deal with all SCEVs Polly allows in all kinds
of expressions. To this end we introduce a ScopExpander that handles
the additional expressions separatly and falls back to the
SCEVExpander for everything else.
Reviewers: grosser, Meinersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12066
llvm-svn: 245288
This allows the code generation to continue working even if a needed
value (that is reloaded anyway) was not yet demoted. Instead of
failing it will now create the location for future demotion to memory
and load from that location. The stores will use the same location and
by construction execute before the load even if the textual order in
the generated AST is otherwise.
Reviewers: grosser, Meinersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12072
llvm-svn: 245203
This test case crashes the scalar code generation as we are not
consistent with the usage of the assumed context. To be precise, we
use the assumed context for the dependence analysis but not to
restrict the domains of the statements.
A step by step explanation of the problem is given in the test case.
llvm-svn: 245176
This option allows the user to provide additional information about parameter
values as an isl_set. To specify that N has the value 1024, we can provide
the context -polly-context='[N] -> {: N = 1024}'.
llvm-svn: 245175
This change extends the BlockGenerator to not only allow Instructions as
base elements of scalar dependences, but any llvm::Value. This allows
us to code-generate scalar dependences which reference function arguments, as
they arise when moddeling read-only scalar dependences.
llvm-svn: 244874
Before we only modeled PHI nodes if at least one incoming basic block was itself
part of the region, now we always model them except if all of their operands are
part of a single non-affine subregion which we model as a black-box.
This change only affects PHI nodes in the entry block, that have exactly one
incoming edge. Before this change, we did not model them and as a result code
generation would not know how to code generate them. With this change, code
generation can code generate them like any other PHI node.
This issue was exposed by r244606. Before this change simplifyRegion would have
moved these PHI nodes out of the SCoP, so we would never have tried to code
generate them. We could implement this behavior again, but changing the IR
after the scop has been modeled and transformed always adds a risk of us
invalidating earlier analysis results. It seems more save and overall also more
consistent to just model and handle this one-entry-edge PHI nodes like any
other PHI node in the scop.
Solution proposed by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 244721
This one was extracted from the test-suite's pifft and caused a
miscompilation because a scalar was not written to its alloca address.
llvm-svn: 244720
The previous code had several problems:
For newly created BasicBlocks it did not (always) call RegionInfo::setRegionFor in order to update its analysis. At the moment RegionInfo does not verify its BBMap, but will in the future. This is fixed by determining the region new BBs belong to and set it accordingly. The new executeScopConditionally() requires accurate getRegionFor information.
Which block is created by SplitEdge depends on the incoming and outgoing edges of the blocks it connects, which makes handling its output more difficult than it needs to be. Especially for finding which block has been created an to assign a region to it for the setRegionFor problem above. This patch uses an implementation for splitEdge that always creates a block between the predecessor and successor. simplifyRegion has also been simplified by using SplitBlockPredecessors instead of SplitEdge. Isolating the entries and exits have been refectored into individual functions.
Previously simplifyRegion did more than just ensuring that there is only one entering and one exiting edge. It ensured that the entering block had no other outgoing edge which was necessary for executeScopConditionally(). Now the latter uses the alternative splitEdge implementation which can handle this situation so simplifyRegion really only needs to simplify the region.
Also, executeScopConditionally assumed that there can be no PHI nodes in blocks with one incoming edge. This is wrong and LCSSA deliberately produces such edges. However, previous passes ensured that there can be no such PHIs in exit nodes, but which will no longer hold in the future.
The new code that the property that it preserves the identity of region block (the property that the memory address of the BasicBlock containing the instructions remains the same; new blocks only contain PHI nodes and a terminator), especially the entry block. As a result, there is no need to update the reference to the BasicBlock of ScopStmt that contain its instructions because they have been moved to other basic blocks.
Reviewers: grosser
Part of Differential Revision: http://reviews.llvm.org/D11867
llvm-svn: 244606
Besides other changes this version of isl contains a fundamental fix to memory
corruption issues we have seen with imath-32 backed isl_ints.
This update also contains a fix that ensures that the schedule-tree based
version of isl's dependence analysis takes the domain of the schedule into
account.
llvm-svn: 244585
Even though read-only accesses to scalars outside of a scop do not need to be
modeled to derive valid transformations or to generate valid sequential code,
but information about them is useful when we considering memory footprint
analysis and/or kernel offloading.
llvm-svn: 243981
We use the branch instruction as the location at which a PHI-node write takes
place, instead of the PHI-node itself. This allows us to identify the
basic-block in a region statement which is on the incoming edge of the PHI-node
and for which the write access was originally introduced. As a result we can,
during code generation, avoid generating PHI-node write accesses for basic
blocks that do not preceed the PHI node without having to look at the IR
again.
This change fixes a bug which was introduced in r243420, when we started to
explicitly model PHI-node reads and writes, but dropped some additional checks
that where still necessary during code generation to not emit PHI-node writes
for basic-blocks that are not on incoming edges of the original PHI node.
Compared to the code before r243420 the new code does not need to inspect the IR
any more and we also do not generate multiple redundant writes.
llvm-svn: 243852
The schedule map we derive from a schedule tree map may map statements into
schedule spaces of different dimensionality. This change adds zero padding
to ensure just a single schedule space is used and the translation from
a union_map to an isl_multi_union_pw_aff does not fail.
llvm-svn: 243849
SCEVExpander, which we are using during code generation, only allows
instructions as insert locations, but breaks in case BasicBlock->end() iterators
are passed to it due to it trying to obtain the basic block in which code should
be generated by calling Instruction->getParent(), which is not defined for
->end() iterators.
This change adds an assert to Polly that ensures we only pass valid instructions
to SCEVExpander and it fixes one case, where we used IRBuilder->SetInsertBlock()
to set an ->end() insert location which was later passed to SCEVExpander.
In general, Polly is always trying to build up the CFG first, before we actually
insert instructions into the CFG sceleton. As a result, each basic block should
already have at least one branch instruction before we start adding code. Hence,
always requiring the IRBuilder insert location to be set to a real instruction
should always be possible.
Thanks Utpal Bora <cs14mtech11017@iith.ac.in> for his help with test case
reduction.
llvm-svn: 243830
Such codes are not interesting to optimize and most likely never appear in the
normal compilation flow. However, they show up during test case reduction with
bugpoint and trigger -- without this change -- an assert in
polly::MemoryAccess::foldAccess(). It is better to detect them in
ScopDetection itself and just bail out.
Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
Reviewers: grosser
Subscribers: pollydev, llvm-commits
Differential Revision: http://reviews.llvm.org/D11425
llvm-svn: 243515
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
Summary:
When translating PHI nodes into memory dependences during code generation we
require two kinds of memory. 'Normal memory' as for all scalar dependences and
'PHI node memory' to store the incoming values of the PHI node. With this
patch we now mark and track these two kinds of memories, which we previously
incorrectly marked as a single memory object.
Being aware of PHI node storage makes code generation easier, as we do not need
to guess what kind of storage a scalar reference requires. This simplifies the
code nicely.
Reviewers: jdoerfert
Subscribers: pollydev, llvm-commits
Differential Revision: http://reviews.llvm.org/D11554
llvm-svn: 243420
These test cases check whether Polly still gives the same results if
LICM runs before. Currently, it does not and therefore these cases are
expected fails.
llvm-svn: 243037
As specified in PR23888, run-time alias check generation is expensive
in terms of compile-time. This reduces the compile time by computing
minimal/maximal access only once for each base pointer
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 243024
Put all Polly targets into a single "Polly" category (i.e.
solution folder). Previously there was no recognizable scheme and most
categories contained just one or two targets or targets didn't belong
to any category.
Reviewers: grosser
llvm-svn: 242779
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.
This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.
For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238
llvm-svn: 242130
Named isl sets can generally have any name if they remain within Polly, but only
certain strings can be parsed by isl. The new names we create ensure that we
can always copy-past isl strings from Polly to other isl tools, e.g. for
debugging.
llvm-svn: 241787
This is very preliminary support, but it seems to work for the most common case.
When observing more/different test cases, we can work on generalizing this.
llvm-svn: 240955
This removes old code that has been disabled since several weeks and was hidden
behind the flags -disable-polly-intra-scop-scalar-to-array=false and
-polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and
PHI nodes to single element arrays, as this avoided the need for their special
handling in Polly. With Johannes' patches adding native support for such scalar
references to Polly, this code is not needed any more. After this commit both
-polly-prepare and -polly-independent are now mostly no-ops. Only a couple of
simple transformations still remain, but they are scheduled for removal too.
Thanks again to Johannes Doerfert for his nice work in making all this code
obsolete.
llvm-svn: 240766
Remainder operations with constant divisor can be modeled as quasi-affine
expression. This patch adds support for detecting and modeling them. We also
add a test that ensures they are correctly code generated.
This patch was extracted from a larger patch contributed by Johannes Doerfert
in http://reviews.llvm.org/D5293
llvm-svn: 240518
This makes the test cases nonaffine even if Polly some days gains support for
the srem instruction, an instruction which is currently not modeled but which
can clearly be modeled statically. A call to a function without definition
will always remain non-affine, as there is just insufficient static information
for it to be modeled more precisely.
llvm-svn: 240458
LLVM's instcombine already translates power-of-two sdivs that are known to be
exact to fast ashr instructions. Hence, there is no need to add this logic
ourselves.
Pointed-out-by: Johannes Doerfert
llvm-svn: 239025
We now verify that memory access functions imported via JSON are indeed defined
for the full iteration domain. Before this change we accidentally imported
memory mappings such as i -> i / 127, which only defined a mapped for values of
i that are evenly divisible by 127, but which did not define any mapping for the
remaining values, with the result that isl just generated an access expression
that had undefined behavior for all the unmapped values.
In the incorrect test cases, we now either use floor(i/127) or we use p/127 and
provide the information that p is indeed a multiple of 127.
llvm-svn: 239024
isl marks known non-negative numerators in modulo (and soon also division)
operations. We now exploit this by generating unsigned operations. This is
beneficial as unsigned operations with power-of-two denominators will be
translated by isl to fast bitshift or bitwise and operations.
llvm-svn: 238577
While looking through the test cases I realized we did not have a CHECK line
for a duplicate memory access which we may want to eliminate later. To ensure
we do not have (or later introduce) unnecessary memory accesses, we now tighten
the test cases to look for such a pattern (and add the CHECK: line that shows
the redundant memory access).
llvm-svn: 238227
This ensures we pass all tests independently of how we set the options
-disable-polly-intra-scop-scalar-to-array and -polly-model-phi-nodes.
(At least if we enable both or disable both. Enabling them individually makes
little sense, as they will hopefully disappear soon anyhow).
llvm-svn: 238087
To reduce compile time and to allow more and better quality SCoPs in
the long run we introduced scalar dependences and PHI-modeling. This
patch will now allow us to generate code if one or both of those
options are set. While the principle of demoting scalars as well as
PHIs to memory in order to communicate their value stays the same,
this allows to delay the demotion till the very end (the actual code
generation). Consequently:
- We __almost__ do not modify the code if we do not generate code
for an optimized SCoP in the end. Thus, the early exit as well as
the unprofitable option will now actually preven us from
introducing regressions in case we will probably not get better
code.
- Polly can be used as a "pure" analyzer tool as long as the code
generator is set to none.
- The original SCoP is almost not touched when the optimized version
is placed next to it. Runtime regressions if the runtime checks
chooses the original are not to be expected and later
optimizations do not need to revert the demotion for that part.
- We will generate direct accesses to the demoted values, thus there
are no "trivial GEPs" that select the first element of a scalar we
demoted and treated as an array.
Differential Revision: http://reviews.llvm.org/D7513
llvm-svn: 238070
Being here, we extend the interface to return the element type and not a pointer
to the element type. We also provide a function to get the size (in bytes) of
the elements stored in this array.
We currently still store the element size as an innermost dimension in
ScopArrayInfo, which is somehow inconsistent and should be addressed in future
patches.
llvm-svn: 237779