This removes old code that has been disabled since several weeks and was hidden
behind the flags -disable-polly-intra-scop-scalar-to-array=false and
-polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and
PHI nodes to single element arrays, as this avoided the need for their special
handling in Polly. With Johannes' patches adding native support for such scalar
references to Polly, this code is not needed any more. After this commit both
-polly-prepare and -polly-independent are now mostly no-ops. Only a couple of
simple transformations still remain, but they are scheduled for removal too.
Thanks again to Johannes Doerfert for his nice work in making all this code
obsolete.
llvm-svn: 240766
Remainder operations with constant divisor can be modeled as quasi-affine
expression. This patch adds support for detecting and modeling them. We also
add a test that ensures they are correctly code generated.
This patch was extracted from a larger patch contributed by Johannes Doerfert
in http://reviews.llvm.org/D5293
llvm-svn: 240518
David Blaikie:
"find returns an iterator by value, so it's just added complexity/strangeness to
then use reference lifetime extension to give it the same semantics as if you'd
used a value type instead of a reference type."
llvm-svn: 238294
David Blaike suggested this as an alternative to the use of owningptr(s) for our
memory management, as value semantics allow to avoid the additional interface
complexity caused by owningptr while still providing similar memory consistency
guarantees. We could also have used a std::vector, but the use of std::vector
would yield possibly changing pointers which currently causes problems as for
example the memory accesses carry pointers to their parent statements. Such
pointers should not change.
Reviewer: jblaikie, jdoerfert
Differential Revision: http://reviews.llvm.org/D10041
llvm-svn: 238290
The feature itself has been committed by Johannes in r238070. As this is the
way forward, we now enable it to ensure we get test coverage.
Thank you Johannes for this nice work!
llvm-svn: 238088
To reduce compile time and to allow more and better quality SCoPs in
the long run we introduced scalar dependences and PHI-modeling. This
patch will now allow us to generate code if one or both of those
options are set. While the principle of demoting scalars as well as
PHIs to memory in order to communicate their value stays the same,
this allows to delay the demotion till the very end (the actual code
generation). Consequently:
- We __almost__ do not modify the code if we do not generate code
for an optimized SCoP in the end. Thus, the early exit as well as
the unprofitable option will now actually preven us from
introducing regressions in case we will probably not get better
code.
- Polly can be used as a "pure" analyzer tool as long as the code
generator is set to none.
- The original SCoP is almost not touched when the optimized version
is placed next to it. Runtime regressions if the runtime checks
chooses the original are not to be expected and later
optimizations do not need to revert the demotion for that part.
- We will generate direct accesses to the demoted values, thus there
are no "trivial GEPs" that select the first element of a scalar we
demoted and treated as an array.
Differential Revision: http://reviews.llvm.org/D7513
llvm-svn: 238070
Instead of explicitly building constraints and adding them to our maps we
now use functions like map_order_le to add the relevant information to the
maps.
llvm-svn: 237934
Being here, we extend the interface to return the element type and not a pointer
to the element type. We also provide a function to get the size (in bytes) of
the elements stored in this array.
We currently still store the element size as an innermost dimension in
ScopArrayInfo, which is somehow inconsistent and should be addressed in future
patches.
llvm-svn: 237779
This reference ID is handy for use cases where we need to identify individual
memory accesses (e.g. to modify their access functions).
This is a reworked version of a patch originally developed by Yabin Hu as part
of his summer of code project.
llvm-svn: 237431
Upcoming revisions of isl require us to include header files explicitly, which
have previously been already transitively included. Before we add them, we sort
the existing includes.
Thanks to Chandler for sort_includes.py. A simple, but very convenient script.
llvm-svn: 236930
This patch also changes the implementation of the ArrayInfoMap to a MapVector
which will ensure that iterating over the list of ArrayInfo objects gives
predictable results. The single loop that currently enumerates the ArrayInfo
objects only frees the individual objectes, hence a possibly changing
iteration order does not affect the outcome. The added robustness is for
future users of this interface.
llvm-svn: 236583
In the lnt benchmark MultiSource/Benchmarks/MallocBench/gs/gs with
scalar and PHI modeling we detected the multidimensional accesses
with sizes variant in the SCoP. This will check the sizes for validity.
llvm-svn: 236395
This change adds location information for the detected regions in Polly when the
required debug information is available.
The JSCOP output format is extended with a "location" field which contains the
information in the format "source.c:start-end"
The dot output is extended to contain the location information for each nested
region in the analyzed function.
As part of this change, the existing getDebugLocation function has been moved
into lib/Support/ScopLocation.cpp to avoid having to include
polly/ScopDetectionDiagnostics.h.
Differential Revision: http://reviews.llvm.org/D9431
Contributed-by: Roal Jordans <r.jordans@tue.nl>
llvm-svn: 236393
In Polly we used both the term 'scattering' and the term 'schedule' to describe
the execution order of a statement without actually distinguishing between them.
We now uniformly use the term 'schedule' for the execution order. This
corresponds to the terminology of isl.
History: CLooG introduced the term scattering as the generated code can be used
as a sequential execution order (schedule) or as a parallel dimension
enumerating different threads of execution (placement). In Polly and/or isl the
term placement was never used, but we uniformly refer to an execution order as a
schedule and only later introduce parallelism. When doing so we do not talk
about about specific placement dimensions.
llvm-svn: 235380
This change is a step towards using a single isl_schedule object throughout
Polly. At the moment the schedule is just constructed from the flat
isl_union_map that defines the schedule. Later we will obtain it directly
from the scop and potentially obtain a schedule with a non-trivial internal
structure that will allow faster dependence analysis.
llvm-svn: 235378
isl_union_map_compute_flow() has been replaced by
isl_union_access_info_compute_flow(). This change does not intend to
change funcitonality, yet. However, it will allow us to pass in subsequent
changes schedule trees to the dependence analysis instead of flat schedules.
This should speed up dependence analysis for important cases significantly.
llvm-svn: 235373
Otherwise, instructions in different functions that share the same pointer (due
to earlier modifications), might get assigned incorrect memory access
information (belonging to instructions in previous functions), which can result
in arbitrary memory corruption and assertion failures.
This fixes llvm.org/PR23160 and possibly also llvm.org/PR23167.
Note: InsnToMemAcc is a global variable that should never have existed in the
first place. We will clean this up in a subsequent patch.
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Debugged-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
llvm-svn: 235254
This will allow the ScopInfo to build the polyhedral representation for
non-affine regions that contain loops. Such loops are basically not visible
in the SCoP representation. Accesses that are variant in such loops are
therefor represented as non-affine accesses.
Differential Revision: http://reviews.llvm.org/D8153
llvm-svn: 234713
This will allow the ScopDetection to detect non-affine regions that
contain loops. All loops contained will be collected and are
accessible to later passes in order to adjust the access functions.
As the loops are non-affine and will not be part of the polyhedral
representation later, all accesses that are variant in these loops
have to be over approximated as non-affine accesses. They are
therefore handled the same way as other non-affine accesses.
Additionally, we do not count non-affine loops for the profitability
heuristic, thus a region with only a non-affine loop will only be
detected if the general detection of loop free regions is enabled.
Differential Revision: http://reviews.llvm.org/D8152
llvm-svn: 234711
This allows us to delinerize code such as:
A[][n]
for (i
for (j
A[i][n-j-1] = ...
which would previously have been delinearize to an access A[i+1][-j-1].
To recover the correct access we apply the piecewise expression:
{ A[i][j] -> A[i-1][i+N]: i < 0; A[i][j] -> A[i][i]: i >= 0}
This approach generalizes to higher dimensions.
llvm-svn: 233566
This will strip the constant factor of a parameter befor we add it to
the SCoP. As a result the access functions are simplified, e.g., for
the attached test case.
llvm-svn: 233501
The performance test case just committed was the last open issue I was aware of.
We enable this by default to increase test coverage and to possibly trigger
reports of issues yet unknown.
llvm-svn: 231590
The new Dependences struct in the DependenceInfo holds all information
that was formerly part of the DependenceInfo. It also provides the
same interface for the user to access this information.
This is another step to a more general ScopPass interface that does
allow multiple SCoPs to be "in flight".
llvm-svn: 231327
We rename the Dependences pass to DependenceInfo as a first step to a
caching pass policy. The new DependenceInfo pass will later provide
"Dependences" for a SCoP.
To keep consistency the test folder is renamed too.
llvm-svn: 231308
If a scalar was defined and used only in a non-affine subregion we do
not need to model the accesses. However, if the scalar was defined
inside the region and escapes the region we have to model the access.
The same is true if the scalar was defined outside and used inside the
region.
llvm-svn: 230960
With the patches r230325, r230329 and r230340 we can handle non-affine
control flow in (loop-free) subregions. As all LLVM test-suite tests pass and
we get ~20% more non-trivial SCoPs, we activate it now by default.
llvm-svn: 230624
This allows us to model non-affine regions in the SCoP representation.
SCoP statements can now describe either basic blocks or non-affine
regions. In the latter case all accesses in the region are accumulated
for the statement and write accesses, except in the entry, have to be
marked as may-write.
Differential Revision: http://reviews.llvm.org/D7846
llvm-svn: 230329
With this patch we allow the SCoP detection to detect regions as SCoPs
which have non-affine control flow inside. All non-affine regions are
tracked and later accessible to the ScopInfo.
As there is no real difference, non-affine branches as well as
floating point branches are covered (and both called non-affine
control flow). However, the detection is restricted to
overapproximate only loop free regions.
llvm-svn: 230325
Scops that only read seem generally uninteresting and scops that only write are
most likely initializations where there is also little to optimize. To not
waste compile time we bail early.
Differential Revision: http://reviews.llvm.org/D7735
llvm-svn: 229820
Alias checks might become costly if there are divisions that complicate the
description of the accessed locations. By overaproximating them we get fairly
accurate results without the huge compile time cost.
llvm-svn: 229252
This allows us to skip ast and code generation if we did not optimize
a SCoP and will not generate parallel or alias annotations. The
initial heuristic to exit is simple but allows improvements later on.
All failing test cases have been modified to disable early exit, thus
to keep their coverage.
Differential Revision: http://reviews.llvm.org/D7254
llvm-svn: 228851
These write are important as they will force the scheduling and code
generation of an otherwise trivial statement and also impose an order of
execution needed to guarantee the correct final value for a scalar in a loop.
Added test case modeled after ClamAV/clamscan.
llvm-svn: 228847
This change has two main purposes:
1) We do not use a static interface to hide an object we create and
destroy for every basic block we copy.
2) We allow the BlockGenerator to store information between calls to
the copyBB method. This will ease scalar/phi code generation
later on.
While a lot of method signatures were changed this should not cause
any real behaviour change.
Differential Revision: http://reviews.llvm.org/D7467
llvm-svn: 228443
This allows us to model PHI nodes in the polyhedral description
without demoting them. The modeling however will result in the
same accesses as the demotion would have introduced.
Differential Revision: http://reviews.llvm.org/D7415
llvm-svn: 228433
The support is currently limited as we only allow them in the input but do
not emit them in the transformed SCoP due to the possible semantic changes.
Differential Revision: http://reviews.llvm.org/D5225
llvm-svn: 227054
The max loop depth was incorrectly computed for scops that contain a
block from a loop but do not contain the entire loop. We need to
check that the full loop is contained in the region when computing
the max loop depth.
These scops occur when a region containing an inner loop is expanded
to include some blocks from the outer loop, but it cannot be fully
expanded to contain the outer loop because the region containing the
outer loop is invalid.
Differential Revision: http://reviews.llvm.org/D6913
llvm-svn: 225812
This support is still incomplete and consequently hidden behind a switch that
needs to be enabled. One problem is ATM that we incorrectly interpret very large
unsigned values as negative values even if used in an unsigned comparision.
llvm-svn: 225480
AF = dyn_cast<SCEVAddRecExpr>(Pair.second) may be NULL for some SCEVs that we do
not support. When reporting the error we still want to pass a pointer that is
known to always be non-NULL.
I do not yet have a test case for this, unfortunately.
llvm-svn: 225461
Schedule dimensions that have the same constant value accross all statements do
not carry any information, but due to the increased dimensionality of the
schedule cost compile time. To not pay this cost, we remove constant dimensions
if possible.
llvm-svn: 225067
Without updating dependences we may lose implicit transitive dependences for
which all explicit dependences have gone through the statement iterations we
have just eliminated.
No test case. We should probably implement a -verify-dependences option.
This fixes llvm.org/PR21227
llvm-svn: 224459
This simplifies the construction of the input for the reduction dependence
computation and at the same time removes an assumption that expects the schedule
to be of 2D + 1 form (the odd dimensions giving textual order, the even
dimensions the loop iterations).
llvm-svn: 223621
This commit drops the Cloog support for Polly. The scripts and
documentation are changed to only use isl as prerequisity. In the code
all Cloog specific parts have been removed and all relevant tests have
been ported to the isl backend when it was created.
llvm-svn: 223141
SCEV based code generation has been the default for two weeks after having
been tested for a long time. We now drop the support the non-scev-based code
generation.
llvm-svn: 222978
In TempScopInfo::buildCondition we extract the conditions to guard the
BB *in addition of* loop bounds. This means we should only consider the
conditions in the paths (in CFG) that do not contain cycles (loops).
At the same time, we set the invert flag if the FalseBB of the current
branch dominates our target BB to indicate that we reach the target BB
with an inverted condition from the current branch.
In this case, the path from the FalseBB contains a cycle if the FalseBB
is the target of a backedge. The conditions implied by such a path should
not be consider. We can identify such a case by checking if the TrueBB
also dominates our target BB, which means we can also reach our target
BB from the TrueBB, without going through the backedge.
llvm-svn: 222907
In case a GEP instruction references into a fixed size array e.g., an access
A[i][j] into an array A[100x100], LLVM-IR does not guarantee that the subscripts
always compute values that are within array bounds. We now derive the set of
parameter values for which all accesses are within bounds and add the assumption
that the scop is only every executed with this set of parameter values.
Example:
void foo(float A[][20], long n, long m {
for (long i = 0; i < n; i++)
for (long j = 0; j < m; j++)
A[i][j] = ...
This loop yields out-of-bound accesses if m is at least 20 and at the same time
at least one iteration of the outer loop is executed. Hence, we assume:
n <= 0 or m <= 20.
Doing so simplifies the dependence analysis problem, allows us to perform
more optimizations and generate better code.
TODO: The location where the GEP instruction is executed is not necessarily the
location where the memory is actually accessed. As a result scanning for GEP[s]
is imprecise. Even though this is not a correctness problem, this imprecision
may result in missed optimizations or non-optimal run-time checks.
In polybench where this mismatch between parametric loop bounds and fixed size
arrays is common, we see with this patch significant reductions in compile time
(up to 50%) and execution time (up to 70%). We see two significant compile time
regressions (fdtd-2d, jacobi-2d-imper), and one execution time regression
(trmm). Both regressions arise due to additional optimizations that have been
enabled by this patch. They can be addressed in subsequent commits.
http://reviews.llvm.org/D6369
llvm-svn: 222754
We will use ScalarEvolution in the ScopInfo.cpp to get the loop trip
count, not cache it in the TempScop object.
Differential Revision: http://reviews.llvm.org/D6070
llvm-svn: 221035
Now MaxLoopDepth only lives in Scops not in TempScops anymore.
This is the first part of a series of changes to make TempScops
obsolete.
Differential Revision: http://reviews.llvm.org/D6069
llvm-svn: 221026
By adding braces into the DEBUG statement we can make clang-format format code
such as:
DEBUG(stmt1(); stmt2())
as multi-line code:
DEBUG({
stmt1();
stmt2();
});
This makes control-flow in debug statements easier to read.
llvm-svn: 220441
This patch changes the RegionSet type used in ScopDetection from a
std::set to a llvm::SetVector. The reason for the change is to
ensure deterministic output when printing the result of the
analysis. We had a windows buildbot failure for the modified test
because the output was coming in a different order.
Only one test case needed to be modified for this change. We could
use CHECK-DAG directives instead of CHECK in the analysis test cases
because the actual order of scops does not matter, but I think that
change should be done in a separate patch that modifies all the
appliciable tests. I simply modified the test to reflect the
expected deterministic output.
Differential Revision: http://reviews.llvm.org/D5897
llvm-svn: 220423
This patch does not change the semantic on it's own. However, the
dependence analysis as well as dce will now use the newest available
access relation for each memory access, thus if at some point the json
importer or any other pass will run before those two and set a new
access relation the behaviour will be different. In general it is
unclear if the dependence analysis and dce should be run on the old or
new access functions anyway. If we need to access the original access
function from the outside later, we can expose the getter again.
Differential Revision: http://reviews.llvm.org/D5707
llvm-svn: 219612
In case the pieceweise affine function used to create an isl_ast_expr
had empty cases (e.g., with contradicting constraints on the
parameters), it was possible that the condition of the isl_ast_expr
select was not a comparison but a constant (thus of type i64).
This patch does two thing:
1) Handle the case the condition of a select is not a i1 type like C.
2) Try to simplify the pieceweise affine functions for the min/max
access when we generate runtime alias checks. That step can often
remove empty or redundant cases as well as redundant constrains.
This fixes bug: http://llvm.org/PR21167
Differential Revision: http://reviews.llvm.org/D5627
llvm-svn: 219208
This resolved the issues with delinearized accesses that might alias,
thus delinearization doesn't deactivate runtime alias checks anymore.
Differential Revision: http://reviews.llvm.org/D5614
llvm-svn: 219078
This class allows to store information about the arrays in the SCoP.
For each base pointer in the SCoP one object is created storing the
type and dimension sizes of the array. The objects can be obtained via
the SCoP, a MemoryAccess or the isl_id associated with the output
dimension of a MemoryAccess (the description of what is accessed).
So far we use the information in the IslExprBuilder to create the
right base type before indexing into the base array. This fixes the
bug http://llvm.org/bugs/show_bug.cgi?id=21113 (both test cases are
included). On top of that we can now build runtime alias checks for
delinearized arrays as the dimension sizes are also part of the
ScopArrayInfo objects.
Differential Revision: http://reviews.llvm.org/D5613
llvm-svn: 219077
We use a parametric abstraction of the domain to split alias groups
if accesses cannot be executed under the same parameter evaluation.
The two test cases check that we can remove alias groups if the
pointers which might alias are never accessed under the same parameter
evaluation and that the minimal/maximal accesses are not global but
with regards to the parameter evaluation.
Differential Revision: http://reviews.llvm.org/D5436
llvm-svn: 218758
If there are multiple read only base addresses in an alias group
we can split it into multiple alias groups each with only one
read only access. This way we might reduce the number of
comparisons significantly as it grows linear in the number of
alias groups but exponential in their size.
Differential Revision: http://reviews.llvm.org/D5435
llvm-svn: 218757
This is just a optimization to save the compile time and execution time
for runtime alias checks if the user guarantees no aliasing all together.
llvm-svn: 218613
If too many parameters are involved in accesses used to create RTCs
we might end up with enormous compile times and RTC expressions.
The reason is that the lexmin/lexmax is dependent on all these
parameters and isl might need to create a case for every "ordering"
of them (e.g., p0 <= p1 <= p2, p1 <= p0 <= p2, ...).
The exact number of parameters allowed in accesses is defined by the
command line option -polly-rtc-max-parameters=XXX and set by default
to 8.
Differential Revision: http://reviews.llvm.org/D5500
llvm-svn: 218566
The run-time alias check places code that involves the base pointer at the
beginning of the SCoP. This breaks if the base pointer is defined inside the
SCoP. Hence, we can only create a run-time alias check if we are sure the base
pointer is not an instruction defined inside the scop. If it is we refuse to
handle the SCoP.
This commit should unbreak most of our current LNT failures.
Differential Revision: http://reviews.llvm.org/D5483
llvm-svn: 218412
This commit drops a call to std::sort, which sorted the base pointers that
possibly alias according to the address at which their corresponding llvm::Value
was allocated. There does not seem to be any good reason, why those pointers
should be (re)sorted and this only makes the output indeterministic.
llvm-svn: 218052
This change will build all alias groups (minimal/maximal accesses
to possible aliasing base pointers) we have to check before
we can assume an alias free environment. It will also use these
to create Runtime Alias Checks (RTC) in the ISL code generation
backend, thus allow us to optimize SCoPs despite possibly aliasing
pointers when this backend is used.
This feature will be enabled for the isl code generator, e.g.,
--polly-code-generator=isl, but disabled for:
- The cloog code generator (still the default).
- The case delinearization is enabled.
- The case non-affine accesses are allowed.
llvm-svn: 218046
During the IslAst parallelism check also compute the minimal dependency
distance and store it in the IstAst for node.
Reviewer: sebpop
Differential Revision: http://reviews.llvm.org/D4987
llvm-svn: 217729
Even though we previously correctly detected the multi-dimensional access
pattern for accesses with a certain base address, we only delinearized
non-affine accesses to this address. Affine accesses have not been touched and
remained as single dimensional accesses. The result was an inconsistent
description of accesses to the same array, with some being one dimensional and
some being multi-dimensional.
This patch ensures that all accesses are delinearized with the same
dimensionality as soon as a single one of them has been detected as non-affine.
While writing this patch, it became evident that the options
-polly-allow-nonaffine and -polly-detect-keep-going have not been properly
supported in case delinearization has been turned on. This patch adds relevant
test coverage and addresses these issues as well. We also added some more
documentation to the functions that are modified in this patch.
This fixes llvm.org/PR20123
Differential Revision: http://reviews.llvm.org/D5329
llvm-svn: 217728
At the moment we assume that only elements of identical size are stored/loaded
to a certain base pointer. This patch adds logic to the scop detection to verify
this.
Differential Revision: http://reviews.llvm.org/D5329
llvm-svn: 217727
It seems we added guards to check for non-existing std::map elements to make
sure they are default constructed before first accessed. Besides, the code
being wrong because of checking Context.NonAffineAccesses[BasePointer].size()
instead of Context.cound(BasePointer), such a check is also not necessary
as std::map takes care of this already.
From the std::map documentation:
"If k does not match the key of any element in the container, the function
inserts a new element with that key and returns a reference to its mapped value.
Notice that this always increases the container size by one, even if no mapped
value is assigned to the element (the element is constructed using its default
constructor)."
llvm-svn: 217506
Arcanist (arc) will now always run linters before uploading any new
commit to Phabricator. All errors/warnings (or their absence) will be
shown in the web interface together with a explanation by the commiter
(arcanist will ask the commiter if the build was not clean).
The linters include:
- clang-format
- spelling check
- permissions check (aka. chmod)
- filename check
- merge conflict marker check
Note, that their scope is sometimes limited (see .arclint for
details).
This commit also fixes all errors and warnings these linters reported,
namely:
- spelling mistakes and typos
- executable permissions for various text files
Differential Revision: http://reviews.llvm.org/D4916
llvm-svn: 215871
This will spill out information about LLVM-internals. However, in cases
where the name of the Value matches the name of the array in the source,
we provide more useful information. In cases where we spill internals,
the information still might help the user to pin down the correct
arrays.
The problem we face here is: The error is pinned to the debug location
of one of the offending values out of the alias set instead of all of them.
The more information we give the user about the set of aliasing
pointers the better.
llvm-svn: 215830
This reverts commit 215684. The intention of the commit is great, but
unfortunately it seems to be the cause of 14 LNT test suite failures:
http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly/builds/116
To make our buildbots and performance testers green until this issue is solved,
we temporarily revert this commit.
llvm-svn: 215816
The support is limited to signed modulo access and condition
expressions with a constant right hand side, e.g., A[i % 2] or
A[i % 9]. Test cases are modified according to this new feature and
new test cases are added.
Differential Revision: http://reviews.llvm.org/D4843
llvm-svn: 215684
Store the llvm::Value pointers of the AliasSet instead of the AliasSet
itself.
We have to be careful about changed IR when the message is generated,
because the Value pointers might not exist anymore. This would render
the Diagnostic invalid. For now we just assert there.
Simply do not retreive a diagnostic message after the IR has changed
it's not valid information anyway.
llvm-svn: 215625
There is no needed for neither 1-dimensional nor higher dimensional arrays to
require positive offsets in the outermost array dimension.
We originally introduced this assumption with the support for delinearizing
multi-dimensional arrays.
llvm-svn: 214665
+ Split all reduction dependences and map them to the causing memory accesses.
+ Print the types & base addresses of broken reductions for each "reduction
parallel" marked loop (OpenMP style).
+ 3 test cases to show how reductions are now represented in the isl ast.
The mapping "(ast) loops -> broken reductions" is also needed to find the
memory accesses we need to privatize in a loop.
llvm-svn: 214489
+ Introduced dependency type TYPE_TC_RED to represent the transitive closure
(& the reverse) of reduction dependences. These are used when we check for
reduction parallel loops.
+ Test cases including loop reversals and modulo schedules which compute
reductions in a alternated order.
llvm-svn: 213019
As our delinearization works optimistically, we need in some cases run-time
checks that verify our optimistic assumptions. A simple example is the
following code:
void foo(long n, long m, long o, double A[n][m][o]) {
for (long i = 0; i < 100; i++)
for (long j = 0; j < 150; j++)
for (long k = 0; k < 200; k++)
A[i][j][k] = 1.0;
}
After clang linearized the access to A and we delinearized it again to
A[i][j][k] we need to ensure that we do not access the delinearized array
out of bounds (this information is not available in LLVM-IR). Hence, we
need to verify the following constraints at run-time:
CHECK: Assumed Context:
CHECK: [o, m] -> { : m >= 150 and o >= 200 }
llvm-svn: 212198
This change is particularly useful in the code generation as we need
to know which binary operator/identity element we need to combine/initialize
the privatization locations.
+ Print the reduction type for each memory access
+ Adjusted the test cases to comply with the new output format and
to test for the right reduction type
llvm-svn: 212126
Iterate over all store memory accesses and check for valid binary reduction
candidate loads by following the operands of the stored value. For each
candidate pair we check if they have the same base address and there are no
other accesses which may overlap with them. This ensures that no intermediate
value can escape into other memory locations or is overwritten at some point.
+ 17 test cases for reduction detection and reduction dependency modeling
llvm-svn: 211957
Enabling -keep-going in ScopDetection causes expansion to an invalid
Scop candidate.
Region A <- Valid candidate
|
Region B <- Invalid candidate
If -keep-going is enabled, ScopDetection would expand A to A+B because
the RejectLog is never checked for errors during expansion.
With this patch only A becomes a valid Scop.
llvm-svn: 211875
This change will ease the transision to multiple reductions per statement as
we can now distinguish the effects of multiple reductions in the same
statement.
+ Wrapped reduction dependences are used to compute privatization dependences
+ Modified test cases to account for the change
llvm-svn: 211795
This dependency analysis will keep track of memory accesses if they might be
part of a reduction. If not, the dependences are tracked on a statement level.
The main reason to do this is to reduce the compile time while beeing able to
distinguish the effects of reduction and non-reduction accesses.
+ Adjusted two test cases
llvm-svn: 211794
Use a container class to store the reject logs. Delegating most calls to
the internal std::map and add a few convenient shortcuts (e.g.,
hasErrors()).
llvm-svn: 211780
Add support for generating optimization remarks after completing the
detection of Scops.
The goal is to provide end-users with useful hints about opportunities that
help to increase the size of the detected Scops in their code.
By default the remark is unspecified and the debug location is empty. Future
patches have to expand on the messages generated.
This patch brings a simple test case for ReportFuncCall to demonstrate the
feature.
Reports all missed opportunities to increase the size/number of valid
Scops:
clang <...> -Rpass-missed="polly-detect" <...>
opt <...> -pass-remarks-missed="polly-detect" <...>
Reports beginning and end of all valid Scops:
clang <...> -Rpass="polly-detect" <...>
opt <...> -pass-remarks="polly-detect" <...>
Differential Revision: http://reviews.llvm.org/D4171
llvm-svn: 211769
+ Collect reduction dependences
+ Introduced TYPE_RED in Dependences.h which can be used to obtain the
reduction dependences
+ Used TYPE_RED to prevent parallelization while we do not have a privatizing
code generation
+ Relax the dependences for non-parallel code generation
+ Add privatization dependences to ensure correctness
+ 12 Test cases to check for reduction and privatization dependences
llvm-svn: 211369
+ Flag to indicate reduction like statements
+ Command line option to (dis)allow multiplicative reduction opcodes
+ Two simple positive test cases, one fp test case w and w/o fast math
+ One "negative" test case (only reduction like but no reduction)
llvm-svn: 211114
+ Added const iterator version
+ Changed name to begin/end to allow range loops
+ Changed call sites to range loops
+ Changed typename to (const_)iterator
llvm-svn: 210927
Fixes#19976.
The error log does not contain an error, in case we reject a candidate
without generating a diagnostic message by using invalid<>(...). This is
the case for the top-level region of a function.
The patch comes without a test-case because adding a useful one requires
additional code just for triggering it. Before the patch it would only trigger,
if we try to print the CFG with Scop error annotations.
llvm-svn: 210753
Without this patch, the testcase would fail on the delinearization of the second
array:
; void foo(long n, long m, long o, double A[n][m][o]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++) {
; A[i+3][j-4][k+7] = 1.0;
; A[i][0][k] = 2.0;
; }
; }
; CHECK: [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[3 + i0, -4 + i1, 7 + i2] };
; CHECK: [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[i0, 0, i2] };
Here is the output of FileCheck on the testcase without this patch:
; CHECK: [n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[i0, 0, i2] };
^
<stdin>:26:2: note: possible intended match here
[n, m, o] -> { Stmt_for_body6[i0, i1, i2] -> MemRef_A[o0] };
^
It is possible to find a good delinearization for A[i][0][k] only in the context
of the delinearization of both array accesses.
There are two ways to delinearize together all array subscripts touching the
same base address: either duplicate the code from scop detection to first gather
all array references and then run the delinearization; or as implemented in this
patch, use the same delinearization info that we computed during scop detection.
llvm-svn: 210117
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.
llvm-svn: 209695
Support a 'keep-going' mode for the ScopDetection. In this mode, we just keep
on detecting, even if we encounter an error.
This is useful for diagnosing SCoP candidates. Sometimes you want all the
errors. Invalid SCoPs will still be refused in the end, we just refuse to
abort on the first error.
llvm-svn: 209574
This stores all RejectReasons created for one region
in a RejectLog inside the DetectionContext. For now
this only keeps track of the last error.
A separate patch will enable the tracking of all errors.
This patch itself does no harm (yet).
llvm-svn: 209572
definition below all of the header #include lines, Polly edition.
If you want to know more details about this, you can see the recent
commits to Debug.h in LLVM. This is just the Polly segment of a cleanup
I'm doing globally for this macro.
llvm-svn: 206852
The following example shows a non-parallel loop
void f(int a[]) {
int i;
for (i = 0; i < 10; ++i)
A[i] = A[i+5];
}
which, in case we import a schedule that limits the iteration domain
to 0 <= i < 5, becomes parallel. Previously we crashed in such cases, now we
just recognize it as parallel.
This fixes http://llvm.org/PR19435
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 206318
We update to a newer version of isl, which includes changes to the compute
out facility that make it a lot more predicable. With our new value, we can
reliably bail out for all reported bugs, while still being able to compute
dependences for all but two test cases in the LLVM test suite. For the remaining
two test cases, the dependence problem we construct is unnecessarily complex,
so there is hope we can improve on this. However, to avoid any future issues,
having a reliable compute out facility in place is important.
llvm-svn: 206106
This replaces the ancient INVALID/INVALID_NOVERIFY macros with a real
function.
The new invalid(..) function uses small diagnostic objects that are
generated on demand. We can store arbitrary additional information per
error type and generate useful debug/error messages on the fly.
Use it as follows:
if (/* Some error condition (ReportFoo) */)
invalid<ReportFoo>(Context, /*Assert=*/true/false,
(/* List of helpful diagnostic objects */));
Where ReportFoo is a subclass of RejectReason that is able to take the
list of helpful diagnostic objects in its constructor.
The implementation of invalid will create the report and fire
an assertion, if necessary.
llvm-svn: 205414
For complex examples it may happen that we do not compute dependences. In this
case we do not want to crash, but just not detect parallel loops.
llvm-svn: 204470
llvm.org/PR19081 reports that the polly dependence analysis causes some h264
compilation to hang. We adjust the compute out, to ensure we do not block on
expensive dependence calculations.
llvm-svn: 204168
Value::user_iterator changes in LLVM r203364. Converts several of these
loops to nice range based loops in the process.
Built and tested cleanly for me, yay for being able to fully build and
test Polly changes!
llvm-svn: 203381
The module LLVMPolly.so links to that. There is really no reason to build a
large number of mini-libraries here, especially as we do have dependences
between the libraries that are not properly handled and that make linking fail
on darwin.
Submitted-by: David Fang <fang@csl.cornell.edu>
llvm-svn: 202743
We mostly iterate over read-only values. Following a suggestion by Duncan P.N
Exons Smith, we use the construct 'const auto &' for this.
llvm-svn: 202651
clang-format requires a space before the ":" in the foreach loop. Even though
this is surprising to me, we follow this style to make our formatting
consistent with clang-format. I found that this clang-format style is used in a
couple of C++11 examples, hence I believe the fact that clang-format adds a
colon is not a bug but just something I was not used to yet.
llvm-svn: 202648
In case we do not have valid dependences, we do not run dead code elimination or
the schedule optimizer. This fixes an infinite loop in the dead code
elimination (PR12110).
llvm-svn: 201982
In case the domain of a statement is empty, the schedule optimizer set by
accident the schedule to a NULL pointer. This is incorrect. Instead, we set
it to an empty isl_map with zero schedule dimensions. We already checked for
this in our test cases, but unfortunately the test cases did not fail as
expected. The assert we add in this commit now ensures that the test cases
fail properly in case we regress on this again.
llvm-svn: 201886
This pass eliminates loop iterations that compute results that are not used
later on. This can help e.g. in D, where the default zero-initialization is
often unnecessary if right after new values are assigned to an array.
Contributed-by: Peter Conn <conn.peter@gmail.com>
llvm-svn: 201817
We do not have a use for this information at the moment. If we need this at some
point, the "instruction -> access" mapping needs to be enhanced as a single
instruction could then possibly perform multiple accesses.
This patch allows us to build the polyhedral information for scops with scalar
dependences.
llvm-svn: 201815
In rare cases the modification of one scop can effect the validity of other
scops, as code generation of an earlier scop may make the scalar evolution
functions derived for later scops less precise. The example that triggered this
patch was a scop that contained an 'or' expression as follows:
%add13710 = or i32 %j.19, 1
--> {(1 + (4 * %l)),+,2}<nsw><%for.body81>
Scev could only analyze the 'or' as it knew %j.19 is a multiple of 2. This
information was not available after the first scop was code generated (or
independent-blocks was run on it) and SCEV could not derive a precise SCEV
expression any more. This means we could not any more code generate this SCoP.
My current understanding is that there is always the risk that an earlier code
generation change invalidates later scops. As the example we have seen here is
difficult to avoid, we use this occasion to guard us against all such
invalidations.
This patch "solves" this issue by verifying right before we start working on
a detected scop, if this scop is in fact still valid. This adds a certain
overhead. However the verification we run is anyways very fast and secondly
it is only run on detected scops. So the overhead should not be very large. As
a later optimization we could detect scops only on demand, such that we need
to run scop-detections always only a single time.
This should fix the single last failure in the LLVM test-suite for the new
scev-based code generation.
llvm-svn: 201593
The MayAliasSet class is currently not used and just confuses people. We can
reintroduce it in case need a more precise tracking of alias sets.
llvm-svn: 201191
In rare cases, a region R which is itself not valid has an indirect child region
that is valid. When R becomes part of a valid region by expansion of another
region, then all children of R have to be erased from the set of valid regions.
This patch ensures that indirect children are erased in addition to direct
children.
Contributed-by: Armin Groesslinger <armin.groesslinger@uni-passau.de>
Tobias: I added a reduced test case and adjusted the logic of the patch to
only recurse until the first child is found.
llvm-svn: 200411
Verification of base addresses is difficult as the independent blocks pass may
introduce aliasing that was not there during scop detection. As a midterm
solution -polly-codegen-scev will remove the need for the independent blocks
pass. For now, we do not verify at compile time that the independent blocks pass
does not make the base addresses loop invariant. Disabling this just removes
one of the multiple safety layers we have. We still can check for correctness
in our regression tests.
llvm-svn: 200315
Array base addresses need to be invariant in the region considered. The base
address has to be computed outside the region, or, when it is computed inside,
the value must not change with the iterations of the loops. For example, when a
two-dimensional array is represented as a pointer to pointers the base address
A[i] in an access A[i][j] changes with i; therefore, such regions have to be
rejected.
Contributed by: Armin Größlinger <armin.groesslinger@uni-passau.de>
llvm-svn: 200314
Count the number of computational steps that have been used to solve the
dependence problem and abort in case we reach the "compute-out". This ensures we
do not hang forever in cases the dependence problem is too difficult to solve.
There is just a single case in the LLVM test-suite that runs into the
compute-out. Even in this case, we can probably coalesce some of the parameters
(i32 b, i32 b zext i64, ...) to simplify the problem enough to not hit the
compute out. However, for now we set the compute out in place to address the
general issue. The compute out was choosen such that it stops on a recent laptop
after about 8 seconds.
llvm-svn: 200156
This fixes a crash that appeared when generating dotty graphs for functions
without loops (for which we do not calculate polyhedral information).
llvm-svn: 198364
We now report the following:
$ polly-clang -O3 -mllvm -polly -mllvm -polly-report test.c -c \
-gline-tables-only
note: Polly detected an optimizable loop region (scop) in function 'foo'
test.c:2: Start of scop
test.c:3: End of scop
note: Polly detected an optimizable loop region (scop) in function 'bar'
test.c:9: Start of scop
test.c:13: End of scop
llvm-svn: 197558
When constructing a scop sometimes the exact representation of a statement or
condition would be very complex, but there is a common case which is a lot
simpler, but which is only valid under certain assumptions. The assumed context
records the assumptions taken during the construction of this scop and that need
to be code generated as a run-time test.
At the moment, we do not yet model any assumptions, but only added the
AssumedContext as well as the isl-ast generation support. As a next step,
this needs to be hooked up with the isl code generation.
if (1) /* run-time condition */
{ /* optimized code */ }
else
{ /* original code */ }
llvm-svn: 193652
Instead of defining the relevant functions inline, we now just keep the
declarations in the class itself. This makes the class declaration a lot
easier to read as all functions can be seen at once. We also use this
opportunity to privatize all functions not used in the public interface of the
class.
llvm-svn: 190841
Use 0 >= 1 instead of 0 != 0 to represent 'false'. This might be slightly more
efficient as isl may create a union of sets for 0 != 0, whereas this is never
needed for the expression 0 >= 1.
Contributed-by: Alexandre Isoard <alexandre.isoard@gmail.com>
llvm-svn: 190384
SCoP invariant parameters with the different start value would deter parameter
sharing. For example, when compiling the following C code:
void foo(float *input) {
for (long j = 0; j < 8; j++) {
// SCoP begin
for (long i = 0; i < 8; i++) {
float x = input[j * 64 + i + 1];
input[j * 64 + i] = x * x;
}
}
}
Polly would creat two parameters for these memory accesses:
p_0: {0,+,256}
p_2: {4,+,256}
[j * 64 + i + 1] => MemRef_input[o0] : 4o0 = p_1 + 4i0
[j * 64 + i] => MemRef_input[o0] : 4o0 = p_0 + 4i0
These parameters only differ from start value. To enable parameter sharing,
we split the start value from SCEVAddRecExpr, so they would share a single
parameter that always has zero start value:
p0: {0,+,256}<%for.cond1.preheader>
[j * 64 + i + 1] => MemRef_input[o0] : 4o0 = 4 + p_1 + 4i0
[j * 64 + i] => MemRef_input[o0] : 4o0 = p_0 + 4i0
Such translation can make the polly-dependence much faster.
Contributed-by: Star Tan <tanmx_star@yeah.net>
llvm-svn: 187728
String operations resulted by raw_string_ostream in the INVALID macro can lead
to significant compile-time overhead when compiling large size source code.
This is because raw_string_ostream relies on TypeFinder class, whose
compile-time cost increases as the size of the module increases. This patch
targets to ensure that it only track detection failures if actually needed.
In this way, we can avoid expensive string operations in normal execution.
With this patch file, the relative compile-time cost of Polly-detect pass does
not increase even when compiling very large size source code.
Contributed-by: Star Tan <tanmx_star@yeah.net>
llvm-svn: 187102
Ensure that the scalar write access corresponds to the result of a load
instruction appears after the generic read access corresponds to the load
instruction.
llvm-svn: 186419
isl recently introduced isl_val as an abstract interface to represent arbitrary
precision numbers. This interface superseeds the old isl_int interface. In
contrast to the old interface which implemented arbitrary precision arithmetic
using macros that forward to the gmp library, the new library hides the math
library implementation in isl. This allows us to switch the math library used by
isl without affecting users such as Polly.
llvm-svn: 184529
When a region header is part of a loop, then all entering edges of this region
should not come from the loop but outside the region. Otherwise, the loop may be
only partially part of the region, which would cause troubles in handling
induction variables.
Currently, we can only model induction variables that are either fully part of
the scop (loop induction variable) or induction variables that are scop-
invariant (parameter). A loop that is only partially part of the
scop causes troubles, as there is no good way to handle the induction
variable in the independent blocks pass.
Contributed-by: Star Tan <tanmx_star@yeah.net>
llvm-svn: 183800
Use the new cl::OptionCategory support to move the Polly options into a separate
option category. The aim is to hide most options and show by default only the
options a user needs to influence '-O3 -polly'. The available options probably
need some care, but here is the current status:
Polly Options:
Configure the polly loop optimizer
-enable-polly-openmp - Generate OpenMP parallel code
-polly - Enable the polly optimizer (only at -O3)
-polly-no-tiling - Disable tiling in the scheduler
-polly-only-func=<function-name> - Only run on a single function
-polly-report - Print information about the activities
of Polly
-polly-vectorizer - Select the vectorization strategy
=none - No Vectorization
=polly - Polly internal vectorizer
=unroll-only - Only grouped unroll the vectorize
candidate loops
=bb - The Basic Block vectorizer driven by
Polly
llvm-svn: 181295
Regions that have multiple entry edges are very common. A simple if condition
yields e.g. such a region:
if
/ \
then else
\ /
for_region
This for_region contains two entry edges 'then' -> 'for_region' and 'else' -> 'for_region'.
Previously we scheduled the RegionSimplify pass to translate such regions into
simple regions. With this patch, we now support them natively when the region is
in -loop-simplify form, which means the entry block should not be a loop header.
Contributed by: Star Tan <tanmx_star@yeah.net>
llvm-svn: 179586
Regions that have multiple exit edges are very common. A simple if condition
yields e.g. such a region:
if
/ \
then else
\ /
after
Region: if -> after
This regions contains the bbs 'if', 'then', 'else', but not 'after'. It has
two exit edges 'then' -> 'after' and 'else' -> 'after'.
Previously we scheduled the RegionSimplify pass to translate such regions into
simple regions. With this patch, we now support them natively.
Contributed-by: Star Tan <tanmx_star@yeah.net>
llvm-svn: 179159
Fix inspired from c2d4a0627e95c34a819b9d4ffb4db62daa78dade.
Given the following code
for (i = 0; i < 10; i++) {
;
}
S: A[i] = 0
When translate the data reference A[i] in statement S using scev, we need to
retrieve the scev of 'i' at the location of 'S'. If we do not do this the
scev that we obtain will be expressed as {0,+,1}_for and will reference loop
iterators that do not surround 'S'. What we really want is the scev to be
instantiated to the value of 'i' after the loop. This value is {10}.
This used to crash in:
int loopDimension = getLoopDepth(Expr->getLoop());
isl_aff *LAff = isl_aff_set_coefficient_si(
isl_aff_zero_on_domain(LocalSpace), isl_dim_in, loopDimension, 1);
(gdb) p Expr->dump()
{8,+,8}<nw><%do.body>
(gdb) p getLoopDepth(Expr->getLoop())
$5 = 0
isl_space *Space = isl_space_set_alloc(Ctx, 0, NbLoopSpaces);
isl_local_space *LocalSpace = isl_local_space_from_space(Space);
As we are trying to create a memory access in a stmt that is outside all loops,
LocalSpace has 0 dimensions:
(gdb) p NbLoopSpaces
$12 = 0
(gdb) p Statement.BB->dump()
if.then: ; preds = %do.end
%0 = load float* %add.ptr, align 4
store float %0, float* %q.1.reg2mem, align 4
br label %if.end.single_exit
and so the scev for %add.ptr should be taken at the place where it is used,
i.e., it should be the value on the last iteration of the do.body loop, and not
"{8,+,8}<nw><%do.body>".
llvm-svn: 179148
After this commit, polly is clang-format clean. This can be tested with
'ninja polly-check-format'. Updates to clang-format may change this, but the
differences will hopefully be both small and general improvements to the
formatting.
We currently have some not very nice formatting for a couple of items, DEBUG()
stmts for example. I believe the benefit of being clang-format clean outweights
the not perfect layout of this code.
llvm-svn: 177796
We now detect scops without a canonical induction variable and can generate a
polyhedral representation for them. There was no modification necessary to
code generate these scops.
llvm-svn: 177643
When using the scev based code generation, we now do not rely on the presence
of a canonical induction variable any more. This commit prepares the path to
(conditionally) disable the induction variable canonicalization pass.
llvm-svn: 177548
We now show the all members of the alias set that may couse possible aliasing.
In case a alias set member is not a named instruction (unnamed instructions or
constant expressions), we show the expression itself.
This improves our error message
from:
Possible aliasing for value: .reg2mem
to:
Possible aliasing: ".reg2mem",
"[0 x double]* inttoptr (i64 47255179264 to [0 x double]*)
llvm-svn: 174329
We fix the following formatting problems found by clang-format:
- 80 cols violations
- Obvious problems with missing or too many spaces
- multiple new lines in a row
clang-format suggests many more changes, most of them falling in the following
two categories:
1) clang-format does not at all format a piece of code nicely
2) The style that clang-format suggests does not match the style used in
Polly/LLVM
I consider differences caused by reason 1) bugs, which should be fixed by
improving clang-format. Differences due to 2) need to be investigated closer
to understand the cause of the difference and the solution that should be taken.
llvm-svn: 171241
Instead of calculating exact value (flow) dependences, it is also possible to
calculate memory based dependences. Sometimes memory based dependences are a lot
easier to calculate. To evaluate the benefits, we add an option to calculate
memory based dependences (use -polly-value-dependences=false).
llvm-svn: 167251
If the flags '-polly-report -g' are given, we print file name and line numbers
for the beginning and end of all detected scops.
linear-algebra/kernels/gemm/gemm.c:23: Scop start
linear-algebra/kernels/gemm/gemm.c:42: Scop end
linear-algebra/kernels/gemm/gemm.c:77: Scop start
linear-algebra/kernels/gemm/gemm.c:82: Scop end
llvm-svn: 167235
This ensures that the isl sets/maps we operate on have the same parameter
dimensions. Operations on objects with different parameter dimensions are not
allow and trigger assertions.
llvm-svn: 163618
This includes:
- The isl_id of the domain of the scattering must be copied from the original
domain
- Remove outdated references to a 'FinalRead' statement
- Print of the Pocc output, if -debug is provided.
- Add line breaks to some error messages.
Reported and Debugged by: Dustin Feld <d3.feld@gmail.com>
llvm-svn: 162901
Cast instruction do not have side effects and can consequently be part of a
scop. We special cased them earlier, as they may be problematic within array
subscripts or loop bounds. However, the scalar evolution validator already
checks for them such that there is no need to also check the instructions within
the basic blocks. Checking them is actually overly conservative as the precence
of casts may invalidate a scop, even though scalar evolution is not influenced
by it.
llvm-svn: 160261
Store a pointer to each ScopStmt in the isl_id associated with the space of its
domain. This will later allow us to recover the statement during code
generation with isl.
llvm-svn: 157607
Derive the maximal and minimal values of a parameter from the type it has. Add
this information to the scop context. This information is needed, to derive
optimal types during code generation.
llvm-svn: 157245