The interpretation of multiple known ValInsts for the same element and
timepoint is that these are alterntivate names for the same values,
for instance a PHINode and the incoming value when knowning it was
the last executed block. That means that known values do not conflict
if there at least (but necessarily all) one common ValInst.
Add a case to test this principle.
llvm-svn: 301480
Do not conflict if a write writes the same value as already known.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32026
llvm-svn: 301460
Do not conflict if the value of Existing and Proposed are the same.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32025
llvm-svn: 301301
Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.
The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).
Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.
Contributed-by: Philipp Schaad
Reviewers: Meinersbur, grosser, bollu
Reviewed By: grosser, bollu
Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32215
llvm-svn: 301299
When both, OccupiedAndKnown and Unused are given, use the former only
for the Known values. The relation Unused \union Occupied must always
hold.
This allows us to specify Known independently of Occupied. It is needed
for an artificial test case in https://reviews.llvm.org/D32025.
llvm-svn: 301284
Earlier, the call to buildFlow was:
WAR = buildFlow(Write, Read, MustWrite, Schedule).
This meant that Read could block another Read, since must-sources can
block each other.
Fixed the call to buildFlow to correctly compute Read. The resulting
code needs to do some ISL juggling to get the output we want.
Bug report: https://bugs.llvm.org/show_bug.cgi?id=32623
Reviewers: Meinersbur
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32011
llvm-svn: 301266
The isl unittest modified its PATH variable to point to the LLVM bin dir.
When building out-of-LLVM-tree, it does not contain the
polly-isl-test executable, hence the test fails.
Ensure that the polly-isl-test is written to a bin directory in the
build root, just like it would happen in an inside-LLVM build.
Then, change PATH to include that dir such that the executable in it
is prioritized before any other location.
llvm-svn: 301096
Unittests are linked against a subset of LLVM libraries and its
transitive dependencies resolved by CMake. The information about indirect
library dependency is not available when building separately from
LLVM, which result in missing symbol errors while linking.
Resolve this issue by querying llvm-config about the available
LLVM libraries and link against all of them, since dependence
information is still not available.
llvm-svn: 301095
We can only link against libLLVM.so or the individual libLLVM*.so
components, but not both of them. Doing so results in these components
exist twice in the programs address space, since it is already contained
in libLLVM.so. The observable effect of this is that command line
switches are registered multiple times (once for each instance),
which is an error.
This fixes llvm.org/PR32735.
Reported-by: Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com>
llvm-svn: 301020
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32027
llvm-svn: 300874
After the isl C++ binding generator is now close to being upstreamed to isl, we
synchronize the latest changes to Polly. These are mostly formatting changes
plus a small interface change for the foreach callback function and some naming
changes in isl::boolean.
llvm-svn: 300398
This commit switches Polly over to the isl::obj::foreach_* implementation, which
is part of the new isl bindings and follows the foreach pattern established in
Polly by Michael Kruse.
The original isl C function:
isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set *uset,
isl_stat (*fn)(__isl_take isl_set *set, void *user), void *user);
which required the user to define a static callback function to which all
interesting parameters are passed via a 'void *' user-pointer, is on the
C++ side available as a function that takes a std::function<>, which can
carry any additional arguments without the need for a user pointer:
stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const;
The following code illustrates the use of the new C++ interface:
auto Lambda = [=, &Result](isl::set Set) -> isl::stat {
auto Shifted = shiftDimension(Set, Pos, Amount);
Result = Result.add(Shifted);
return isl::stat::ok;
}
UnionSet.foreach_set(Lambda);
Polly had some specialized foreach functions which did not require the lambdas
to return a status flag. We remove these functions in this commit to move Polly
completely over to the new isl interface. We may in the future discuss if
functors without return values can be supported easily.
Another extension proposed by Michael Kruse is the use of C++ iterators to allow
the use of normal for loops to iterate over these sets. Such an extension would
allow us to further simplify the code.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D30620
llvm-svn: 300323
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.
For example, in case of the following matrix-matrix multiplication,
for (i = 0; i < 1024; i++)
for (k = 0; k < 1024; k++)
for (j = 0; j < 1024; j++)
C[i][j] += A[i][k] * B[k][j];
it can produce the following schedule tree
domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
0 <= i2 <= 1023 }"
child:
schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i1)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
permutable: 1
coincident: [ 1, 1, 0 ]
The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).
This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.
Refs.:
[1] - https://bugs.llvm.org/show_bug.cgi?id=32500
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D31741
llvm-svn: 299662
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.
Check against a possible divide-by-zero in getMacroKernelParams
and return early.
Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31708
llvm-svn: 299633
The current StackColoring algorithm does not correctly handle the
situation when some, but not all paths from a BB to the entry node
cross a llvm.lifetime.start. According to an interpretation of the
language reference at
http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic
this might be correct, but it would cost too much effort to handle
in StackColoring.
To be on the safe side, remove all lifetime markers even in the original
code version (they have never been copied to the optimized version)
to ensure that no path to the entry block will cross a
llvm.lifetime.start.
The same principle applies to paths the a function return and the
llvm.lifetime.end marker, so we remove them as well.
This fixes llvm.org/PR32251.
Also see the discussion at
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html
llvm-svn: 299585
= Change of WAR, WAW generation: =
- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
`sink <- may source <- must source` as a *may* dependence.
- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.
- Now, we call WAR and WAW correctly.
== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
WAW = isl_union_flow_get_may_dependence(Flow);
isl_union_flow_free(Flow);
```
== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
Flow = buildFlow(Write, Read, MustaWrite, Schedule);
WAR = isl_union_flow_get_must_dependence(Flow);
isl_union_flow_free(Flow);
```
- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
intuitively correct, as we do not want may-writes to block
must-writes.
- Leaves us with direct (R -> W).
- This affects reduction generation since RED is built using WAW and WAR.
= New StrictWAW for Reductions: =
- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.
= Explanation: Why the new WAR dependences in tests are correct: =
- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.
== Code: ==
```lang=llvm, name=new-war-dependence.ll
; void manyreductions(long *A) {
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S0: *A += 42;
;
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S1: *A += 42;
;
```
=== WAR dependence: ===
{ S0[1023, 1023] -> S1[0, 0] }
- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:
```lang=cpp, name=dependence-incorrect, counterexample
S0[1023, 1023]:
*-- tmp = *A (load0)--*
WAR 2 add = tmp + 42 |
*-> *A = add (store0) |
WAR 1
S1[0, 0]: |
tmp = *A (load1) |
add = tmp + 42 |
A = add (store1)<-*
```
- One may assume that WAR2 *hides* WAR1 (since store0 happens before
store1). However, within a statement, Polly has no idea about the
ordering of loads and stores.
- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
S0[1023, 1023]:
A = add (store0)
tmp = A (load0) ---*
add = A + 42 |
WAR 1
S1[0, 0]: |
tmp = A (load1) |
add = A + 42 |
A = add (store1) <-*
```
- So, Polly generates (correct) WAR dependences. It does not make sense
to remove these dependences, since they are correct with respect to
Polly's model.
Reviewers: grosser, Meinersbur
tags: #polly
Differential revision: https://reviews.llvm.org/D31386
llvm-svn: 299429
Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly.
This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however.
Reviewers: grosser, Meinersbur
Reviewed By: grosser
Subscribers: nemanjai, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31653
llvm-svn: 299423
Instead of creating the declaration ourselves, we obtain it directly from the
LLVM intrinsic definitions. This addresses a post-review comment for r299359.
Suggested-by: Hongzing Zheng <etherzhhb@gmail.com>
llvm-svn: 299360
Add support for -polly-codegen-perf-monitoring. When performance monitoring
is enabled, we emit performance monitoring code during code generation that
prints after program exit statistics about the total number of cycles executed
as well as the number of cycles spent in scops. This gives an estimate on how
useful polyhedral optimizations might be for a given program.
Example output:
Polly runtime information
-------------------------
Total: 783110081637
Scops: 663718949365
In the future, we might also add functionality to measure how much time is spent
in optimized scops and how many cycles are spent in the fallback code.
Reviewers: bollu,sebpop
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31599
llvm-svn: 299359
Trivial fix for two testcases. When Polly isn't linked into opt,
independent of whether it's built in-tree or not, these testcases forget
to load the appropriate library.
Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>
Differential Revision: https://reviews.llvm.org/D31596
llvm-svn: 299357
No-alias metadata grows quadratic in the size of arrays involved, which can
become very costly for large programs. This commit bounds the number of arrays
for which we construct no-alias information to ten. This is conservatively
correct, as we just provide less information to LLVM and speeds up the compile
time of one of my internal test cases from 'does-not-terminate' to
'finishes-in-less-than-a-minute'. In the future we might try to be more clever
here, but this change should provide a good baseline.
llvm-svn: 299352
Provide an common way for testing if a statement contains something
for region and block statements. First user is
RegionGenerator::addOperandToPHI.
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298617
Add shiftDim and convertZoneToTimepoints overloads for isl maps.
Add distributeDomain, liftDomains and applyDomainRange functions.
These are going to be used in https://reviews.llvm.org/D31247
(Add known array contents to Knowledge)
llvm-svn: 298543
The isl C++ bindings now has implicit conversions from isl::set to
isl::union_set. Therefore the additional overload accepting isl::set
is not required anymore.
llvm-svn: 298529
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30606
llvm-svn: 298510
Map the new load to the base pointer of the invariant load hoisted load
to be able to find the alias information for it.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30605
llvm-svn: 298507
"Write" is an overloaded term. In collectInfo() till buildFlow(), it is
used to mean "must writes". However, within the memory based analysis,
it is used to mean "both may and must writes". Renaming the Write
variable helps clarify this difference.
Reviewers: grosser
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31181
llvm-svn: 298361
Note that the isl::union_set(isl_ctx,std::string) constructor will
auto-convert the char* to an std::string. Converting a nullptr to
std::string is undefined in C++11 (sect. 21.4.2.9).
llvm-svn: 298259
Otherwise the isl_id NewId which ensures uniqueness of the
created space is unused. None of the tests currently uses an
nameless tuple, so there is not change in what is tested.
llvm-svn: 298258
When not adding constraints on parameters using -polly-ignore-parameter-bounds,
the context may not necessarily list all parameter dimensions. To support code
generation in this situation, we now always iterate over the actual parameter
list, rather than relying on the context to list all parameter dimensions.
llvm-svn: 298197
After this change, enabling -polly-codegen-add-debug-printing in combination
with -polly-codegen-generate-expressions allows us to instrument the compiled
binaries to not only print the values stored and loaded to a given memory
access, but also to print the accessed location with array name and
per-dimension offset:
MemRef_A[3][2]
Store to 6299784: 5.000000
MemRef_A[3][3]
Load from 6299788: 0.000000
MemRef_A[3][3]
Store to 6299788: 6.000000
This can be very helpful for debugging.
llvm-svn: 298194
In commit r219005 lifetime markers have been introduced to mark the lifetime of
the OpenMP context data structure. However, their use seems incorrect and
recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was
not at all related to r298053. r298053 only caused a change in the loop order,
as this change resulted in a different isl internal representation which caused
the scheduler to derive a different schedule. This change then caused the IR to
change, which apparently created a pattern in which LLVM exploites the lifetime
markers. It seems we are using the OpenMP context outside of the lifetime
markers. Even though CrystalMk could probably be fixed by expanding the scope of
the lifetime markers, it is not clear what happens in case the OpenMP function
call is in a loop which will cause a sequence of starting and ending lifetimes.
As it is unlikely that the lifetime markers give any performance benefit, we
just drop them to remove complexity.
llvm-svn: 298192
The AssumptionCache removal of r289756 has been reverted in
r290086/r290087. A different solution has been implemented in r291671
which keeps the AssumptionCache. We can therefore use it again in Polly.
This reverts r289791.
llvm-svn: 298089
In the previous default ScopInfo applied the profitability heuristic for
scalar accesses (-polly-unprofitable-scalar-accs=true) and the
-polly-prune-unprofitable was disabled by default
(-polly-enable-prune-unprofitable=false) as that pruning was already done.
This changes switches the defaults to -polly-unprofitable-scalar-accs=true
-polly-enable-prune-unprofitable=false such that the scalar access
heuristic check is done by the pass. This allows passes between ScopInfo
and PruneUnprofitable to optimize away scalar accesses.
Without enabling such intermediate passes, there is no change in
behaviour of profitability checks in a PassManagerBuilder built
pass chain, but it allows us to cover this configuration with the
buildbots.
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298081
ScopInfo's normal profitability heuristic considers SCoPs where all
statements have scalar writes as not profitably optimizable and
invalidate the SCoP in that case. However, -polly-delicm and
-polly-simplify may be able to remove some of the scalar writes such
that the flag -polly-unprofitable-scalar-accs=false allows disabling
that part of the heuristic.
In cases where DeLICM (or other passes after ScopInfo) are not
successful in removing scalar writes, the SCoP is still not profitably
optimizable. The schedule optimizer would again try computing another
schedule, resulting in slower compilation.
The -polly-prune-unprofitable pass applies the profitability heuristic
again before the schedule optimizer Polly can still bail out even with
-polly-unprofitable-scalar-accs=false.
Differential Revision: https://reviews.llvm.org/D31033
llvm-svn: 298080
For experiments it is sometimes helpful to provide parameter bound information
to polly and to not use these parameter bounds for simplification.
Add a new option "-polly-ignore-parameter-bounds" which does precisely this.
llvm-svn: 298077
Dependences::calculateDependences.
This ensures that we handle may-writes correctly when building
dependence information. Also add a test case checking correctness of
may-write information. Not handling it before was an oversight.
Differential Revision: https://reviews.llvm.org/D31075
llvm-svn: 298074
For experiments it is sometimes helpful to not take any inbounds assumptions.
Add a new option "-polly-ignore-inbounds" which does precisely this.
llvm-svn: 298073
In subsequent changes we will make Polly a little bit more lazy in adding
parameter dimensions to different sets. As a result, not all parameters will
always be part of the parameter space. This change ensures that we do not use
the '-1' returned when a parameter dimension cannot be found, but instead
just do not try to eliminate the anyhow non-existing dimension.
llvm-svn: 298054
Since several years, isl can perform most operations on sets with differing
parameter spaces, by expanding the parameter space on demand relying using
named isl ids to distinguish different parameter dimensions.
By not always expanding to full dimensionality the set remain smaller and can
likely be operated on faster. This change by itself did not yet result in
measurable performance benefits, but it is a step into the right direction
needed to ensure that subsequent changes indeed can work with lower-dimensional
sets and these sets do not get blown up by accident when later intersected with
the domain context.
llvm-svn: 298053
Introduce ScopStmt::getSurroundingLoop() to replace getFirstNonBoxedLoopFor.
getSurroundingLoop() returns the precomputed surrounding/first non-boxed
loop. Except in ScopDetection, the list of boxed loops is only used to
get the surrounding loop. getFirstNonBoxedLoopFor also requires LoopInfo
at every use which is not necessarily available everywhere where we may
want to use it.
Differential Revision: https://reviews.llvm.org/D30985
llvm-svn: 297899
The bindings currently need to be generated manually, as they are not yet
part of the official isl distribution. Hence, we keep them across updates
assuming they only need to be updated when new functions or functionality
should be exposed.
llvm-svn: 297710
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.
The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".
Reviewers: grosser
Subscribers: gareevroman, pollydev
Tags: #polly
Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>
Differential Revision: https://reviews.llvm.org/D30815
llvm-svn: 297587
If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.
Reviewers: grosser
Subscribers: nemanjai
Tags: #polly
Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com>
Differential Revision: https://reviews.llvm.org/D30864
llvm-svn: 297578
Currently the isl::val constructor only takes a signed long as parameter, which
on Windows is only 32 bit large and can consequently not be used to obtain the
same result when loading from the expression '(1ull << 32) - 1)' that we get
when loading this value via isl_val_int_from_ui or when loading the value
on Linux systems with 64 bit long data types. We avoid this issue by performing
the shift and subtractiong within the isl::val.
It would be nice to teach the isl++ bindings to construct isl::val from other
integer types, but the current interface is not sufficient to do so. If
constructors from both signed long and unsigned long are exposed, then integer
literals that are of type 'int' and which must be converted to 'long' to match
the constructor have two ambigious constructors to choose from, which result
in a compiler error. The right solution is likely to additionally expose
constructors from signed and unsigned int, but these are not yet available in
the isl C interface and adding those adhoc to our bindings is something I would
like to avoid for now. We should address this issue with a proper discussion
on the isl side.
llvm-svn: 297522
As most discussions about these bindings have concluded and only the final
patch review on the isl mailing list is missing, we drop the experimental
warning tag to match the patchset we will submit to isl, which is expected to
not change notably any more.
llvm-svn: 297519
Instead of declaring a function as:
inline val plain_get_val_if_fixed(enum dim type, unsigned int pos) const;
we use:
inline isl::val plain_get_val_if_fixed(isl::dim type, unsigned int pos) const;
The first argument caused the following compile time error on windows:
"error C3431: 'dim': a scoped enumeration cannot be redeclared as an
unscoped enumeration"
In some cases it is sufficient to just drop the 'enum' prefix, but for example
for isl::set the 'enum class dim' type collides with the function name
isl::set::dim and can consequently not be referenced. To avoid such kind of
ambiguities in the future we add the isl:: prefix consistently to all types
used.
Reported-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 297478
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.
It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.
It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.
Differential Revision: https://reviews.llvm.org/D30820
llvm-svn: 297473
This pass is a small and self-contained example of a piece of code that was
written with the isl C interface. The diff of this change nicely shows how the
C++ bindings can improve the readability of the code by avoiding the long C
function names and by avoiding any need for memory management.
As you will see, no calls to isl_*_copy or isl_*_free are needed anymore.
Instead the C++ interface takes care of automatically managing the objects.
This may introduce internally additional copies, but due to the isl reference
counting, such copies are expected to be cheap. For performance critical
operations, we will later exploit move semantics to eliminate unnecessary
copies that have shown to be costly.
Below we give a set of examples that shows the benefit of the C++ interface vs.
the pure C interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(aff) || isl_aff_is_one(aff))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA = isl_union_pw_multi_aff_from_union_map(umap);
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto *Empty = isl_union_map_empty(space);
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
set = isl_union_set_intersect(set, set2);
After:
Set = Set.intersect(Set2);
The use of isl::boolean in return types also adds an increases the robustness
of Polly, as on conversion to true or false, we verify that no isl_bool_error
has been returned and assert in case an error was returned. Before this change
we would have just ignored the error and proceeded with (some) exection path.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30619
llvm-svn: 297466
For this translation we introduce two functions, valFromAPInt and APIntFromVal,
to convert between isl::val and APInt. For now these are just proxies, but in
the future they will replace the current isl_val* based conversion functions.
The isl unit test cases benefit most from the new isl::boolean (from Michael
Kruse), which can be explicitly casted to bool and which -- as part of this cast
-- emits a check that no error condition has been triggered so far. This allows
us to simplify
EXPECT_EQ(isl_bool_true, isl_val_is_zero(IslZero));
to
EXPECT_TRUE(IslZero.is_zero());
This simplification also becomes very clear in operator==, which changes from
auto IsEqual = isl_set_is_equal(LHS.keep(), RHS.keep());
EXPECT_NE(isl_bool_error, IsEqual);
return IsEqual;
to just
return bool(LHS.is_equal(RHS));
Some background for non-isl users. The isl C interface has an isl_bool type,
which can be either true, false, or error. Hence, whenever a function returns
a value of type isl_bool, an explicit error check should be considered. By
using isl::boolean, we can just cast the isl::boolean to 'bool' or simply use
the isl::boolean in a context where it will automatically be casted to bool
(e.g., in an if-condition). When doing so, the C++ bindings automatically add
code that verifies that the return value is not an error code. If it is, the
program will warn about this and abort. For cases where errors are expected,
isl::boolean provides checks such as boolean::is_true_or_error() or
boolean::is_true_no_error() to explicitly control program behavior in case of
error conditions.
Thanks to the new automatic memory management, we also can avoid many calls to
isl_*_free. For code that had previously been managed by IslPtr<>, many calls
to give/take/copy are eliminated.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30618
llvm-svn: 297464
Translate the full algorithm to use the new isl C++ bindings
This is a large piece of code that has been written with the Polly IslPtr<>
memory management tool, which only performed memory management, but did not
provide a method interface. As such the code was littered with calls to
give(), copy(), keep(), and take(). The diff of this change should give a
good example how the new method interface simplifies the code by removing the
need for switching between managed types and C functions all the time
and consequently also the need to use the long C function names.
These are a couple of examples comparing the old IslPtr memory management
interface with the complete method interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(Aff.get()) || isl_aff_is_one(Aff.get()))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA =
give(isl_union_pw_multi_aff_from_union_map(UMap.copy());
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto Empty = give(isl_union_map_empty(Space.copy());
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
Set = give(isl_union_set_intersect(Set.copy(), Set2.copy());
After:
Set = Set.intersect(Set2);
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30617
llvm-svn: 297463
The isl C++ binding method interface introduces a thin C++ layer that allows
to call isl methods directly on the memory managed C++ objects. This makes the
relevant methods directly available via code-completion interfaces, allows for
the use of overloading, conversion constructors, and many other nice C++
features that make using isl a lot easier.
The individual features will be highlighted in the subsequent commits.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30616
llvm-svn: 297462
Over the last couple of months several authors of independent isl C++ bindings
worked together to jointly design an official set of isl C++ bindings which
combines their experience in developing isl C++ bindings. The new bindings have
been designed around a value pointer style interface and remove the need for
explicit pointer managenent and instead use C++ language features to manage isl
objects.
This commit introduces the smart-pointer part of the isl C++ bindings and
replaces the current IslPtr<T> classes, which served the very same purpose, but
had to be manually maintained. Instead, we now rely on automatically generated
classes for each isl object, which provide value_ptr semantics.
An isl object has the following smart pointer interface:
inline set manage(__isl_take isl_set *ptr);
class set {
friend inline set manage(__isl_take isl_set *ptr);
isl_set *ptr = nullptr;
inline explicit set(__isl_take isl_set *ptr);
public:
inline set();
inline set(const set &obj);
inline set &operator=(set obj);
inline ~set();
inline __isl_give isl_set *copy() const &;
inline __isl_give isl_set *copy() && = delete;
inline __isl_keep isl_set *get() const;
inline __isl_give isl_set *release();
inline bool is_null() const;
}
The interface and behavior of the new value pointer style classes is inspired
by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which
proposes a std::value_ptr, a smart pointer that applies value semantics to its
pointee.
We currently only provide a limited set of public constructors and instead
require provide a global overloaded type constructor method "isl::obj
isl::manage(isl_obj *)", which allows to convert an isl_set* to an isl::set by
calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor
for unique pointers.
The next two functions isl::obj::get() and isl::obj::release() are taken
directly from the std::value_ptr proposal:
S.get() extracts the raw pointer of the object managed by S.
S.release() extracts the raw pointer of the object managed by S and sets the
object in S to null.
We additionally add std::obj::copy(). S.copy() returns a raw pointer refering
to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a
functionality commonly needed when interacting directly with the isl C
interface where all methods marked with __isl_take require consumable raw
pointers.
S.is_null() checks if S manages a pointer or if the managed object is currently
null. We add this function to provide a more explicit way to check if the
pointer is empty compared to a direct conversion to bool.
This commit also introduces a couple of polly-specific extensions that cover
features currently not handled by the official isl C++ bindings draft, but
which have been provided by IslPtr<T> and are consequently added to avoid code
churn. These extensions include:
- operator bool() : Conversion from objects to bool
- construction from nullptr_t
- get_ctx() method
- take/keep/give methods, which match the currently used naming
convention of IslPtr<T> in Polly. They just forward to
(release/get/manage).
- raw_ostream printers
We expect that these extensions are over time either removed or upstreamed to
the official isl bindings.
We also export a couple of classes that have not yet been exported in isl (e.g.,
isl::space)
As part of the code review, the following two questions were asked:
- Why do we not use a standard smart pointer?
std::value_ptr was a proposal that has not been accepted. It is consequently
not available in the standard library. Even if it would be available, we want
to expand this interface with a complete method interface that is conveniently
available from each managed pointer. The most direct way to achieve this is to
generate a specialiced value style pointer class for each isl object type and
add any additional methods to this class. The relevant changes follow in
subsequent commits.
- Why do we not use templates or macros to avoid code duplication?
It is certainly possible to use templates or macros, but as this code is
auto-generated there is no need to make writing this code more efficient. Also,
most of these classes will be specialized with individual member functions in
subsequent commits, such that there will be little code reuse to exploit. Hence,
we decided to do so at the moment.
These bindings are not yet officially part of isl, but the draft is already very
stable. The smart pointer interface itself did not change since serveral months.
Adding this code to Polly is against our normal policy of only importing
official isl code. In this case however, we make an exception to showcase a
non-trivial use case of these bindings which should increase confidence in these
bindings and will help upstreaming them to isl.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30325
llvm-svn: 297452
This pass allows writing the LLVM-IR just before and after the Polly
passes to a file.
Dumping the IR before Polly helps reproducing bugs that occur in code
generated by clang. It is the only reliable way to get the IR that
triggers a bug. The alternative is to emit the IR with
clang -c -emit-llvm -S -o dump.ll
then pass it through all optimization passes
opt dump.ll -basicaa -sroa ... -S -o optdump.ll
to then reproduce the error with
opt optdump.ll -polly-opt-isl -polly-codegen -analyze
However, the IR is not the same. -O3 uses a PassBuilder than creates passes
with different parameters than the default.
Dumping the IR after Polly is useful to compare a miscompilation with
a known-good configuration.
Differential Revision: https://reviews.llvm.org/D30788
llvm-svn: 297415
Generate a PollyConfig.cmake for use with Cmake's find_package in
out-of-tree projects.
Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>
Differential Revision: https://reviews.llvm.org/D30495
llvm-svn: 297395
In case LLVM pointers are annotated with !dereferencable attributes/metadata
or LLVM can look at the allocation from which a pointer is derived, we can know
that dereferencing pointers is safe and can be done unconditionally. We use this
information to proof certain pointers as save to hoist and then hoist them
unconditionally.
llvm-svn: 297375
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.
This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.
Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.
The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.
A patch that makes codegen support partial accesses is in preparation
as well.
Differential Revision: https://reviews.llvm.org/D30763
llvm-svn: 297373
Simplify ScopDetection::isInvariant(). Essentially deny everything that
is defined within the SCoP and is not load-hoisted.
The previous understanding of "invariant" has a few holes:
- Expressions without side-effects with only invariant arguments, but
are defined withing the SCoP's region with the exception of selects
and PHIs. These should be part of the index expression derived by
ScalarEvolution and not of the base pointer.
- Function calls with that are !mayHaveSideEffects() (typically
functions with "readnone nounwind" attributes). An example is given
below.
@C = external global i32
declare float* @getNextBasePtr(float*) readnone nounwind
...
%ptr = call float* @getNextBasePtr(float* %A, float %B)
The call might return:
* %A, so %ptr aliases with it in the SCoP
* %B, so %ptr aliases with it in the SCoP
* @C, so %ptr aliases with it in the SCoP
* a new pointer everytime it is called, such as malloc()
* a pointer into the allocated block of one of the aforementioned
* any of the above, at random at each call
Hence and contrast to a comment in the base_pointer.ll regression
test, %ptr is not necessarily the same all the time. It might also
alias with anything and no AliasAnalysis can tell otherwise if the
definition is external. It is hence not suitable in the role of a
base pointer.
The practical problem with base pointers defined in SCoP statements is
that it is not available globally in the SCoP. The statement instance
must be executed first before the base pointer can be used. This is no
problem if the base pointer is transferred as a scalar value between
statements. Uses of MemoryAccess::setNewAccessRelation may add a use of
the base pointer anywhere in the array. setNewAccessRelation is used by
JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently
assumes that base pointers are available globally and generates invalid
code for new access relation (referring to the base pointer of the
original code) if not, even if the base pointer would be available in
the statement.
This could be fixed with some added complexity and restrictions. The
ExprBuilder must lookup the local BBMap and code that call
setNewAccessRelation must check whether the base pointer is available
first.
The code would still be incorrect in the presence of aliasing. There
is the switch -polly-ignore-aliasing to explicitly allow this, but
it is hardly a justification for the additional complexity. It would
still be mostly useless because in most cases either getNextBasePtr()
has external linkage in which case the readnone nounwind attributes
cannot be derived in the translation unit itself, or is defined in the
same translation unit and gets inlined.
Reviewed By: grosser
Differential Revision: https://reviews.llvm.org/D30695
llvm-svn: 297281
Only when load-hoisted we can be sure the base pointer is invariant
during the SCoP's execution. Most of the time it would be added to
the required hoists for the alias checks anyway, except with
-polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if
AliasAnalysis is already sure it doesn't alias with anything
(for instance if there is no other pointer to alias with).
Two more parts in Polly assume that this load-hoisting took place:
- setNewAccessRelation() which contains an assert which tests this.
- BlockGenerator which would use to the base ptr from the original
code if not load-hoisted (if the access expression is regenerated)
Differential Revision: https://reviews.llvm.org/D30694
llvm-svn: 297195
There is no point in optimizing unreachable code, hence our test cases should
always return.
This commit is part of a series that makes Polly more robust on the presence of
unreachables.
llvm-svn: 297158
These test cases should work in combination with
https://reviews.llvm.org/D12676, but became outdated over time. Update them
in preparation of discussions with Daniel Berlin on how to represent unreachable
in the post-dominator tree.
llvm-svn: 297157
Our current scop modeling enters an infinite loop when trying to model code
that has unreachable instructions (e.g.,
test/ScopInfo/BoundChecks/single-loop.ll), as the number of basic blocks
returned by the LLVM Loop* does not include unreachable basic blocks that
branch off from the core loop body. This arises for example in the following
piece of code:
for (i = 0; i < N; i++) {
if (i > 1024)
abort(); <- this abort might be translated to an
unreachable
A[i] = ...
}
This patch adds these unreachable basic blocks in our per loop basic block
count to ensure that the schedule construction does not assume a loop has been
processed completely, despite certain unreachable basic blocks still remaining.
The infinite loop is only observable in combination with
https://reviews.llvm.org/D12676 or a similar patch.
llvm-svn: 297156
Scops that exit with an unreachable are today still permitted, but make little
sense to optimize. We therefore can already skip them during scop detection.
This speeds up scop detection in certain cases and also ensures that bugpoint
does not introduce unreachables when reducing test cases.
In practice this change should have little impact, as the performance of
unreachable code is unlikely to matter.
This commit is part of a series that makes Polly more robust in the presence
of unreachables.
llvm-svn: 297151
There is no point in optimizing unreachable code, hence our test cases should
always return.
This commit is part of a series that makes Polly more robust on the presence of
unreachables.
llvm-svn: 297150
There is no point in optimizing unreachable code, hence our test cases should
always return.
This commit is part of a series that makes Polly more robust on the presence of
unreachables.
llvm-svn: 297147
r296992 made ScalarEvolution's CompareValueComplexity less aggressive,
and that broke the polly test being fixed in this change. This change
explicitly bumps CompareValueComplexity in said test case to make it
pass.
Can someone from the polly team please can give me an idea on if this
case is important enough to have
scalar-evolution-max-value-compare-depth be 3 by default?
llvm-svn: 296994
Some Polly ACC test cases fail without a working NVPTX backend. We explicitly
specify this dependence in REQUIRES. Alternatively, we could have only marked
polly-acc as supported in case the NVPTX backend is available, but as we might
use other backends in the future, this does not seem to be the best choice.
For this to work, we also need to make the 'targets_to_build' information
available.
Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 296853
These loads cannot be savely hoisted as the condition guarding the
non-affine region cannot be duplicated to also protect the hoisted load
later on. Today they are dropped in ScopInfo. By checking for this early, we
do not even try to model them and possibly can still optimize smaller regions
not containing this specific required-invariant load.
llvm-svn: 296744
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.
Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access
[n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }
which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.
For people only reading commit messages, here the comment that explains what
memory folding is:
To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.
We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.
As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:
[i,j] -> [i, j] : j >= 0
[i,j] -> [i-1, j+N] : j < 0
After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.
llvm-svn: 296679
Without this simplification for a loop nest:
void foo(long n1_a, long n1_b, long n1_c, long n1_d,
long p1_b, long p1_c, long p1_d,
float A_1[][p1_b][p1_c][p1_d]) {
for (long i = 0; i < n1_a; i++)
for (long j = 0; j < n1_b; j++)
for (long k = 0; k < n1_c; k++)
for (long l = 0; l < n1_d; l++)
A_1[i][j][k][l] += i + j + k + l;
}
the assumption:
n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)
is taken rather than the simpler assumption:
p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.
The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.
This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.
llvm-svn: 296456
This patch adds an option to build against a version of libisl already
installed on the system. The installation is autodetected using the
pkg-config file shipped with isl.
The detection of the library is in the FindISL.cmake module that creates
an imported target.
Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>
Differential Revision: https://reviews.llvm.org/D30043
llvm-svn: 296361
These verify that some scalars are not mapped because it would be
incorrect to do so.
For these check we verify that no transformation has been executed from
output of the pass's '-analyze'. Adding optimization remarks is not useful
as it would result in too many messages, even repeated ones. I avoided
checking the '-debug-only=polly-delicm' output which is an antipattern.
llvm-svn: 296348
We can not perform the dependence analysis and, consequently, the parallel
code generation in case the schedule tree contains extension nodes.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30394
llvm-svn: 296325
Control flow would flow-through after the check whether the operations
quota exceeded, with the intention that it would later be caught by
Knowledge::isUsable(). However, the Knowledge constructor has its own
assertions to check consistency which would fail if its fields have only
been initialized partially because some sets have been computed correctly
before the operations quota takes effect.
Fix by erroring-out early instead of falling-throught into the code that
might expect that everything has been computed correctly. For robustness,
also bail-out if any of the fields contain nullptr values instead of
relying on isl always setting exactly this error code if something went
wrong.
This should fix the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable
(-polly-process-unprofitable -polly-position=before-vectorizer
-polly-enable-delicm) buildbot.
llvm-svn: 296022
NonowningIslPtr<isl_X> was used as types of function parameters when the
function does not consume the isl object, i.e. an __isl_keep parameter.
The alternatives are:
1. IslPtr<isl_X>
This has additional calls to isl_X_copy and isl_X_free to
increase/decrease the reference counter even though not needed. The
caller already owns a reference to the isl object.
2. const IslPtr<isl_X>&
This does not change the reference counter, but requires an
additional load to get the pointer to the isl object (instead of just
passing the pointer itself).
Moreover, the compiler cannot rely on the constness of the pointer
and has to reload the pointer every time it writes to memory (unless
alias analysis such as TBAA says it is not possible).
The isl C++ bindings currently in development do not have an equivalent
to NonowningIslPtr and adding one would make the binding more
complicated and its advantage in performance is small. In order to
simplify the transition to these C++ bindings, remove NonowningIslPtr.
Change every former use of it to alternative 2 mentioned aboce
(const IslPtr<isl_X>&).
llvm-svn: 295998
Once a StmtSchedule is created, only its domain is used anywhere within
DependenceInfo::calculateDependences. So, we choose to return the
wrapped domain of the union_map rather than the entire union_map.
However, we still build the union_map first within collectInfo(). It is
cleaner to first build the entire union_map and then pull the domain out in
one shot, rather than repeatedly extracting the domain in bits and pieces
from accdom.
Contributed-by: Siddharth Bhat <siddu.druid@gmail.com>
Differential Revision: https://reviews.llvm.org/D30208
llvm-svn: 295984
Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause the the Polly pass chain to be interrupted. It is not used by any
Polly pass anymore, hence we can remove all references to it.
llvm-svn: 295983
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30293
llvm-svn: 295958
These tests were not included in the main DeLICM commit. These check the
cases where zone analysis cannot be successful because of assumption
violations.
We use the LLVM optimization remark infrastructure as it seems to be the
best fit for this kind of messages. I tried to make use if the
OptimizationRemarkEmitter. However, it would insert additional function
passes into the pass manager to get the hotness information. The pass
manager would insert them between the flatten pass and delicm, causing
the ScopInfo with the flattened schedule being thrown away.
Differential Revision: https://reviews.llvm.org/D30253
llvm-svn: 295846
There is no template specialization for cl::parser<unsigned long> such
that parsing an cl::opt<unsigned long> command line argument will fail.
Use opt<int> instead which has an associated parser.
llvm-svn: 295832
We only ever use the wrapped domain of AccessSchedule, so stop
creating an entire union_map and then pulling the domain out.
Reviewers: grosser
Tags: #polly
Contributed-by: Siddharth Bhat <siddu.druid@gmail.com>
Differential Revision: https://reviews.llvm.org/D30179
llvm-svn: 295726
Implement the -polly-delicm pass. The pass intends to undo the
effects of LoopInvariantCodeMotion (LICM) which adds additional scalar
dependencies into SCoPs. DeLICM will try to map those scalars back to
the array elements they were promoted from, as long as the array
element is unused.
The is the main patch from the DeLICM/DePRE patch series. It does not
yet undo GVN PRE for which additional information about known values
is needed and does not handle PHI write accesses that have have no
target. As such its usefulness is limited. Patches for these issues
including regression tests for error situatons will follow.
Reviewers: grosser
Differential Revision: https://reviews.llvm.org/D24716
llvm-svn: 295713
This is currently the minimum required version by LLVM. Since LLVM is
needed to build Polly, we also require at least that version.
Suggested-by: Philip Pfaffe <philip.pfaffe@gmail.com>
llvm-svn: 295672
isl headers are currently missing in a Polly installation. Because the
Polly headers depend on those, code can't be compiled against an
installed Polly.
This patch installs the isl headers. I left a TODO, as optionally it
should be possible to use a system version of isl instead of the one
shipped with Polly.
When compiling, clients of the installation need to add
-I${PREFIX}/include/polly/ to there include path right now, because
there currently is no way to export this path automatically.
Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>
Differential Revision: https://reviews.llvm.org/D29931
llvm-svn: 295671
Instead of counting the number of read-only accesses, we now count the number of
distinct read-only array references when checking if a run-time alias check
may be too complex. The run-time alias check is quadratic in the number of
base pointers, not the number of accesses.
Before this change we accidentally skipped SPEC's lbm test case.
llvm-svn: 295567
This change gets rid of the need for zero padding, makes the reduction
computation code more similar to the normal dependence computation, and also
better documents what we do at the moment.
Making the dependence computation for reductions a little bit easier to
understand will hopefully help us to further reduce code duplication.
This reduces the time spent only in the reduction dependence pass from 260ms to
150ms for test/DependenceInfo/reduction_sequence.ll. This is a reduction of over
40% in dependence computation time.
This change was inspired by discussions with Michael Kruse, Utpal Bora,
Siddharth Bhat, and Johannes Doerfert. It can hopefully lay the base for further
cleanups of the reduction code.
llvm-svn: 295550
This test case is a mini performance test case that shows the time needed for a
couple of simple reductions. It takes today about 325ms on my machine to run
this test case through 'opt' with scop construction and reduction detection. It
can be used as mini-proxy for further tuning of the reduction code.
Generally we do not commit performance test cases, but as this is very
small and also very fast it seems OK to keep it in the lit test suite.
This test case will also help to verify that future changes to the reduction
code will not affect the ordering of the reduction sets and will consequently
not cause spurious performance changes that only result from reordering of
dependences in the reduction set.
llvm-svn: 295549
Trying to fold such kind of dimensions will result in a division by zero,
which crashes the compiler. As such arrays are likely to invalidate the
scop anyhow (but are not illegal in LLVM-IR), there is no point in trying
to optimize the array layout. Hence, we just avoid the folding of
constant dimensions of size zero.
llvm-svn: 295415
Before this change wrapping range metadata resulted in exponential growth of
the context, which made context construction of large scops very slow. Instead,
we now just do not model the range information precisely, in case the number
of disjuncts in the context has already reached a certain limit.
llvm-svn: 295360
Commit r230230 introduced the use of range metadata to derive bounds for
parameters, instead of just looking at the type of the parameter. As part of
this commit support for wrapping ranges was added, where the lower bound of a
parameter is larger than the upper bound:
{ 255 < p || p < 0 }
However, at the same time, for wrapping ranges support for adding bounds given
by the size of the containing type has acidentally been dropped. As a result,
the range of the parameters was not guaranteed to be bounded any more. This
change makes sure we always add the bounds given by the size of the type and
then additionally add bounds based on signed wrapping, if available. For a
parameter p with a type size of 32 bit, the valid range is then:
{ -2147483648 <= p <= 2147483647 and (255 < p or p < 0) }
llvm-svn: 295349
The Knowledge class remembers the state of data at any timepoint of a SCoP's
execution. Currently, it tracks whether an array element is unused or is
occupied by some value, and the writes to it. A future addition will be to also
remember which value it contains.
Objects are used to determine whether two Knowledge contain conflicting
information, i.e. two states cannot be true a the same time.
This commit was extracted from the DeLICM algorithm at
https://reviews.llvm.org/D24716.
llvm-svn: 295197
Formatting unnamed array names is expensive in LLVM as the this requires
deriving the numbered virtual instruction name (e.g., %12) for an llvm::Value,
which is currently not implemented efficiently. As instruction numberes anyhow
do not really carry a lot of information for the user, we just print 'unknown'
instead.
This change reduces the scop detection time from 24 to 19 seconds, for one of
our large-scale inputs. This is a reduction by 21%.
llvm-svn: 294894
When deriving the range of valid values of a scalar evolution expression might
be a range [12, 8), where the upper bound is smaller than the lower bound and
where the range is expected to possibly wrap around. We theoretically could
model such a range as a union of two non-wrapping ranges, but do not do this
as of yet. Instead, we just do not derive any bounds. Before this change,
we could have obtained bounds where the maximal possible value is strictly
smaller than the minimal possible value, which is incorrect and also caused
assertions during scop modeling.
llvm-svn: 294891
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Sven Verdoolaege <skimo-polly@kotnet.org>
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29814
llvm-svn: 294836
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29269
llvm-svn: 294828
http://polly.llvm.org/example_manual_matmul.html which illustrates individual
passes of Polly, has been ported to reStructuredText and necessary changes have
been made to the configuration files used by SPHINX to include the new source as
a part of the documentation.
Contributed-by: Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com>
Differential Revision: https://reviews.llvm.org/D25163
llvm-svn: 294735
This change clarfies that we want to indeed use the original base address
when creating the ScopArrayInfo that corresponds to a given memory access.
This change prepares for https://reviews.llvm.org/D28518.
llvm-svn: 294734
This replaces the use of getOriginalAddrPtr, a value that is stored in
ScopArrayInfo and might at some point not be unique any more. However, the
access value is defined to be unique.
This change is an update on r294576, which only clarified that we need the
original memory access, but where we still remained dependent to have one base
pointer per scop.
This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294733
When generating code in the BlockGenerator we copy all (interesting)
instructions and keep track of the new values in a basic block map. To obtain
the original llvm::Value that belongs to a load memory access, we use
getAccessValue() instead of getOriginalBaseAddr(). The former always references
the instruction we use to load values from. The latter, on the other hand,
is obtaine from the corresponding ScopArrayInfo and would not be unique in
case ScopArrayInfo objects at some point allow memory accesses with different
base addresses.
This change is an update on r294566, which only clarified that we need the
original memory access, but where we still remained dependent to have one
base pointer per scop.
This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294669
By using the public interface MemoryAccess::getScopArrayInfo() we avoid the
direct access to the ScopArrayInfoMap and as a result also do not need to
use the BasePtr as key. This change makes the code cleaner.
The const-cast we introduce is a little ugly. We may consider to drop const
correctness for getScopArrayInfo() at some point.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294655
LLVM's coding conventions suggest to use auto only in obvious cases. Hence,
we move this code to actually declare the types used. We also replace the
variable name 'SAI', with the name 'Array', as this improves readability.
llvm-svn: 294654
When building alias groups, we sort different ScopArrays into unrelated groups.
Historically we identified arrays through their base pointer, as no
ScopArrayInfo class was yet available. This change changes the alias group
construction to reference arrays through their ScopArrayInfo object.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294649
During SCoP construction we sometimes inspect the underlying IR by looking at
the base address of a MemoryAccess. In such cases, we always want the original
base address. Make this clear by calling getOriginalBaseAddr().
This is a non-functional change as getBaseAddr maps to getOriginalBaseAddr
at the moment.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294576
The base address of a memory access is already an llvm::Value. Hence, there is
no need to go through SCEV, but we can directly work with the llvm::Value.
Also use 'Value *' instead of 'auto' for cases where the type is not obvious.
llvm-svn: 294575
Instead of iterating over statements and their memory accesses to extract the
set of available base pointers, just directly iterate over all ScopArray
objects. This reflects more the actual intend of the code: collect all arrays
(and their base pointers) to emit alias information that specifies that accesses
to different arrays cannot alias.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294574
There are problems with using the machine information to derive the precise
vector size on polly-amd64-linux and polly-arm-linux. We temporarily disable
the problematic run lines.
llvm-svn: 294571
Before this change we used the name of the base pointer to mark reductions. This
is imprecise as the canonical reference is the ScopArray itself and not the
basepointer of a reduction. Using the base pointer of reductions is problematic
in cases where a single ScopArray is referenced through two different base
pointers.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294568
When computing reduction dependences we first identify all ScopArrays which are
part of reductions and then only compute for these ScopArrays the more detailed
data dependences that allow us to identify reductions and optimize across them.
Instead of using the base pointer as identifier of a ScopArray, it is clearer
and more understandable to directly use the ScopArray as identifier. This change
implements such a switch.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294567
When regenerating code in the BlockGenerator we copy instructions that may
references scalar values, for which the new value of a given scalar is looked up
in BBMap using the original scalar llvm::Value as index. It is consequently
necessary that (re)loaded scalar values are made available in BBMap using the
original llvm::Value as key independently if the llvm::Value was (re)loaded from
the original scalar or a new access function has been specified that caused the
value to be reloaded from an array with a differnet base address. We make this
clear by using MemoryAccess::getOriginalBaseAddr() instead of
MemoryAccess::getBaseAddr() as index to BBMap.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294566
optimization
Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.
In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.
The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.
In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.
It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29244
llvm-svn: 294564
with optimizeMatMulPattern
This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.
llvm-svn: 294444
This function has been extracted from the upcoming DeLICM patch
(https://reviews.llvm.org/D24716).
In contrast to computeReachingWrite and computeArrayUnused,
convertZoneToTimepoints implies a format for zones (ranges between timepoints).
Zones at the moment are unique to DeLICM, but convertZoneToTimepoints makes most
sense in conjunction with the previous two functions.
llvm-svn: 294094
multiplication
The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28357
llvm-svn: 293890
Add a simple example to update the documentation on how the packing
transformation is implemented.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28021
llvm-svn: 293429
Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single map
from ScopArrayInfo to the corresponding Alloca. This change is intended, both as
a general simplification and cleanup, but also to reduce our use of
MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure
we have only a single place where the array (and its base pointer) for which we
generate code for is specified, which means we can more easily introduce new
access functions that use a different ScopArrayInfo as base. We already today
experiment with modifiable access functions, so this change does not address
a specific bug, but it just reduces the scope one needs to reason about.
Another motivation for this patch is https://reviews.llvm.org/D28518, where
memory accesses with different base pointers could possibly be mapped to a
single ScopArrayInfo object. Such a mapping is currently not possible, as we
currently generate alloca instructions according to the base addresses of the
memory accesses, not according to the ScopArrayInfo object they belong to. By
making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo
object will automatically mean that the same stack slot is used for these
arrays. For D28518 this is not a problem, as only MemoryType::Array objects are
mapping, but resolving this inconsistency will hopefully avoid confusion.
llvm-svn: 293374
Add some generally useful isl tools into a their own new ISLTools.cpp.
These are the helpers were extracted from and will be use by the DeLICM
algorithm (https://reviews.llvm.org/D24716).
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 293340
Before this change the user only saw "Unspecified Error", when a region
contained the entry block. Now we report:
"Scop contains function entry (not yet supported)."
llvm-svn: 293169
Before this change we created an additional reload in the copy of the incoming
block of a PHI node to reload the incoming value, even though the necessary
value has already been made available by the normally generated scalar loads.
In this change, we drop the code that generates this redundant reload and
instead just reuse the scalar value already available.
Besides making the generated code slightly cleaner, this change also makes sure
that scalar loads go through the normal logic, which means they can be remapped
(e.g. to array slots) and corresponding code is generated to load from the
remapped location. Without this change, the original scalar load at the
beginning of the non-affine region would have been remapped, but the redundant
scalar load would continue to load from the old PHI slot location.
It might be possible to further simplify the code in addOperandToPHI,
but this would not only mean to pull out getNewValue, but to also change the
insertion point update logic. As this did not work when trying it the first
time, this change is likely not trivial. To not introduce bugs last minute, we
postpone further simplications to a subsequent commit.
We also document the current behavior a little bit better.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D28892
llvm-svn: 292486
Making certain values 'const' to just cast it away a little later mainly
obfuscates the code. Hence, we just drop the 'const' parts.
Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 292480