Enable the use for partial writes for PHI write accesses with a switch.
This simply skips the test for whether a PHI write would be partial.
The analog test for partial value writes also protects for partial reads
which we do not support (yet). It is possible to test for partial reads
separately such that we could skip the partial write check as well. In
case this shows up to be useful, I can implement it as well.
Differential Revision: https://reviews.llvm.org/D33487
llvm-svn: 303762
This reduces the diff to the official isl C++ bindings and solves a correctness
issue with isl::booleans, where isl_bool_error results were accidentally
converted to isl::boolean::true.
llvm-svn: 303505
Instead of relying on these functions to be part of the isl C++ bindings, we
just define this functionality independently. This allows us to use isl C++
bindings that do not contain LLVM specific functionality.
llvm-svn: 303503
- auto + decltype + template use was not inferrable in
`Transform/Simplify.cpp accessesInOrder`.
- changed code to explicitly construct required vector instead of using
higher order iterator helpers.
- Failing compiler spec:
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
llvm-svn: 303039
Removal of overwritten writes currently encompasses all the cases
of the identical write removal.
There is an observable behavioral change in that the last, instead
of the first, MemoryAccess is kept. This should not affect the
generated code, however.
Differential Revision: https://reviews.llvm.org/D33143
llvm-svn: 302987
Remove memory writes that are overwritten by later writes. This works
for StoreInsts:
store double 21.0, double* %A
store double 42.0, double* %A
scalar writes at the end of a statement and mixes of these.
Multiple writes can be the result of DeLICM, which might map multiple
writes to the same location when it knows that these do no conflict
(for instance because they write the same value). Such writes
interfere with pattern-matched optimization such as gemm and may not
get removed by other LLVM passes after code generation.
Differential Revision: https://reviews.llvm.org/D33142
llvm-svn: 302986
As with the scalar operand of the initial StoreInst, also use input
accesses when searching for new opportunities after mapping a
PHI write.
The same rational applies here: After LICM has been applied, the
promoted value will either be an instruction in the same statement
(in which case we fall back to try every scalar access of the
statement), or in another statement such that there will be such
an input access. In the latter case other scalars cannot have
originated from the same register promotion, at least not by LICM.
This mostly helps to decrease compilation time and makes debugging
easier by not pursuing unpromising routes. In some circumstances,
it may change the compiler's output.
llvm-svn: 302839
Previous to this patch, we used VirtualUse to determine the input
access of an llvm::Value in a statement. The input access is the
READ MemoryAccess that makes a value available in that statement,
which can either be a READ of a MemoryKind::Value or the
MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input
access to heuristically find a candidate to map without searching all
possible values.
This might modify the behaviour in that previously PHI accesses were
not considered input accesses before. This was unintentially lost when
"VirtualUse" was extracted from the "Known Knowledge" patch.
llvm-svn: 302838
After DeLICM, it is possible to have two writes of the same value to
the same location in the same statement when it determined that those
writes do not conflict (write the same value).
Teach -polly-simplify to remove one of the writes. It interferes with
the pattern matching of matrix-multiplication kernels and also seem
to not be optimized away by LLVM.
The algorthm is simple, has O(n^2) behaviour (n = max number of
MemoryAccesses in a statement) and only matches the most obvious cases,
but seem to be enough to pattern-match Boost ublas gemm.
Not handled cases include:
- StoreInst instructions (a.k.a. explicit writes), since the value might
be loaded or overwritten between the two stores.
- PHINode, especially LCSSA, when the PHI value matches with on other's.
- Partial writes (in preparation)
llvm-svn: 302805
Some isl functions can simplify their __isl_keep arguments. The
argument object after the call uses different contraints to represent
the same set. Different contraints can result in different outputs
when printed to a string.
In assert builds additional isl functions are called (in assert() or
mentioned, these can change the internal representation of its read-only
arguments such that printed strings are different in debug and non-debug
builds.
What happened here is that a call to isl_set_is_equal inside an assert
in getScatterFor normalizes one of its arguments such that one redundant
constraint is removed. The redundant constraint therefore does not appear
in the string representing the domain, which FileCheck notices as a
regression test failure compared to a build with assertions disabled.
This fix removes the redundant contraints the domain from the start such
that the redundant contraint is removed in assert and non-assert builds.
Isl adds a flag to such sets such that the removal of redundancies is
not done multiple times (here: by isl_set_is_equal).
Thanks to Tobias Grosser for reporting and hinting to the cause.
llvm-svn: 302711
Summary:
In case two arrays share base pointers in the same invariant load equivalence
class, we canonicalize all memory accesses to the first of these arrays
(according to their order in the equivalence class).
This enables us to optimize kernels such as boost::ublas by ensuring that
different references to the C array are interpreted as accesses to the same
array. Before this change the runtime alias check for ublas would fail, as it
would assume models of the C array with differing (but identically valued) base
pointers would reference distinct regions of memory whereas the referenced
memory regions were indeed identical.
As part of this change we remove most of the MemoryAccess::get*BaseAddr
interface. We removed already all references to get*BaseAddr in previous
commits to ensure that no code relies on matching base pointers between
memory accesses and scop arrays -- except for three remaining uses where we
need the original base pointer. We document for these situations that
MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
to the base pointer of the scop array referenced by this memory access.
Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert
Reviewed By: Meinersbur
Subscribers: etherzhhb
Tags: #polly
Differential Revision: https://reviews.llvm.org/D28518
llvm-svn: 302636
Extend the Knowledge class to store information about the contents
of array elements and which values are written. Two knowledges do
not conflict the known content is the same. The content information
if computed from writes to and loads from the array elements, and
represented by "ValInst": isl spaces that compare equal if the value
represented is the same.
Differential Revision: https://reviews.llvm.org/D31247
llvm-svn: 302339
Do not conflict if a write writes the same value as already known.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32026
llvm-svn: 301460
Do not conflict if the value of Existing and Proposed are the same.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32025
llvm-svn: 301301
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32027
llvm-svn: 300874
After the isl C++ binding generator is now close to being upstreamed to isl, we
synchronize the latest changes to Polly. These are mostly formatting changes
plus a small interface change for the foreach callback function and some naming
changes in isl::boolean.
llvm-svn: 300398
This commit switches Polly over to the isl::obj::foreach_* implementation, which
is part of the new isl bindings and follows the foreach pattern established in
Polly by Michael Kruse.
The original isl C function:
isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set *uset,
isl_stat (*fn)(__isl_take isl_set *set, void *user), void *user);
which required the user to define a static callback function to which all
interesting parameters are passed via a 'void *' user-pointer, is on the
C++ side available as a function that takes a std::function<>, which can
carry any additional arguments without the need for a user pointer:
stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const;
The following code illustrates the use of the new C++ interface:
auto Lambda = [=, &Result](isl::set Set) -> isl::stat {
auto Shifted = shiftDimension(Set, Pos, Amount);
Result = Result.add(Shifted);
return isl::stat::ok;
}
UnionSet.foreach_set(Lambda);
Polly had some specialized foreach functions which did not require the lambdas
to return a status flag. We remove these functions in this commit to move Polly
completely over to the new isl interface. We may in the future discuss if
functors without return values can be supported easily.
Another extension proposed by Michael Kruse is the use of C++ iterators to allow
the use of normal for loops to iterate over these sets. Such an extension would
allow us to further simplify the code.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D30620
llvm-svn: 300323
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.
For example, in case of the following matrix-matrix multiplication,
for (i = 0; i < 1024; i++)
for (k = 0; k < 1024; k++)
for (j = 0; j < 1024; j++)
C[i][j] += A[i][k] * B[k][j];
it can produce the following schedule tree
domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
0 <= i2 <= 1023 }"
child:
schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i1)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
permutable: 1
coincident: [ 1, 1, 0 ]
The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).
This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.
Refs.:
[1] - https://bugs.llvm.org/show_bug.cgi?id=32500
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D31741
llvm-svn: 299662
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.
Check against a possible divide-by-zero in getMacroKernelParams
and return early.
Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31708
llvm-svn: 299633
= Change of WAR, WAW generation: =
- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
`sink <- may source <- must source` as a *may* dependence.
- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.
- Now, we call WAR and WAW correctly.
== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
WAW = isl_union_flow_get_may_dependence(Flow);
isl_union_flow_free(Flow);
```
== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
Flow = buildFlow(Write, Read, MustaWrite, Schedule);
WAR = isl_union_flow_get_must_dependence(Flow);
isl_union_flow_free(Flow);
```
- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
intuitively correct, as we do not want may-writes to block
must-writes.
- Leaves us with direct (R -> W).
- This affects reduction generation since RED is built using WAW and WAR.
= New StrictWAW for Reductions: =
- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.
= Explanation: Why the new WAR dependences in tests are correct: =
- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.
== Code: ==
```lang=llvm, name=new-war-dependence.ll
; void manyreductions(long *A) {
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S0: *A += 42;
;
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S1: *A += 42;
;
```
=== WAR dependence: ===
{ S0[1023, 1023] -> S1[0, 0] }
- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:
```lang=cpp, name=dependence-incorrect, counterexample
S0[1023, 1023]:
*-- tmp = *A (load0)--*
WAR 2 add = tmp + 42 |
*-> *A = add (store0) |
WAR 1
S1[0, 0]: |
tmp = *A (load1) |
add = tmp + 42 |
A = add (store1)<-*
```
- One may assume that WAR2 *hides* WAR1 (since store0 happens before
store1). However, within a statement, Polly has no idea about the
ordering of loads and stores.
- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
S0[1023, 1023]:
A = add (store0)
tmp = A (load0) ---*
add = A + 42 |
WAR 1
S1[0, 0]: |
tmp = A (load1) |
add = A + 42 |
A = add (store1) <-*
```
- So, Polly generates (correct) WAR dependences. It does not make sense
to remove these dependences, since they are correct with respect to
Polly's model.
Reviewers: grosser, Meinersbur
tags: #polly
Differential revision: https://reviews.llvm.org/D31386
llvm-svn: 299429
The isl C++ bindings now has implicit conversions from isl::set to
isl::union_set. Therefore the additional overload accepting isl::set
is not required anymore.
llvm-svn: 298529
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30606
llvm-svn: 298510
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.
The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".
Reviewers: grosser
Subscribers: gareevroman, pollydev
Tags: #polly
Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>
Differential Revision: https://reviews.llvm.org/D30815
llvm-svn: 297587
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.
It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.
It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.
Differential Revision: https://reviews.llvm.org/D30820
llvm-svn: 297473
This pass is a small and self-contained example of a piece of code that was
written with the isl C interface. The diff of this change nicely shows how the
C++ bindings can improve the readability of the code by avoiding the long C
function names and by avoiding any need for memory management.
As you will see, no calls to isl_*_copy or isl_*_free are needed anymore.
Instead the C++ interface takes care of automatically managing the objects.
This may introduce internally additional copies, but due to the isl reference
counting, such copies are expected to be cheap. For performance critical
operations, we will later exploit move semantics to eliminate unnecessary
copies that have shown to be costly.
Below we give a set of examples that shows the benefit of the C++ interface vs.
the pure C interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(aff) || isl_aff_is_one(aff))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA = isl_union_pw_multi_aff_from_union_map(umap);
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto *Empty = isl_union_map_empty(space);
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
set = isl_union_set_intersect(set, set2);
After:
Set = Set.intersect(Set2);
The use of isl::boolean in return types also adds an increases the robustness
of Polly, as on conversion to true or false, we verify that no isl_bool_error
has been returned and assert in case an error was returned. Before this change
we would have just ignored the error and proceeded with (some) exection path.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30619
llvm-svn: 297466
Translate the full algorithm to use the new isl C++ bindings
This is a large piece of code that has been written with the Polly IslPtr<>
memory management tool, which only performed memory management, but did not
provide a method interface. As such the code was littered with calls to
give(), copy(), keep(), and take(). The diff of this change should give a
good example how the new method interface simplifies the code by removing the
need for switching between managed types and C functions all the time
and consequently also the need to use the long C function names.
These are a couple of examples comparing the old IslPtr memory management
interface with the complete method interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(Aff.get()) || isl_aff_is_one(Aff.get()))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA =
give(isl_union_pw_multi_aff_from_union_map(UMap.copy());
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto Empty = give(isl_union_map_empty(Space.copy());
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
Set = give(isl_union_set_intersect(Set.copy(), Set2.copy());
After:
Set = Set.intersect(Set2);
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30617
llvm-svn: 297463
Over the last couple of months several authors of independent isl C++ bindings
worked together to jointly design an official set of isl C++ bindings which
combines their experience in developing isl C++ bindings. The new bindings have
been designed around a value pointer style interface and remove the need for
explicit pointer managenent and instead use C++ language features to manage isl
objects.
This commit introduces the smart-pointer part of the isl C++ bindings and
replaces the current IslPtr<T> classes, which served the very same purpose, but
had to be manually maintained. Instead, we now rely on automatically generated
classes for each isl object, which provide value_ptr semantics.
An isl object has the following smart pointer interface:
inline set manage(__isl_take isl_set *ptr);
class set {
friend inline set manage(__isl_take isl_set *ptr);
isl_set *ptr = nullptr;
inline explicit set(__isl_take isl_set *ptr);
public:
inline set();
inline set(const set &obj);
inline set &operator=(set obj);
inline ~set();
inline __isl_give isl_set *copy() const &;
inline __isl_give isl_set *copy() && = delete;
inline __isl_keep isl_set *get() const;
inline __isl_give isl_set *release();
inline bool is_null() const;
}
The interface and behavior of the new value pointer style classes is inspired
by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which
proposes a std::value_ptr, a smart pointer that applies value semantics to its
pointee.
We currently only provide a limited set of public constructors and instead
require provide a global overloaded type constructor method "isl::obj
isl::manage(isl_obj *)", which allows to convert an isl_set* to an isl::set by
calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor
for unique pointers.
The next two functions isl::obj::get() and isl::obj::release() are taken
directly from the std::value_ptr proposal:
S.get() extracts the raw pointer of the object managed by S.
S.release() extracts the raw pointer of the object managed by S and sets the
object in S to null.
We additionally add std::obj::copy(). S.copy() returns a raw pointer refering
to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a
functionality commonly needed when interacting directly with the isl C
interface where all methods marked with __isl_take require consumable raw
pointers.
S.is_null() checks if S manages a pointer or if the managed object is currently
null. We add this function to provide a more explicit way to check if the
pointer is empty compared to a direct conversion to bool.
This commit also introduces a couple of polly-specific extensions that cover
features currently not handled by the official isl C++ bindings draft, but
which have been provided by IslPtr<T> and are consequently added to avoid code
churn. These extensions include:
- operator bool() : Conversion from objects to bool
- construction from nullptr_t
- get_ctx() method
- take/keep/give methods, which match the currently used naming
convention of IslPtr<T> in Polly. They just forward to
(release/get/manage).
- raw_ostream printers
We expect that these extensions are over time either removed or upstreamed to
the official isl bindings.
We also export a couple of classes that have not yet been exported in isl (e.g.,
isl::space)
As part of the code review, the following two questions were asked:
- Why do we not use a standard smart pointer?
std::value_ptr was a proposal that has not been accepted. It is consequently
not available in the standard library. Even if it would be available, we want
to expand this interface with a complete method interface that is conveniently
available from each managed pointer. The most direct way to achieve this is to
generate a specialiced value style pointer class for each isl object type and
add any additional methods to this class. The relevant changes follow in
subsequent commits.
- Why do we not use templates or macros to avoid code duplication?
It is certainly possible to use templates or macros, but as this code is
auto-generated there is no need to make writing this code more efficient. Also,
most of these classes will be specialized with individual member functions in
subsequent commits, such that there will be little code reuse to exploit. Hence,
we decided to do so at the moment.
These bindings are not yet officially part of isl, but the draft is already very
stable. The smart pointer interface itself did not change since serveral months.
Adding this code to Polly is against our normal policy of only importing
official isl code. In this case however, we make an exception to showcase a
non-trivial use case of these bindings which should increase confidence in these
bindings and will help upstreaming them to isl.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30325
llvm-svn: 297452
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.
This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.
Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.
The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.
A patch that makes codegen support partial accesses is in preparation
as well.
Differential Revision: https://reviews.llvm.org/D30763
llvm-svn: 297373
Control flow would flow-through after the check whether the operations
quota exceeded, with the intention that it would later be caught by
Knowledge::isUsable(). However, the Knowledge constructor has its own
assertions to check consistency which would fail if its fields have only
been initialized partially because some sets have been computed correctly
before the operations quota takes effect.
Fix by erroring-out early instead of falling-throught into the code that
might expect that everything has been computed correctly. For robustness,
also bail-out if any of the fields contain nullptr values instead of
relying on isl always setting exactly this error code if something went
wrong.
This should fix the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable
(-polly-process-unprofitable -polly-position=before-vectorizer
-polly-enable-delicm) buildbot.
llvm-svn: 296022
NonowningIslPtr<isl_X> was used as types of function parameters when the
function does not consume the isl object, i.e. an __isl_keep parameter.
The alternatives are:
1. IslPtr<isl_X>
This has additional calls to isl_X_copy and isl_X_free to
increase/decrease the reference counter even though not needed. The
caller already owns a reference to the isl object.
2. const IslPtr<isl_X>&
This does not change the reference counter, but requires an
additional load to get the pointer to the isl object (instead of just
passing the pointer itself).
Moreover, the compiler cannot rely on the constness of the pointer
and has to reload the pointer every time it writes to memory (unless
alias analysis such as TBAA says it is not possible).
The isl C++ bindings currently in development do not have an equivalent
to NonowningIslPtr and adding one would make the binding more
complicated and its advantage in performance is small. In order to
simplify the transition to these C++ bindings, remove NonowningIslPtr.
Change every former use of it to alternative 2 mentioned aboce
(const IslPtr<isl_X>&).
llvm-svn: 295998
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30293
llvm-svn: 295958
These tests were not included in the main DeLICM commit. These check the
cases where zone analysis cannot be successful because of assumption
violations.
We use the LLVM optimization remark infrastructure as it seems to be the
best fit for this kind of messages. I tried to make use if the
OptimizationRemarkEmitter. However, it would insert additional function
passes into the pass manager to get the hotness information. The pass
manager would insert them between the flatten pass and delicm, causing
the ScopInfo with the flattened schedule being thrown away.
Differential Revision: https://reviews.llvm.org/D30253
llvm-svn: 295846
There is no template specialization for cl::parser<unsigned long> such
that parsing an cl::opt<unsigned long> command line argument will fail.
Use opt<int> instead which has an associated parser.
llvm-svn: 295832
Implement the -polly-delicm pass. The pass intends to undo the
effects of LoopInvariantCodeMotion (LICM) which adds additional scalar
dependencies into SCoPs. DeLICM will try to map those scalars back to
the array elements they were promoted from, as long as the array
element is unused.
The is the main patch from the DeLICM/DePRE patch series. It does not
yet undo GVN PRE for which additional information about known values
is needed and does not handle PHI write accesses that have have no
target. As such its usefulness is limited. Patches for these issues
including regression tests for error situatons will follow.
Reviewers: grosser
Differential Revision: https://reviews.llvm.org/D24716
llvm-svn: 295713
The Knowledge class remembers the state of data at any timepoint of a SCoP's
execution. Currently, it tracks whether an array element is unused or is
occupied by some value, and the writes to it. A future addition will be to also
remember which value it contains.
Objects are used to determine whether two Knowledge contain conflicting
information, i.e. two states cannot be true a the same time.
This commit was extracted from the DeLICM algorithm at
https://reviews.llvm.org/D24716.
llvm-svn: 295197
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Sven Verdoolaege <skimo-polly@kotnet.org>
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29814
llvm-svn: 294836
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29269
llvm-svn: 294828
optimization
Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.
In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.
The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.
In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.
It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29244
llvm-svn: 294564
with optimizeMatMulPattern
This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.
llvm-svn: 294444
multiplication
The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28357
llvm-svn: 293890
Add a simple example to update the documentation on how the packing
transformation is implemented.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28021
llvm-svn: 293429
If the parameters of the target cache (i.e., cache level sizes, cache level
associativities) are not specified or have wrong values, we use ones for
parameters of the macro-kernel and do not perform data-layout optimizations of
the matrix multiplication. In this patch we specify the default values of the
cache parameters to be able to apply the pattern matching optimizations even in
this case. Since there is no typical values of this parameters, we use the
parameters of Intel Core i7-3820 SandyBridge that also help to attain the
high-performance on IBM POWER System S822 and IBM Power 730 Express server.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28090
llvm-svn: 290518
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.
This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.
In case of Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8
it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28019
llvm-svn: 290256
multiplication
Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7
it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D27878
llvm-svn: 290251
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D25655
llvm-svn: 289815
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.
In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D25653
llvm-svn: 289806
It did happen that after the inliner finished we end up with promotable
allocas in a function. We now run mem2reg to make sure everything is
promoted if possible.
llvm-svn: 288514
Add an empty DeLICM pass, without any functional parts.
Extracting the boilerplate from the the functional part reduces the size of the
code to review (https://reviews.llvm.org/D24716)
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 288160
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23260
llvm-svn: 281441
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D23991
llvm-svn: 281234
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.
To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:
{ Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }
And here the same lifetime for a semantically identical one-dimensional
schedule:
{ Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }
Differential Revision: https://reviews.llvm.org/D24310
llvm-svn: 280948
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.
llvm-svn: 280468
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23740
llvm-svn: 279395
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D22187
llvm-svn: 278666
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21491
llvm-svn: 276627
This ensures that the error status set with -polly-on-isl-error-abort is
maintained even after running DependenceInfo and ScheduleOptimizer. Both
passes temporarily set the error status to CONTINUE as the dependence
analysis uses a compute-out and the scheduler may not be able to derive
a schedule. In both cases we want to not abort, but to handle the error
gracefully. Before this commit, we always set the error reporting to ABORT
after these passes. After this commit, we use the error reporting mode that was
active earlier.
This comes without a test case as this would require us to introduce (memory)
errors which would trigger the isl errors.
llvm-svn: 274272
llvm commonly adds a comment to the closing brace of a namespace to indicate
which namespace is closed. clang-tidy provides with llvm-namespace-comment
a handy tool to check for this habit. We use it to ensure we consitently use
namespace comments in Polly.
There are slightly different styles in how namespaces are closed in LLVM. As
there is no large difference between the different comment styles we go for the
style clang-tidy suggests by default.
To reproduce this fix run:
for i in `ls tools/polly/lib/*/*.cpp`; \
clang-tidy -checks='-*,llvm-namespace-comment' -p build $i -fix \
-header-filter=".*"; \
done
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273621
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273437
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.
llvm-svn: 273435
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21140
llvm-svn: 273397
multiplication
Fix small issues related to characters, operators and descriptions of tests.
Differential Revision: http://reviews.llvm.org/D20806
llvm-svn: 271264
Created a new pass ScopInfoRegionPass. As name suggests, it is a
region pass and it is there to preserve compatibility with our
existing Polly passes. ScopInfoRegionPass will return a SCoP object
for a valid region while the creation of the SCoP stays in the
ScopInfo class.
Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D20770
llvm-svn: 271259
This header is required to make the ISO 646 alternative operator
spellings ("and", "or" instead of "&&", "||") work. Should these
operators be replaced by the standard ones as already suggested by
Johannes, also remove this #include again.
llvm-svn: 271206
Add determination of statements that contain, in particular,
matrix multiplications and can be optimized with [1] to try to
get close-to-peak performance. It can be enabled
via polly-pm-based-opts, which is false by default.
Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D20575
llvm-svn: 271128
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.
In practice this allows loop skewing for more parallelism in inner
loops.
llvm-svn: 268222
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
This will allow us to optimize C++ template code with Polly. This support is
mostly for debugging purpose and individual experiments. The ultimate goal is
still to run Polly later in the pass manager when inlining already happened.
llvm-svn: 250092
Helper functions in the BlockGenerators.h/cpp introduce dependences
from the frontend to the backend of Polly. As they are used in
ScopDetection, ScopInfo, etc. we move them to the ScopHelper file.
llvm-svn: 249919
The changes affect methods that are part of the Pass interface and
include:
- Comments that describe the methods purpose.
- A consistent use of the keywords override and virtual.
Additionally, the printScop method is now optional and removed from
SCoP passes that do not implement it.
llvm-svn: 248685
Other passes which perform different optimizations might be interested in
also applying data-locality transformations as part of their overall
transformation.
llvm-svn: 245824
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.
llvm-svn: 245564
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.
llvm-svn: 245563
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.
This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.
llvm-svn: 245424
The July issue of TOPLAS contains a 50 page discussion of the AST generation
techniques used in Polly. This discussion gives not only an in-depth
description of how we (re)generate an imperative AST from our polyhedral based
mathematical program description, but also gives interesting insights about:
- Schedule trees: A tree-based mathematical program description that enables us
to perform loop transformations on an abstract level, while issues like the
generation of the correct loop structure and loop bounds will be taken care of
by our AST generator.
- Polyhedral unrolling: We discuss techniques that allow the unrolling of
non-trivial loops in the context of parameteric loop bounds, complex tile
shapes and conditionally executed statements. Such unrolling support enables
the generation of predicated code e.g. in the context of GPGPU computing.
- Isolation for full/partial tile separation: We discuss native support for
handling full/partial tile separation and -- in general -- native support for
isolation of boundary cases to enable smooth code generation for core
computations.
- AST generation with modulo constraints: We discuss how modulo mappings are
lowered to efficient C/LLVM code.
- User-defined constraint sets for run-time checks We discuss how arbitrary
sets of constraints can be used to automatically create run-time checks that
ensure a set of constrainst actually hold. This feature is very useful to
verify at run-time various assumptions that have been taken program
optimization.
Polyhedral AST generation is more than scanning polyhedra
Tobias Grosser, Sven Verdoolaege, Albert Cohen
ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015
llvm-svn: 245157
Summary: The splitExitBlock function is never called. Going to replace its functionality in successive patches that do not modify the IR.
Reviewers: grosser
Subscribers: pollydev
Projects: #polly
Differential Revision: http://reviews.llvm.org/D11865
llvm-svn: 244404
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.
This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.
For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238
llvm-svn: 242130
This removes old code that has been disabled since several weeks and was hidden
behind the flags -disable-polly-intra-scop-scalar-to-array=false and
-polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and
PHI nodes to single element arrays, as this avoided the need for their special
handling in Polly. With Johannes' patches adding native support for such scalar
references to Polly, this code is not needed any more. After this commit both
-polly-prepare and -polly-independent are now mostly no-ops. Only a couple of
simple transformations still remain, but they are scheduled for removal too.
Thanks again to Johannes Doerfert for his nice work in making all this code
obsolete.
llvm-svn: 240766
This version adds small integer optimization, but is not active by
default. It will be enabled in a later commit.
The schedule-fuse=min/max option has been replaced by the
serialize-sccs option. Adapting Polly was necessary, but retaining the
name polly-opt-fusion=min/max.
Differential Revision: http://reviews.llvm.org/D10505
Reviewers: grosser
llvm-svn: 240027
Running indvar before Polly is useful as this eliminates zexts as they commonly
appear when a 32 bit induction variable (type int) was used on a 64 bit system.
These zexts confuse our delinearization and prevent for example the successful
delinearization of the nussinov kernel in polybench-c-4.1.
This fixes http://llvm.org/PR23426
Suggested-by: Xing Su <xsu.llvm@outlook.com>
llvm-svn: 238643
David Blaike suggested this as an alternative to the use of owningptr(s) for our
memory management, as value semantics allow to avoid the additional interface
complexity caused by owningptr while still providing similar memory consistency
guarantees. We could also have used a std::vector, but the use of std::vector
would yield possibly changing pointers which currently causes problems as for
example the memory accesses carry pointers to their parent statements. Such
pointers should not change.
Reviewer: jblaikie, jdoerfert
Differential Revision: http://reviews.llvm.org/D10041
llvm-svn: 238290
The feature itself has been committed by Johannes in r238070. As this is the
way forward, we now enable it to ensure we get test coverage.
Thank you Johannes for this nice work!
llvm-svn: 238088