This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.
The analysis is now enabled by default.
The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.
Differential Revision: https://reviews.llvm.org/D36380
llvm-svn: 310279
In https://reviews.llvm.org/D36278 it was pointed out that the behavior of
getPartialTilePrefixes is not very well understood. To allow for a better
understanding, we first provide some basic unittests.
llvm-svn: 310175
The method forwardSpeculatable forwards speculatively executable
instructions and is currently the only way to forward an
instruction.
In the future we intend to add more methods.
llvm-svn: 310056
Summary:
This patch is a first attempt at registering Polly passes with the LLVM tools. Tool plugins are still unsupported, but this registration is usable from the tools if Polly is linked into them (albeit requiring minimal patches to those tools). Registration requires a small amount of machinery (the owning analysis proxies), necessary for injecting ScopAnalysisManager objects into the calling tools.
This patch is marked WIP because the registration is incomplete. Parsing manual pipelines is fully supported, but default pass injection into the O3 pipeline is lacking, mostly because there is opportunity for some redesign here, I believe. The first point of order would be insertion points. I think it makes sense to run before the vectorizers. Running Polly Early, however, is weird. Mostly because it actually is the default (which to me is unexpected), and because Polly runs it's own O1 pipeline. Why not instead insert it at an appropriate place somewhere after simplification happend? Running after the loop optimizers seems intuitive, but it also seems wasteful, since multiple consecutive loops might well be a single scop, and we don't need to run for all of them.
My second request for comments would be regarding all those smallish helper passes we have, like PollyViewer, PollyPrinter, PollyImportJScop. Right now these are controlled by command line options, deciding whether they should be part of the Polly pipeline. What is your opinion on treating them like real passes, and have the user write an appropriate pipeline if they want to use any of them?
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35458
llvm-svn: 309826
Commit r309730 moved the call to canForwardTree into an assert(), even
though this function has side-effects if its DoIt parameter is true. To
avoid a warning in release builds, do an (void)Execution of its result
instead.
To avoid such confusion in the future, rename
canForwardTree() to forwardTree().
llvm-svn: 309753
The previous algorithm was to search a writes and the sours of its value
operand, and see whether the write just stores the same read value back,
which includes a search whether there is another write access between
them. This is O(n^2) in the max number of accesses in a statement
(+ the complexity of isl comparing the access functions).
The new algorithm is more similar to the one used for searching for
overwrites and coalescable writes. It scans over all accesses in order
of execution while tracking which array elements still have the same
value since it was read. This is O(n), not counting the complexity
within isl. It should be more reliable than trying to catch all
non-conforming cases in the previous approach. It is also less code.
We now also support if the write is a partial write of the read's
domain, and to some extent non-affine subregions.
Differential Revision: https://reviews.llvm.org/D36137
llvm-svn: 309734
With a lot of reads and writes to the same array in a statement,
some isl sets that capture the state between access can become
complex such that isl takes more considerable time and memory
for operations on them.
The problems identified were:
- is_subset() takes considerable time with many disjoints in the
arguments. We limit the number of disjoints to 4, any additional
information is thrown away.
- subtract() can lead to many disjoints. We instead assume that any
array element is possibly accessed, which removes all disjoints.
- subtract_domain() may lead to considerable processing, even if all
elements are are to be removed. Instead, we remove determine and
remove the affected spaces manually. No behaviour is changed.
llvm-svn: 309728
This allows -polly-optree to move instructions that depend on
synthesizable values.
The difficulty for synthesizable values is that their value depends on
the location. When it is moved over a loop header, and the SCEV
expression depends on the loop induction variable (SCEVAddRecExpr), it
would use the current induction variable instead of the last one.
At the moment we cannot forward PHI nodes such that crossing the header
of loops referenced by SCEVAddRecExpr is not possible (assuming the loop
header has at least two incoming blocks: for entering the loop and the
backedge, such any instruction to be forwarded must have a phi between
use and definition).
A remaining issue is when the forwarded value is used after the loop,
but is only synthesizable inside the loop. This happens e.g. if
ScalarEvolution is unable to determine the number of loop iterations or
the initial loop value. We do not forward in this situation.
Differential Revision: https://reviews.llvm.org/D36102
llvm-svn: 309609
In addition to array and PHI writes, also allow scalar value writes.
The only kind of write not allowed are writes by functions
(including memcpy/memmove/memset).
llvm-svn: 309582
Write coalescing combines write accesses that
- Write the same llvm::Value.
- Write to the same array.
- Unless they do not write anything in a statement instance (partial
writes), write to the same element.
- There is no other access between them that accesses the same element.
This is particularly useful after DeLICM, which leaves partial writes to
disjoint domains.
Differential Revision: https://reviews.llvm.org/D36010
llvm-svn: 309489
Translate the ScheduleOptimizer to use the new isl C++ bindings.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D35845
llvm-svn: 309119
Read-only values (values defined before the SCoP) require special
handing with -polly-analyze-read-only-scalars=true (which is the
default). If active, each use of a value requires a read access.
When a copied value uses a read-only value, we must also ensure that
such a MemoryAccess is available or is created.
Differential Revision: https://reviews.llvm.org/D35764
llvm-svn: 308876
If the access relation's domain is empty, the access will never be
executed. We can just remove it.
We only remove write accesses. Partial read accesses are not yet
supported and instructions in the statement might require the
llvm::Value holding the read's result to be defined.
llvm-svn: 308830
Hoisted loads can be trivially supported because there are no
MemoryAccess to be modified, the loaded value is just available
at code generation.
llvm-svn: 308826
This pass 'forwards' operand trees into statements that use them in
order to avoid scalar dependencies.
This minimal implementation handles only the case of speculatable
instructions. We will successively add support for:
- Hoisted loads
- Read-only values
- Synthesizable values
- Loads
- PHIs
- Forwarding only parts of the tree
Differential Revision: https://reviews.llvm.org/D35754
llvm-svn: 308825
Print a statement's instruction on dump() regardless of
-polly-print-instructions. dump() is supposed to be used in the debugger
only and never in regression tests. While debugging, get all the
information we have and we are not bound to break anything. For non-dump
purposes of print, forward the setting of -polly-print-instructions as
parameters.
Some calls to print() had to be changed because the
PollyPrintInstructions setting is only available in ScopInfo.cpp.
In ScheduleOptimizer.cpp, dump() was used in regression tests.
That's not what dump() is for.
The print parameter "PrintInstructions" will also be useful for an
explicit print SCoP pass in a future patch.
llvm-svn: 308746
Use a mark-and-sweep algorithm to find and remove unused instructions
and MemoryAccesses. This is useful in particular to remove scalar
writes that are never used anywhere. A scalar write in a loop induces
a write-after-write dependency that stops the loop iterations to be
rescheduled. Such writes can be a result of previous transformations
such as DeLICM and operand tree forwarding.
It adds a new class VirtualInstruction that represents an instruction in
a particular statement. At the moment an instruction can only belong to
the statement that represents a BasicBlock. In the future, instructions
can be in one of multiple statements representing a BasicBlock
(Nandini's work), in different statements than its BasicBlock would
indicate, and even multiple statements at once (by forwarding operand
trees). It also integrates nicely with the VirtualUse class.
ScopStmt::contains(Instruction*) currently uses the instruction's parent
BasicBlock to check whether it contains the instruction. It will need to
check the actual statement list when one of the aforementioned features
become possible.
Differential Revision: https://reviews.llvm.org/D35656
llvm-svn: 308626
Before this patch, ScalarDefUseChain was a tool used by DeLICM to find
all reads and writes of scalar accesses. It iterated once over all
accesses and stores the accesses into maps.
By integrating it into the Scop class, we can keep the maps up-to-date
without the need for recomputing them. It will be needed for more than
DeLICM in the future, such as SCoP simplification, code movement between
virtual statements, and array expansion (GSoC project).
Compared to ScalarUseDefChain, we save two maps by finding the ScopStmt
a Def/PHIRead must reside in, and use its already existing lookup
function to find the MemoryAccess.
Differential Revision: https://reviews.llvm.org/D35631
llvm-svn: 308495
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D33138
llvm-svn: 308494
Summary:
This makes code more readable and allows to reuse this functionality in
the future at other places.
Suggested-by Michael Kruse in post-commit review of r307660.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Reviewed By: Meinersbur
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35585
llvm-svn: 308435
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.
When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.
In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: kbarton, nemanjai, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D34054
llvm-svn: 306863
Iterate through memory accesses in execution order (first all implicit reads,
then explicit accesses, then implicit writes).
In the test case this caused an implicit load to be handled as if it was loaded
after the write. That is, the value being written before it is available.
This fixes llvm.org/PR33323
llvm-svn: 304810
Enable the use for partial writes for PHI write accesses with a switch.
This simply skips the test for whether a PHI write would be partial.
The analog test for partial value writes also protects for partial reads
which we do not support (yet). It is possible to test for partial reads
separately such that we could skip the partial write check as well. In
case this shows up to be useful, I can implement it as well.
Differential Revision: https://reviews.llvm.org/D33487
llvm-svn: 303762
This reduces the diff to the official isl C++ bindings and solves a correctness
issue with isl::booleans, where isl_bool_error results were accidentally
converted to isl::boolean::true.
llvm-svn: 303505
Instead of relying on these functions to be part of the isl C++ bindings, we
just define this functionality independently. This allows us to use isl C++
bindings that do not contain LLVM specific functionality.
llvm-svn: 303503
- auto + decltype + template use was not inferrable in
`Transform/Simplify.cpp accessesInOrder`.
- changed code to explicitly construct required vector instead of using
higher order iterator helpers.
- Failing compiler spec:
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
llvm-svn: 303039
Removal of overwritten writes currently encompasses all the cases
of the identical write removal.
There is an observable behavioral change in that the last, instead
of the first, MemoryAccess is kept. This should not affect the
generated code, however.
Differential Revision: https://reviews.llvm.org/D33143
llvm-svn: 302987
Remove memory writes that are overwritten by later writes. This works
for StoreInsts:
store double 21.0, double* %A
store double 42.0, double* %A
scalar writes at the end of a statement and mixes of these.
Multiple writes can be the result of DeLICM, which might map multiple
writes to the same location when it knows that these do no conflict
(for instance because they write the same value). Such writes
interfere with pattern-matched optimization such as gemm and may not
get removed by other LLVM passes after code generation.
Differential Revision: https://reviews.llvm.org/D33142
llvm-svn: 302986
As with the scalar operand of the initial StoreInst, also use input
accesses when searching for new opportunities after mapping a
PHI write.
The same rational applies here: After LICM has been applied, the
promoted value will either be an instruction in the same statement
(in which case we fall back to try every scalar access of the
statement), or in another statement such that there will be such
an input access. In the latter case other scalars cannot have
originated from the same register promotion, at least not by LICM.
This mostly helps to decrease compilation time and makes debugging
easier by not pursuing unpromising routes. In some circumstances,
it may change the compiler's output.
llvm-svn: 302839
Previous to this patch, we used VirtualUse to determine the input
access of an llvm::Value in a statement. The input access is the
READ MemoryAccess that makes a value available in that statement,
which can either be a READ of a MemoryKind::Value or the
MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input
access to heuristically find a candidate to map without searching all
possible values.
This might modify the behaviour in that previously PHI accesses were
not considered input accesses before. This was unintentially lost when
"VirtualUse" was extracted from the "Known Knowledge" patch.
llvm-svn: 302838
After DeLICM, it is possible to have two writes of the same value to
the same location in the same statement when it determined that those
writes do not conflict (write the same value).
Teach -polly-simplify to remove one of the writes. It interferes with
the pattern matching of matrix-multiplication kernels and also seem
to not be optimized away by LLVM.
The algorthm is simple, has O(n^2) behaviour (n = max number of
MemoryAccesses in a statement) and only matches the most obvious cases,
but seem to be enough to pattern-match Boost ublas gemm.
Not handled cases include:
- StoreInst instructions (a.k.a. explicit writes), since the value might
be loaded or overwritten between the two stores.
- PHINode, especially LCSSA, when the PHI value matches with on other's.
- Partial writes (in preparation)
llvm-svn: 302805
Some isl functions can simplify their __isl_keep arguments. The
argument object after the call uses different contraints to represent
the same set. Different contraints can result in different outputs
when printed to a string.
In assert builds additional isl functions are called (in assert() or
mentioned, these can change the internal representation of its read-only
arguments such that printed strings are different in debug and non-debug
builds.
What happened here is that a call to isl_set_is_equal inside an assert
in getScatterFor normalizes one of its arguments such that one redundant
constraint is removed. The redundant constraint therefore does not appear
in the string representing the domain, which FileCheck notices as a
regression test failure compared to a build with assertions disabled.
This fix removes the redundant contraints the domain from the start such
that the redundant contraint is removed in assert and non-assert builds.
Isl adds a flag to such sets such that the removal of redundancies is
not done multiple times (here: by isl_set_is_equal).
Thanks to Tobias Grosser for reporting and hinting to the cause.
llvm-svn: 302711
Summary:
In case two arrays share base pointers in the same invariant load equivalence
class, we canonicalize all memory accesses to the first of these arrays
(according to their order in the equivalence class).
This enables us to optimize kernels such as boost::ublas by ensuring that
different references to the C array are interpreted as accesses to the same
array. Before this change the runtime alias check for ublas would fail, as it
would assume models of the C array with differing (but identically valued) base
pointers would reference distinct regions of memory whereas the referenced
memory regions were indeed identical.
As part of this change we remove most of the MemoryAccess::get*BaseAddr
interface. We removed already all references to get*BaseAddr in previous
commits to ensure that no code relies on matching base pointers between
memory accesses and scop arrays -- except for three remaining uses where we
need the original base pointer. We document for these situations that
MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
to the base pointer of the scop array referenced by this memory access.
Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert
Reviewed By: Meinersbur
Subscribers: etherzhhb
Tags: #polly
Differential Revision: https://reviews.llvm.org/D28518
llvm-svn: 302636
Extend the Knowledge class to store information about the contents
of array elements and which values are written. Two knowledges do
not conflict the known content is the same. The content information
if computed from writes to and loads from the array elements, and
represented by "ValInst": isl spaces that compare equal if the value
represented is the same.
Differential Revision: https://reviews.llvm.org/D31247
llvm-svn: 302339
Do not conflict if a write writes the same value as already known.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32026
llvm-svn: 301460
Do not conflict if the value of Existing and Proposed are the same.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32025
llvm-svn: 301301
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32027
llvm-svn: 300874
After the isl C++ binding generator is now close to being upstreamed to isl, we
synchronize the latest changes to Polly. These are mostly formatting changes
plus a small interface change for the foreach callback function and some naming
changes in isl::boolean.
llvm-svn: 300398
This commit switches Polly over to the isl::obj::foreach_* implementation, which
is part of the new isl bindings and follows the foreach pattern established in
Polly by Michael Kruse.
The original isl C function:
isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set *uset,
isl_stat (*fn)(__isl_take isl_set *set, void *user), void *user);
which required the user to define a static callback function to which all
interesting parameters are passed via a 'void *' user-pointer, is on the
C++ side available as a function that takes a std::function<>, which can
carry any additional arguments without the need for a user pointer:
stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const;
The following code illustrates the use of the new C++ interface:
auto Lambda = [=, &Result](isl::set Set) -> isl::stat {
auto Shifted = shiftDimension(Set, Pos, Amount);
Result = Result.add(Shifted);
return isl::stat::ok;
}
UnionSet.foreach_set(Lambda);
Polly had some specialized foreach functions which did not require the lambdas
to return a status flag. We remove these functions in this commit to move Polly
completely over to the new isl interface. We may in the future discuss if
functors without return values can be supported easily.
Another extension proposed by Michael Kruse is the use of C++ iterators to allow
the use of normal for loops to iterate over these sets. Such an extension would
allow us to further simplify the code.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D30620
llvm-svn: 300323
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.
For example, in case of the following matrix-matrix multiplication,
for (i = 0; i < 1024; i++)
for (k = 0; k < 1024; k++)
for (j = 0; j < 1024; j++)
C[i][j] += A[i][k] * B[k][j];
it can produce the following schedule tree
domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
0 <= i2 <= 1023 }"
child:
schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i1)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
permutable: 1
coincident: [ 1, 1, 0 ]
The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).
This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.
Refs.:
[1] - https://bugs.llvm.org/show_bug.cgi?id=32500
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D31741
llvm-svn: 299662
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.
Check against a possible divide-by-zero in getMacroKernelParams
and return early.
Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31708
llvm-svn: 299633
= Change of WAR, WAW generation: =
- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
`sink <- may source <- must source` as a *may* dependence.
- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.
- Now, we call WAR and WAW correctly.
== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
WAW = isl_union_flow_get_may_dependence(Flow);
isl_union_flow_free(Flow);
```
== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
Flow = buildFlow(Write, Read, MustaWrite, Schedule);
WAR = isl_union_flow_get_must_dependence(Flow);
isl_union_flow_free(Flow);
```
- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
intuitively correct, as we do not want may-writes to block
must-writes.
- Leaves us with direct (R -> W).
- This affects reduction generation since RED is built using WAW and WAR.
= New StrictWAW for Reductions: =
- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.
= Explanation: Why the new WAR dependences in tests are correct: =
- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.
== Code: ==
```lang=llvm, name=new-war-dependence.ll
; void manyreductions(long *A) {
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S0: *A += 42;
;
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S1: *A += 42;
;
```
=== WAR dependence: ===
{ S0[1023, 1023] -> S1[0, 0] }
- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:
```lang=cpp, name=dependence-incorrect, counterexample
S0[1023, 1023]:
*-- tmp = *A (load0)--*
WAR 2 add = tmp + 42 |
*-> *A = add (store0) |
WAR 1
S1[0, 0]: |
tmp = *A (load1) |
add = tmp + 42 |
A = add (store1)<-*
```
- One may assume that WAR2 *hides* WAR1 (since store0 happens before
store1). However, within a statement, Polly has no idea about the
ordering of loads and stores.
- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
S0[1023, 1023]:
A = add (store0)
tmp = A (load0) ---*
add = A + 42 |
WAR 1
S1[0, 0]: |
tmp = A (load1) |
add = A + 42 |
A = add (store1) <-*
```
- So, Polly generates (correct) WAR dependences. It does not make sense
to remove these dependences, since they are correct with respect to
Polly's model.
Reviewers: grosser, Meinersbur
tags: #polly
Differential revision: https://reviews.llvm.org/D31386
llvm-svn: 299429
The isl C++ bindings now has implicit conversions from isl::set to
isl::union_set. Therefore the additional overload accepting isl::set
is not required anymore.
llvm-svn: 298529
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30606
llvm-svn: 298510
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.
The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".
Reviewers: grosser
Subscribers: gareevroman, pollydev
Tags: #polly
Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>
Differential Revision: https://reviews.llvm.org/D30815
llvm-svn: 297587
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.
It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.
It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.
Differential Revision: https://reviews.llvm.org/D30820
llvm-svn: 297473
This pass is a small and self-contained example of a piece of code that was
written with the isl C interface. The diff of this change nicely shows how the
C++ bindings can improve the readability of the code by avoiding the long C
function names and by avoiding any need for memory management.
As you will see, no calls to isl_*_copy or isl_*_free are needed anymore.
Instead the C++ interface takes care of automatically managing the objects.
This may introduce internally additional copies, but due to the isl reference
counting, such copies are expected to be cheap. For performance critical
operations, we will later exploit move semantics to eliminate unnecessary
copies that have shown to be costly.
Below we give a set of examples that shows the benefit of the C++ interface vs.
the pure C interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(aff) || isl_aff_is_one(aff))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA = isl_union_pw_multi_aff_from_union_map(umap);
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto *Empty = isl_union_map_empty(space);
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
set = isl_union_set_intersect(set, set2);
After:
Set = Set.intersect(Set2);
The use of isl::boolean in return types also adds an increases the robustness
of Polly, as on conversion to true or false, we verify that no isl_bool_error
has been returned and assert in case an error was returned. Before this change
we would have just ignored the error and proceeded with (some) exection path.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30619
llvm-svn: 297466
Translate the full algorithm to use the new isl C++ bindings
This is a large piece of code that has been written with the Polly IslPtr<>
memory management tool, which only performed memory management, but did not
provide a method interface. As such the code was littered with calls to
give(), copy(), keep(), and take(). The diff of this change should give a
good example how the new method interface simplifies the code by removing the
need for switching between managed types and C functions all the time
and consequently also the need to use the long C function names.
These are a couple of examples comparing the old IslPtr memory management
interface with the complete method interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(Aff.get()) || isl_aff_is_one(Aff.get()))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA =
give(isl_union_pw_multi_aff_from_union_map(UMap.copy());
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto Empty = give(isl_union_map_empty(Space.copy());
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
Set = give(isl_union_set_intersect(Set.copy(), Set2.copy());
After:
Set = Set.intersect(Set2);
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30617
llvm-svn: 297463
Over the last couple of months several authors of independent isl C++ bindings
worked together to jointly design an official set of isl C++ bindings which
combines their experience in developing isl C++ bindings. The new bindings have
been designed around a value pointer style interface and remove the need for
explicit pointer managenent and instead use C++ language features to manage isl
objects.
This commit introduces the smart-pointer part of the isl C++ bindings and
replaces the current IslPtr<T> classes, which served the very same purpose, but
had to be manually maintained. Instead, we now rely on automatically generated
classes for each isl object, which provide value_ptr semantics.
An isl object has the following smart pointer interface:
inline set manage(__isl_take isl_set *ptr);
class set {
friend inline set manage(__isl_take isl_set *ptr);
isl_set *ptr = nullptr;
inline explicit set(__isl_take isl_set *ptr);
public:
inline set();
inline set(const set &obj);
inline set &operator=(set obj);
inline ~set();
inline __isl_give isl_set *copy() const &;
inline __isl_give isl_set *copy() && = delete;
inline __isl_keep isl_set *get() const;
inline __isl_give isl_set *release();
inline bool is_null() const;
}
The interface and behavior of the new value pointer style classes is inspired
by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which
proposes a std::value_ptr, a smart pointer that applies value semantics to its
pointee.
We currently only provide a limited set of public constructors and instead
require provide a global overloaded type constructor method "isl::obj
isl::manage(isl_obj *)", which allows to convert an isl_set* to an isl::set by
calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor
for unique pointers.
The next two functions isl::obj::get() and isl::obj::release() are taken
directly from the std::value_ptr proposal:
S.get() extracts the raw pointer of the object managed by S.
S.release() extracts the raw pointer of the object managed by S and sets the
object in S to null.
We additionally add std::obj::copy(). S.copy() returns a raw pointer refering
to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a
functionality commonly needed when interacting directly with the isl C
interface where all methods marked with __isl_take require consumable raw
pointers.
S.is_null() checks if S manages a pointer or if the managed object is currently
null. We add this function to provide a more explicit way to check if the
pointer is empty compared to a direct conversion to bool.
This commit also introduces a couple of polly-specific extensions that cover
features currently not handled by the official isl C++ bindings draft, but
which have been provided by IslPtr<T> and are consequently added to avoid code
churn. These extensions include:
- operator bool() : Conversion from objects to bool
- construction from nullptr_t
- get_ctx() method
- take/keep/give methods, which match the currently used naming
convention of IslPtr<T> in Polly. They just forward to
(release/get/manage).
- raw_ostream printers
We expect that these extensions are over time either removed or upstreamed to
the official isl bindings.
We also export a couple of classes that have not yet been exported in isl (e.g.,
isl::space)
As part of the code review, the following two questions were asked:
- Why do we not use a standard smart pointer?
std::value_ptr was a proposal that has not been accepted. It is consequently
not available in the standard library. Even if it would be available, we want
to expand this interface with a complete method interface that is conveniently
available from each managed pointer. The most direct way to achieve this is to
generate a specialiced value style pointer class for each isl object type and
add any additional methods to this class. The relevant changes follow in
subsequent commits.
- Why do we not use templates or macros to avoid code duplication?
It is certainly possible to use templates or macros, but as this code is
auto-generated there is no need to make writing this code more efficient. Also,
most of these classes will be specialized with individual member functions in
subsequent commits, such that there will be little code reuse to exploit. Hence,
we decided to do so at the moment.
These bindings are not yet officially part of isl, but the draft is already very
stable. The smart pointer interface itself did not change since serveral months.
Adding this code to Polly is against our normal policy of only importing
official isl code. In this case however, we make an exception to showcase a
non-trivial use case of these bindings which should increase confidence in these
bindings and will help upstreaming them to isl.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30325
llvm-svn: 297452
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.
This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.
Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.
The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.
A patch that makes codegen support partial accesses is in preparation
as well.
Differential Revision: https://reviews.llvm.org/D30763
llvm-svn: 297373
Control flow would flow-through after the check whether the operations
quota exceeded, with the intention that it would later be caught by
Knowledge::isUsable(). However, the Knowledge constructor has its own
assertions to check consistency which would fail if its fields have only
been initialized partially because some sets have been computed correctly
before the operations quota takes effect.
Fix by erroring-out early instead of falling-throught into the code that
might expect that everything has been computed correctly. For robustness,
also bail-out if any of the fields contain nullptr values instead of
relying on isl always setting exactly this error code if something went
wrong.
This should fix the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable
(-polly-process-unprofitable -polly-position=before-vectorizer
-polly-enable-delicm) buildbot.
llvm-svn: 296022
NonowningIslPtr<isl_X> was used as types of function parameters when the
function does not consume the isl object, i.e. an __isl_keep parameter.
The alternatives are:
1. IslPtr<isl_X>
This has additional calls to isl_X_copy and isl_X_free to
increase/decrease the reference counter even though not needed. The
caller already owns a reference to the isl object.
2. const IslPtr<isl_X>&
This does not change the reference counter, but requires an
additional load to get the pointer to the isl object (instead of just
passing the pointer itself).
Moreover, the compiler cannot rely on the constness of the pointer
and has to reload the pointer every time it writes to memory (unless
alias analysis such as TBAA says it is not possible).
The isl C++ bindings currently in development do not have an equivalent
to NonowningIslPtr and adding one would make the binding more
complicated and its advantage in performance is small. In order to
simplify the transition to these C++ bindings, remove NonowningIslPtr.
Change every former use of it to alternative 2 mentioned aboce
(const IslPtr<isl_X>&).
llvm-svn: 295998
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30293
llvm-svn: 295958