Remove an assertion that tests the injectivity of the
PHIRead -> PHIWrite relation. That is, allow a single PHI write to be
used by multiple PHI reads. This may happen due to some statements
containing the PHI write not having the statement instances that would
overwrite the previous incoming value due to (assumed/invalid) contexts.
This result in that PHI write is mapped to multiple targets which is not
supported. Codegen will select one one of the targets using
getAddressFunction(). However, the runtime check should protect us from
this case ever being executed.
We therefore allow injective PHI relations. Additional calculations to
detect/santitize this case would probably not be worth the compuational
effort.
This fixes llvm.org/PR34485
llvm-svn: 313902
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.
When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.
Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.
This fixes llvm.org/PR34441
llvm-svn: 313842
Fix walking over the schedule tree to collect its properties
(Number of permutable bands etc.).
Also add regression tests for these statistics.
llvm-svn: 313750
Computing the reaching definition in forwardTree() can take a long time
if the coefficients are large. When the forwarding is
carried-out (doIt==true), forwardTree() must execute entirely or not at
all to get a consistent output, which means we cannot just allow
out-of-quota errors to happen in the middle of the processing.
We introduce the class IslQuotaScope which allows to opt-in code that is
conformant and has been tested with out-of-quota events. In case of
ForwardOpTree, out-of-quota is allowed during the operand tree
examination, but not during the transformation. The same forwardTree()
recursion is used for examination and execution, meaning that the
reaching definition has already been computed in the examination tree
walk and cached for reuse in the transformation tree walk.
This should fix the time-out of grtestutils.ll of the asop buildbot. If
the compilation still takes too long, we can reduce the max-operations
allows for -polly-optree.
Differential Revision: https://reviews.llvm.org/D37984
llvm-svn: 313690
cl::opt<unsigned long> is not specialized and hence the option
-polly-optree-max-ops impossible to use.
Replace by supported option cl::opt<unsigned>.
Also check for an error state when computing the written value, which
happens when the quota runs out.
llvm-svn: 313546
The remaining parts produced by the full partial tile isolation can contain
hot spots that are worth to be optimized. Currently, we rely on the simple
loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts.
However, the approach can suffer from the lack of the information about
aliasing that Polly provides using additional alias metadata or/and the lack
of the information required by simple loop unrolling pass.
This patch is the first step to optimize the remaining parts. To do it, we
unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps
to increase the performance of the generated code from 39.87 GFlop/s to
49.23 GFlop/s.
The next possible step is to avoid unrolling performed by Polly in case of
isolated and remaining parts and rely only on simple loop unrolling pass and
the Loop vectorizer.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D37692
llvm-svn: 312929
Up to now ZoneAlgo considered array elements access by something else
than a LoadInst or StoreInst as not analyzable. This patch removes that
restriction by using the unknown ValInst to describe the written
content, repectively the element type's null value in case of memset.
Differential Revision: https://reviews.llvm.org/D37362
llvm-svn: 312630
Since r312249 instructions of a entry block of region statements are
not marked as root anymore and hence can theoretically be removed
if unused. Theoretically, because the instruction list was not changed.
Still, MemoryAccesses for unused instructions were removed. This lead
to a failed assertion in the code generator when the MemoryAccess for
the still listed instruction was not found.
This hould fix the
Assertion failed: ArrayAccess && "No array access found for instruction!",
file ScopInfo.h, line 1494
compiler crashes.
llvm-svn: 312566
Before this patch, OpTree did not consider forwarding an operand tree consisting
of only single LoadInst as useful. The motivation was that, like an access to a
read-only variable, it would just replace one MemoryAccess by another. However,
in contrast to read-only accesses, this would replace a scalar access by an
array access, which is something worth doing.
In addition, leaving scalar MemoryAccess is problematic in that VirtualUse
prioritizes inter-Stmt use over intra-Stmt. It was possible that the same LLVM
value has a MemoryAccess for accessing the remote Stmt's LoadInst as well as
having the same LoadInst in its own instruction list (due to being forwarded
from another operand tree).
With this patch we ensure that if a LoadInst is forwarded is any operand tree,
also the operand tree containing just the LoadInst is forwarded as well, which
effectively removes the scalar MemoryAccess such that only the array access
remains, not both.
Thanks Michael for the detailed explanation.
Reviewers: Meinersbur, bellu, singam-sanjay, gareevroman
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37424
llvm-svn: 312456
Summary:
After region statements now also have instruction lists, this is a
straightforward extension.
Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman
Reviewed By: Meinersbur
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37298
llvm-svn: 312249
ZoneAlgo used to bail out for the complete SCoP if it encountered
something violating its assumption. This meant the neither OpTree can
forward any load nor DeLICM do anything in such cases, even if their
transformations are unrelated to the violations.
This patch adds a list of compatible elements (currently with the
granularity of entire arrays) that can be used for analysis. OpTree
and DeLICM can then check whether their transformations only concern
compatible elements, and skip non-compatible ones.
This will be useful for e.g. Polybench's benchmarks covariance,
correlation, bicg, doitgen, durbin, gramschmidt, adi that have
assumption violation, but which are not necessarily relevant
for all transformations.
Differential Revision: https://reviews.llvm.org/D37219
llvm-svn: 311929
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.
This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)
JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.
Differential Revision: https://reviews.llvm.org/D37010
llvm-svn: 311888
Summary:
This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.
MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.
MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :
```
void mse(double A[Ni], double B[Nj]) {
int i,j;
double tmp = 6;
for (i = 0; i < Ni; i++) {
for (int j = 0; j<Nj; j++) {
tmp = tmp + 2;
}
B[i] = tmp;
}
}
```
Polly gives us the following statements and memory accesses :
```
Statements {
Stmt_for_body
Domain :=
{ Stmt_for_body[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_body[i0] -> [i0, 0, 0] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
Instructions {
%tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
}
Stmt_for_inc
Domain :=
{ Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
Schedule :=
{ Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
Instructions {
%tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
%add = fadd double %tmp.11, 2.000000e+00
%exitcond = icmp ne i32 %inc, 10000
}
Stmt_for_end
Domain :=
{ Stmt_for_end[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_end[i0] -> [i0, 2, 0] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_end[i0] -> MemRef_B[i0] };
Instructions {
%add.lcssa = phi double [ %add, %for.inc ]
store double %add.lcssa, double* %arrayidx, align 8
%exitcond5 = icmp ne i64 %indvars.iv.next, 10000
}
}
```
and the following dependences :
```
{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999;
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998;
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999;
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }
```
When trying to expand this memory access :
```
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
```
The new access map would look like this :
```
{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }
```
The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.
Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev, Meinersbur
Differential Revision: https://reviews.llvm.org/D36647
llvm-svn: 311619
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
for scalar false dependencies
- Number of parallel loops
These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.
Differential Revision: https://reviews.llvm.org/D37049
llvm-svn: 311553
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36928
llvm-svn: 311473
The pattern recognition for MatMul is restrictive.
The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).
This was changed and made more flexible.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36460
llvm-svn: 311302
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
int i,j;
for (j = 0; j < Ni; j++) {
for (int i = 0; i<Nj; i++)
S: B[i] = i;
for (int i = 0; i<Nj; i++)
T: D[i] = i;
U: A[j] = B[j];
C[j] = D[j];
}
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.
This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.
We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.
Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur, simbuerg
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36791
llvm-svn: 311165
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.
This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.
Differential Revision: https://reviews.llvm.org/D36832
llvm-svn: 311126
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.
We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: mgorny, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36800
llvm-svn: 311066
We are working towards removing uses of Scop::getStmtFor(BB). In this
patch, we remove dependency of Scop::getStmtFor(Inst) on getStmtFor(BB).
To do so, we introduce a map of instructions to their corresponding scop
statements and use it to get the instructions' statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35663
llvm-svn: 310494
The previous value of "polly-delicm" was forgotten to to be changed when
ForwardOpTree was split from DeLICM.
Thanks to Tobias for noticing!
llvm-svn: 310465
distributeDomain() and filterKnownValInst() are used in a scop
of ForwardOpTree that limits the number of isl operations.
Therefore some isl functions may return null after any operation.
Remove assertion that assume non-null results and handle
isl_*_foreach returning isl::stat::error.
I hope this fixes the crash of the asop buildbot at ihevc_recon.c.
llvm-svn: 310461
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36278
llvm-svn: 310374
It is possible that partial writes are empty (write is never executed).
In this case, when in PHINode's incoming edge is never taken such that
the incoming write becomes an empty partial write, if enabled. The
issue is that when converting the union_map to an map, it's space
cannot be derived from the union_map itself. Rather, we need to
determine its space independently.
This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk.
llvm-svn: 310348
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.
llvm-svn: 310338
This allows us to remove more scalar dependences. While this feature is still
rather experimental, we want to give it sufficient test coverage.
llvm-svn: 310314
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.
llvm-svn: 310311
This commit implements the initial version of fully-indexed static
expansion.
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[j] = j;
T: A[i] = B[i]
```
After the pass, we want this :
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[i][j] = j;
T: A[i] = B[i][i]
```
For now we bail (fail) in the following cases:
- Scalar access
- Multiple writes per SAI
- MayWrite Access
- Expansion that leads to an access to the original array
Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.
Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>
Reviewers: simbuerg, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: mgorny, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D34982
llvm-svn: 310304