Whether a partial write is tautological/unsatisfiable not only
depends on the access domain, but also on the domain covered
by its node in the AST.
In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0).
for (int c0 = 0; c0 < tmp5; c0 += 1) {
Stmt_for_body344(c0);
if (tmp5 >= c0 + 2)
Stmt_cond_false(c0);
Stmt_cond_end(c0);
}
if (tmp5 <= 0) {
Stmt_for_body344(0);
Stmt_cond_false(0);
Stmt_cond_end(0);
}
Isl cannot derive a subscript for an array element that is never accessed.
This caused an error in that no subscript expression has been generated
in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one
to exist because there is an execution of that write, just not in that
ast node.
Fixed by instead of determining whether the access domain is empty,
inspect whether isl generated a constant "false" ast expression in
the current ast node.
This should fix a compiler crash of the aosp buildbot.
llvm-svn: 311663
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.
Differential Revision: https://reviews.llvm.org/D37058
llvm-svn: 311648
Summary:
This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.
MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.
MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :
```
void mse(double A[Ni], double B[Nj]) {
int i,j;
double tmp = 6;
for (i = 0; i < Ni; i++) {
for (int j = 0; j<Nj; j++) {
tmp = tmp + 2;
}
B[i] = tmp;
}
}
```
Polly gives us the following statements and memory accesses :
```
Statements {
Stmt_for_body
Domain :=
{ Stmt_for_body[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_body[i0] -> [i0, 0, 0] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
Instructions {
%tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
}
Stmt_for_inc
Domain :=
{ Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
Schedule :=
{ Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
Instructions {
%tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
%add = fadd double %tmp.11, 2.000000e+00
%exitcond = icmp ne i32 %inc, 10000
}
Stmt_for_end
Domain :=
{ Stmt_for_end[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_end[i0] -> [i0, 2, 0] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_end[i0] -> MemRef_B[i0] };
Instructions {
%add.lcssa = phi double [ %add, %for.inc ]
store double %add.lcssa, double* %arrayidx, align 8
%exitcond5 = icmp ne i64 %indvars.iv.next, 10000
}
}
```
and the following dependences :
```
{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999;
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998;
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999;
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }
```
When trying to expand this memory access :
```
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
```
The new access map would look like this :
```
{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }
```
The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.
Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev, Meinersbur
Differential Revision: https://reviews.llvm.org/D36647
llvm-svn: 311619
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
for scalar false dependencies
- Number of parallel loops
These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.
Differential Revision: https://reviews.llvm.org/D37049
llvm-svn: 311553
Loop with zero iteration are, syntactically, loops. They have been
excluded from the loop counter even for the non-profitable counters.
This seems to be unintentially as the sentinel value of '0' minimal
iterations does exclude such loops.
Fix by never considering the iteration count when the sentinel
value of 0 is found.
This makes the recently added NumTotalLoops couter redundant
with NumLoopsOverall, which now is equivalent. Hence, NumTotalLoops
is removed as well.
Note: The test case 'ScopDetect/statistics.ll' effectively does not
check profitability, because -polly-process-unprofitable is passed
to all test cases.
llvm-svn: 311551
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.
Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.
llvm-svn: 311548
Summary:
ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore.
The test failure was observed on these two polly buildbots:
http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310
This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert.
Reviewers: grosser, sanjoy, bollu
Reviewed By: grosser
Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36776
llvm-svn: 311503
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.
This improves 2mm performance from 1.5 seconds to 0.17 seconds.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: Meinersbur
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37028
llvm-svn: 311498
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36928
llvm-svn: 311473
The implementation of computeArrayUnused did not consider writes without
reads before, except for the first write in the SCoP. This caused it to
'forget' writes directly following another write.
This patch re-adds the entire reaching defintion of a write that has not
been covered before by a read.
This fixes Polybench 4.2 2mm where only one of the matrix-multiplication
was detected.
llvm-svn: 311403
Dragonegg generates most function parameters as pointers to the actual
parameters. However, it does not mark these parameters with the
dereferencable attribute.
Polly is conservative when it comes to invariant load
hoisting, thus we add runtime checks to invariant load hoisted pointers
when we do not know that pointers are dereferencable. This is correct behaviour,
but is a performance penalty.
Add a flag that allows all pointer parameters to be dereferencable. That
way, polly can speculatively load-hoist paramters to functions without
runtime checks.
Differential Revision: https://reviews.llvm.org/D36461
llvm-svn: 311329
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.
Differential Revision: https://reviews.llvm.org/D36934
llvm-svn: 311328
The pattern recognition for MatMul is restrictive.
The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).
This was changed and made more flexible.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36460
llvm-svn: 311302
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.
This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.
llvm-svn: 311268
Instead of using Twines and temporary expressions, we do string manipulation
through a std::string. This resolves a memory corruption issue, which likely
was caused by twines loosing their underlying string too soon.
llvm-svn: 311264
- We should iterate over `I`, which is `Cur` expanded out to an
instruction, and not `Cur` itself.
- This is a bugfix.
Differential Revision: https://reviews.llvm.org/D36923
llvm-svn: 311261
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.
We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36929
llvm-svn: 311259
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.
Differential revision: D36925
llvm-svn: 311248
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
int i,j;
for (j = 0; j < Ni; j++) {
for (int i = 0; i<Nj; i++)
S: B[i] = i;
for (int i = 0; i<Nj; i++)
T: D[i] = i;
U: A[j] = B[j];
C[j] = D[j];
}
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.
This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.
We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.
Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur, simbuerg
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36791
llvm-svn: 311165
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.
This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.
Differential Revision: https://reviews.llvm.org/D36832
llvm-svn: 311126
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:
```
call void \
bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```
- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.
Differential Revision: https://reviews.llvm.org/D36825
llvm-svn: 311121
- If we have global arrays, we would like to rewrite them to global
pointers which are allocated using `cudaMallocManaged`.
- If we have allocas in a function, we would like to rewrite them to
heap-allocations with `cudaMallocManaged` and `cudaFree`.
- With these rewrite mechanisms, we can offload _any_ function to the
GPU with no code rewrite whatsover.
Differential Revision: https://reviews.llvm.org/D36516
llvm-svn: 311080