In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
llvm-svn: 312410
In Polly, we specifically add a paramter to represent the outermost dimension
size of fortran arrays. We do this because this information is statically
available from the fortran metadata generated by dragonegg.
However, we were only materializing these parameters (meaning, creating an
llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
we should materialize parameters from *scop array info*.
It is wrong because if there is a case where we detect 2 fortran arrays,
but only one of them is accessed, we may not materialize the other array's
dimensions at all.
This is incorrect. We fix this by looping over all
`polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.
Differential Revision: https://reviews.llvm.org/D37379
llvm-svn: 312350
Mark scalar dependences for different statements belonging to same BB
as 'Inter'.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D37147
llvm-svn: 312324
Currently, GVN can be necessary to eliminate redundant instructions in case
of, for instance, GEMM and float type. This patch makes GVN be run during
the cleanup.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D37340
llvm-svn: 312307
Summary:
After region statements now also have instruction lists, this is a
straightforward extension.
Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman
Reviewed By: Meinersbur
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37298
llvm-svn: 312249
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.
So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.
This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.
Differential Revision: https://reviews.llvm.org/D37056
llvm-svn: 312239
The adds code generation support for the previous commit.
This patch has been re-applied, after the memory issue in the previous patch
has been fixed.
llvm-svn: 312211
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.
We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.
This change set is reapplied, after a memory corruption issue had been fixed.
llvm-svn: 312210
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.
We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.
llvm-svn: 312128
Reduction detection is only executed in the SCoP building phase.
Hence it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312118
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312117
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312116
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
This mostly mechanical change makes ScopBuilder directly access some of
ScopStmt/MemoryAccess private fields. We add ScopBuilder as a friend
class and will add proper accessor functions sometime later.
llvm-svn: 312115
This patch allows annotating of metadata in ir instruction
(with "polly_split_after"), which specifies where to split a particular
scop statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36402
llvm-svn: 312107
The intrinsics memset, memcopy and memmove do have their memory accesses
modeled by ScopBuilder. Do not consider them error-case behavior.
Test case will come with a future patch that requires memory intrinsics
outside of error blocks.
llvm-svn: 312021
Commit r252725 introduced a "return false" if an ignored intrinsics was
found. The consequence of this was that the mere existence of an ignored
intrinsic (such as llvm.dbg.value) before a call that would have
qualified the block to be an error block, to not be an error block.
The obvious goal was to just skip ignored intrinsics, not changing the
meaning of what an error block is.
llvm-svn: 312020
ZoneAlgo used to bail out for the complete SCoP if it encountered
something violating its assumption. This meant the neither OpTree can
forward any load nor DeLICM do anything in such cases, even if their
transformations are unrelated to the violations.
This patch adds a list of compatible elements (currently with the
granularity of entire arrays) that can be used for analysis. OpTree
and DeLICM can then check whether their transformations only concern
compatible elements, and skip non-compatible ones.
This will be useful for e.g. Polybench's benchmarks covariance,
correlation, bicg, doitgen, durbin, gramschmidt, adi that have
assumption violation, but which are not necessarily relevant
for all transformations.
Differential Revision: https://reviews.llvm.org/D37219
llvm-svn: 311929
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.
This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)
JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.
Differential Revision: https://reviews.llvm.org/D37010
llvm-svn: 311888
In cases where the entry block of a scop was not contained in a loop that was
part of the scop region and at the same time there was a loop surrounding the
scop, we missed to count the loops in the scop and consequently did not consider
the scop profitable. We correct this by only moving to the loop parent, in case
the current loop is loop contained in the scop.
This increases the number of loops in COSMO which we assume to be profitable
from 3974 to 4981.
llvm-svn: 311863
Whether a partial write is tautological/unsatisfiable not only
depends on the access domain, but also on the domain covered
by its node in the AST.
In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0).
for (int c0 = 0; c0 < tmp5; c0 += 1) {
Stmt_for_body344(c0);
if (tmp5 >= c0 + 2)
Stmt_cond_false(c0);
Stmt_cond_end(c0);
}
if (tmp5 <= 0) {
Stmt_for_body344(0);
Stmt_cond_false(0);
Stmt_cond_end(0);
}
Isl cannot derive a subscript for an array element that is never accessed.
This caused an error in that no subscript expression has been generated
in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one
to exist because there is an execution of that write, just not in that
ast node.
Fixed by instead of determining whether the access domain is empty,
inspect whether isl generated a constant "false" ast expression in
the current ast node.
This should fix a compiler crash of the aosp buildbot.
llvm-svn: 311663
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.
Differential Revision: https://reviews.llvm.org/D37058
llvm-svn: 311648
Summary:
This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.
MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.
MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :
```
void mse(double A[Ni], double B[Nj]) {
int i,j;
double tmp = 6;
for (i = 0; i < Ni; i++) {
for (int j = 0; j<Nj; j++) {
tmp = tmp + 2;
}
B[i] = tmp;
}
}
```
Polly gives us the following statements and memory accesses :
```
Statements {
Stmt_for_body
Domain :=
{ Stmt_for_body[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_body[i0] -> [i0, 0, 0] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
Instructions {
%tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
}
Stmt_for_inc
Domain :=
{ Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
Schedule :=
{ Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
Instructions {
%tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
%add = fadd double %tmp.11, 2.000000e+00
%exitcond = icmp ne i32 %inc, 10000
}
Stmt_for_end
Domain :=
{ Stmt_for_end[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_end[i0] -> [i0, 2, 0] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_end[i0] -> MemRef_B[i0] };
Instructions {
%add.lcssa = phi double [ %add, %for.inc ]
store double %add.lcssa, double* %arrayidx, align 8
%exitcond5 = icmp ne i64 %indvars.iv.next, 10000
}
}
```
and the following dependences :
```
{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999;
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998;
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999;
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }
```
When trying to expand this memory access :
```
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
```
The new access map would look like this :
```
{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }
```
The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.
Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev, Meinersbur
Differential Revision: https://reviews.llvm.org/D36647
llvm-svn: 311619
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
for scalar false dependencies
- Number of parallel loops
These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.
Differential Revision: https://reviews.llvm.org/D37049
llvm-svn: 311553
Loop with zero iteration are, syntactically, loops. They have been
excluded from the loop counter even for the non-profitable counters.
This seems to be unintentially as the sentinel value of '0' minimal
iterations does exclude such loops.
Fix by never considering the iteration count when the sentinel
value of 0 is found.
This makes the recently added NumTotalLoops couter redundant
with NumLoopsOverall, which now is equivalent. Hence, NumTotalLoops
is removed as well.
Note: The test case 'ScopDetect/statistics.ll' effectively does not
check profitability, because -polly-process-unprofitable is passed
to all test cases.
llvm-svn: 311551
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.
Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.
llvm-svn: 311548
Summary:
ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore.
The test failure was observed on these two polly buildbots:
http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310
This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert.
Reviewers: grosser, sanjoy, bollu
Reviewed By: grosser
Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36776
llvm-svn: 311503
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.
This improves 2mm performance from 1.5 seconds to 0.17 seconds.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: Meinersbur
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37028
llvm-svn: 311498
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36928
llvm-svn: 311473
This was originally a `#define`. It is much easier to play around with
this as an environment variable when we run on large programs.
Differential Revision: https://reviews.llvm.org/D37012
llvm-svn: 311471
The implementation of computeArrayUnused did not consider writes without
reads before, except for the first write in the SCoP. This caused it to
'forget' writes directly following another write.
This patch re-adds the entire reaching defintion of a write that has not
been covered before by a read.
This fixes Polybench 4.2 2mm where only one of the matrix-multiplication
was detected.
llvm-svn: 311403
Dragonegg generates most function parameters as pointers to the actual
parameters. However, it does not mark these parameters with the
dereferencable attribute.
Polly is conservative when it comes to invariant load
hoisting, thus we add runtime checks to invariant load hoisted pointers
when we do not know that pointers are dereferencable. This is correct behaviour,
but is a performance penalty.
Add a flag that allows all pointer parameters to be dereferencable. That
way, polly can speculatively load-hoist paramters to functions without
runtime checks.
Differential Revision: https://reviews.llvm.org/D36461
llvm-svn: 311329
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.
Differential Revision: https://reviews.llvm.org/D36934
llvm-svn: 311328
The pattern recognition for MatMul is restrictive.
The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).
This was changed and made more flexible.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36460
llvm-svn: 311302
We now load the function pointer for `cuMemAllocManaged` dynamically, so
it should be possible to compile `GPUJIT` on non-CUDA systems again.
It should now be possible to link on non-cuda systems again.
Thanks to Philipp Schaad for noticing this inconsitency.
Differential Revision: https://reviews.llvm.org/D36921
llvm-svn: 311289
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.
This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.
llvm-svn: 311268
Instead of using Twines and temporary expressions, we do string manipulation
through a std::string. This resolves a memory corruption issue, which likely
was caused by twines loosing their underlying string too soon.
llvm-svn: 311264
- We should iterate over `I`, which is `Cur` expanded out to an
instruction, and not `Cur` itself.
- This is a bugfix.
Differential Revision: https://reviews.llvm.org/D36923
llvm-svn: 311261
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.
We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36929
llvm-svn: 311259
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.
Differential revision: D36925
llvm-svn: 311248
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
int i,j;
for (j = 0; j < Ni; j++) {
for (int i = 0; i<Nj; i++)
S: B[i] = i;
for (int i = 0; i<Nj; i++)
T: D[i] = i;
U: A[j] = B[j];
C[j] = D[j];
}
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.
This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.
We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.
Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur, simbuerg
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36791
llvm-svn: 311165
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.
This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.
Differential Revision: https://reviews.llvm.org/D36832
llvm-svn: 311126
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:
```
call void \
bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```
- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.
Differential Revision: https://reviews.llvm.org/D36825
llvm-svn: 311121
In release builds LLVM may not pass along LLVM names consistently. We make the
test cases independent of the LLVM-IR names to avoid spurious test case
failures.
llvm-svn: 311118
- If we have global arrays, we would like to rewrite them to global
pointers which are allocated using `cudaMallocManaged`.
- If we have allocas in a function, we would like to rewrite them to
heap-allocations with `cudaMallocManaged` and `cudaFree`.
- With these rewrite mechanisms, we can offload _any_ function to the
GPU with no code rewrite whatsover.
Differential Revision: https://reviews.llvm.org/D36516
llvm-svn: 311080
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.
We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: mgorny, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36800
llvm-svn: 311066
ReportLoopHasNoExit started failing after r310940 that added
infinite loops to postdominators. The change made regions not
contain infinite loops anymore.
This patch unbreaks the polly tree by XFAILING the
ReportLoopHasNoExit test. Full fix is under review in D36776.
llvm-svn: 310980
Requesting size 0 allocations from `cuMalloc` / `cuMallocManaged` fails.
If there is a size 0 allocation that can be statically proved, the we
fail at PPCGCodeGeneration. This is because if size 0 allocation could
take place, we should not generate code that tries to use this array.
However, there are cases where we cannot statically prove this, and at
runtime we get a request for 0 bytes of memory. We choose to allocate
size 1 to allow the program to continue running.
Differential Revision: https://reviews.llvm.org/D36751
llvm-svn: 310941
Summary:
Before, if we fail to parse a jscop file, this will be reported as an
error and importing is aborted. However, this isn't actually strong
enough, since although the import is aborted, the scop has already been
modified and is very likely broken. Instead, make this a hard failure
and throw an LLVM error. This new behaviour requires small changes to
the tests for the legacy pass, namely using `not` to verify the error.
Further, fixed the jscop file for the
base_pointer_load_is_inst_inside_invariant_1 testcase.
Reviewed By: Meinersbur
Split out of D36578.
llvm-svn: 310599
Summary:
I pulled out all functionality into static functions, and use those both
in the legacy passes and in the new ones.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D36578
llvm-svn: 310597
Summary:
During code generation for a Scop we modify the IR of a function.
While this shouldn't affect a Scop in the formal sense, the implementation
caches various information about the IR such as SCEV expressions for bounds or
parameters. This cached information needs to be updated or invalidated. To this
end, SPMUpdater allows passes to report when they've invalidated a Scop to the
PassManager, which will then flush and recompute all Scops. This in turn
invalidates all iterators, so references to Scops shouldn't be held.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D36524
llvm-svn: 310551
We are working towards removing uses of Scop::getStmtFor(BB). In this
patch, we remove dependency of Scop::getStmtFor(Inst) on getStmtFor(BB).
To do so, we introduce a map of instructions to their corresponding scop
statements and use it to get the instructions' statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35663
llvm-svn: 310494
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.
Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.
A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.
Differential Revision: https://reviews.llvm.org/D36513
llvm-svn: 310471
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.
This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.
llvm-svn: 310466
The previous value of "polly-delicm" was forgotten to to be changed when
ForwardOpTree was split from DeLICM.
Thanks to Tobias for noticing!
llvm-svn: 310465
distributeDomain() and filterKnownValInst() are used in a scop
of ForwardOpTree that limits the number of isl operations.
Therefore some isl functions may return null after any operation.
Remove assertion that assume non-null results and handle
isl_*_foreach returning isl::stat::error.
I hope this fixes the crash of the asop buildbot at ihevc_recon.c.
llvm-svn: 310461
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.
To fix this, ask data layout what the size in bytes should be.
Differential Revision: https://reviews.llvm.org/D36459
llvm-svn: 310448
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.
In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.
To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D35761
llvm-svn: 310380
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36278
llvm-svn: 310374
This allows us to get rid of stores that are overwritten within the very same
basic block, without ever being read beforehand. This simplification is
necessary for delicm to run on pb4's correlation.
llvm-svn: 310369
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.
Differential Revision: https://reviews.llvm.org/D36457
llvm-svn: 310350
It is possible that partial writes are empty (write is never executed).
In this case, when in PHINode's incoming edge is never taken such that
the incoming write becomes an empty partial write, if enabled. The
issue is that when converting the union_map to an map, it's space
cannot be derived from the union_map itself. Rather, we need to
determine its space independently.
This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk.
llvm-svn: 310348
- It's useful to know the amount of memory asked for since, for example,
asking for `0` bytes of memory is illegal.
- Line number is helpful since we print the same message in the function
at different points.
llvm-svn: 310340
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.
llvm-svn: 310338
Polly has traditionally always been executed at the beginning of the pass
pipeline as LLVM's inliner and DeLICM passes introduced plenty of scalar
dependences which prevented any kind of useful high-level loop optimizations
later in the pass pipeline. With DeLICM now being available, Polly can also
run optimizations when folded into the pass pipeline. This has the benefit
that Polly should now be more effective on C++ code and as an additional bonus,
no additional early canonicalization phase must be run. As a result, Polly
touches the code only if it applies a transformation. Code that does not
benefit from Polly is not touched and consequently will have the very same
execution time as without Polly enabled. Random performance changes, as could
sometimes be observed with polly-position=early are consequently not possible
any more. If performance is changed, this is due to Polly is choosing to
perform a transformation. If this choice is wrong, it can be fixed directly
in Polly.
http://polly.llvm.org/docs/Architecture.html#polly-in-the-llvm-pass-pipeline
llvm-svn: 310319
This allows us to remove more scalar dependences. While this feature is still
rather experimental, we want to give it sufficient test coverage.
llvm-svn: 310314
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.
llvm-svn: 310311
This commit implements the initial version of fully-indexed static
expansion.
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[j] = j;
T: A[i] = B[i]
```
After the pass, we want this :
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[i][j] = j;
T: A[i] = B[i][i]
```
For now we bail (fail) in the following cases:
- Scalar access
- Multiple writes per SAI
- MayWrite Access
- Expansion that leads to an access to the original array
Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.
Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>
Reviewers: simbuerg, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: mgorny, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D34982
llvm-svn: 310304
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.
The analysis is now enabled by default.
The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.
Differential Revision: https://reviews.llvm.org/D36380
llvm-svn: 310279
This update is mostly a maintenance update, but also exposes a couple of new
functions that will be needed for the next version of the isl++ bindings.
llvm-svn: 310205
Summary:
Small patch to fix the JSON exporter.
Currently, using "opt -polly-export-jscop" does not generate jscop files, but gives an error:
*** Error in `opt': corrupted double-linked list: 0x0000000000bc4bb0 ***
Updated the function getAccessRelationStr() to work with the current version of getAccessRelation(), fixing the JSON exporter
Reviewers: bollu, grosser
Reviewed By: grosser
Subscribers: grosser, llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36370
llvm-svn: 310199
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D36372
llvm-svn: 310196
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.
The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.
Differential Revision: https://reviews.llvm.org/D36290
llvm-svn: 310193
This logic is duplicated, so we refactor it into a separate function.
This will be used in a later patch to teach PPCGCodeGen code generation
for loops that are outside the scop.
Differential Revision: https://reviews.llvm.org/D36310
llvm-svn: 310192
In https://reviews.llvm.org/D36278 it was pointed out that the behavior of
getPartialTilePrefixes is not very well understood. To allow for a better
understanding, we first provide some basic unittests.
llvm-svn: 310175
The method forwardSpeculatable forwards speculatively executable
instructions and is currently the only way to forward an
instruction.
In the future we intend to add more methods.
llvm-svn: 310056
Summary:
Testing the new-pm passes becomes much easier once they behave more like the
old passes in terms of the order in which Scops are processed and printed. This
requires three changes:
- ScopInfo: Use an ordered map to store scops
- ScopInfo: Iterate and print Scops in reverse order to match legacy PM behaviour
- ScopDetection: print function name in ScopAnalysisPrinter
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36303
llvm-svn: 310052
The complication of bspatch.cc of the AOSP buildbot currently fails
presumably because the occurance of a MetadataAsValue in an operand.
This kind of value can occur as operands of intrinsics, the typical
example being the debug intrinsics.
Polly currently ignores the debug intrinsics and it is not yet clear
which other intrinic might occur. For such cases, and to unbreak the
AOSP buildbot, treat a MetadataAsValue as a constant because it can be
referenced without modification in generated code.
llvm-svn: 309992
With this patch, we get rid of the last use of getStmtFor(BB). Here
this is done by getting the last statement of the incoming block in
case the user is a phi node; otherwise just fetching the statement
comprising the instruction for which the virtual use is being created.
Differential Revision: https://reviews.llvm.org/D36268
llvm-svn: 309947
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.
To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36243
llvm-svn: 309939
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.
The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.
We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.
llvm-svn: 309934
These passes have been tested over the last month and should generally help
to remove scalar data dependences in Polly. We enable them to give them even
wider test coverage. Large performance regressions and any kind of correctness
regressions are not expected.
llvm-svn: 309878
When compiling with clang, explicit instantiation of the
OwningScopAnalysisManagerFunctionProxy needs to happen within the polly
namespace. Same goes with the specialization of its run method.
llvm-svn: 309835
Summary:
This patch is a first attempt at registering Polly passes with the LLVM tools. Tool plugins are still unsupported, but this registration is usable from the tools if Polly is linked into them (albeit requiring minimal patches to those tools). Registration requires a small amount of machinery (the owning analysis proxies), necessary for injecting ScopAnalysisManager objects into the calling tools.
This patch is marked WIP because the registration is incomplete. Parsing manual pipelines is fully supported, but default pass injection into the O3 pipeline is lacking, mostly because there is opportunity for some redesign here, I believe. The first point of order would be insertion points. I think it makes sense to run before the vectorizers. Running Polly Early, however, is weird. Mostly because it actually is the default (which to me is unexpected), and because Polly runs it's own O1 pipeline. Why not instead insert it at an appropriate place somewhere after simplification happend? Running after the loop optimizers seems intuitive, but it also seems wasteful, since multiple consecutive loops might well be a single scop, and we don't need to run for all of them.
My second request for comments would be regarding all those smallish helper passes we have, like PollyViewer, PollyPrinter, PollyImportJScop. Right now these are controlled by command line options, deciding whether they should be part of the Polly pipeline. What is your opinion on treating them like real passes, and have the user write an appropriate pipeline if they want to use any of them?
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35458
llvm-svn: 309826
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**
When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.
The only debug metadata of the instruction, the DebugLoc, is unset by this patch.
This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.
Reviewers: grosser, bollu
Reviewed By: grosser
Subscribers: pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36161
llvm-svn: 309822
Summary: I made a mistake in handling transitive invalidation of analysis results. I've updated the list of preserved analyses as well as the correct result dependences.
The Invalidator passed through the invalidate() path can be used to
transitively invalidate analyses. It frequently happens that analysis
results depend on other analyses, and thus store references to their
results. When the dependee now gets invalidated, the depender needs to
be invalidated as well. This is the purpose of the Invalidator object,
which can be used to check whether some dependee analysis is in the
process of being invalidated. I originally was checking the wrong
dependee analyses, which is an actual error, you can only check analysis
results that are in the cache (which they are if you've captured their
reference). The invalidation I'm handling inside the proxy deals with
the standard analyses the proxy passes into the Scop pipeline, since I'm
capturing their reference.
This checking allows us to actually preserve a couple of results outside
of the proxy, since the Scop pipeline shouldn't break those, or
otherwise should update them accordingly.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36216
llvm-svn: 309811
We introduce `polly_mallocManaged` and `polly_freeManaged` as
proxies for `cudaMallocManaged` / `cudaFree`. This is currently not
used by Polly. It is auxiliary code that is used in `COSMO`.
This is useful because `polly_mallocManaged` matches the signature of `malloc`,
while `cudaMallocManaged` does not. We introduce `polly_freeManaged` for
symmetry.
We use this in COSMO to use the unified memory feature of the newer
CUDA APIs (>= 6).
Differential Revision: https://reviews.llvm.org/D35991
llvm-svn: 309808
On mixing the driver and runtime APIs, it is quite possible that a
context already exists due to runtime API usage. In this case, Polly should
try to use the same context.
This patch teaches GPUJIT to detect that a context exists and how to
pick up this context.
Without this, calling `cudaMallocManaged`, for example, before a
polly-generated kernel launch causes P100 to *hang*.
This is a part of (https://reviews.llvm.org/D35991) that was extracted
out.
Differential Revision: https://reviews.llvm.org/D36162
llvm-svn: 309802