Represent PHIs by their incoming values instead of an opaque value of
themselves. This allows ForwardOpTree to "look through" the PHIs and
forward the incoming values since forwardings PHIs is currently not
supported.
This is particularly useful to cope with PHIs inserted by GVN LoadPRE.
The incoming values all resolve to a load from a single array element
which then can be forwarded.
It should in theory also reduce spurious conflicts in value mapping
(DeLICM), but I have not yet found a profitable case yet, so it is
not included here.
To avoid transitive closure and potentially necessary overapproximations
of those, PHIs that may reference themselves are excluded from
normalization and keep their opaque self-representation.
Differential Revision: https://reviews.llvm.org/D39333
llvm-svn: 317008
Summary:
When GPUNodeBuilder creates loops inside the kernel, it dispatches to
IslNodeBuilder. This however is surprisingly dangerous, since it accesses the
AST Node's user through the wrong type. This patch fixes this problem by
overriding createFor correctly.
This fixes PR35010.
Reviewers: grosser, bollu, Meinersbur
Reviewed By: Meinersbur
Subscribers: Meinersbur, nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D39364
llvm-svn: 316872
The option splits BasicBlocks into minimal statements such that no
additional scalar dependencies are introduced.
The algorithm is based on a union-find structure, and unites sets if
putting them into separate statements would introduce a scalar
dependencies. As a consequence, instructions may be split into separate
statements such their relative order is different than the statements
they are in. This is accounted for instructions whose relative order
matters (e.g. memory accesses).
The algorithm is generic in that heuristic changes can be made
relatively easily. We might relax the order requirement for read-reads
or accesses to different base pointers. Forwardable instructions can be
made to not cause a join.
This implementation gives us a speed-up of 82% in SPEC 2006 456.hmmer
benchmark by allowing loop-distribution in a hot loop such that one of
the loops can be vectorized.
Differential Revision: https://reviews.llvm.org/D38403
llvm-svn: 314983
The option is introduced with only one possible value
-polly-stmt-granularity=bb which represents the current behaviour, which
is outlined into the new function buildSequentialBlockStmts().
More options will be added in future commits.
llvm-svn: 314900
Decouple handling of exit block PHIs and other MemoryAccesses. Exit PHIs
only need the PHI handling part of buildAccessFunctions but requires
code for skipping them in while creating other MemoryAcesses.
This change will make it easier to modify how statement MemoryAccesses
are created without considering the exit block special case.
llvm-svn: 314662
These functions print a multi-line and sorted representation of unions
of polyhedra. Each polyhedron (basic_{ast/map}) has its own line.
First sort key is the polyhedron's hierachical space structure.
Secondary sort key is the lower bound of the polyhedron, which should
ensure that the polyhedral are printed in approximately ascending order.
Example output of dumpPw():
[p_0, p_1, p_2] -> {
Stmt0[0] -> [0, 0];
Stmt0[i0] -> [i0, 0] : 0 < i0 <= 5 - p_2;
Stmt1[0] -> [0, 2] : p_1 = 1 and p_0 = -1;
Stmt2[0] -> [0, 1] : p_1 >= 3 + p_0;
Stmt3[0] -> [0, 3];
}
In contrast dumpExpanded() prints each point in the sets, unless there
is an unbounded dimension that cannot be expandend.
This is useful for reduced test cases where the loop counts are set to
some constant to understand a bug.
Example output of dumpExpanded(
{ [MemRef_A[i0] -> [i1]] : (exists (e0 = floor((1 + i1)/3): i0 = 1 and
3e0 <= i1 and 3e0 >= -1 + i1 and i1 >= 15 and i1 <= 25)) or (exists (e0
= floor((i1)/3): i0 = 0 and 3e0 < i1 and 3e0 >= -2 + i1 and i1 > 0 and
i1 <= 11)) }):
{
[MemRef_A[0] ->[1]];
[MemRef_A[0] ->[2]];
[MemRef_A[0] ->[4]];
[MemRef_A[0] ->[5]];
[MemRef_A[0] ->[7]];
[MemRef_A[0] ->[8]];
[MemRef_A[0] ->[10]];
[MemRef_A[0] ->[11]];
[MemRef_A[1] ->[15]];
[MemRef_A[1] ->[16]];
[MemRef_A[1] ->[18]];
[MemRef_A[1] ->[19]];
[MemRef_A[1] ->[21]];
[MemRef_A[1] ->[22]];
[MemRef_A[1] ->[24]];
[MemRef_A[1] ->[25]]
}
Differential Revision: https://reviews.llvm.org/D38349
llvm-svn: 314525
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
This is a recommit of r312663 after fixing
test/Isl/CodeGen/phi_after_error_block_outside_of_scop.ll
llvm-svn: 314075
Such RTCs may introduce integer wrapping intrinsics with more than 64 bit,
which are translated to library calls on AOSP that are not part of the
runtime and will consequently cause linker errors.
Thanks to Eli Friedman for reporting this issue and reducing the test case.
llvm-svn: 314065
Before this patch, ScopInfo::getValueDef(SAI) used
getStmtFor(Instruction*) to find the MemoryAccess that writes a
MemoryKind::Value. In cases where the value is synthesizable within the
statement that defines, the instruction is not added to the statement's
instruction list, which means getStmtFor() won't return anything.
If the synthesiable instruction is not synthesiable in a different
statement (due to being defined in a loop that and ScalarEvolution
cannot derive its escape value), we still need a MemoryKind::Value
and a write to it that makes it available in the other statements.
Introduce a separate map for this purpose.
This fixes MultiSource/Benchmarks/MallocBench/cfrac where
-polly-simplify could not find the writing MemoryAccess for a use. The
write was not marked as required and consequently was removed.
Because this could in principle happen as well for PHI scalars,
add such a map for PHI reads as well.
llvm-svn: 313881
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.
When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.
Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.
This fixes llvm.org/PR34441
llvm-svn: 313842
Computing the reaching definition in forwardTree() can take a long time
if the coefficients are large. When the forwarding is
carried-out (doIt==true), forwardTree() must execute entirely or not at
all to get a consistent output, which means we cannot just allow
out-of-quota errors to happen in the middle of the processing.
We introduce the class IslQuotaScope which allows to opt-in code that is
conformant and has been tested with out-of-quota events. In case of
ForwardOpTree, out-of-quota is allowed during the operand tree
examination, but not during the transformation. The same forwardTree()
recursion is used for examination and execution, meaning that the
reaching definition has already been computed in the examination tree
walk and cached for reuse in the transformation tree walk.
This should fix the time-out of grtestutils.ll of the asop buildbot. If
the compilation still takes too long, we can reduce the max-operations
allows for -polly-optree.
Differential Revision: https://reviews.llvm.org/D37984
llvm-svn: 313690
This reverts commit
r312410 - [ScopDetect/Info] Look through PHIs that follow an error block
The commit caused generation of invalid IR due to accessing a parameter
that does not dominate the SCoP.
llvm-svn: 312663
Up to now ZoneAlgo considered array elements access by something else
than a LoadInst or StoreInst as not analyzable. This patch removes that
restriction by using the unknown ValInst to describe the written
content, repectively the element type's null value in case of memset.
Differential Revision: https://reviews.llvm.org/D37362
llvm-svn: 312630
Since r312249 instructions of a entry block of region statements are
not marked as root anymore and hence can theoretically be removed
if unused. Theoretically, because the instruction list was not changed.
Still, MemoryAccesses for unused instructions were removed. This lead
to a failed assertion in the code generator when the MemoryAccess for
the still listed instruction was not found.
This hould fix the
Assertion failed: ArrayAccess && "No array access found for instruction!",
file ScopInfo.h, line 1494
compiler crashes.
llvm-svn: 312566
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
llvm-svn: 312410
In Polly, we specifically add a paramter to represent the outermost dimension
size of fortran arrays. We do this because this information is statically
available from the fortran metadata generated by dragonegg.
However, we were only materializing these parameters (meaning, creating an
llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
we should materialize parameters from *scop array info*.
It is wrong because if there is a case where we detect 2 fortran arrays,
but only one of them is accessed, we may not materialize the other array's
dimensions at all.
This is incorrect. We fix this by looping over all
`polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.
Differential Revision: https://reviews.llvm.org/D37379
llvm-svn: 312350
Summary:
After region statements now also have instruction lists, this is a
straightforward extension.
Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman
Reviewed By: Meinersbur
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37298
llvm-svn: 312249
The adds code generation support for the previous commit.
This patch has been re-applied, after the memory issue in the previous patch
has been fixed.
llvm-svn: 312211
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.
We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.
This change set is reapplied, after a memory corruption issue had been fixed.
llvm-svn: 312210
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.
We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.
llvm-svn: 312128
Reduction detection is only executed in the SCoP building phase.
Hence it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312118
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312117
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312116
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
This mostly mechanical change makes ScopBuilder directly access some of
ScopStmt/MemoryAccess private fields. We add ScopBuilder as a friend
class and will add proper accessor functions sometime later.
llvm-svn: 312115
This patch allows annotating of metadata in ir instruction
(with "polly_split_after"), which specifies where to split a particular
scop statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36402
llvm-svn: 312107
ZoneAlgo used to bail out for the complete SCoP if it encountered
something violating its assumption. This meant the neither OpTree can
forward any load nor DeLICM do anything in such cases, even if their
transformations are unrelated to the violations.
This patch adds a list of compatible elements (currently with the
granularity of entire arrays) that can be used for analysis. OpTree
and DeLICM can then check whether their transformations only concern
compatible elements, and skip non-compatible ones.
This will be useful for e.g. Polybench's benchmarks covariance,
correlation, bicg, doitgen, durbin, gramschmidt, adi that have
assumption violation, but which are not necessarily relevant
for all transformations.
Differential Revision: https://reviews.llvm.org/D37219
llvm-svn: 311929
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.
This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)
JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.
Differential Revision: https://reviews.llvm.org/D37010
llvm-svn: 311888
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
for scalar false dependencies
- Number of parallel loops
These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.
Differential Revision: https://reviews.llvm.org/D37049
llvm-svn: 311553
Summary:
ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore.
The test failure was observed on these two polly buildbots:
http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310
This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert.
Reviewers: grosser, sanjoy, bollu
Reviewed By: grosser
Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36776
llvm-svn: 311503
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36928
llvm-svn: 311473
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.
Differential Revision: https://reviews.llvm.org/D36934
llvm-svn: 311328
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127