Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23740
llvm-svn: 279395
getAccessFunctions() is dead code and the 'BB' argument
of getOrCreateAccessFunctions() is not used. This patch deletes
getAccessFunctions and transforms AccFuncMap into
a std::vector<std::unique_ptr<MemoryAccess>> AccessFunctions.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23759
llvm-svn: 279394
The existing code would add the operands in the wrong order, and eventually
crash because the SCEV expression doesn't exactly match the parameter SCEV
expression in SCEVAffinator::visit. (SCEV doesn't sort the operands to
getMulExpr in general.)
Differential Revision: https://reviews.llvm.org/D23592
llvm-svn: 279087
We already invalidated a couple of critical values earlier on, but we now
invalidate all instructions contained in a scop after the scop has been code
generated. This is necessary as later scops may otherwise obtain SCEV
expressions that reference values in the earlier scop that before dominated
the later scop, but which had been moved into the conditional branch and
consequently do not dominate the later scop any more. If these very values are
then used during code generation of the later scop, we generate used that are
dominated by the values they use.
This fixes: http://llvm.org/PR28984
llvm-svn: 279047
Normally this is ensured when adding PHI nodes, but as PHI node dependences
do not need to be added in case all incoming blocks are within the same
non-affine region, this was missed.
This corrects an issue visible in LNT's sqlite3, in case invariant load hoisting
was disabled.
llvm-svn: 278792
With invariant load hoisting enabled the LLVM buildbots currently show some
miscompiles, which are possibly caused by invariant load hosting itself.
Confirming and fixing this requires a more in-depth analysis. To meanwhile get
back green buildbots that allow us to observe other regressions, we disable
invariant code hoisting temporarily. The relevant bug is tracked at:
http://llvm.org/PR28985
llvm-svn: 278681
This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.
llvm-svn: 278667
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D22187
llvm-svn: 278666
To do so we change the way array exents are computed. Instead of the precise
set of memory locations accessed, we now compute the extent as the range between
minimal and maximal address in the first dimension and the full extent defined
by the sizes of the inner array dimensions.
We also move the computation of the may_persist region after the construction
of the arrays, as it relies on array information. Without arrays being
constructed no useful information is computed at all.
llvm-svn: 278212
Ensure the right scalar allocations are used as the host location of data
transfers. For the device code, we clear the allocation cache before device
code generation to be able to generate new device-specific allocation and
we need to make sure to add back the old host allocations as soon as the
device code generation is finished.
llvm-svn: 278126
This increases the readability of the IR and also clarifies that the GPU
inititialization is executed _after_ the scalar initialization which needs
to before the code of the transformed scop is executed.
Besides increased readability, the IR should not change. Specifically, I
do not expect any changes in program semantics due to this patch.
llvm-svn: 278125
In case some code -- not guarded by control flow -- would be emitted directly in
the start block, it may happen that this code would use uninitalized scalar
values if the scalar initialization is only emitted at the end of the start
block. This is not a problem today in normal Polly, as all statements are
emitted in their own basic blocks, but Polly-ACC emits host-to-device copy
statements into the start block.
Additional Polly-ACC test coverage will be added in subsequent changes that
improve the handling of PHI nodes in Polly-ACC.
llvm-svn: 278124
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.
This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.
llvm-svn: 278103
The function expandRegion() frees Region* objects again when it determines that
these are not valid SCoPs. However, the DetectionContext added to the
DetectionContextMap still holds a reference. The validity is checked using the
ValidRegions lookup table. When a new Region is added to that list, it might
share the same address, such that the DetectionContext contains two
Region* associations that are in ValidRegions, but that are unrelated and of
which one has already been free.
Also remove the DetectionContext when not a valid expansion.
llvm-svn: 278062
When adding code that avoids to pass values used in isl expressions and
LLVM instructions twice, we forgot to make single variable passed to the
kernel available in the ValueMap that makes it usable for instructions that
are not replaced with isl ast expressions. This change adds the variable
that is passed to the kernel to the ValueMap to ensure it is available
for such use cases as well.
llvm-svn: 278039
There is no need to reset the position of the builder, as we can just continue
to insert code at the current position of the IRBuilder, which happens to
be precisely the location we reset the builder to.
llvm-svn: 278014
... instead of adding instructions at the end of the basic block the builder
is currently at. This makes it easier to reason about where IR is generated,
as with the IRBuilder there is just a single location that specificies where
IR is generated.
llvm-svn: 278013
The map is iterated over when generating the values escaping the SCoP. The
indeterministic iteration order of DenseMap causes the output IR to change at
every compilation, adding noise to comparisons.
Replace DenseMap by a MapVector to ensure the same iteration order at every
compilation.
llvm-svn: 277832
When entering the dependence computation and the max_operations is set, the
operations counter may have already exceeded the counter, thus aborting any ISL
computation from the start. The counter is reset at the end of the dependence
calculation such that a follow-up recomputation might succeed, ie. the success
of the first dependence calculation depends on unrelated ISL operations that
happened before, giving it a disadvantage to the following calculations.
This patch resets the operations counter at the beginning of the dependence
recalculation to not depend on previous actions. Otherwise additional
preprocessing of the Scop that aims to improve its schedulability (eg. DeLICM)
do have the effect that DependenceInfo and hence the scheduling fail more
likely, contraproductive to the goal of said preprocessing.
llvm-svn: 277810
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.
llvm-svn: 277802
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.
llvm-svn: 277799
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.
llvm-svn: 277699
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.
llvm-svn: 277589
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D22828
llvm-svn: 277263