The check-polly-tests target runs regression/unit tests but without checking
formatting. This is useful to not having to reload a file in an open editor
(which eg. clears the undo buffer, moves cursor/window position) when running
polly-update-format.
After this change, the following test targets exist:
- check-polly-unittests to run unittests only
- check-polly-tests to run unit and regression tests
- polly-check-format to check formatting using clang-format
- check-polly to run them all
As a side-effect, when running check-polly, polly-check-format and run in
parallel (instead of polly-check-format first).
Differential Revision: https://reviews.llvm.org/D24191
llvm-svn: 280654
Change the code around setNewAccessRelation to allow to use a an existing array
element for memory instead of an ad-hoc alloca. This facility will be used for
DeLICM/DeGVN to convert scalar dependencies into regular ones.
The changes necessary include:
- Make the code generator use the implicit locations instead of the alloca ones.
- A test case
- Make the JScop importer accept changes of scalar accesses for that test case.
- Adapt the MemoryAccess interface to the fact that the MemoryKind can change.
They are named (get|is)OriginalXXX() to get the status of the memory access
before any change by setNewAccessRelation() (some properties such as
getIncoming() do not change even if the kind is changed and are still
required). To get the modified properties, there is (get|is)LatestXXX(). The
old accessors without Original|Latest become synonyms of the
(get|is)OriginalXXX() to not make functional changes in unrelated code.
Differential Revision: https://reviews.llvm.org/D23962
llvm-svn: 280408
Add the infrastructure for unittests to Polly and two simple tests for
conversion between isl_val and APInt. In addition, a build target
check-polly-unittests is added to run only the unittests but not the regression
tests.
Clang's unittest mechanism served as as a blueprint which then was adapted to
Polly.
Differential Revision: https://reviews.llvm.org/D23833
llvm-svn: 279734
configure_lit_site_cfg defines some more parameters that are used in
lit.site.cfg.in. configure_file would leave those empty. These additional
definitions seem to be unimportant for regression tests, but unittests do not
work without them.
In case of out-of-tree builds, define the additional parameters with default
values. These may not take all configuration parameters into account, as
configure_lit_site_cfg would.
llvm-svn: 279733
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23740
llvm-svn: 279395
The existing code would add the operands in the wrong order, and eventually
crash because the SCEV expression doesn't exactly match the parameter SCEV
expression in SCEVAffinator::visit. (SCEV doesn't sort the operands to
getMulExpr in general.)
Differential Revision: https://reviews.llvm.org/D23592
llvm-svn: 279087
We already invalidated a couple of critical values earlier on, but we now
invalidate all instructions contained in a scop after the scop has been code
generated. This is necessary as later scops may otherwise obtain SCEV
expressions that reference values in the earlier scop that before dominated
the later scop, but which had been moved into the conditional branch and
consequently do not dominate the later scop any more. If these very values are
then used during code generation of the later scop, we generate used that are
dominated by the values they use.
This fixes: http://llvm.org/PR28984
llvm-svn: 279047
Normally this is ensured when adding PHI nodes, but as PHI node dependences
do not need to be added in case all incoming blocks are within the same
non-affine region, this was missed.
This corrects an issue visible in LNT's sqlite3, in case invariant load hoisting
was disabled.
llvm-svn: 278792
This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.
llvm-svn: 278667
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D22187
llvm-svn: 278666
To do so we change the way array exents are computed. Instead of the precise
set of memory locations accessed, we now compute the extent as the range between
minimal and maximal address in the first dimension and the full extent defined
by the sizes of the inner array dimensions.
We also move the computation of the may_persist region after the construction
of the arrays, as it relies on array information. Without arrays being
constructed no useful information is computed at all.
llvm-svn: 278212
Ensure the right scalar allocations are used as the host location of data
transfers. For the device code, we clear the allocation cache before device
code generation to be able to generate new device-specific allocation and
we need to make sure to add back the old host allocations as soon as the
device code generation is finished.
llvm-svn: 278126
This increases the readability of the IR and also clarifies that the GPU
inititialization is executed _after_ the scalar initialization which needs
to before the code of the transformed scop is executed.
Besides increased readability, the IR should not change. Specifically, I
do not expect any changes in program semantics due to this patch.
llvm-svn: 278125
In case some code -- not guarded by control flow -- would be emitted directly in
the start block, it may happen that this code would use uninitalized scalar
values if the scalar initialization is only emitted at the end of the start
block. This is not a problem today in normal Polly, as all statements are
emitted in their own basic blocks, but Polly-ACC emits host-to-device copy
statements into the start block.
Additional Polly-ACC test coverage will be added in subsequent changes that
improve the handling of PHI nodes in Polly-ACC.
llvm-svn: 278124
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.
This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.
llvm-svn: 278103
When adding code that avoids to pass values used in isl expressions and
LLVM instructions twice, we forgot to make single variable passed to the
kernel available in the ValueMap that makes it usable for instructions that
are not replaced with isl ast expressions. This change adds the variable
that is passed to the kernel to the ValueMap to ensure it is available
for such use cases as well.
llvm-svn: 278039
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.
llvm-svn: 277802
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.
llvm-svn: 277799
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.
llvm-svn: 277699
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.
llvm-svn: 277589
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D22828
llvm-svn: 277263
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.
llvm-svn: 276961
Adding a new pass PolyhedralInfo. This pass will be the interface to Polly.
Initially, we will provide the following interface:
- #IsParallel(Loop *L) - return a bool depending on whether the loop is
parallel or not for the given program order.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D21486
llvm-svn: 276637
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.
llvm-svn: 276636
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.
llvm-svn: 276635
Do not process SCoPs with infeasible runtime context in the new
ScopInfoWrapperPass. Do not compute dependences for such SCoPs in the new
DependenceInfoWrapperPass.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D22402
llvm-svn: 276631
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21491
llvm-svn: 276627
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.
llvm-svn: 276623
Run the NVPTX backend over the GPUModule IR and write the resulting assembly
code in a string.
To work correctly, it is important to invalidate analysis results that still
reference the IR in the kernel module. Hence, this change clears all references
to dominators, loop info, and scalar evolution.
Finally, the NVPTX backend has troubles to generate code for various special
floating point types (not surprising), but also for uncommon integer types. This
commit does not resolve these issues, but pulls out problematic test cases into
separate files to XFAIL them individually and resolve them in future (not
immediate) changes one by one.
llvm-svn: 276396
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.
For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.
llvm-svn: 276270
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.
llvm-svn: 276263
It seems the order in which we generated memory accesses changed such that
the import of these updated memory accesses failed for the 'loop3' statement
in this test case. Unfortunately, the existing CHECK lines were not strict
enough to catch this. Hence, besides fixing the order of the memory access
lines we also ensure that the memory access changes are both clearly visibly
and well checked.
llvm-svn: 276247
This is currently not supported and will only be added later. Also update the
test cases to ensure no invariant code hoisting is applied.
llvm-svn: 275987
We use this opportunity to further classify the different user statements that
can arise and add TODOs for the ones not yet implemented.
llvm-svn: 275957
Create for each kernel a separate LLVM-IR module containing a single function
marked as kernel function and taking one pointer for each array referenced
by this kernel. Add debugging output to verify the kernels are generated
correctly.
llvm-svn: 275952
Initialize the list of references to a GPU array to ensure that the arrays that
need to be passed to kernel calls are computed correctly. Furthermore, the very
same information is also necessary to compute synchronization correctly. As the
functionality to compute these references is already available, what is left for
us to do is only to connect the necessary functionality to compute array
reference information.
llvm-svn: 275798
Create LLVM-IR for all host-side control flow of a given GPU AST. We implement
this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The
IslNodeBuilder will take care of generating all general-purpose ast nodes, but
we provide our own createUser implementation to handle the different GPU
specific user statements. For now, we just skip any user statement and only
generate a host-code sceleton, but in subsequent commits we will add handling of
normal ScopStmt's performing computations, kernel calls, as well as host-device
data transfers. We will also introduce run-time check generation and LICM in
subsequent commits.
llvm-svn: 275783
Otherwise ppcg would try to call into pet functionality that this not available,
which obviously will cause trouble. As we can easily print these statements
ourselves, we just do so.
llvm-svn: 275579
This option increases the scalability of the scheduler and allows us to remove
the 'gisting' workaround we introduced in r275565 to handle a more complicated
test case. Another benefit of using this option is also that the generated
code looks a lot more streamlined.
Thanks to Sven Verdoolaege for reminding me of this option.
llvm-svn: 275573
This works around a shortcoming of the isl scheduler, which even for some
smaller test cases does not terminate in case domain constraints are part
of the flow dependences.
llvm-svn: 275565
Arrays with integer base type are similar to arrays with floating point types,
with the exception that LLVM's integer types can take some odd values. We
add a selection of different values to make sure we correctly round these
types when necessary.
References to scalar integer types are special, as we currently do not model
these types as array accesses as they are considered 'synthesizable' by Polly.
As a result, we do not generate explicit data-transfers for them, but instead
will need to keep track of all references to 'synthesizable' values separately.
At the current stage, this is only visible by missing host-to-device
data-transfer calls. In the future, we will also require special code generation
strategies.
llvm-svn: 275551
We currently only test that the code structure we generate for these scalar
parameters is correct and we add these types to make sure later code generation
additions have sufficient test coverage.
In case some of these types cannot be mapped due to missing hardware support
on the GPU some of these test cases may need to be updated later on.
llvm-svn: 275548
A sequence of CHECK lines allows additional statements to appear in the
output of the tested program without any test failures appearing. As we do
not want this to happen, switch this test case to use CHECK-NEXT.
llvm-svn: 275534
For this we need to provide an explicit list of statements as they occur in
the polly::Scop to ppcg.
We also setup basic AST printing facilities to facilitate debugging. To allow
code reuse some (minor) changes in ppcg are have been necessary.
llvm-svn: 275436
The tile size was previously uninitialized. As a result, it was often zero (aka.
no tiling), which is not what we want in general. More importantly, there was
the risk for arbitrary tile sizes to be choosen, which we did not observe, but
which still is highly problematic.
llvm-svn: 275418
This change now applies ppcg's GPU mapping on our initial schedule. For this
to work, we need to also initialize the set of all names (isl_ids) used in
the scop as well as the program context.
llvm-svn: 275396
To do so we copy the necessary information to compute an initial schedule from
polly::Scop to ppcg's scop. Most of the necessary information is directly
available and only needs to be passed on to ppcg, with the exception of 'tagged'
access relations, access relations that additionally carry information about
which memory access an access relation originates from.
We could possibly perform the construction of tagged accesses as part of
ScopInfo, but as this format is currently specific to ppcg we do not do this
yet, but keep this functionality local to our GPU code generation.
After the scop has been initialized, we compute data dependences and ask ppcg to
compute an initial schedule. Some of this functionality is already available in
polly::DependenceInfo and polly::ScheduleOptimizer, but to keep differences
to ppcg small we use ppcg's functionality here. We may later investiage if
a closer integration of these tools makes sense.
llvm-svn: 275390
At this stage, we do not yet modify the IR but just generate a default
initialized ppcg_scop and gpu_prog and free both immediately. Both will later be
filled with data from the polly::Scop and are needed to use PPCG for GPU
schedule generation. This commit does not yet perform any GPU code generation,
but ensures that the basic infrastructure has been put in place.
We also add a simple test case to ensure the new code is run and use this
opportunity to verify that GPU_CODEGEN tests are only run if GPU code generation
has been enabled in cmake.
llvm-svn: 275389
Check not only that the compiler is not crashing, but also whether the
probablematic part (The sequence of instructions simplified to '4') is reflected
in the output.
Thanks to Tobias for the hint.
llvm-svn: 275189
An assertion in visitSDivInstruction() checked whether the divisor is constant
by checking whether the argument is a ConstantInt. However, SCEVValidator allows
the divisor to be simplified to a constant by ScalarEvolution.
We synchronize the implementation of SCEVValidator and SCEVAffinator to both
accept simplified SCEV expressions.
llvm-svn: 275174
For llvm the memory accesses from nonaffine loops should be visible,
however for polly those nonaffine loops should be invisible/boxed.
This fixes llvm.org/PR28245
Cointributed-by: Huihui Zhang <huihuiz@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D21591
llvm-svn: 274842
Reject and report regions that contains loops overlapping nonaffine region.
This situation typically happens in the presence of inifinite loops.
This addresses bug llvm.org/PR28071.
Differential Revision: http://reviews.llvm.org/D21312
Contributed-by: Huihui Zhang <huihuiz@codeaurora.org>
llvm-svn: 273905
This patch addresses:
- A new function pass to compute polyhedral dependences. This is
required to avoid the region pass manager.
- Stores a map of Scop to Dependence object for all the scops present
in a function. By default, access wise dependences are stored.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: http://reviews.llvm.org/D21105
llvm-svn: 273881
This patch adds a new function pass ScopInfoWrapperPass so that the
polyhedral description of a region, the SCoP, can be constructed and
used in a function pass.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: http://reviews.llvm.org/D20962
llvm-svn: 273856
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21140
llvm-svn: 273397
With this update the isl AST generation extracts disjunctive constraints early
on. As a result, code that previously resulted in two branches with (close-to)
identical code within them:
if (P <= -1) {
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
} else if (P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
results now in only a single branch body:
if (P <= -1 || P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
This resolves http://llvm.org/PR27559
Besides the above change, this isl update brings better simplification of
sets/maps containing existentially quantified dimensions and fixes a bug in
isl's coalescing.
llvm-svn: 272500
As these test cases will be changed in a subsequent commit, we expand and
tighten them to make the subsequent changes to them more obvious. As part of
this we add more context to some test cases and add CHECK-NEXT lines to ensure
no intermediate lines are missed by accident.
llvm-svn: 272499
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
such. In order to generate them properly as parameters we had to improve the
ScopExpander, though the change is the first in the direction of a more
aggressive scalar synthetization.
This patch was originally contributed by Johannes Doerfert in r271888, but was
in conflict with the revert in r272483. This is a recommit with some minor
adjustment to the test cases to take care of differing instruction names.
llvm-svn: 272485
The recent expression type changes still need more discussion, which will happen
on phabricator or on the mailing list. The precise list of commits reverted are:
- "Refactor division generation code"
- "[NFC] Generate runtime checks after the SCoP"
- "[FIX] Determine insertion point during SCEV expansion"
- "Look through IntToPtr & PtrToInt instructions"
- "Use minimal types for generated expressions"
- "Temporarily promote values to i64 again"
- "[NFC] Avoid unnecessary comparison for min/max expressions"
- "[Polly] Fix -Wunused-variable warnings (NFC)"
- "[NFC] Simplify min/max expression generation"
- "Simplify the type adjustment in the IslExprBuilder"
Some of them are just reverted as we would otherwise get conflicts. I will try
to re-commit them if possible.
llvm-svn: 272483
This patch refactors the code generation for divisions. This allows to
always generate a shift for a power-of-two division and to utilize
information about constant divisors in order to truncate the result
type.
llvm-svn: 271898
We now generate runtime checks __after__ the SCoP code generation and
not before, though they are still inserted at the same position int
the code. This allows to modify the runtime check during SCoP code
generation.
llvm-svn: 271894