Commit Graph

943 Commits

Author SHA1 Message Date
Tobias Grosser aaabbbf886 GPGPU: Do not assume arrays start at 0
Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.

We also adjust the size of the array according to the offset at which the array
is actually accessed.

An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.

llvm-svn: 281611
2016-09-15 14:05:58 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Tobias Grosser a82c4b5df8 GPGPU: Allow region statements
llvm-svn: 281305
2016-09-13 08:42:10 +00:00
Tobias Grosser b79f4d3970 GPGPU: Extend types when array sizes have smaller types
This prevents a compiler crash.

llvm-svn: 281303
2016-09-13 08:02:14 +00:00
Tobias Grosser b51d507c74 Adapt test case to recent change in Global Variable Definition
llvm-svn: 281295
2016-09-13 05:19:26 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 5857b701a3 GPGPU: Bail out gracefully in case of invalid IR
Instead of aborting, we now bail out gracefully in case the kernel IR we
generate is invalid. This can currently happen in case the SCoP stores
pointer values, which we model as arrays, as data values into other arrays. In
this case, the original pointer value is not available on the device and can
consequently not be stored. As detecting this ahead of time is not so easy, we
detect these situations after the invalid IR has been generated and bail out.

llvm-svn: 281193
2016-09-12 06:06:31 +00:00
Tobias Grosser 0bf4cc6499 Add missing 'REQUIRES' line
llvm-svn: 281166
2016-09-11 13:42:42 +00:00
Tobias Grosser 02293ed755 GPGPU: Do not fail in case of arrays never accessed
If these arrays have never been accessed we failed to derive an upper bound
of the accesses and consequently a size for the outermost dimension. We
now explicitly check for empty access sets and then just use zero as size
for the outermost dimension.

llvm-svn: 281165
2016-09-11 13:30:12 +00:00
Michael Kruse 7886bd7ca5 Add -polly-flatten-schedule pass.
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.

To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:

    { Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
      Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
      Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
      Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
      Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }

And here the same lifetime for a semantically identical one-dimensional
schedule:

    { Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }

Differential Revision: https://reviews.llvm.org/D24310

llvm-svn: 280948
2016-09-08 15:02:36 +00:00
Michael Kruse 564579726a Add check-polly-tests build target.
The check-polly-tests target runs regression/unit tests but without checking
formatting. This is useful to not having to reload a file in an open editor
(which eg. clears the undo buffer, moves cursor/window position) when running
polly-update-format.

After this change, the following test targets exist:
 - check-polly-unittests to run unittests only
 - check-polly-tests to run unit and regression tests
 - polly-check-format to check formatting using clang-format
 - check-polly to run them all

As a side-effect, when running check-polly, polly-check-format and run in
parallel (instead of polly-check-format first).

Differential Revision: https://reviews.llvm.org/D24191

llvm-svn: 280654
2016-09-05 10:54:16 +00:00
Michael Kruse 2fa3519463 Allow mapping scalar MemoryAccesses to array elements.
Change the code around setNewAccessRelation to allow to use a an existing array
element for memory instead of an ad-hoc alloca. This facility will be used for
DeLICM/DeGVN to convert scalar dependencies into regular ones.

The changes necessary include:
- Make the code generator use the implicit locations instead of the alloca ones.
- A test case
- Make the JScop importer accept changes of scalar accesses for that test case.
- Adapt the MemoryAccess interface to the fact that the MemoryKind can change.
  They are named (get|is)OriginalXXX() to get the status of the memory access
  before any change by setNewAccessRelation() (some properties such as
  getIncoming() do not change even if the kind is changed and are still
  required). To get the modified properties, there is (get|is)LatestXXX(). The
  old accessors without Original|Latest become synonyms of the
  (get|is)OriginalXXX() to not make functional changes in unrelated code.

Differential Revision: https://reviews.llvm.org/D23962

llvm-svn: 280408
2016-09-01 19:53:31 +00:00
Michael Kruse d262feff80 Add space between access string and follow-up.
llvm-svn: 279826
2016-08-26 15:43:52 +00:00
Michael Kruse 6b6e38d9b1 Add "New access function" to update_check.py classifier.
Lines with this prefix are printed by JSONImporter.

llvm-svn: 279825
2016-08-26 15:43:43 +00:00
Roman Gareev 44aeef7ecf [FIX] Access dimensions should correspond to number of dimensions of the accesses array.
llvm-svn: 279821
2016-08-26 13:41:53 +00:00
Michael Kruse 05cf9c22f1 Introduce unittests.
Add the infrastructure for unittests to Polly and two simple tests for
conversion between isl_val and APInt. In addition, a build target
check-polly-unittests is added to run only the unittests but not the regression
tests.

Clang's unittest mechanism served as as a blueprint which then was adapted to
Polly.

Differential Revision: https://reviews.llvm.org/D23833

llvm-svn: 279734
2016-08-25 12:36:15 +00:00
Michael Kruse 0e63ab4243 Use configure_lit_site_cfg instead of configure_file.
configure_lit_site_cfg defines some more parameters that are used in
lit.site.cfg.in. configure_file would leave those empty. These additional
definitions seem to be unimportant for regression tests, but unittests do not
work without them.

In case of out-of-tree builds, define the additional parameters with default
values. These may not take all configuration parameters into account, as
configure_lit_site_cfg would.

llvm-svn: 279733
2016-08-25 12:03:33 +00:00
Michael Kruse 4a080de057 Add %loadPolly to test command line.
Required for out-of-tree builds of Polly.

llvm-svn: 279657
2016-08-24 19:12:48 +00:00
Roman Gareev 5f99f8656e Add a flag to dump SCoP optimized with the IslScheduleOptimizer pass
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23740

llvm-svn: 279395
2016-08-21 11:20:39 +00:00
Eli Friedman 28671c83d6 [SCEVValidator] Don't reorder multiplies in extractConstantFactor.
The existing code would add the operands in the wrong order, and eventually
crash because the SCEV expression doesn't exactly match the parameter SCEV
expression in SCEVAffinator::visit. (SCEV doesn't sort the operands to
getMulExpr in general.)

Differential Revision: https://reviews.llvm.org/D23592

llvm-svn: 279087
2016-08-18 16:30:42 +00:00
Tobias Grosser 1c18440958 [BlockGenerator] Invalidate SCEV values for instructions in scop
We already invalidated a couple of critical values earlier on, but we now
invalidate all instructions contained in a scop after the scop has been code
generated. This is necessary as later scops may otherwise obtain SCEV
expressions that reference values in the earlier scop that before dominated
the later scop, but which had been moved into the conditional branch and
consequently do not dominate the later scop any more. If these very values are
then used during code generation of the later scop, we generate used that are
dominated by the values they use.

This fixes: http://llvm.org/PR28984

llvm-svn: 279047
2016-08-18 10:45:57 +00:00
Tobias Grosser b143e31164 [ScopInfo] Make scalars used by PHIs in non-affine regions available
Normally this is ensured when adding PHI nodes, but as PHI node dependences
do not need to be added in case all incoming blocks are within the same
non-affine region, this was missed.

This corrects an issue visible in LNT's sqlite3, in case invariant load hoisting
was disabled.

llvm-svn: 278792
2016-08-16 11:44:48 +00:00
Tobias Grosser c80c15bd50 [ScopDetect] Do not assert in case of AddRecs with non-constant start expression
llvm-svn: 278738
2016-08-15 20:59:30 +00:00
Tobias Grosser 13e55a32fd [test] Force invariant load hoisting one last time
Without invariant load hoisting an (unrelated) bug is exposed in this test
case: http://llvm.org/PR28984

llvm-svn: 278680
2016-08-15 16:43:33 +00:00
Tobias Grosser 7cb809983d [tests] Force invariant load hoisting for test cases that need it -- III
llvm-svn: 278673
2016-08-15 15:56:24 +00:00
Tobias Grosser ad61c170d5 [tests] Force invariant load hoisting for test cases that need it II
llvm-svn: 278669
2016-08-15 13:58:16 +00:00
Tobias Grosser 75b9c7df4d [test] Correct spelling in test case
and explicitly enable invariant load hoisting for this test case.

llvm-svn: 278668
2016-08-15 13:58:04 +00:00
Tobias Grosser 6e6264c142 [tests] Force invariant load hoisting for test cases that need it
This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.

llvm-svn: 278667
2016-08-15 13:27:49 +00:00
Roman Gareev 1c892e91e3 Perform replacement of access relations and creation of new arrays according to the packing transformation
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D22187

llvm-svn: 278666
2016-08-15 12:22:54 +00:00
Tobias Grosser d58acf866a [GPGPU] Ensure arrays where only parts are modified are copied to GPU
To do so we change the way array exents are computed. Instead of the precise
set of memory locations accessed, we now compute the extent as the range between
minimal and maximal address in the first dimension and the full extent defined
by the sizes of the inner array dimensions.

We also move the computation of the may_persist region after the construction
of the arrays, as it relies on array information. Without arrays being
constructed no useful information is computed at all.

llvm-svn: 278212
2016-08-10 10:58:19 +00:00
Tobias Grosser b06ff4574e [GPGPU] Support PHI nodes used in GPU kernel
Ensure the right scalar allocations are used as the host location of data
transfers. For the device code, we clear the allocation cache before device
code generation to be able to generate new device-specific allocation and
we need to make sure to add back the old host allocations as soon as the
device code generation is finished.

llvm-svn: 278126
2016-08-09 15:35:06 +00:00
Tobias Grosser 750160e260 [GPGPU] Use separate basic block for GPU initialization code
This increases the readability of the IR and also clarifies that the GPU
inititialization is executed _after_ the scalar initialization which needs
to before the code of the transformed scop is executed.

Besides increased readability, the IR should not change. Specifically, I
do not expect any changes in program semantics due to this patch.

llvm-svn: 278125
2016-08-09 15:35:03 +00:00
Tobias Grosser 776700d0b7 [BlockGenerator] Insert initializations at beginning of start block
In case some code -- not guarded by control flow -- would be emitted directly in
the start block, it may happen that this code would use uninitalized scalar
values if the scalar initialization is only emitted at the end of the start
block. This is not a problem today in normal Polly, as all statements are
emitted in their own basic blocks, but Polly-ACC emits host-to-device copy
statements into the start block.

Additional Polly-ACC test coverage will be added in subsequent changes that
improve the handling of PHI nodes in Polly-ACC.

llvm-svn: 278124
2016-08-09 15:34:59 +00:00
Tobias Grosser 77f76788dc [tests] Add two missing 'REQUIRES' lines
llvm-svn: 278104
2016-08-09 09:11:39 +00:00
Tobias Grosser c59b3ce044 [BlockGenerator] Also eliminate dead code not originating from BB
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.

This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.

llvm-svn: 278103
2016-08-09 08:59:05 +00:00
Tobias Grosser cf66ef26f3 [GPGPU] Pass parameters always by using their own type
llvm-svn: 278100
2016-08-09 07:22:08 +00:00
Tobias Grosser 124534038a [GPGPU] Support Values referenced from both isl expr and llvm instructions
When adding code that avoids to pass values used in isl expressions and
LLVM instructions twice, we forgot to make single variable passed to the
kernel available in the ValueMap that makes it usable for instructions that
are not replaced with isl ast expressions. This change adds the variable
that is passed to the kernel to the ValueMap to ensure it is available
for such use cases as well.

llvm-svn: 278039
2016-08-08 19:22:19 +00:00
Tobias Grosser cb1aef8de4 [GPGPU] Create code to verify run-time conditions
llvm-svn: 278026
2016-08-08 17:35:55 +00:00
Tobias Grosser 928d7573dd GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.

llvm-svn: 277802
2016-08-05 08:27:24 +00:00
Tobias Grosser 470608e3e4 Add missing 'REQUIRES' line
llvm-svn: 277800
2016-08-05 07:08:45 +00:00
Tobias Grosser c1c6a2a61b GPGPU: Add cuda annotations to specify maximal number of threads per block
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.

llvm-svn: 277799
2016-08-05 06:47:43 +00:00
Tobias Grosser f919d8b360 GPGPU: Support scalars that are mapped to shared memory
llvm-svn: 277726
2016-08-04 13:57:29 +00:00
Tobias Grosser 130ca30f92 GPGPU: Add private memory support
llvm-svn: 277722
2016-08-04 12:39:03 +00:00
Tobias Grosser b513b4916b GPGPU: Add support for shared memory
llvm-svn: 277721
2016-08-04 12:18:14 +00:00
Tobias Grosser 00bb5a99f5 GPGPU: Handle scalar array references
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.

llvm-svn: 277699
2016-08-04 06:55:59 +00:00
Tobias Grosser 576932728d GPGPU: Pass subtree values correctly to the kernel
llvm-svn: 277697
2016-08-04 06:55:49 +00:00
Tobias Grosser 629109b633 GPGPU: Mark kernel functions as polly.skip
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.

llvm-svn: 277589
2016-08-03 12:00:07 +00:00
Roman Gareev 0c09a3af00 Add missing prefixes.
llvm-svn: 277264
2016-07-30 11:15:00 +00:00
Roman Gareev d7754a1245 Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D22828

llvm-svn: 277263
2016-07-30 09:25:51 +00:00
Tobias Grosser 8af38ecaa3 Add missing REQUIRES line
llvm-svn: 276964
2016-07-28 07:08:34 +00:00