Commit Graph

708 Commits

Author SHA1 Message Date
Michael Kruse 0446d81e2d [Simplify] Add -polly-simplify pass.
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.

It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.

It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.

Differential Revision: https://reviews.llvm.org/D30820

llvm-svn: 297473
2017-03-10 16:05:24 +00:00
Tobias Grosser 24222c7357 Fix namespaces after clang-format update
llvm-svn: 296635
2017-03-01 15:54:27 +00:00
Roman Gareev bc3fbe49c5 Disable the parallel code generation in case of extension nodes
We can not perform the dependence analysis and, consequently, the parallel
code generation in case the schedule tree contains extension nodes.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30394

llvm-svn: 296325
2017-02-27 08:03:11 +00:00
Michael Kruse 52ab4943b4 Remove all references to PostDominators. NFC.
Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause the the Polly pass chain to be interrupted. It is not used by any
Polly pass anymore, hence we can remove all references to it.

llvm-svn: 295983
2017-02-23 15:16:22 +00:00
Tobias Grosser 583be06fb2 [BlockGenerator] Use MemoryAccess::getAccessValue to get load instruction
When generating code in the BlockGenerator we copy all (interesting)
instructions and keep track of the new values in a basic block map. To obtain
the original llvm::Value that belongs to a load memory access, we use
getAccessValue() instead of getOriginalBaseAddr(). The former always references
the instruction we use to load values from. The latter, on the other hand,
is obtaine from the corresponding ScopArrayInfo and would not be unique in
case ScopArrayInfo objects at some point allow memory accesses with different
base addresses.

This change is an update on r294566, which only clarified that we need the
original memory access, but where we still remained dependent to have one
base pointer per scop.

This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294669
2017-02-09 23:54:23 +00:00
Tobias Grosser 4553463be4 [IRBuilder] Extract base pointers directly from ScopArray
Instead of iterating over statements and their memory accesses to extract the
set of available base pointers, just directly iterate over all ScopArray
objects. This reflects more the actual intend of the code: collect all arrays
(and their base pointers) to emit alias information that specifies that accesses
to different arrays cannot alias.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294574
2017-02-09 09:34:42 +00:00
Tobias Grosser 26fb7d7517 [IslAst] Print the ScopArray name to mark reductions
Before this change we used the name of the base pointer to mark reductions. This
is imprecise as the canonical reference is the ScopArray itself and not the
basepointer of a reduction. Using the base pointer of reductions is problematic
in cases where a single ScopArray is referenced through two different base
pointers.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294568
2017-02-09 08:06:15 +00:00
Tobias Grosser 02400a0e0c [BlockGenerator] BBMap uses original BaseAddress for scalar loads [NFC]
When regenerating code in the BlockGenerator we copy instructions that may
references scalar values, for which the new value of a given scalar is looked up
in BBMap using the original scalar llvm::Value as index. It is consequently
necessary that (re)loaded scalar values are made available in BBMap using the
original llvm::Value as key independently if the llvm::Value was (re)loaded from
the original scalar or a new access function has been specified that caused the
value to be reloaded from an array with a differnet base address. We make this
clear by using MemoryAccess::getOriginalBaseAddr() instead of
MemoryAccess::getBaseAddr() as index to BBMap.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294566
2017-02-09 08:05:50 +00:00
Tobias Grosser ff40087a6a Update to recent formatting changes
llvm-svn: 293756
2017-02-01 10:12:09 +00:00
Tobias Grosser 682c51143d [BlockGenerator] Comment corretions for r293374 [NFC]
This addresses some additional comments from Michael Kruse for commit r293374
as expressed in https://reviews.llvm.org/D28901.

llvm-svn: 293378
2017-01-28 11:39:02 +00:00
Tobias Grosser 587f1f57ad [Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap
Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single map
from ScopArrayInfo to the corresponding Alloca. This change is intended, both as
a general simplification and cleanup, but also to reduce our use of
MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure
we have only a single place where the array (and its base pointer) for which we
generate code for is specified, which means we can more easily introduce new
access functions that use a different ScopArrayInfo as base. We already today
experiment with modifiable access functions, so this change does not address
a specific bug, but it just reduces the scope one needs to reason about.

Another motivation for this patch is https://reviews.llvm.org/D28518, where
memory accesses with different base pointers could possibly be mapped to a
single ScopArrayInfo object. Such a mapping is currently not possible, as we
currently generate alloca instructions according to the base addresses of the
memory accesses, not according to the ScopArrayInfo object they belong to.  By
making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo
object will automatically mean that the same stack slot is used for these
arrays. For D28518 this is not a problem, as only MemoryType::Array objects are
mapping, but resolving this inconsistency will hopefully avoid confusion.

llvm-svn: 293374
2017-01-28 07:42:10 +00:00
Tobias Grosser 75dfaa1dbe BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts
Before this change we created an additional reload in the copy of the incoming
block of a PHI node to reload the incoming value, even though the necessary
value has already been made available by the normally generated scalar loads.
In this change, we drop the code that generates this redundant reload and
instead just reuse the scalar value already available.

Besides making the generated code slightly cleaner, this change also makes sure
that scalar loads go through the normal logic, which means they can be remapped
(e.g. to array slots) and corresponding code is generated to load from the
remapped location. Without this change, the original scalar load at the
beginning of the non-affine region would have been remapped, but the redundant
scalar load would continue to load from the old PHI slot location.

It might be possible to further simplify the code in addOperandToPHI,
but this would not only mean to pull out getNewValue, but to also change the
insertion point update logic. As this did not work when trying it the first
time, this change is likely not trivial. To not introduce bugs last minute, we
postpone further simplications to a subsequent commit.

We also document the current behavior a little bit better.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28892

llvm-svn: 292486
2017-01-19 14:12:45 +00:00
Tobias Grosser 943c369c60 BlockGenerator: remove obfuscating const and const casts
Making certain values 'const' to just cast it away a little later mainly
obfuscates the code. Hence, we just drop the 'const' parts.

Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 292480
2017-01-19 13:25:52 +00:00
Tobias Grosser 97b8490982 Use range-based for loop [NFC]
llvm-svn: 292471
2017-01-19 05:09:23 +00:00
Tobias Grosser e1ff0cf2eb Relax assert when setting access functions with invariant base pointers
Summary:
Instead of forbidding such access functions completely, we verify that their
base pointer has been hoisted and only assert in case the base pointer was
not hoisted.

I was trying for a little while to get a test case that ensures the assert is
correctly fired in case of invariant load hoisting being disabled, but I could
not find a good way to do so, as llvm-lit immediately aborts if a command
yields a non-zero return value. As we do not generally test our asserts,
not having a test case here seems OK.

This resolves http://llvm.org/PR31494

Suggested-by: Michael Kruse <llvm@meinersbur.de>

Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28798

llvm-svn: 292213
2017-01-17 12:00:42 +00:00
Tobias Grosser 21a059af09 Adjust formatting to commit r292110 [NFC]
llvm-svn: 292123
2017-01-16 14:08:10 +00:00
Tobias Grosser 4d5a917287 Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo
To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960, we make MemoryKind a typed enum.
This change also allows us to drop the 'MK_' prefix and to instead use the more
descriptive full name of the enum as prefix. To reduce the amount of typing
needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a
global scope, which means the ScopArrayInfo:: prefix is not needed. This move
also makes historically sense. In the beginning of Polly we had different
MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later
canonicalized to one. During this canonicalization we just choose the enum in
ScopArrayInfo, but did not consider to move this shared enum to global scope.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 292030
2017-01-14 20:25:44 +00:00
Tobias Grosser e29db2173b Update to recent clang-format changes
llvm-svn: 291810
2017-01-12 21:05:19 +00:00
Roman Gareev bd5c6039c6 Align newly created arrays to the first level cache line boundary
Aligning data to cache lines boundaries helps to avoid overheads related to
an access to it ([1]). This patch aligns newly created arrays and adds an
option to specify the first level cache line size. By default we use 64 bytes,
which is a typical cache-line size ([2]).

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak).

Refs.:

[1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
[2] - http://igoro.com/archive/gallery-of-processor-cache-effects/

Differential Revision: https://reviews.llvm.org/D28020

Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 290253
2016-12-21 12:37:36 +00:00
Tobias Grosser b6945e3301 Fix clang-format
llvm-svn: 290103
2016-12-19 14:06:40 +00:00
Tobias Grosser dc6b87c56e Add newline at end of debug print
In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed
without a newline at the end of each diagnostic. We add such a newline to
improve readability.

llvm-svn: 288323
2016-12-01 08:08:47 +00:00
Michael Kruse 11c5e07925 canSynthesize: Remove unused argument LI. NFC.
The helper function polly::canSynthesize() does not directly use the LoopInfo
analysis, hence remove it from its argument list.

llvm-svn: 288144
2016-11-29 15:11:04 +00:00
Tobias Grosser df8f35b7b8 Update for clang-format change in r288119
llvm-svn: 288134
2016-11-29 12:52:08 +00:00
Tobias Grosser b3c3d149b9 [CodeGen] Add flag to code-generate most memory access expressions
Introduce the new flag -polly-codegen-generate-expressions which forces Polly
to code generate AST expressions instead of using our SCEV based access
expression generation even for cases where the original memory access relation
was not changed and the SCEV based access expression could be code generated
without any issue.

This is an experimental option for better testing the isl ast expression
generation. The default behavior of Polly remains unchanged. We also exclude
a couple of cases for which the AST expression is not yet working.

llvm-svn: 287694
2016-11-22 20:21:16 +00:00
Johannes Doerfert 81aa6e882f [NFC] Adjust naming scheme of statistic variables
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 287347
2016-11-18 14:37:08 +00:00
Johannes Doerfert dae2e9287d [DBG] Collect statistics about actually versioned SCoPs
llvm-svn: 287267
2016-11-17 21:55:43 +00:00
Johannes Doerfert 8c5464a715 [DBG] Allow to emit the RTC value at runtime
The new command line flag "polly-codegen-emit-rtc-print" can be used to
place a "printf" in the generated code that will print the RTC value and
the overflow state.

llvm-svn: 287265
2016-11-17 21:49:19 +00:00
Tobias Grosser d0b9173caa IslAst: always use the context during ast generation
Providing the context to the ast generator allows for additional simplifcations
and -- more importantly -- allows to generate loops with only partially bounded
domains, assuming the domains are bounded for all parameter configurations
that are valid as defined by the context.

This change fixes the crash reported in http://llvm.org/PR30956

The original reason why we did not include the context when generating an
AST was that CLooG and later isl used to sometimes transfer some of the
constraints that bound the size of parameters from the context into the
generated AST. This resulted in operations with very large constants, which
sometimes introduced problematic integer overflows. The latest versions of
the isl AST generator are careful to not introduce such constants.

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286442
2016-11-10 09:39:58 +00:00
Tobias Grosser 16480186f8 IslNodeBuilder: Ensure newly generated memory accesses are well-defined
Add some additional asserts that ensure newly code-generated memory accesses
are defined on all domain and schedule domain instances.

llvm-svn: 286050
2016-11-05 21:46:01 +00:00
Eli Friedman acf8006471 [Polly CodeGen] Break critical edge from RTC to original loop.
This makes polly generate a CFG which is closer to what we want
in LLVM IR, with a loop preheader for the original loop. This is
just a cleanup, but it exposes some fragile assumptions.

I'm not completely happy with the changes related to expandCodeFor;
RTCBB->getTerminator() is basically a random insertion point which
happens to work due to the way we generate runtime checks. I'm not
sure what the right answer looks like, though.

Differential Revision: https://reviews.llvm.org/D26053

llvm-svn: 285864
2016-11-02 22:32:23 +00:00
Mandeep Singh Grang 5b1abfc88e [polly] Fix non-determinism in polly BlockGenerators
Summary: Iterating over SeenBlocks which is a SmallPtrSet results in non-determinism in codegen

Reviewers: jdoerfert, zinob, grosser

Tags: #polly

Differential Revision: https://reviews.llvm.org/D25778

llvm-svn: 284622
2016-10-19 17:56:49 +00:00
Eli Friedman 3c1a75bf9c Handle multi-dimensional invariant load.
If the address of a load depends on another load, make sure to emit
the loads in the right order.

llvm-svn: 284426
2016-10-17 21:04:26 +00:00
Michael Kruse fa53c86dc1 [ScopInfo/CodeGen] ExitPHI reads are implicit.
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.

Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.

Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.

llvm-svn: 284023
2016-10-12 16:31:09 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Michael Kruse 4b0c5aea78 [CodeGen] Add assertion for indirect array index expression generation. NFC.
Currently Polly cannot generate code for index expressions if the base pointer
is computed within the scop. The base pointer must be generated as well, but
there is no code that triggers that.

Add an assertion to detect when this would occur and miscompile. The IR verifier
should catch it as well.

llvm-svn: 282893
2016-09-30 18:29:37 +00:00
Michael Kruse 888ab55140 [CodeGen] Change 'Scalar' to 'Array' in method names. NFC.
generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array)
memory accesses, therefore the method names were misleading. The names also
were similar to generateScalarLoads() and generateScalarStores() (plural forms)
which indeed handle scalar accesses. Presumbly, they were originally named to
contrast VectorBlockGenerator::generateLoad().

Rename the two methods to generateArrayLoad(),
respectively generateArrayStore().

llvm-svn: 282861
2016-09-30 14:34:05 +00:00
Michael Kruse 77394f1394 [CodeGen] Add assertion for partial scalar accesses. NFC.
The code generator always adds unconditional LoadInst and StoreInst, hence the
MemoryAccess must be defined over all statement instances.

llvm-svn: 282853
2016-09-30 14:01:46 +00:00
Tobias Grosser bc653f2031 GPGPU: Do not run mostly sequential kernels in GPU
In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run it on the
CPU.

llvm-svn: 281849
2016-09-18 08:31:09 +00:00
Tobias Grosser 82f2af3508 GPGPU: Dynamically ensure 'sufficient compute'
Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
of dynamic compute, which means GPU acceleration is not beneficial. We
compute at run-time an approximation of how many dynamic instructions will be
executed and fall back to CPU code in case this number is not sufficiently
large. To keep the run-time checking code simple, we over-approximate the
number of instructions executed in each statement by computing the volume of
the rectangular hull of its iteration space.

llvm-svn: 281848
2016-09-18 06:50:35 +00:00
Tobias Grosser 51dfc27589 GPGPU: Store back non-read-only scalars
We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the GPU. For these kernels it is important to not keep the scalar values
in thread-local registers, but to store them back to the corresponding device
memory objects that backs them up.

We currently only store scalars back at the end of a kernel. This is only
correct if precisely one thread is executed. In case more than one thread may
be run, we currently invalidate the scop. To support such cases correctly,
we would need to always load and store back from a corresponding global
memory slot instead of a thread-local alloca slot.

llvm-svn: 281838
2016-09-17 19:22:31 +00:00
Tobias Grosser fe74a7a1f5 GPGPU: Detect read-only scalar arrays ...
and pass these by value rather than by reference.

llvm-svn: 281837
2016-09-17 19:22:18 +00:00
Tobias Grosser aaabbbf886 GPGPU: Do not assume arrays start at 0
Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.

We also adjust the size of the array according to the offset at which the array
is actually accessed.

An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.

llvm-svn: 281611
2016-09-15 14:05:58 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Tobias Grosser 0a893f7df4 GPGPU: Use const_cast to avoid compiler warning [NFC]
llvm-svn: 281333
2016-09-13 13:22:27 +00:00
Tobias Grosser a82c4b5df8 GPGPU: Allow region statements
llvm-svn: 281305
2016-09-13 08:42:10 +00:00
Tobias Grosser b79f4d3970 GPGPU: Extend types when array sizes have smaller types
This prevents a compiler crash.

llvm-svn: 281303
2016-09-13 08:02:14 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 5857b701a3 GPGPU: Bail out gracefully in case of invalid IR
Instead of aborting, we now bail out gracefully in case the kernel IR we
generate is invalid. This can currently happen in case the SCoP stores
pointer values, which we model as arrays, as data values into other arrays. In
this case, the original pointer value is not available on the device and can
consequently not be stored. As detecting this ahead of time is not so easy, we
detect these situations after the invalid IR has been generated and bail out.

llvm-svn: 281193
2016-09-12 06:06:31 +00:00
Tobias Grosser 02293ed755 GPGPU: Do not fail in case of arrays never accessed
If these arrays have never been accessed we failed to derive an upper bound
of the accesses and consequently a size for the outermost dimension. We
now explicitly check for empty access sets and then just use zero as size
for the outermost dimension.

llvm-svn: 281165
2016-09-11 13:30:12 +00:00
Tobias Grosser a3afe44d6c IslNodeBuilder: Add missing __isl_take annotation
llvm-svn: 281034
2016-09-09 11:16:50 +00:00
Tobias Grosser f3600dfa2d IslNodeBuilder: Add missing __isl_take annotations
llvm-svn: 280936
2016-09-08 13:48:55 +00:00
Tobias Grosser c80d6979bd Drop '@brief' from doxygen comments
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.

llvm-svn: 280468
2016-09-02 06:33:33 +00:00
Michael Kruse 2fa3519463 Allow mapping scalar MemoryAccesses to array elements.
Change the code around setNewAccessRelation to allow to use a an existing array
element for memory instead of an ad-hoc alloca. This facility will be used for
DeLICM/DeGVN to convert scalar dependencies into regular ones.

The changes necessary include:
- Make the code generator use the implicit locations instead of the alloca ones.
- A test case
- Make the JScop importer accept changes of scalar accesses for that test case.
- Adapt the MemoryAccess interface to the fact that the MemoryKind can change.
  They are named (get|is)OriginalXXX() to get the status of the memory access
  before any change by setNewAccessRelation() (some properties such as
  getIncoming() do not change even if the kind is changed and are still
  required). To get the modified properties, there is (get|is)LatestXXX(). The
  old accessors without Original|Latest become synonyms of the
  (get|is)OriginalXXX() to not make functional changes in unrelated code.

Differential Revision: https://reviews.llvm.org/D23962

llvm-svn: 280408
2016-09-01 19:53:31 +00:00
Tobias Grosser 1c18440958 [BlockGenerator] Invalidate SCEV values for instructions in scop
We already invalidated a couple of critical values earlier on, but we now
invalidate all instructions contained in a scop after the scop has been code
generated. This is necessary as later scops may otherwise obtain SCEV
expressions that reference values in the earlier scop that before dominated
the later scop, but which had been moved into the conditional branch and
consequently do not dominate the later scop any more. If these very values are
then used during code generation of the later scop, we generate used that are
dominated by the values they use.

This fixes: http://llvm.org/PR28984

llvm-svn: 279047
2016-08-18 10:45:57 +00:00
Tobias Grosser d58acf866a [GPGPU] Ensure arrays where only parts are modified are copied to GPU
To do so we change the way array exents are computed. Instead of the precise
set of memory locations accessed, we now compute the extent as the range between
minimal and maximal address in the first dimension and the full extent defined
by the sizes of the inner array dimensions.

We also move the computation of the may_persist region after the construction
of the arrays, as it relies on array information. Without arrays being
constructed no useful information is computed at all.

llvm-svn: 278212
2016-08-10 10:58:19 +00:00
Tobias Grosser b06ff4574e [GPGPU] Support PHI nodes used in GPU kernel
Ensure the right scalar allocations are used as the host location of data
transfers. For the device code, we clear the allocation cache before device
code generation to be able to generate new device-specific allocation and
we need to make sure to add back the old host allocations as soon as the
device code generation is finished.

llvm-svn: 278126
2016-08-09 15:35:06 +00:00
Tobias Grosser 750160e260 [GPGPU] Use separate basic block for GPU initialization code
This increases the readability of the IR and also clarifies that the GPU
inititialization is executed _after_ the scalar initialization which needs
to before the code of the transformed scop is executed.

Besides increased readability, the IR should not change. Specifically, I
do not expect any changes in program semantics due to this patch.

llvm-svn: 278125
2016-08-09 15:35:03 +00:00
Tobias Grosser 776700d0b7 [BlockGenerator] Insert initializations at beginning of start block
In case some code -- not guarded by control flow -- would be emitted directly in
the start block, it may happen that this code would use uninitalized scalar
values if the scalar initialization is only emitted at the end of the start
block. This is not a problem today in normal Polly, as all statements are
emitted in their own basic blocks, but Polly-ACC emits host-to-device copy
statements into the start block.

Additional Polly-ACC test coverage will be added in subsequent changes that
improve the handling of PHI nodes in Polly-ACC.

llvm-svn: 278124
2016-08-09 15:34:59 +00:00
Tobias Grosser c59b3ce044 [BlockGenerator] Also eliminate dead code not originating from BB
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.

This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.

llvm-svn: 278103
2016-08-09 08:59:05 +00:00
Tobias Grosser cf66ef26f3 [GPGPU] Pass parameters always by using their own type
llvm-svn: 278100
2016-08-09 07:22:08 +00:00
Tobias Grosser 124534038a [GPGPU] Support Values referenced from both isl expr and llvm instructions
When adding code that avoids to pass values used in isl expressions and
LLVM instructions twice, we forgot to make single variable passed to the
kernel available in the ValueMap that makes it usable for instructions that
are not replaced with isl ast expressions. This change adds the variable
that is passed to the kernel to the ValueMap to ensure it is available
for such use cases as well.

llvm-svn: 278039
2016-08-08 19:22:19 +00:00
Tobias Grosser cb1aef8de4 [GPGPU] Create code to verify run-time conditions
llvm-svn: 278026
2016-08-08 17:35:55 +00:00
Tobias Grosser fa9abd1f03 Fix compilation in 'asserts' mode
llvm-svn: 278025
2016-08-08 17:35:52 +00:00
Tobias Grosser 0aa29532b7 [IslNodeBuilder] Move run-time check generation to NodeBuilder [NFC]
This improves the structure of the code and allows us to reuse the runtime
code generation in the PPCGCodeGeneration.

llvm-svn: 278017
2016-08-08 15:41:52 +00:00
Tobias Grosser 219feac456 [CodeGeneration] Do not set insert position redundantly
There is no need to reset the position of the builder, as we can just continue
to insert code at the current position of the IRBuilder, which happens to
be precisely the location we reset the builder to.

llvm-svn: 278014
2016-08-08 15:25:50 +00:00
Tobias Grosser 000db70754 [IslNodeBuilder] Directly use the insert location of our Builder
... instead of adding instructions at the end of the basic block the builder
is currently at. This makes it easier to reason about where IR is generated,
as with the IRBuilder there is just a single location that specificies where
IR is generated.

llvm-svn: 278013
2016-08-08 15:25:46 +00:00
Michael Kruse fbde435517 [CodeGen] Use MapVector instead of DenseMap.
The map is iterated over when generating the values escaping the SCoP. The
indeterministic iteration order of DenseMap causes the output IR to change at
every compilation, adding noise to comparisons.

Replace DenseMap by a MapVector to ensure the same iteration order at every
compilation.

llvm-svn: 277832
2016-08-05 16:45:51 +00:00
Tobias Grosser 928d7573dd GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly
Before this commit we generated the array type in reverse order and we also
added the outermost dimension size to the new array declaration, which is
incorrect as Polly additionally assumed an additional unsized outermost
dimension, such that we had an off-by-one error in the linearization of access
expressions.

llvm-svn: 277802
2016-08-05 08:27:24 +00:00
Tobias Grosser c1c6a2a61b GPGPU: Add cuda annotations to specify maximal number of threads per block
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.

llvm-svn: 277799
2016-08-05 06:47:43 +00:00
Tobias Grosser f919d8b360 GPGPU: Support scalars that are mapped to shared memory
llvm-svn: 277726
2016-08-04 13:57:29 +00:00
Tobias Grosser 8950cead7f GPGPU: Disable verbose debug output
llvm-svn: 277724
2016-08-04 12:44:03 +00:00
Tobias Grosser b0dd95bcd2 Remove leftover debug output
llvm-svn: 277723
2016-08-04 12:41:28 +00:00
Tobias Grosser 130ca30f92 GPGPU: Add private memory support
llvm-svn: 277722
2016-08-04 12:39:03 +00:00
Tobias Grosser b513b4916b GPGPU: Add support for shared memory
llvm-svn: 277721
2016-08-04 12:18:14 +00:00
Tobias Grosser 00bb5a99f5 GPGPU: Handle scalar array references
Pass the content of scalar array references to the alloca on the kernel side
and do not pass them additional as normal LLVM scalar value.

llvm-svn: 277699
2016-08-04 06:55:59 +00:00
Tobias Grosser 3216f8546c BlockGenerator: Assert that we do not get alloca of array access
llvm-svn: 277698
2016-08-04 06:55:53 +00:00
Tobias Grosser 576932728d GPGPU: Pass subtree values correctly to the kernel
llvm-svn: 277697
2016-08-04 06:55:49 +00:00
Tobias Grosser 629109b633 GPGPU: Mark kernel functions as polly.skip
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.

llvm-svn: 277589
2016-08-03 12:00:07 +00:00
Tobias Grosser 2219d15748 Fix a couple of spelling mistakes
llvm-svn: 277569
2016-08-03 05:28:09 +00:00
Roman Gareev d7754a1245 Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D22828

llvm-svn: 277263
2016-07-30 09:25:51 +00:00
Tobias Grosser d8b94bcac1 GPGPU: Pass context parameters to GPU kernel
llvm-svn: 276963
2016-07-28 06:47:59 +00:00
Tobias Grosser a490147c90 GPGPU: Pass host iterators to kernel
llvm-svn: 276962
2016-07-28 06:47:56 +00:00
Tobias Grosser 44143bb927 GPGPU: use current 'Index' to find slot in parameter array
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.

llvm-svn: 276961
2016-07-28 06:47:53 +00:00
Tobias Grosser 4e18d71c71 GPGPU: Generate kernel parameter allocation with right size
Before this change we miscounted the number of function parameters.

llvm-svn: 276960
2016-07-28 06:47:50 +00:00
Tobias Grosser 79a947c233 GPGPU: Add basic support for kernel launches
llvm-svn: 276863
2016-07-27 13:20:16 +00:00
Tobias Grosser 5779359624 GPGPU: Load GPU kernels
We embed the PTX code into the host IR as a global variable and compile it
at run-time into a GPU kernel.

llvm-svn: 276645
2016-07-25 16:31:21 +00:00
Tobias Grosser 13c78e4d51 GPGPU: Emit data-transfer code
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.

llvm-svn: 276636
2016-07-25 12:47:39 +00:00
Tobias Grosser 7287aeddf1 GPGPU: Complete code to allocate and free device arrays
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.

llvm-svn: 276635
2016-07-25 12:47:33 +00:00
Tobias Grosser fa7b080218 GPGPU: initialize GPU context and simplify the corresponding GPURuntime interface.
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.

llvm-svn: 276623
2016-07-25 09:16:01 +00:00
Tobias Grosser 8ed5e5999f IslNodeBuilder: Make finalize() virtual
This allows the finalization routine of the IslNodeBuilder to be overwritten
by derived classes. Being here, we also drop the unnecessary 'Scop' postfix
and the unnecessary 'Scop' parameter.

llvm-svn: 276622
2016-07-25 09:15:57 +00:00
Tobias Grosser 9a18d55947 GPGPU: Optimize kernel IR before generating assembly code
We optimize the kernel _after_ dumping the IR we generate to make the IR we
dump easier readable and independent of possible changes in the general
purpose LLVM optimizers.

llvm-svn: 276551
2016-07-24 06:43:21 +00:00
Tobias Grosser e1a98343a1 GPGPU: Verify kernel IR before generating assembly
llvm-svn: 276550
2016-07-24 06:43:17 +00:00
Tobias Grosser 74dc3cb431 GPGPU: Generate PTX assembly code for the kernel modules
Run the NVPTX backend over the GPUModule IR and write the resulting assembly
code in a string.

To work correctly, it is important to invalidate analysis results that still
reference the IR in the kernel module. Hence, this change clears all references
to dominators, loop info, and scalar evolution.

Finally, the NVPTX backend has troubles to generate code for various special
floating point types (not surprising), but also for uncommon integer types. This
commit does not resolve these issues, but pulls out problematic test cases into
separate files to XFAIL them individually and resolve them in future (not
immediate) changes one by one.

llvm-svn: 276396
2016-07-22 07:11:12 +00:00
Tobias Grosser edb885cb12 GPGPU: generate code for ScopStatements
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.

For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.

llvm-svn: 276270
2016-07-21 13:15:59 +00:00
Tobias Grosser 86083da0ec IslNodeBuilder: expose addReferencesFromStmt [NFC]
This will be used by Polly GPGPU to determine the values that need to be
passed to GPU kernels.

llvm-svn: 276269
2016-07-21 13:15:55 +00:00
Tobias Grosser 04b909fcca IslExprBuilder: allow to specify an external isl_id to ScopArrayInfo mapping
This is useful for external users using IslExprBuilder, in case they cannot
embed ScopArrayInfo data into their isl_ids, because the isl_ids either already
carry other information or the isl_ids have been created and their user pointers
cannot be updated any more.

llvm-svn: 276268
2016-07-21 13:15:51 +00:00
Tobias Grosser 9d12d8ade3 BlockGenerator: remove dead instructions in normal statements
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.

llvm-svn: 276263
2016-07-21 11:48:36 +00:00
Tobias Grosser 2d58a64e7f GPGPU: Bail out of scops with hoisted invariant loads
This is currently not supported and will only be added later. Also update the
test cases to ensure no invariant code hoisting is applied.

llvm-svn: 275987
2016-07-19 15:56:25 +00:00
Tobias Grosser 5260c041ea GPGPU: Emit in-kernel synchronization statements
We use this opportunity to further classify the different user statements that
can arise and add TODOs for the ones not yet implemented.

llvm-svn: 275957
2016-07-19 07:33:16 +00:00
Tobias Grosser 59ab070523 GPGPU: generate control flow within the kernel
llvm-svn: 275956
2016-07-19 07:33:11 +00:00