This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.
llvm-svn: 276263
It seems the order in which we generated memory accesses changed such that
the import of these updated memory accesses failed for the 'loop3' statement
in this test case. Unfortunately, the existing CHECK lines were not strict
enough to catch this. Hence, besides fixing the order of the memory access
lines we also ensure that the memory access changes are both clearly visibly
and well checked.
llvm-svn: 276247
This is currently not supported and will only be added later. Also update the
test cases to ensure no invariant code hoisting is applied.
llvm-svn: 275987
We use this opportunity to further classify the different user statements that
can arise and add TODOs for the ones not yet implemented.
llvm-svn: 275957
Create for each kernel a separate LLVM-IR module containing a single function
marked as kernel function and taking one pointer for each array referenced
by this kernel. Add debugging output to verify the kernels are generated
correctly.
llvm-svn: 275952
Initialize the list of references to a GPU array to ensure that the arrays that
need to be passed to kernel calls are computed correctly. Furthermore, the very
same information is also necessary to compute synchronization correctly. As the
functionality to compute these references is already available, what is left for
us to do is only to connect the necessary functionality to compute array
reference information.
llvm-svn: 275798
Create LLVM-IR for all host-side control flow of a given GPU AST. We implement
this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The
IslNodeBuilder will take care of generating all general-purpose ast nodes, but
we provide our own createUser implementation to handle the different GPU
specific user statements. For now, we just skip any user statement and only
generate a host-code sceleton, but in subsequent commits we will add handling of
normal ScopStmt's performing computations, kernel calls, as well as host-device
data transfers. We will also introduce run-time check generation and LICM in
subsequent commits.
llvm-svn: 275783
Otherwise ppcg would try to call into pet functionality that this not available,
which obviously will cause trouble. As we can easily print these statements
ourselves, we just do so.
llvm-svn: 275579
This option increases the scalability of the scheduler and allows us to remove
the 'gisting' workaround we introduced in r275565 to handle a more complicated
test case. Another benefit of using this option is also that the generated
code looks a lot more streamlined.
Thanks to Sven Verdoolaege for reminding me of this option.
llvm-svn: 275573
This works around a shortcoming of the isl scheduler, which even for some
smaller test cases does not terminate in case domain constraints are part
of the flow dependences.
llvm-svn: 275565
Arrays with integer base type are similar to arrays with floating point types,
with the exception that LLVM's integer types can take some odd values. We
add a selection of different values to make sure we correctly round these
types when necessary.
References to scalar integer types are special, as we currently do not model
these types as array accesses as they are considered 'synthesizable' by Polly.
As a result, we do not generate explicit data-transfers for them, but instead
will need to keep track of all references to 'synthesizable' values separately.
At the current stage, this is only visible by missing host-to-device
data-transfer calls. In the future, we will also require special code generation
strategies.
llvm-svn: 275551
We currently only test that the code structure we generate for these scalar
parameters is correct and we add these types to make sure later code generation
additions have sufficient test coverage.
In case some of these types cannot be mapped due to missing hardware support
on the GPU some of these test cases may need to be updated later on.
llvm-svn: 275548
A sequence of CHECK lines allows additional statements to appear in the
output of the tested program without any test failures appearing. As we do
not want this to happen, switch this test case to use CHECK-NEXT.
llvm-svn: 275534
For this we need to provide an explicit list of statements as they occur in
the polly::Scop to ppcg.
We also setup basic AST printing facilities to facilitate debugging. To allow
code reuse some (minor) changes in ppcg are have been necessary.
llvm-svn: 275436
The tile size was previously uninitialized. As a result, it was often zero (aka.
no tiling), which is not what we want in general. More importantly, there was
the risk for arbitrary tile sizes to be choosen, which we did not observe, but
which still is highly problematic.
llvm-svn: 275418
This change now applies ppcg's GPU mapping on our initial schedule. For this
to work, we need to also initialize the set of all names (isl_ids) used in
the scop as well as the program context.
llvm-svn: 275396
To do so we copy the necessary information to compute an initial schedule from
polly::Scop to ppcg's scop. Most of the necessary information is directly
available and only needs to be passed on to ppcg, with the exception of 'tagged'
access relations, access relations that additionally carry information about
which memory access an access relation originates from.
We could possibly perform the construction of tagged accesses as part of
ScopInfo, but as this format is currently specific to ppcg we do not do this
yet, but keep this functionality local to our GPU code generation.
After the scop has been initialized, we compute data dependences and ask ppcg to
compute an initial schedule. Some of this functionality is already available in
polly::DependenceInfo and polly::ScheduleOptimizer, but to keep differences
to ppcg small we use ppcg's functionality here. We may later investiage if
a closer integration of these tools makes sense.
llvm-svn: 275390
At this stage, we do not yet modify the IR but just generate a default
initialized ppcg_scop and gpu_prog and free both immediately. Both will later be
filled with data from the polly::Scop and are needed to use PPCG for GPU
schedule generation. This commit does not yet perform any GPU code generation,
but ensures that the basic infrastructure has been put in place.
We also add a simple test case to ensure the new code is run and use this
opportunity to verify that GPU_CODEGEN tests are only run if GPU code generation
has been enabled in cmake.
llvm-svn: 275389
Check not only that the compiler is not crashing, but also whether the
probablematic part (The sequence of instructions simplified to '4') is reflected
in the output.
Thanks to Tobias for the hint.
llvm-svn: 275189
An assertion in visitSDivInstruction() checked whether the divisor is constant
by checking whether the argument is a ConstantInt. However, SCEVValidator allows
the divisor to be simplified to a constant by ScalarEvolution.
We synchronize the implementation of SCEVValidator and SCEVAffinator to both
accept simplified SCEV expressions.
llvm-svn: 275174
For llvm the memory accesses from nonaffine loops should be visible,
however for polly those nonaffine loops should be invisible/boxed.
This fixes llvm.org/PR28245
Cointributed-by: Huihui Zhang <huihuiz@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D21591
llvm-svn: 274842
Reject and report regions that contains loops overlapping nonaffine region.
This situation typically happens in the presence of inifinite loops.
This addresses bug llvm.org/PR28071.
Differential Revision: http://reviews.llvm.org/D21312
Contributed-by: Huihui Zhang <huihuiz@codeaurora.org>
llvm-svn: 273905
This patch addresses:
- A new function pass to compute polyhedral dependences. This is
required to avoid the region pass manager.
- Stores a map of Scop to Dependence object for all the scops present
in a function. By default, access wise dependences are stored.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: http://reviews.llvm.org/D21105
llvm-svn: 273881
This patch adds a new function pass ScopInfoWrapperPass so that the
polyhedral description of a region, the SCoP, can be constructed and
used in a function pass.
Patch by Utpal Bora <cs14mtech11017@iith.ac.in>
Differential Revision: http://reviews.llvm.org/D20962
llvm-svn: 273856
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D21140
llvm-svn: 273397
With this update the isl AST generation extracts disjunctive constraints early
on. As a result, code that previously resulted in two branches with (close-to)
identical code within them:
if (P <= -1) {
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
} else if (P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
results now in only a single branch body:
if (P <= -1 || P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
This resolves http://llvm.org/PR27559
Besides the above change, this isl update brings better simplification of
sets/maps containing existentially quantified dimensions and fixes a bug in
isl's coalescing.
llvm-svn: 272500
As these test cases will be changed in a subsequent commit, we expand and
tighten them to make the subsequent changes to them more obvious. As part of
this we add more context to some test cases and add CHECK-NEXT lines to ensure
no intermediate lines are missed by accident.
llvm-svn: 272499
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
such. In order to generate them properly as parameters we had to improve the
ScopExpander, though the change is the first in the direction of a more
aggressive scalar synthetization.
This patch was originally contributed by Johannes Doerfert in r271888, but was
in conflict with the revert in r272483. This is a recommit with some minor
adjustment to the test cases to take care of differing instruction names.
llvm-svn: 272485
The recent expression type changes still need more discussion, which will happen
on phabricator or on the mailing list. The precise list of commits reverted are:
- "Refactor division generation code"
- "[NFC] Generate runtime checks after the SCoP"
- "[FIX] Determine insertion point during SCEV expansion"
- "Look through IntToPtr & PtrToInt instructions"
- "Use minimal types for generated expressions"
- "Temporarily promote values to i64 again"
- "[NFC] Avoid unnecessary comparison for min/max expressions"
- "[Polly] Fix -Wunused-variable warnings (NFC)"
- "[NFC] Simplify min/max expression generation"
- "Simplify the type adjustment in the IslExprBuilder"
Some of them are just reverted as we would otherwise get conflicts. I will try
to re-commit them if possible.
llvm-svn: 272483
This patch refactors the code generation for divisions. This allows to
always generate a shift for a power-of-two division and to utilize
information about constant divisors in order to truncate the result
type.
llvm-svn: 271898
We now generate runtime checks __after__ the SCoP code generation and
not before, though they are still inserted at the same position int
the code. This allows to modify the runtime check during SCoP code
generation.
llvm-svn: 271894
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
such. In order to generate them properly as parameters we had to improve the
ScopExpander, though the change is the first in the direction of a more
aggressive scalar synthetization.
llvm-svn: 271888
We now use the minimal necessary bit width for the generated code. If
operations might overflow (add/sub/mul) we will try to adjust the types in
order to ensure a non-wrapping computation. If the type adjustment is not
possible, thus the necessary type is bigger than the type value of
--polly-max-expr-bit-width, we will use assumptions to verify the computation
will not wrap. However, for run-time checks we cannot build assumptions but
instead utilize overflow tracking intrinsics.
llvm-svn: 271878