Given the following code
for (i = 0; i < 10; i++) {
;
}
S: A[i] = 0
When code generating S using scev based code generation, we need to retrieve
the scev of 'i' at the location of 'S'. If we do not do this the scev that
we obtain will be expressed as {0,+,1}_for and will reference loop iterators
that do not surround 'S' and that we consequently do not know how to code
generate. What we really want is the scev to be instantiated to the value of 'i'
after the loop. This value is {10} and it can be code generated without
troubles.
llvm-svn: 177777
Scev code generation can now handle scops with non canonical induction
variables. Hence there is no need to introduce canonical ones any more.
llvm-svn: 177644
We now detect scops without a canonical induction variable and can generate a
polyhedral representation for them. There was no modification necessary to
code generate these scops.
llvm-svn: 177643
When using the scev based code generation, we now do not rely on the presence
of a canonical induction variable any more. This commit prepares the path to
(conditionally) disable the induction variable canonicalization pass.
llvm-svn: 177548
This allows us to test Polly and the Polly optimizer without actually doing
code generation at the end. By enabling this option, we can also measure the
compile time overhead due to code generation and the cost of LLVM optimizing the
newly generated code.t
llvm-svn: 177516
When doing SCEV based code generation, we ignore instructions calculating values
that are fully defined by a SCEV expression. The values that are calculated by
this instructions are recalculated on demand.
This commit improves the check to verify if certain instructions can be ignored
and recalculated on demand.
llvm-svn: 177313
We now show the all members of the alias set that may couse possible aliasing.
In case a alias set member is not a named instruction (unnamed instructions or
constant expressions), we show the expression itself.
This improves our error message
from:
Possible aliasing for value: .reg2mem
to:
Possible aliasing: ".reg2mem",
"[0 x double]* inttoptr (i64 47255179264 to [0 x double]*)
llvm-svn: 174329
We fix the following formatting problems found by clang-format:
- 80 cols violations
- Obvious problems with missing or too many spaces
- multiple new lines in a row
clang-format suggests many more changes, most of them falling in the following
two categories:
1) clang-format does not at all format a piece of code nicely
2) The style that clang-format suggests does not match the style used in
Polly/LLVM
I consider differences caused by reason 1) bugs, which should be fixed by
improving clang-format. Differences due to 2) need to be investigated closer
to understand the cause of the difference and the solution that should be taken.
llvm-svn: 171241
Recent changes in isl:
- Allow analysis of loops during code generation
This simplifies the detection of parallel loops.
- Simplify the way costumized ast printers are defined
This enables us to highlight parallel / vector loops in our debug output.
- Compile time improvements for codegen contexts that include parameters
- Various bug fixes
This update also gets us in sync for the isl 0.11 release.
llvm-svn: 169100
generation.
We don't use the exact same way to build loop body for GPGPU codegen as openmp
codegen and other transformations do currently, in which cases 'createLoop'
function is called recursively. GPGPU codegen may fail due to improper restore
of ValueMap and ClastVars .
Contributed by: Yabin Hu <yabin.hwu@gmail.com>
llvm-svn: 168966
Caught while compiling polly without cloog:
../tools/polly/lib/RegisterPasses.cpp:77: error: use of enum 'CodegenChoice' without previous declaration
llvm-svn: 168624
Instead of calculating exact value (flow) dependences, it is also possible to
calculate memory based dependences. Sometimes memory based dependences are a lot
easier to calculate. To evaluate the benefits, we add an option to calculate
memory based dependences (use -polly-value-dependences=false).
llvm-svn: 167251
If the flags '-polly-report -g' are given, we print file name and line numbers
for the beginning and end of all detected scops.
linear-algebra/kernels/gemm/gemm.c:23: Scop start
linear-algebra/kernels/gemm/gemm.c:42: Scop end
linear-algebra/kernels/gemm/gemm.c:77: Scop start
linear-algebra/kernels/gemm/gemm.c:82: Scop end
llvm-svn: 167235
The detection of values that need to be copied in to the generated OpenMP
subfunction also detects the array base addresses needed in the SCoP. Hence, it
is not necessary to unconditionally copy all the base addresses to the generated
function.
Test cases are modified to reflect this change. Arrays which are global
variables do not occur in the struct passed to the subfunction anymore. A test
case for base address copy-in is added in copy_in_array.{c,ll}.
Committed with slight modifications
Contributed by: Armin Groesslinger <armin.groesslinger@uni-passau.de>
llvm-svn: 167215
In addition to the arrays and clast variables a SCoP statement may also refer to
values defined before the SCoP or to function arguments. Detect these values and
add them to the set of values passed to the function generated for OpenMP
parallel execution of a clast.
Committed with additional test cases and some refactoring.
Contributed by: Armin Groesslinger <armin.groesslinger@uni-passau.de>
llvm-svn: 167214
When generating OpenMP or GPGPU code the original ValueMap and ClastVars must be
kept. We already recovered the original ClastVars by reverting the changes, but
we did not keep the content of the ValueMap. This patch keeps now an explicit
copy of both maps and restores them after generating OpenMP or GPGPU code.
This is an adapted version of a patch contributed by:
Armin Groesslinger <armin.groesslinger@uni-passau.de>
llvm-svn: 167213
I like to make w/o being able to build, but I don't have the dependencies to
build and test polly. I'll revert if the build bots don't like it.
llvm-svn: 166670
This change ensures that isl is only detected if it includes code generation
support. This allows us to remove a lot of conditional compilation and also
avoids missing test cases in case the feature is not available.
llvm-svn: 166403
The bug was within isl. To fix it, we simply update the isl version that
is used by Polly. We still have some changes within Polly to be able to
write a proper test case.
Reported-by: Sameer Sahasrabuddhe <Sameer.Sahasrabuddhe@amd.com>
llvm-svn: 166021
Previously isl always generated '<=' or '>='. However, in many cases '<' or '>'
leads to simpler code. This commit updates isl and adds the relevant code
generation support to Polly.
llvm-svn: 166020
Scoplib only supports access functions, but not the more generic
access relations. This commit now also supports access functions
that where not directly expresses as A[sub] with sub = i + 5b,
but with A[sub] with -sub = -i + (-5b).
Test case to come.
Contributed by: Dustin Feld <d3.feld@gmail.com>
llvm-svn: 165379
This pass implements a new code generator that uses the code generation
algorithm included in isl.
For the moment the new code generation is limited to sequential code.
llvm-svn: 165037
Older versions of libpluto crashed, if no schedule was found. Recent
versions return NULL. We detect this and keep the original schedule.
llvm-svn: 164376
This ensures that the isl sets/maps we operate on have the same parameter
dimensions. Operations on objects with different parameter dimensions are not
allow and trigger assertions.
llvm-svn: 163618
The IndVarSimplify pass in Polly uses the intrinsics header. We need to ensure
that the header is generated, before we use it. This patch fixes the problem
for the cmake build (it did not show up in the autoconf one).
Contributed by: Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
llvm-svn: 163130
This includes:
- The isl_id of the domain of the scattering must be copied from the original
domain
- Remove outdated references to a 'FinalRead' statement
- Print of the Pocc output, if -debug is provided.
- Add line breaks to some error messages.
Reported and Debugged by: Dustin Feld <d3.feld@gmail.com>
llvm-svn: 162901
Translate the selected parallel loop body into a ptx string and run it with the
cuda driver API. We limit this preliminary implementation to target the
following special test cases:
- Support only 2-dimensional parallel loops with or without only one innermost
non-parallel loop.
- Support write memory access to only one array in a SCoP.
The patch was committed with smaller changes to the build system:
There is now a flag to enable gpu code generation explictly. This was required
as we need the llvm.codegen() patch applied on the llvm sources, to compile this
feature correctly. Also, enabling gpu code generation does not require cuda.
This requirement was removed to allow 'make polly-test' runs, even without an
installed cuda runtime.
Contributed by: Yabin Hu <yabin.hwu@gmail.com>
llvm-svn: 161239
The Apple linker fails by default, if some function calls can not be resolved at
link time. However, all functions that are part of LLVM itself will not be
linked into Polly, but will be provided by the compiler that Polly is loaded
into. Hence, during linking we need to ignore failures due to unresolved
function calls.
llvm-svn: 161234
Cast instruction do not have side effects and can consequently be part of a
scop. We special cased them earlier, as they may be problematic within array
subscripts or loop bounds. However, the scalar evolution validator already
checks for them such that there is no need to also check the instructions within
the basic blocks. Checking them is actually overly conservative as the precence
of casts may invalidate a scop, even though scalar evolution is not influenced
by it.
llvm-svn: 160261
I did not take into account, that this patch fails to compile without the
llvm.codegen patch applied. This breaks buildbots.
I revert this until we found a solution to commit this without buildbots
complaining.
This reverts commit cb43ab80e94434e780a66be3b9a6ad466822fe33.
llvm-svn: 160165
Translate the selected parallel loop body into a ptx string and run it
with cuda driver API. We limit this preliminary implementation to
target the following special test cases:
- Support only 2-dimensional parallel loops with or without only one
innermost non-parallel loop.
- Support write memory access to only one array in a SCoP.
Contributed by: Yabin Hu <yabin.hwu@gmail.com>
llvm-svn: 160164
CLooG and the CLooG based code generation does not yet correctly derive the
types of the expressions, but just uses i64 for everything. This is incorrect,
but works normally pretty well. However, the recent change of adding parameter
bounds to the context made CLooG generate expressions that contain a lot of very
large integers that possibly don't fit into an i64. This broke the code
generation for several benchmarks.
To get the CLooG based code generation working again, we just don't take into
account any constraints in the context. This brings us back to the theoretical
incorrect, but in practice generally correct code.
The next step will be the isl based code generation. Here we will derive
automatically correct types.
llvm-svn: 158015
Store a pointer to each ScopStmt in the isl_id associated with the space of its
domain. This will later allow us to recover the statement during code
generation with isl.
llvm-svn: 157607
Derive the maximal and minimal values of a parameter from the type it has. Add
this information to the scop context. This information is needed, to derive
optimal types during code generation.
llvm-svn: 157245
There is no need for special code to handle SCEVUnknowns. SCEVUnkowns are always
parameters and will be handled by the generic parameter handling code in
visit().
llvm-svn: 157243
This is an incomplete implementation of the SCEV based code generation.
When finished it will remove the need for -indvars -enable-iv-rewrite.
For the moment it is still disabled. Even though it passes 'make polly-test',
there are still loose ends especially in respect of OpenMP code generation.
llvm-svn: 155717
This fixes two crashes that appeared in case of:
- A load of a non vectorizable type (e.g. float**)
- An instruction that is not vectorizable (e.g. call)
llvm-svn: 154586
Grouped unrolling means that we unroll a loop such that the different instances
of a certain statement are scheduled right after each other, but we do
not generate any vector code. The idea here is that we can schedule the
bb vectorizer right afterwards and use it heuristics to decide when
vectorization should be performed.
llvm-svn: 154251
To avoid overflows we still use a larger type (i64) while calculating the value
of the old ivs. However, we truncate the result to the type of the old iv when
providing it to the new code.
A corresponding test case is added to the polly test suite. Also, a failing test
case is fixed.
This fixes PR12311.
Contributed by: Tsingray Liu <tsingrayliu@gmail.com>
llvm-svn: 153952
When deriving new values for the statements of a SCoP, we assumed that parameter
values are constant within the SCoP and consquently do not need to be rewritten.
For OpenMP code generation this assumption is wrong, as such values are not
available in the OpenMP subfunction and consequently also may need to be
rewritten.
Committed with some changes.
Contributed-By: Johannes Doerfert <s9jodoer@stud.uni-saarland.de>
llvm-svn: 153838
We create a new file LoopGenerators that provides utility classes for the
generation of OpenMP parallel and scalar loops. This means we move a lot
of the OpenMP generation out of the Polly specific code generator.
llvm-svn: 153325