Providing the context to the ast generator allows for additional simplifcations
and -- more importantly -- allows to generate loops with only partially bounded
domains, assuming the domains are bounded for all parameter configurations
that are valid as defined by the context.
This change fixes the crash reported in http://llvm.org/PR30956
The original reason why we did not include the context when generating an
AST was that CLooG and later isl used to sometimes transfer some of the
constraints that bound the size of parameters from the context into the
generated AST. This resulted in operations with very large constants, which
sometimes introduced problematic integer overflows. The latest versions of
the isl AST generator are careful to not introduce such constants.
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286442
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D23991
llvm-svn: 281234
Change the code around setNewAccessRelation to allow to use a an existing array
element for memory instead of an ad-hoc alloca. This facility will be used for
DeLICM/DeGVN to convert scalar dependencies into regular ones.
The changes necessary include:
- Make the code generator use the implicit locations instead of the alloca ones.
- A test case
- Make the JScop importer accept changes of scalar accesses for that test case.
- Adapt the MemoryAccess interface to the fact that the MemoryKind can change.
They are named (get|is)OriginalXXX() to get the status of the memory access
before any change by setNewAccessRelation() (some properties such as
getIncoming() do not change even if the kind is changed and are still
required). To get the modified properties, there is (get|is)LatestXXX(). The
old accessors without Original|Latest become synonyms of the
(get|is)OriginalXXX() to not make functional changes in unrelated code.
Differential Revision: https://reviews.llvm.org/D23962
llvm-svn: 280408
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23740
llvm-svn: 279395
We already invalidated a couple of critical values earlier on, but we now
invalidate all instructions contained in a scop after the scop has been code
generated. This is necessary as later scops may otherwise obtain SCEV
expressions that reference values in the earlier scop that before dominated
the later scop, but which had been moved into the conditional branch and
consequently do not dominate the later scop any more. If these very values are
then used during code generation of the later scop, we generate used that are
dominated by the values they use.
This fixes: http://llvm.org/PR28984
llvm-svn: 279047
This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.
llvm-svn: 278667
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D22187
llvm-svn: 278666
In case some code -- not guarded by control flow -- would be emitted directly in
the start block, it may happen that this code would use uninitalized scalar
values if the scalar initialization is only emitted at the end of the start
block. This is not a problem today in normal Polly, as all statements are
emitted in their own basic blocks, but Polly-ACC emits host-to-device copy
statements into the start block.
Additional Polly-ACC test coverage will be added in subsequent changes that
improve the handling of PHI nodes in Polly-ACC.
llvm-svn: 278124
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D22828
llvm-svn: 277263
It seems the order in which we generated memory accesses changed such that
the import of these updated memory accesses failed for the 'loop3' statement
in this test case. Unfortunately, the existing CHECK lines were not strict
enough to catch this. Hence, besides fixing the order of the memory access
lines we also ensure that the memory access changes are both clearly visibly
and well checked.
llvm-svn: 276247
With this update the isl AST generation extracts disjunctive constraints early
on. As a result, code that previously resulted in two branches with (close-to)
identical code within them:
if (P <= -1) {
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
} else if (P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
results now in only a single branch body:
if (P <= -1 || P >= 1)
for (int c0 = 0; c0 < N; c0 += 1)
Stmt_store(c0);
This resolves http://llvm.org/PR27559
Besides the above change, this isl update brings better simplification of
sets/maps containing existentially quantified dimensions and fixes a bug in
isl's coalescing.
llvm-svn: 272500
As these test cases will be changed in a subsequent commit, we expand and
tighten them to make the subsequent changes to them more obvious. As part of
this we add more context to some test cases and add CHECK-NEXT lines to ensure
no intermediate lines are missed by accident.
llvm-svn: 272499
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
such. In order to generate them properly as parameters we had to improve the
ScopExpander, though the change is the first in the direction of a more
aggressive scalar synthetization.
This patch was originally contributed by Johannes Doerfert in r271888, but was
in conflict with the revert in r272483. This is a recommit with some minor
adjustment to the test cases to take care of differing instruction names.
llvm-svn: 272485
The recent expression type changes still need more discussion, which will happen
on phabricator or on the mailing list. The precise list of commits reverted are:
- "Refactor division generation code"
- "[NFC] Generate runtime checks after the SCoP"
- "[FIX] Determine insertion point during SCEV expansion"
- "Look through IntToPtr & PtrToInt instructions"
- "Use minimal types for generated expressions"
- "Temporarily promote values to i64 again"
- "[NFC] Avoid unnecessary comparison for min/max expressions"
- "[Polly] Fix -Wunused-variable warnings (NFC)"
- "[NFC] Simplify min/max expression generation"
- "Simplify the type adjustment in the IslExprBuilder"
Some of them are just reverted as we would otherwise get conflicts. I will try
to re-commit them if possible.
llvm-svn: 272483
This patch refactors the code generation for divisions. This allows to
always generate a shift for a power-of-two division and to utilize
information about constant divisors in order to truncate the result
type.
llvm-svn: 271898
We now generate runtime checks __after__ the SCoP code generation and
not before, though they are still inserted at the same position int
the code. This allows to modify the runtime check during SCoP code
generation.
llvm-svn: 271894
IntToPtr and PtrToInt instructions are basically no-ops that we can handle as
such. In order to generate them properly as parameters we had to improve the
ScopExpander, though the change is the first in the direction of a more
aggressive scalar synthetization.
llvm-svn: 271888
We now use the minimal necessary bit width for the generated code. If
operations might overflow (add/sub/mul) we will try to adjust the types in
order to ensure a non-wrapping computation. If the type adjustment is not
possible, thus the necessary type is bigger than the type value of
--polly-max-expr-bit-width, we will use assumptions to verify the computation
will not wrap. However, for run-time checks we cannot build assumptions but
instead utilize overflow tracking intrinsics.
llvm-svn: 271878
In case of modulo compared to zero, we need to do signed modulo
operation as unsigned can give different results based on whether the
dividend is negative or not.
This addresses llvm.org/PR27707
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
Reviewers: _jdoerfert, grosser, Meinersbur
Differential Revision: http://reviews.llvm.org/D20145
llvm-svn: 271707
Operands of binary operations that might overflow will be temporarily
promoted to i64 again, though that is not a sound solution for the problem.
llvm-svn: 271538
We now have a simple function to adjust/unify the types of two (or three)
operands before an operation that requieres the same type for all operands.
Due to this change we will not promote parameters that are added to i64
anymore if that is not needed.
llvm-svn: 271513
Since the base pointer of a possibly aliasing pointer might not alias
with any other pointer it (the base pointer) might not be tagged as
"required invariant". However, we need it do be in order to compare
the accessed addresses of the derived (possibly aliasing) pointer.
This patch also tries to clean up the load hoisting a little bit.
llvm-svn: 270412
Truncate operations are basically modulo operations, thus we can model
them that way. However, for large types we assume the operand to fit
in the new type size instead of introducing a modulo with a very large
constant.
llvm-svn: 269300
We utilize assumptions on the input to model IR in polyhedral world.
To verify these assumptions we version the code and guard it with a
runtime-check (RTC). However, since the RTCs are themselves generated
from the polyhedral representation we generate them under the same
assumptions that they should verify. In other words, the guarantees
that we try to provide with the RTCs do not hold for the RTCs
themselves. To this end it is necessary to employ a different check
for the RTCs that will verify the assumptions did hold for them too.
Differential Revision: http://reviews.llvm.org/D20165
llvm-svn: 269299
Min/max expressions are easier to read and can in some cases also result in
more concise IR that is generated as the min/max --- when lowered to a
cmp+select pattern -- commonly has a simpler condition then the ternary
condition isl would normally generate.
llvm-svn: 268855
After zero-extend operations and unsigned comparisons we now allow
unsigned divisions. The handling is basically the same as for signed
division, except the interpretation of the operands. As the divisor
has to be constant in both cases we can simply interpret it as an
unsigned value without additional complexity in the representation.
For the dividend we could choose from the different representation
schemes introduced for zero-extend operations but for now we will
simply use an assumption.
llvm-svn: 268032
Instead of matching for %6, we use a regexp to match for the result strings.
This test case caused unrelated noise in http://reviews.llvm.org/D15722.
llvm-svn: 267875
If the base pointer of an invariant load is is loaded conditionally, that
condition needs to hold for the invariant load too. The structure of the
program will imply this for domain constraints but not for imprecisions in
the modeling. To this end we will propagate the execution context of base
pointers during code generation and thus ensure the derived pointer does
not access an invalid base pointer.
llvm-svn: 267707
A zero-extended value can be interpreted as a piecewise defined signed
value. If the value was non-negative it stays the same, otherwise it
is the sum of the original value and 2^n where n is the bit-width of
the original (or operand) type. Examples:
zext i8 127 to i32 -> { [127] }
zext i8 -1 to i32 -> { [256 + (-1)] } = { [255] }
zext i8 %v to i32 -> [v] -> { [v] | v >= 0; [256 + v] | v < 0 }
However, LLVM/Scalar Evolution uses zero-extend (potentially lead by a
truncate) to represent some forms of modulo computation. The left-hand side
of the condition in the code below would result in the SCEV
"zext i1 <false, +, true>for.body" which is just another description
of the C expression "i & 1 != 0" or, equivalently, "i % 2 != 0".
for (i = 0; i < N; i++)
if (i & 1 != 0 /* == i % 2 */)
/* do something */
If we do not make the modulo explicit but only use the mechanism described
above we will get the very restrictive assumption "N < 3", because for all
values of N >= 3 the SCEVAddRecExpr operand of the zero-extend would wrap.
Alternatively, we can make the modulo in the operand explicit in the
resulting piecewise function and thereby avoid the assumption on N. For the
example this would result in the following piecewise affine function:
{ [i0] -> [(1)] : 2*floor((-1 + i0)/2) = -1 + i0;
[i0] -> [(0)] : 2*floor((i0)/2) = i0 }
To this end we can first determine if the (immediate) operand of the
zero-extend can wrap and, in case it might, we will use explicit modulo
semantic to compute the result instead of emitting non-wrapping assumptions.
Note that operands with large bit-widths are less likely to be negative
because it would result in a very large access offset or loop bound after the
zero-extend. To this end one can optimistically assume the operand to be
positive and avoid the piecewise definition if the bit-width is bigger than
some threshold (here MaxZextSmallBitWidth).
We choose to go with a hybrid solution of all modeling techniques described
above. For small bit-widths (up to MaxZextSmallBitWidth) we will model the
wrapping explicitly and use a piecewise defined function. However, if the
bit-width is bigger than MaxZextSmallBitWidth we will employ overflow
assumptions and assume the "former negative" piece will not exist.
llvm-svn: 267408