To support non-aligned accesses we introduce a virtual element size
for arrays that divides each access function used for this array. The
adjustment of the access function based on the element size of the
array was therefore moved after this virtual element size was
determined, thus after all accesses have been created.
Differential Revision: http://reviews.llvm.org/D17246
llvm-svn: 261226
A load can only be invariant if its base pointer is invariant too. To
this end, we check if the base pointer is defined inside the region or
outside. In the former case we recursively check if we can (and
therefore will) hoist the base pointer too. Only if that happends we
can hoist the load.
llvm-svn: 260886
This reverts commit 98efa006c96ac981c00d2e386ec1102bce9f549a.
The fix was broken since we do not use AA in the ScopDetection anymore to
check for invariant accesses.
llvm-svn: 260884
Before this patch it could happen that we did not hoist a load that
was a base pointer of another load even though AA already declared the
first one as invariant (during ScopDetection). If this case arises we
will now skipt the "can be overwriten" check because in this case the
over-approximating nature causes us to generate broken code.
llvm-svn: 260862
So far we separated constant factors from multiplications, however,
only when they are at the outermost level of a parameter SCEV. Now,
we also separate constant factors from the parameter SCEV if the
outermost expression is a SCEVAddRecExpr. With the changes to the
SCEVAffinator we can now improve the extractConstantFactor(...)
function at will without worrying about any other code part. Thus,
if needed we can implement a more comprehensive
extractConstantFactor(...) function that will traverse the SCEV
instead of looking only at the outermost level.
Four test cases were affected. One did not change much and the other
three were simplified.
llvm-svn: 260859
We now distinguish invariant loads to the same memory location if they
have different types. This will cause us to pre-load an invariant
location once for each type that is used to access it. However, we can
thereby avoid invalid casting, especially if an array is accessed
though different typed/sized invariant loads.
This basically reverts the changes in r260023 but keeps the test
cases.
llvm-svn: 260045
We also disable this feature by default, as there are still some issues in
combination with invariant load hoisting that slipped through my initial
testing.
llvm-svn: 260025
Invariant load hoisting of memory accesses with non-canonical element
types lacks support for equivalence classes that contain elements of
different width/size. This support should be added, but to get our buildbots
back to green, we disable load hoisting for memory accesses with non-canonical
element size for now.
llvm-svn: 260023
Always use access-instruction pointer type to load the invariant values.
Otherwise mismatches between ScopArrayInfo element type and memory access
element type will result in invalid casts. These type mismatches are after
r259784 a lot more common and also arise with types of different size, which
have not been handled before.
Interestingly, this change actually simplifies the code, as we now have only
one code path that is always taken, rather then a standard code path for the
common case and a "fixup" code path that replaces the standard code path in
case of mismatching types.
llvm-svn: 260009
The previously implemented approach is to follow value definitions and
create write accesses ("push defs") while searching for uses. This
requires the same relatively validity- and requirement conditions to be
replicated at multiple locations (PHI instructions, other instructions,
uses by PHIs).
We replace this by iterating over the uses in a SCoP ("pull in
requirements"), and add writes only when at least one read has been
added. It turns out to be simpler code because each use is only iterated
over once and writes are added for the first access that reads it. We
need another iteration to identify escaping values (uses not in the
SCoP), which also makes the difference between such accesses more
obvious. As a side-effect, the order of scalar MemoryAccess can change.
Differential Revision: http://reviews.llvm.org/D15706
llvm-svn: 259987
This allows code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To model such code we use as canonical element type of the modeled array the
smallest element type of all original array accesses, if type allocation sizes
are multiples of each other. Otherwise, we use a newly created iN type, where N
is the gcd of the allocation size of the types used in the accesses to this
array. Accesses with types larger as the canonical element type are modeled as
multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support code-generating these memory accesses, we introduce a new method
getAccessAddressFunction that assigns each statement instance a single memory
location, the address we load from/store to. Currently we obtain this address by
taking the lexmin of the access function. We may consider keeping track of the
memory location more explicitly in the future.
We currently do _not_ handle multi-dimensional arrays and also keep the
restriction of not supporting accesses where the offset expression is not a
multiple of the access element type size. This patch adds tests that ensure
we correctly invalidate a scop in case these accesses are found. Both types of
accesses can be handled using the very same model, but are left to be added in
the future.
We also move the initialization of the scop-context into the constructor to
ensure it is already available when invalidating the scop.
Finally, we add this as a new item to the 2.9 release notes
Reviewers: jdoerfert, Meinersbur
Differential Revision: http://reviews.llvm.org/D16878
llvm-svn: 259784
We support now code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To support such code we use as element type of the modeled array the smallest
element type of all original array accesses. Accesses with larger types are
modeled as multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support jscop-rewritable memory accesses we need each statement instance to
only be assigned a single memory location, which will be the address at which
we load the value. Currently we obtain this address by taking the lexmin of
the access function. We may consider keeping track of the memory location more
explicitly in the future.
llvm-svn: 259587
Before adding a MK_Value READ MemoryAccess, check whether the read is
necessary or synthesizable. Synthesizable values are later generated by
the SCEVExpander and therefore do not need to be transferred
explicitly. This can happen because the check for synthesizability has
presumbly been forgotten in the case where a phi's incoming value has
been defined in a different statement.
Differential Revision: http://reviews.llvm.org/D15687
llvm-svn: 258998
Ensure that there is at most one phi write access per PHINode and
ScopStmt. In particular, this would be possible for non-affine
subregions with multiple exiting blocks. We replace multiple MAY_WRITE
accesses by one MUST_WRITE access. The written value is constructed
using a PHINode of all exiting blocks. The interpretation of the PHI
WRITE's "accessed value" changed from the incoming value to the PHI like
for PHI READs since there is no unique incoming value.
Because region simplification shuffles around PHI nodes -- particularly
with exit node PHIs -- the PHINodes at analysis time does not always
exist anymore in the code generation pass. We instead remember the
incoming block/value pair in the MemoryAccess.
Differential Revision: http://reviews.llvm.org/D15681
llvm-svn: 258809
Both functions implement the same functionality, with the difference that
getNewScalarValue assumes that globals and out-of-scop scalars can be directly
reused without loading them from their corresponding stack slot. This is correct
for sequential code generation, but causes issues with outlining code e.g. for
OpenMP code generation. getNewValue handles such cases correctly.
Hence, we can replace getNewScalarValue with getNewValue. This is not only more
future proof, but also eliminates a bunch of code.
The only functionality that was available in getNewScalarValue that is lost
is the on-demand creation of scalar values. However, this is not necessary any
more as scalars are always loaded at the beginning of each basic block and will
consequently always be available when scalar stores are generated. As this was
not the case in older versions of Polly, it seems the on-demand loading is just
some older code that has not yet been removed.
Finally, generateScalarLoads also generated loads for values that are loop
invariant, available in GlobalMap and which are preferred over the ones loaded
in generateScalarLoads. Hence, we can just skip the code generation of such
scalar values, avoiding the generation of dead code.
Differential Revision: http://reviews.llvm.org/D16522
llvm-svn: 258799
In Polly, after hoisting loop invariant loads outside loop, the alignment
information for hoisted loads are missing, this patch restore them.
Contributed-by: Lawrence Hu <lawrence@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16160
llvm-svn: 258105
When generating scalar loads/stores separately the vector code has not been
updated. This commit adds code to generate scalar loads for vector code as well
as code to assert in case scalar stores are encountered within a vector loop.
llvm-svn: 255714
When rewriting the access functions of load/store statements, we are only
interested in the actual array memory location. The current code just took
the very first memory access, which could be a scalar or an array access. As
a result, we failed to update access functions even though this was requested
via .jscop.
llvm-svn: 255713
When introducing separate control flow for the original and optimized code we
introduce now a special 'ExitingBlock':
\ /
EnteringBB
|
SplitBlock---------\
_____|_____ |
/ EntryBB \ StartBlock
| (region) | |
\_ExitingBB_/ ExitingBlock
| |
MergeBlock---------/
|
ExitBB
/ \
This 'ExitingBlock' contains code such as the final_reloads for scalars, which
previously were just added to whichever statement/loop_exit/branch-merge block
had been generated last. Having an explicit basic block makes it easier to
find these constructs when looking at the CFG.
llvm-svn: 255107
In case the original parameter instruction does not have a name, but it comes
from a load instruction where the base pointer has a name we used the name of
the load instruction to give some more intuition of where the parameter came
from. To ensure this works also through GEPs which may have complex offsets,
we originally just dropped the offsets and _only_ used the base pointer name.
As this can result in multiple parameters to get the same name, we now prefix
the parameter ID to ensure parameter names are unique. This will make it easier
to understand debug output.
This change does not affect correctness, as parameter IDs (even of the same
name) can always be distinguished through the SCEV pointer stored inside them.
llvm-svn: 253330
IVs of loops for which the loop header is in the subregion, but not the entire
loop may be incremented outside of the subregion and can consequently not be
kept private to the subregion. Instead, they need to and are modeled as virtual
loops in the iteration domains. As this is the case, generating new subregion
induction variables for such loops is not needed and indeed wrong as they would
hide the virtual induction variables modeled in the scop.
This fixes a miscompile in MultiSource/Benchmarks/Ptrdist/bc and
MultiSource/Benchmarks/nbench/. Thanks Michael and Johannes for their
investiagations and helpful observations regarding this bug.
llvm-svn: 252860
Especially for structs, the SAI object of a base pointer does not
describe all the types that the user might expect when he loads from
that base pointer. While we will still cast integers and pointers we
will now reload the value with the correct type if floating point and
non-floating point values are involved. However, there are now TODOs
where we use bitcasts instead of a proper conversion or reloading.
This fixes bug 25479.
llvm-svn: 252706
We now create all invariant equivalence classes for required invariant loads
instead of creating them on-demand. This way we can check if a parameter
references an invariant load that is actually not executed and was therefor
not materialized. If that happens the parameter is not materialized either.
This fixes bug 25469.
llvm-svn: 252701
Scalar reloads in the generated entering block were not recognized as
dominating the subregions locks when there were multiple entering
nodes. This resulted in values defined in there not being copied.
As a fix, we unconditionally add the BBMap of the generated entering
node to the generated entry. This fixes part of llvm.org/PR25439.
This reverts 252449 and reapplies r252445. Its test was failing
indeterministically due to r252375 which was reverted in r252522.
llvm-svn: 252540
The dominance of the generated non-affine subregion block was based on
the scop's merge block, therefore resulted in an invalid DominanceTree.
It resulted in some values as assumed to be unusable in the actual
generated exit block.
We detect the case that the exit block has been moved and decide
dominance using the BB at the original exit. If we create another exit
node, that exit nodes is dominated by the one generated from where the
original exit resides. This fixes llvm.org/PR25438 and part of
llvm.org/PR25439.
llvm-svn: 252526
It introduced indeterminism as it was iterating over an address-indexed
hashtable. The corresponding bug PR25438 will be fixed in a successive
commit.
llvm-svn: 252522
This reverts commit 9775824b265e574fc541e975d64d3e270243b59d due to a
failing unit test.
Please check and correct the unit test and commit again.
llvm-svn: 252449
Scalar reloads in the generated entering block were not recognized as
dominating the subregions locks when there were multiple entering
nodes. This resulted in values defined in there not being copied.
As a fix, we unconditionally add the BBMap of the generated entering
node to the generated entry. This fixes part of llvm.org/PR25439.
llvm-svn: 252445
If a SCoP contains error blocks we cannot use the domain constraints
to simplify the assumptions as the domain is already influenced by the
assumptions we took. Before this patch we did that and some assumptions
became self-fulfilling as they were implied by the domain constraints.
llvm-svn: 252424
When we bail out early we make the partially build new code path
practically dead, though it was not unreachable. To remove dominance
problems we now make it not only dead but also prevent the control
flow to join with the original code path, thus allow to use original
values after the SCoP without any PHI nodes.
This fixes bug 25447.
llvm-svn: 252420
While the program cannot cause a dependence cycle between invariant
loads, additional constraints (e.g., to ensure finite loops) can
introduce them. It is hard to detect them in the SCoP description,
thus we will only check for them at code generation time. If such a
recursion is detected we will bail out the code generation and place a
"false" runtime check to guarantee the original code is used.
This fixes bug 25443.
llvm-svn: 252412
After loop versioning, a dominance check of a non-affine subregion's
exit node causes the dominance check to always fail on any block in the
subregion if it shares the same exit block with the scop. The
subregion's exit block has become polly_merge_new_and_old, which also
receives the control flow of the generated code. This would cause that
any value for implicit stores is assumed to be not from the scop.
We check dominance with the generated exit node instead.
This fixes llvm.org/PR25438
llvm-svn: 252375
We were adding all generated values in non-affine subregions to be used
for the subregions generated exit block. The thought was that only
values that are dominating the original exit block can be used there.
But it is possible for synthesizable values to be expanded in any
block. If the same values is also used for implicit writes, it would
try to reuse already synthesized values even if not dominating the exit
block.
The fix is to only add values to the list of values usable in the exit
block only if it is dominating the exit block. This fixes
llvm.org/PR25412.
llvm-svn: 252301
Before this commit memory reference identifiers have only been unique per
basic block, but not per (non-affine) ScopStmt. This commit now uses the
MemoryAccess base pointer to uniquely identify each Memory access.
llvm-svn: 252200
For generating scalar writes of non-affine subregions, all except phi
writes are generated in the exit block. The phi writes are generated in
the incoming block for which we errornously used the same BBMap. This
can conflict if a value for one block is synthesized, and then reused
for another block which is not dominated by the first block. This is
fixed by using block-specific BBMaps for phi writes.
llvm-svn: 252172
To simplify and correct the preloading of a base pointer origin, e.g.,
the base pointer for the current indirect invariant load, we now just
check if there is an invariant access class that involves the base
pointer of the current class.
llvm-svn: 251962
If a base pointer of a preloaded value has a base pointer origin, thus it is
an indirect invariant load, we have to make sure the base pointer origin is
preloaded first.
llvm-svn: 251946
If a base pointer load is preloaded, we have change the base pointer of
the derived SAI. However, as the derived SAI relationship is is
coarse grained, we need to check if we actually preloaded the base
pointer or a different element of the base pointer SAI array.
llvm-svn: 251881
When verifying if a scop is still valid we rerun all analysis, but did not
update DetectionContextMap. This change ensures that information, e.g. about
non-affine regions, is correctly updated
llvm-svn: 251227
Such PHI nodes can not only appear in the ExitBlock of the Scop, but indeed
any scalar PHI node above the scop and used in the scop is modeled as scalar
read access.
llvm-svn: 251198
New values were always synthesized in the block of the instruction
that needed them. This is incorrect for PHI node whose' value must be
defined in the respective incoming block. This patch temporarily moves
the builder's insert point to the incoming block while synthesizing phi
node arguments.
This fixes PR25241 (http://llvm.org/bugs/show_bug.cgi?id=25241)
llvm-svn: 250693
Accesses that have a relative offset (in bytes) that is not divisible
by the type size (in bytes) will be represented as empty in the SCoP
description. This is on its own not good but it also crashed the
invariant load hoisting. This patch will fix the latter problem while
the former should be addressed too.
This fixes bug 25236.
llvm-svn: 250664
If the base pointer of a load is invariant and defined in the SCoP but
not loaded we cannot hoist the load as we would not hoist the base
pointer definition.
This fixes bug 25237.
llvm-svn: 250663
Sorting is replaced by a demand driven code generation that will pre-load a
value when it is needed or, if it was not needed before, at some point
determined by the order of invariant accesses in the program. Only in very
little cases this demand driven pre-loading will kick in, though it will
prevent us from generating faulty code. An example where it is needed is
shown in:
test/ScopInfo/invariant_loads_complicated_dependences.ll
Invariant loads that appear in parameters but are not on the top-level (e.g.,
the parameter is not a SCEVUnknown) will now be treated correctly.
Differential Revision: http://reviews.llvm.org/D13831
llvm-svn: 250655
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
Expressing this in terms of BlockGenerator::getOrCreateAlloca(const
ScopArrayInfo *Array) does not work as the MemoryAccess BasePtr is in case of
invariant load hoisting different to the ScopArrayInfo BasePtr. Until this is
investigated and fixed, we move back to code that just uses the baseptr of
MemoryAccess.
llvm-svn: 250637
Instead of generating implicit loads within basic blocks, put them
before the instructions of the statment itself, including non-affine
subregions. The region's entry node is dominating all blocks in the
region and therefore the loaded value will be available there.
Implicit writes in block-stmts were already stored back at the end of
the block. Now, also generate the stores of non-affine subregions when
leaving the statement, i.e. in the exiting block.
This change is required for array-mapped implicits ("De-LICM") to
ensure that there are no dependencies of demoted scalars within
statments. Statement load all required values, operator on copied in
registers, and then write back the changed value to the demoted memory.
Lifetimes analysis within statements becomes unecessary.
Differential Revision: http://reviews.llvm.org/D13487
llvm-svn: 250625
In r250408 'CHECK-NEXT: br' lines were removed as they also matched a
'%polly.subregion.iv.inc' instruction and did consequently not check what they
were supposed to check. However, without these lines we can not test that the
.s2a instructions that are not any more generated since r250411 really are not
emitted. Hence, we add back the CHECK-NEXT lines to ensure there are really no
instructions generated between the store that we check for and the branch at the
end of the basic block. To ensure we do not match too early, we now check for
'br i1' or 'br label'.
llvm-svn: 250435
When pulling a llvm::Value to be written as a PHI write, the former
code did only check whether it is within the same basic block, but it
could also be the same non-affine subregion. In that case some
unecessary pair of MemoryAccesses would have been created.
Two unit test were explicitely checking for the unecessary writes,
including the comments that the writes are unecessary.
llvm-svn: 250411
They happen to match
%polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1
^^ ^^
that is, are misleading in what they actually check.
llvm-svn: 250408
When sharing the same map from old to new value, CodeGeneration would
reuse the same new value for each basic block. However, the SCEV
expander might emit code in a basic block that does not dominate a use
of the SCEV in another basic block. This test checks whether both such
blocks have their own expanded new values.
llvm-svn: 250389
We harden one test case by ensuring no additional stores may possibly be
introduced between the stores we check for and the basic block terminator
statements.
We also add a test case for the situation where a value that is passed from
a non-affine region to a PHI node does not dominate the exit of the non-affine
region. This case has come up in patch reviews, so we make sure it is properly
handled today and in the future.
llvm-svn: 250217
If a (assumed) invariant location is loaded multiple times we
generated a parameter for each location. However, this caused compile
time problems for several benchmarks (e.g., 445_gobmk in SPEC2006 and
BT in the NAS benchmarks). Additionally, the code we generate is
suboptimal as we preload the same location multiple times and perform
the same checks on all the parameters that refere to the same value.
With this patch we consolidate the invariant loads in three steps:
1) During SCoP initialization required invariant loads are put in
equivalence classes based on their pointer operand. One
representing load is used to generate a parameter for the whole
class, thus we never generate multiple parameters for the same
location.
2) During the SCoP simplification we remove invariant memory
accesses that are in the same equivalence class. While doing so
we build the union of all execution domains as it is only
important that the location is at least accessed once.
3) During code generation we only preload one element of each
equivalence class with the unified execution domain. All others
are mapped to that preloaded value.
Differential Revision: http://reviews.llvm.org/D13338
llvm-svn: 249853
This patch allows invariant loads to be used in the SCoP description,
e.g., as loop bounds, conditions or in memory access functions.
First we collect "required invariant loads" during SCoP detection that
would otherwise make an expression we care about non-affine. To this
end a new level of abstraction was introduced before
SCEVValidator::isAffineExpr() namely ScopDetection::isAffine() and
ScopDetection::onlyValidRequiredInvariantLoads(). Here we can decide
if we want a load inside the region to be optimistically assumed
invariant or not. If we do, it will be marked as required and in the
SCoP generation we bail if it is actually not invariant. If we don't
it will be a non-affine expression as before. At the moment we
optimistically assume all "hoistable" (namely non-loop-carried) loads
to be invariant. This causes us to expand some SCoPs and dismiss them
later but it also allows us to detect a lot we would dismiss directly
if we would ask e.g., AliasAnalysis::canBasicBlockModify(). We also
allow potential aliases between optimistically assumed invariant loads
and other pointers as our runtime alias checks are sound in case the
loads are actually invariant. Together with the invariant checks this
combination allows to handle a lot more than LICM can.
The code generation of the invariant loads had to be extended as we
can now have dependences between parameters and invariant (hoisted)
loads as well as the other way around, e.g.,
test/Isl/CodeGen/invariant_load_parameters_cyclic_dependence.ll
First, it is important to note that we cannot have real cycles but
only dependences from a hoisted load to a parameter and from another
parameter to that hoisted load (and so on). To handle such cases we
materialize llvm::Values for parameters that are referred by a hoisted
load on demand and then materialize the remaining parameters. Second,
there are new kinds of dependences between hoisted loads caused by the
constraints on their execution. If a hoisted load is conditionally
executed it might depend on the value of another hoisted load. To deal
with such situations we sort them already in the ScopInfo such that
they can be generated in the order they are listed in the
Scop::InvariantAccesses list (see compareInvariantAccesses). The
dependences between hoisted loads caused by indirect accesses are
handled the same way as before.
llvm-svn: 249607
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422
A statement with an empty domain complicates the invariant load
hoisting and does not help any subsequent analysis or transformation.
In fact it might introduce parameter dimensions or increase the
schedule dimensionality. To this end, we remove statements with an
empty domain early in the SCoP simplification.
llvm-svn: 249276
We have to skip accesses in non-affine subregions during hoisting as
they might not be executed under the same condition as the entry of
the non-affine subregion.
llvm-svn: 249139
If a value is globally mapped (IslNodeBuilder::ValueMap) and
referenced in the code that will be put into a subfunction, we hand
down the new value to the subfunction.
This patch also removes code that handed down all invariant loads to
the subfunction. Instead, only needed invariant loads are given to the
subfunction. There are two possible reasons for an invariant load to
be handed down:
1) The invariant load is used in a block that is placed in the
subfunction but which is not the parent of the load. In this
case, the scalar access that will read the loaded value, will
cause its base pointer (the preloaded value) to be handed down to
the subfunction.
2) The invariant load is defined and used in a block that is placed
in the subfunction. With this patch we will hand down the
preloaded value to the subfunction as the invariant load is
globally mapped to that value.
llvm-svn: 249126
Instructions which we can synthesis from a SCEV expression are not
generated directly, but only when they are used as an operand of
another instruction. This avoids generating unnecessary instructions
and works more reliably than first inserting them and then deleting
them later on.
This commit was reverted in r248860 due to a remaining miscompile, where
we forgot to synthesis the operand values that were referenced from scalar
writes. test/Isl/CodeGen/scalar-store-from-same-bb.ll tests that we do this
now correctly.
llvm-svn: 248900
Before we unconditinoally forced all users outside the SCoP to use
the preloaded value. However, if the SCoP is not executed due to the
runtime checks, we need to use the original value because it might not
be invariant in the first place.
llvm-svn: 248881
As a first step in the direction of assumed invariant loads (loads
that are not written in some context) we now detect and hoist
definitively invariant loads. These invariant loads will be preloaded
in the code generation and used in the optimized version of the SCoP.
If the load is only conditionally executed the preloaded version will
also only be executed under the same condition, hence we will never
access memory that wouldn't have been accessed otherwise. This is also
the most distinguishing feature to licm.
As hoisting can make statements empty we will simplify the SCoP and
remove empty statements that would otherwise cause artifacts in the
code generation.
Differential Revision: http://reviews.llvm.org/D13194
llvm-svn: 248861
This reverts commit 07830c18d789ee72812d5b5b9b4f8ce72ebd4207.
The commit broke at least one test in lnt,
MultiSource/Benchmarks/Ptrdist/bc/number.c
was miss compiled and the test produced a wrong result.
One Polly test case that was added later was adjusted too.
llvm-svn: 248860
Every once in a while we see code that accesses memory with different types,
e.g. to perform operations on a piece of memory using type 'float', but to copy
data to this memory using type 'int'. Modeled in C, such codes look like:
void foo(float A[], float B[]) {
for (long i = 0; i < 100; i++)
*(int *)(&A[i]) = *(int *)(&B[i]);
for (long i = 0; i < 100; i++)
A[i] += 10;
}
We already used the correct types during normal operations, but fall back to our
detected type as soon as we import changed memory access functions. For these
memory accesses we may generate invalid IR due to a mismatch between the element
type of the array we detect and the actual type used in the memory access. To
address this issue, we always cast the newly created address of a memory access
back to the type of the memory access where the address will be used.
llvm-svn: 248781
Instructions which we can synthesis from a SCEV expression are not generated
directly, but only when they are used as an operand of another instruction. This
avoids generating unnecessary instruction and works more reliably than first
inserting them and then deleting them later on.
Suggested-by: Johannes Doerfert <doerfert@cs.uni-saarland.de>
Differential Revision: http://reviews.llvm.org/D13208
llvm-svn: 248712
This patch allows switch instructions with affine conditions in the
SCoP. Also switch instructions in non-affine subregions are allowed.
Both did not require much changes to the code, though there was some
refactoring needed to integrate them without code duplication.
In the llvm-test suite the number of profitable SCoPs increased from
135 to 139 but more importantly we can handle more benchmarks and user
inputs without preprocessing.
Differential Revision: http://reviews.llvm.org/D13200
llvm-svn: 248701
We now only delete trivially dead instructions in the BB we copy (copyBB), but
not in any other BB. Only for copyBB we know that there will _never_ be any
future uses of instructions that have no use after copyBB has been generated.
Other instructions in the AST that have been generated by IslNodeBuilder may
look dead at the moment, but may possibly still be referenced by GlobalMaps. If
we delete them now, later uses would break surprisingly.
We do not have a test case that breaks due to us deleting too many instructions.
This issue was found by inspection.
llvm-svn: 248688
After having generated a new user statement a couple of inefficient or
trivially dead instructions may remain. This commit runs instruction
simplification over the newly generated blocks to ensure unneeded
instructions are removed right away.
This commit does adds simplification for non-affine subregions which was not
yet part of 248681.
llvm-svn: 248683
Otherwise, part of the computation will be just simplified away when we add
instruction simplification support to the RegionGenerator.
llvm-svn: 248682
After having generated a new user statement a couple of inefficient or trivially
dead instructions may remain. This commit runs instruction simplification over
the newly generated blocks to ensure unneeded instructions are removed right
away.
This commit does not yet add simplification for non-affine subregions.
llvm-svn: 248681
This commit basically reverts r246427 but still solves the issue
tackled by that commit. Instead of emitting initialization code in the
beginning of the start block we now generate parallel code in its own
block and thereby guarantee separation. This is necessary as we cannot
generate code for hoisted loads prior to the start block but it still
needs to be placed prior to everything else.
llvm-svn: 248674
We now add loop carried information during the second traversal of the
region instead of in a intermediate step in-between. This makes the
generation simpler, removes code and should even be faster.
llvm-svn: 248125
If the GEP instructions give us enough insights, model scalar accesses as
multi-dimensional (and generate the relevant run-time checks to ensure
correctness). This will allow us to simplify the dependence computation in
a subsequent commit.
llvm-svn: 247906
This will allow to generate non-wrap assumptions for integer expressions
that are part of the SCoP. We compare the common isl representation of
the expression with one computed with modulo semantic. For all parameter
combinations they are not equal we can have integer overflows.
The nsw flags are respected when the modulo representation is computed,
nuw and nw flags are ignored for now.
In order to not increase compile time to much, the non-wrap assumptions
are collected in a separate boundary context instead of the assumed
context. This helps compile time as the boundary context can become
complex and it is therefor not advised to use it in other operations
except runtime check generation. However, the assumed context is e.g.,
used to tighten dependences. While the boundary context might help to
tighten the assumed context it is doubtful that it will help in practice
(it does not effect lnt much) as the boundary (or no-wrap assumptions)
only restrict the very end of the possible value range of parameters.
PET uses a different approach to compute the no-wrap context, though lnt runs
have shown that this version performs slightly better for us.
llvm-svn: 247732
At some point we build loop trip counts using this method. It was replaced by
a simpler trick that works only for affine (e.g., not modulo) constraints and
relies on the removal of unbounded parts. In order to allow modulo constrains
again we go back to the former, more accurate method.
llvm-svn: 247540
As we do not rely on ScalarEvolution any more we do not need to get
the backedge taken count. Additionally, our domain generation handles
everything that is affine and has one latch and our ScopDetection will
over-approximate everything else.
This change will therefor allow loops with:
- one latch
- exiting conditions that are affine
Additionally, it will not check for structured control flow anymore.
Hence, loops and conditionals are not necessarily single entry single
exit regions any more.
Differential Version: http://reviews.llvm.org/D12758
llvm-svn: 247289
This patch replaces the last legacy part of the domain generation, namely the
ScalarEvolution part that was used to obtain loop bounds. We now iterate over
the loops in the region and propagate the back edge condition to the header
blocks. Afterwards we propagate the new information once through the whole
region. In this process we simply ignore unbounded parts of the domain and
thereby assume the absence of infinite loops.
+ This patch already identified a couple of broken unit tests we had for
years.
+ We allow more loops already and the step to multiple exit and multiple back
edges is minimal.
+ It allows to model the overflow checks properly as we actually visit
every block in the SCoP and know where which condition is evaluated.
- It is currently not compatible with modulo constraints in the
domain.
Differential Revision: http://reviews.llvm.org/D12499
llvm-svn: 247279
The support for modulo expressions is not comlete and makes the new
domain generation harder. As the currently broken domain generation
needs to be replaced, we will first swap in the new, fixed domain
generation and make it compatible with the modulo expressions later.
llvm-svn: 247278
The support for pointer expressions is broken as it can only handle
some patterns in the IslExprBuilder. We should to treat pointers in
expressions the same as integers at some point and revert this patch.
llvm-svn: 247147
While we do not need to model PHI nodes in the region exit (as it is not part
of the SCoP), we need to prepare for the case that the exit block is split in
code generation to create a single exiting block. If this will happen, hence
if the region did not have a single exiting block before, we will model the
operands of the PHI nodes as escaping scalars in the SCoP.
Differential Revision: http://reviews.llvm.org/D12051
llvm-svn: 247078
Instead of having two separate options
-polly-detect-scops-in-functions-without-loops and
-polly-detect-scops-in-regions-without-loops we now just use
-polly-detect-unprofitable to force the detection of scops ignoring any compile
time saving bailout heuristics.
llvm-svn: 247057
Certain backends, e.g. NVPTX, do not support '.' in function names. Hence,
we ensure all '.' are replaced by '_' when generating function names for
subfunctions. For the current OpenMP code generation, this is not strictly
necessary, but future uses cases (e.g. GPU offloading) need this issue to be
fixed.
llvm-svn: 246980
Our alias metadata is currently not emitted in a deterministic order. As it
is not needed in this test, we just drop it for now (but keep in mind to fix
this).
llvm-svn: 246942
When this option is enabled, Polly will emit printf calls for each scalar
load/and store which dump the scalar value loaded/stored at run time.
This patch also refactors the RuntimeDebugBuilder to use variadic templates
when generating CPU printfs. As result, it now becomes easier to print
strings that consist of a set of arguments. Also, as a single printf
call is emitted, it is more likely for such strings to be emitted atomically
if executed multi-threaded.
llvm-svn: 246941
When computing the index expressions for new, multi-dimensional memory accesses
these new index expressions may reference original llvm::Values that are not
transfered into the OpenMP subfunction. Using GlobalMap we now replace
references to such values with the rewritten values that have e.g. been passed
to the OpenMP subfunction.
llvm-svn: 246923
Originally, we disallowed the import of multi-dimensional access functions due
to our code generation not supporting the generation of new address expressions
for multi-dimensional memory accesses. When building our run-time alias check
infrastructure we added code generation support for multi-dimensional address
calculations. Hence, we can now savely allow the import of new
multi-dimensional access functions.
llvm-svn: 246917
Before this commit we did this only for Arguments or Constants, but indeed
an instruction may define a value a lot higher up in the dominance tree, but
the actual write generally needs to happen right before branching to the
PHI node. Otherwise, the writes of different branches into PHI nodes may get
intermixed if they lay higher up in the dominance tree.
llvm-svn: 246441
Our OpenMP code generation generated part of its launching code directly into
the start basic block and without this change the scalar initialization was
run _after_ the OpenMP threads have been launched. This resulted in
uninitialized scalar values to be used.
llvm-svn: 246427
In order to compute domain conditions for conditionals we will now
traverse the region in the ScopInfo once and build the domains for
each block in the region. The SCoP statements can then use these
constraints when they build their domain.
The reason behind this change is twofold:
1) This removes a big chunk of preprocessing logic from the
TempScopInfo, namely the Conditionals we used to build there.
Additionally to moving this logic it is also simplified. Instead
of walking the dominance tree up for each basic block in the
region (as we did before), we now traverse the region only
once in order to collect the domain conditions.
2) This is the first step towards the isl based domain creation.
The second step will traverse the region similar to this step,
however it will propagate back edge conditions. Once both are in
place this conditional handling will allow multiple exit loops
additional logic.
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12428
llvm-svn: 246398
We already modeled read-only dependences to scalar values defined outside the
scop as memory reads and also generated read accesses from the corresponding
alloca instructions that have been used to pass these scalar values around
during code generation. However, besides for PHI nodes that have already been
handled, we failed to store the orignal read-only scalar values into these
alloc. This commit extends the initialization of scalar values to all read-only
scalar values used within the scop.
llvm-svn: 246394
Our code generation currently does not support scalar references to metadata
values. Hence, it would crash if we try to model scalar dependences to metadata
values. Fortunately, for one of the common uses, debug information, we can
for now just ignore the relevant intrinsics and consequently the issue of how
to model scalar dependences to metadata.
llvm-svn: 246388
I ran the script from r246327 and it touched all the right files;
committing now to hopefully right the bots, but if my check-polly
doesn't come back clean I'll keep looking.
http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/33648
llvm-svn: 246341
If a region does not have more than one loop, we do not identify it as
a Scop in ScopDetection. The main optimizations Polly is currently performing
(tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
to have a positive impact on individual loops. In some cases, Polly's run-time
alias checks or conditional hoisting may still have a positive impact, but those
are mostly enabling transformations which LLVM already performs for individual
loops. As we do not focus on individual loops, we leave them untouched to not
introduce compile time regressions and execution time noise. This results in
good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12268
llvm-svn: 246161
If nothing is executed we can bail out early. Otherwise we can use the
constraints that ensure at least one statement is executed for
simplification.
llvm-svn: 245585
Instead of generating code for an empty assumed context we bail out
early. As the number of assumptions we generate increases this becomes
more and more important. Additionally, this change will allow us to
hide internal contexts that are only used in runtime checks e.g., a
boundary context with constraints not suited for simplifications.
llvm-svn: 245540
To make alias scope metadata generation work in OpenMP mode we now provide
the ScopAnnotator with information about the base pointer rewrite that happens
when passing arrays into the OpenMP subfunction.
llvm-svn: 245451
executeScopConditionally would destroy a predecessor region if it the
scop's entry was the region's exit block by forking it to polly.start
and thus creating a secnd exit out of the region. This patch "shrinks"
the predecessor region s.t. polly.split_new_and_old is not the
region's exit anymore.
llvm-svn: 245294
The SCEVExpander cannot deal with all SCEVs Polly allows in all kinds
of expressions. To this end we introduce a ScopExpander that handles
the additional expressions separatly and falls back to the
SCEVExpander for everything else.
Reviewers: grosser, Meinersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12066
llvm-svn: 245288
This allows the code generation to continue working even if a needed
value (that is reloaded anyway) was not yet demoted. Instead of
failing it will now create the location for future demotion to memory
and load from that location. The stores will use the same location and
by construction execute before the load even if the textual order in
the generated AST is otherwise.
Reviewers: grosser, Meinersbur
Subscribers: #polly
Differential Revision: http://reviews.llvm.org/D12072
llvm-svn: 245203
This test case crashes the scalar code generation as we are not
consistent with the usage of the assumed context. To be precise, we
use the assumed context for the dependence analysis but not to
restrict the domains of the statements.
A step by step explanation of the problem is given in the test case.
llvm-svn: 245176
This change extends the BlockGenerator to not only allow Instructions as
base elements of scalar dependences, but any llvm::Value. This allows
us to code-generate scalar dependences which reference function arguments, as
they arise when moddeling read-only scalar dependences.
llvm-svn: 244874
Before we only modeled PHI nodes if at least one incoming basic block was itself
part of the region, now we always model them except if all of their operands are
part of a single non-affine subregion which we model as a black-box.
This change only affects PHI nodes in the entry block, that have exactly one
incoming edge. Before this change, we did not model them and as a result code
generation would not know how to code generate them. With this change, code
generation can code generate them like any other PHI node.
This issue was exposed by r244606. Before this change simplifyRegion would have
moved these PHI nodes out of the SCoP, so we would never have tried to code
generate them. We could implement this behavior again, but changing the IR
after the scop has been modeled and transformed always adds a risk of us
invalidating earlier analysis results. It seems more save and overall also more
consistent to just model and handle this one-entry-edge PHI nodes like any
other PHI node in the scop.
Solution proposed by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 244721
This one was extracted from the test-suite's pifft and caused a
miscompilation because a scalar was not written to its alloca address.
llvm-svn: 244720
The previous code had several problems:
For newly created BasicBlocks it did not (always) call RegionInfo::setRegionFor in order to update its analysis. At the moment RegionInfo does not verify its BBMap, but will in the future. This is fixed by determining the region new BBs belong to and set it accordingly. The new executeScopConditionally() requires accurate getRegionFor information.
Which block is created by SplitEdge depends on the incoming and outgoing edges of the blocks it connects, which makes handling its output more difficult than it needs to be. Especially for finding which block has been created an to assign a region to it for the setRegionFor problem above. This patch uses an implementation for splitEdge that always creates a block between the predecessor and successor. simplifyRegion has also been simplified by using SplitBlockPredecessors instead of SplitEdge. Isolating the entries and exits have been refectored into individual functions.
Previously simplifyRegion did more than just ensuring that there is only one entering and one exiting edge. It ensured that the entering block had no other outgoing edge which was necessary for executeScopConditionally(). Now the latter uses the alternative splitEdge implementation which can handle this situation so simplifyRegion really only needs to simplify the region.
Also, executeScopConditionally assumed that there can be no PHI nodes in blocks with one incoming edge. This is wrong and LCSSA deliberately produces such edges. However, previous passes ensured that there can be no such PHIs in exit nodes, but which will no longer hold in the future.
The new code that the property that it preserves the identity of region block (the property that the memory address of the BasicBlock containing the instructions remains the same; new blocks only contain PHI nodes and a terminator), especially the entry block. As a result, there is no need to update the reference to the BasicBlock of ScopStmt that contain its instructions because they have been moved to other basic blocks.
Reviewers: grosser
Part of Differential Revision: http://reviews.llvm.org/D11867
llvm-svn: 244606
We use the branch instruction as the location at which a PHI-node write takes
place, instead of the PHI-node itself. This allows us to identify the
basic-block in a region statement which is on the incoming edge of the PHI-node
and for which the write access was originally introduced. As a result we can,
during code generation, avoid generating PHI-node write accesses for basic
blocks that do not preceed the PHI node without having to look at the IR
again.
This change fixes a bug which was introduced in r243420, when we started to
explicitly model PHI-node reads and writes, but dropped some additional checks
that where still necessary during code generation to not emit PHI-node writes
for basic-blocks that are not on incoming edges of the original PHI node.
Compared to the code before r243420 the new code does not need to inspect the IR
any more and we also do not generate multiple redundant writes.
llvm-svn: 243852
SCEVExpander, which we are using during code generation, only allows
instructions as insert locations, but breaks in case BasicBlock->end() iterators
are passed to it due to it trying to obtain the basic block in which code should
be generated by calling Instruction->getParent(), which is not defined for
->end() iterators.
This change adds an assert to Polly that ensures we only pass valid instructions
to SCEVExpander and it fixes one case, where we used IRBuilder->SetInsertBlock()
to set an ->end() insert location which was later passed to SCEVExpander.
In general, Polly is always trying to build up the CFG first, before we actually
insert instructions into the CFG sceleton. As a result, each basic block should
already have at least one branch instruction before we start adding code. Hence,
always requiring the IRBuilder insert location to be set to a real instruction
should always be possible.
Thanks Utpal Bora <cs14mtech11017@iith.ac.in> for his help with test case
reduction.
llvm-svn: 243830
As specified in PR23888, run-time alias check generation is expensive
in terms of compile-time. This reduces the compile time by computing
minimal/maximal access only once for each base pointer
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 243024
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.
This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.
For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238
llvm-svn: 242130
This removes old code that has been disabled since several weeks and was hidden
behind the flags -disable-polly-intra-scop-scalar-to-array=false and
-polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and
PHI nodes to single element arrays, as this avoided the need for their special
handling in Polly. With Johannes' patches adding native support for such scalar
references to Polly, this code is not needed any more. After this commit both
-polly-prepare and -polly-independent are now mostly no-ops. Only a couple of
simple transformations still remain, but they are scheduled for removal too.
Thanks again to Johannes Doerfert for his nice work in making all this code
obsolete.
llvm-svn: 240766
Remainder operations with constant divisor can be modeled as quasi-affine
expression. This patch adds support for detecting and modeling them. We also
add a test that ensures they are correctly code generated.
This patch was extracted from a larger patch contributed by Johannes Doerfert
in http://reviews.llvm.org/D5293
llvm-svn: 240518
LLVM's instcombine already translates power-of-two sdivs that are known to be
exact to fast ashr instructions. Hence, there is no need to add this logic
ourselves.
Pointed-out-by: Johannes Doerfert
llvm-svn: 239025
We now verify that memory access functions imported via JSON are indeed defined
for the full iteration domain. Before this change we accidentally imported
memory mappings such as i -> i / 127, which only defined a mapped for values of
i that are evenly divisible by 127, but which did not define any mapping for the
remaining values, with the result that isl just generated an access expression
that had undefined behavior for all the unmapped values.
In the incorrect test cases, we now either use floor(i/127) or we use p/127 and
provide the information that p is indeed a multiple of 127.
llvm-svn: 239024
isl marks known non-negative numerators in modulo (and soon also division)
operations. We now exploit this by generating unsigned operations. This is
beneficial as unsigned operations with power-of-two denominators will be
translated by isl to fast bitshift or bitwise and operations.
llvm-svn: 238577
This ensures we pass all tests independently of how we set the options
-disable-polly-intra-scop-scalar-to-array and -polly-model-phi-nodes.
(At least if we enable both or disable both. Enabling them individually makes
little sense, as they will hopefully disappear soon anyhow).
llvm-svn: 238087
To reduce compile time and to allow more and better quality SCoPs in
the long run we introduced scalar dependences and PHI-modeling. This
patch will now allow us to generate code if one or both of those
options are set. While the principle of demoting scalars as well as
PHIs to memory in order to communicate their value stays the same,
this allows to delay the demotion till the very end (the actual code
generation). Consequently:
- We __almost__ do not modify the code if we do not generate code
for an optimized SCoP in the end. Thus, the early exit as well as
the unprofitable option will now actually preven us from
introducing regressions in case we will probably not get better
code.
- Polly can be used as a "pure" analyzer tool as long as the code
generator is set to none.
- The original SCoP is almost not touched when the optimized version
is placed next to it. Runtime regressions if the runtime checks
chooses the original are not to be expected and later
optimizations do not need to revert the demotion for that part.
- We will generate direct accesses to the demoted values, thus there
are no "trivial GEPs" that select the first element of a scalar we
demoted and treated as an array.
Differential Revision: http://reviews.llvm.org/D7513
llvm-svn: 238070
Modified two test cases to adjust to the above change in renaming.
These two files were causing the buildbot failure in Polly, #30204 for example.
Details in http://reviews.llvm.org/D9483
This checkin goes with r237150 and r237151
llvm-svn: 237203
Besides class, function and file names, we also change the command line option
from -polly-codegen-isl to just -polly-codegen. The isl postfix is a leftover
from the times when we still had the CLooG based -polly-codegen. Today it is
just redundant and we drop it.
llvm-svn: 237099
This option is enabled since a long time and there does not seem to be a
situation in which we would not want to print alias scopes. Remove this option
to reduce the set of command-line option combinations that may expose bugs.
llvm-svn: 235861
I just learned that target triples prevent test cases to be run on other
architectures. Polly test cases are until now sufficiently target independent
to not require any target triples. Hence, we drop them.
llvm-svn: 235384
This change ensures that we sign-extend integer types in case non-matching
operands are encountered when generating a multi-dimensional access offset.
This fixes http://llvm.org/PR23124
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 234122
This allows us to delinerize code such as:
A[][n]
for (i
for (j
A[i][n-j-1] = ...
which would previously have been delinearize to an access A[i+1][-j-1].
To recover the correct access we apply the piecewise expression:
{ A[i][j] -> A[i-1][i+N]: i < 0; A[i][j] -> A[i][i]: i >= 0}
This approach generalizes to higher dimensions.
llvm-svn: 233566
When creating parameters the SCEVexpander may introduce new induction variables,
that possibly create scalar dependences in the original scop, before we code
generate the scop. The resulting scalar dependences may then inhibit correct
code generation of the scop. To prevent this, we first version the code without
a run-time check and only then introduce new parameters and the run-time
condition. The if-condition that guards the original scop from being modified by
the SCEVexpander.
This change causes some test case changes as the run-time conditions are now
introduced in the split basic block rather than in the entry basic block.
This fixes http://llvm.org/PR22069
Test case reduced by: Karthik Senthil
llvm-svn: 233477
This options was earlier used for experiments with the vectorizer, but to my
knowledge is not really used anymore. If anybody needs this, we can always
reintroduce this feature.
llvm-svn: 232934
The BB vectorizer is deprecated and there is no point in generating code for it
any more. This option was introduced when there was not yet any loop vectorizer
in sight. Now being matured, Polly should target the loop vectorizer.
llvm-svn: 232099
When code generating array index expressions the types of the different
components of the index expressions may not always match. We extend the type of
the index expression (if possible) and assert otherwise.
llvm-svn: 231592
When we generate code for a whole region we have to respect dominance
and update it too.
The first is achieved with multiple "BBMap"s. Each copied block in the
region gets its own map. It is initialized only with values mapped in
the immediate dominator block, if this block is in the region and was
therefor already copied. This way no values defined in a block that
doesn't dominate the current one will be used.
To update dominance information we check if the immediate dominator of
the original block we want to copy is in the region. If so we set the
immediate dominator of the current block to the copy of the immediate
dominator of the original block.
llvm-svn: 230774
isl recently introduced a new interface to create run-time checks from
constraint sets. Use this interface to simplify our run-time check generation.
llvm-svn: 230640
This is the code generation for region statements that are created
when non-affine control flow was present in the input. A new
generator, similar to the block or vector generator, for regions is
used to traverse and copy the region statement and to adjust the
control flow inside the new region in the end.
llvm-svn: 230340
Scops that only read seem generally uninteresting and scops that only write are
most likely initializations where there is also little to optimize. To not
waste compile time we bail early.
Differential Revision: http://reviews.llvm.org/D7735
llvm-svn: 229820
This commit imports the latest isl version into lib/External/isl. The changes
relavant for Polly are:
1) Schedule trees [1] have been introduced as a more structured way to
describe schedules. Polly does not yet use them, but we may switch to them
in the near future.
2) Another set of coalescing changes [2] simplifies some data dependences and
removes a couple of code generation artifacts.
We now understand that the following sets can be merged:
{ Stmt_S1[i0, i1] -> Stmt_S2[i0 + i1] :
i0 >= 0 and i1 <= 1023 - i0 and i1 >= 1
Stmt_S1[i0, 0] -> Stmt_S2[i0] : i0 <= 1023 and i0 >= 1}
into:
{ Stmt_S1[i0, i1] -> Stmt_S2[i0 + i1] : i1 <= 1023 - i0 and i1 >= 0 and
i1 >= 1 - i0 and i0 >= 0 }
Changes of this kind reduce unnecessary specialization during code
generation.
- for (int c3 = 0; c3 <= 1023; c3 += 1) {
- if (c3 % 2 == 0) {
- Stmt_for_body3(c1, c3);
- } else
- Stmt_for_body3(c1, c3);
- }
+ for (int c3 = 0; c3 <= 1023; c3 += 1)
+ Stmt_for_body3(c1, c3);
[1] http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf
[2] http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf
llvm-svn: 229423
This allows us to skip ast and code generation if we did not optimize
a SCoP and will not generate parallel or alias annotations. The
initial heuristic to exit is simple but allows improvements later on.
All failing test cases have been modified to disable early exit, thus
to keep their coverage.
Differential Revision: http://reviews.llvm.org/D7254
llvm-svn: 228851
The support is currently limited as we only allow them in the input but do
not emit them in the transformed SCoP due to the possible semantic changes.
Differential Revision: http://reviews.llvm.org/D5225
llvm-svn: 227054
This change ensures that the values that represent the array size of a
multi-dimensional access are correctly sign-extended when used to compute a
memory address used in the run-time alias check.
To make the test case more readable, we name the instructions that we generate.
llvm-svn: 225818
Schedule dimensions that have the same constant value accross all statements do
not carry any information, but due to the increased dimensionality of the
schedule cost compile time. To not pay this cost, we remove constant dimensions
if possible.
llvm-svn: 225067
Isl now specifically marks modulo operations that are compared against zero.
They can be implemented with the C/LLVM remainder operation.
We also update a couple of test cases where the output of isl has slightly
changed.
llvm-svn: 223607
This commit drops the Cloog support for Polly. The scripts and
documentation are changed to only use isl as prerequisity. In the code
all Cloog specific parts have been removed and all relevant tests have
been ported to the isl backend when it was created.
llvm-svn: 223141
SCEV based code generation has been the default for two weeks after having
been tested for a long time. We now drop the support the non-scev-based code
generation.
llvm-svn: 222978
In case a GEP instruction references into a fixed size array e.g., an access
A[i][j] into an array A[100x100], LLVM-IR does not guarantee that the subscripts
always compute values that are within array bounds. We now derive the set of
parameter values for which all accesses are within bounds and add the assumption
that the scop is only every executed with this set of parameter values.
Example:
void foo(float A[][20], long n, long m {
for (long i = 0; i < n; i++)
for (long j = 0; j < m; j++)
A[i][j] = ...
This loop yields out-of-bound accesses if m is at least 20 and at the same time
at least one iteration of the outer loop is executed. Hence, we assume:
n <= 0 or m <= 20.
Doing so simplifies the dependence analysis problem, allows us to perform
more optimizations and generate better code.
TODO: The location where the GEP instruction is executed is not necessarily the
location where the memory is actually accessed. As a result scanning for GEP[s]
is imprecise. Even though this is not a correctness problem, this imprecision
may result in missed optimizations or non-optimal run-time checks.
In polybench where this mismatch between parametric loop bounds and fixed size
arrays is common, we see with this patch significant reductions in compile time
(up to 50%) and execution time (up to 70%). We see two significant compile time
regressions (fdtd-2d, jacobi-2d-imper), and one execution time regression
(trmm). Both regressions arise due to additional optimizations that have been
enabled by this patch. They can be addressed in subsequent commits.
http://reviews.llvm.org/D6369
llvm-svn: 222754
This patch includes tests where we actually need to adjust the CHECK lines
for SCEV based code generation. Besides these adjustments we add explicit
calls to -polly-codegen-scev=[true|false] and make sure we test both cases.
llvm-svn: 222112
This prevents SCEVs to reference values not valid any more and as a consequence
solves a bug where such values reintroduced during ast generation caused the
independent blocks pass to fail validation.
http://llvm.org/PR21204
llvm-svn: 222103
The isl based backend has been tested since a long time and with the recently
commited OpenMP support the last missing piece of functionality was ported from
the CLooG backend.
The isl based backend gives us interesting new functionality:
- Run-time alias checks (enabled by default)
Optimize scops that contain possibly aliasing pointers. This feature has
largely increased the number of loop nests we consider for optimization.
Thanks Johannes!
- Delinearization (not yet enabled by default)
Model accesses to multi-dimensional arrays precisely. This will allow us to
understand kernels with multi-dimensional VLAs written in Julia, boost::ublas,
coremark or C99.
Thanks Sebastian!
- Generation of higher quality code
Sven and me spent a long time to optimize the quality of the generated code. A
major focus were expressions as they result from modulos/divisions or
piecewise affine expressions (a ? b : c).
- Full/Partial tile separation, polyhedral unrolling
The isl code generation provides functionality to generate specialized code
for core and cleanup loops and to specialize code using polyhedral context
information while unrolling statements.
(not yet exploited in Polly)
- Modifieable access functions
We can now use standard isl functionality to remap memory accesses to new
data locations. A standard use case is the use of shared memory, where
accesses to a larger region in global memory need to be mapped to a smaller
shared memory region using a modulo mapping.
(not yet exploited in Polly)
The cloog based code generation is still available for comparision, but is
scheduled for removal.
llvm-svn: 222101
Instead of parallelizing every parallel outermost loop, we now use a very
minimalistic cost model. Specifically, we assume innermost loops are not
worth parallelising and all non-innermost loops are.
When parallelizing all loops in LNT we got several slowdowns/timeouts due to
us parallelizing innermost loops that are executed only a couple of times
(number of iterations not known statically). With this basic heuristic enabled
LNT does not show any more timeouts, while several interesting loops are still
parallelized.
There are many ways to obtain an improved heuristic. Constructing such an
improvide heuristic from a position of minimal slow-down and zero code size
increase seems to be the best, as it allows us to track progress on LNT.
llvm-svn: 222096
This backend supports besides the classical code generation the upcoming SCEV
based code generation (which the existing CLooG backend does not support
robustly).
OpenMP code generation in the isl backend benefits from our run-time alias
checks such that the set of loops that can possibly be parallelized is a lot
larger.
The code was tested on LNT. We do not regress on builds without -polly-parallel.
When using -polly-parallel most tests work flawlessly, but a few issues still
remain and will be addressed in follow up commits.
SCEV/non-SCEV codegen:
- Compile time failure in ldecod and TimberWolfMC due a problem in our
run-time alias check generation triggered by pointers that escape through
the OpenMP subfunction (OpenMP specific).
- Several execution time failures. Due to the larger set of loops that we now
parallelize (compared to the classical code generation), we currently run
into some timeouts in tests with a lot loops that have a low trip count and
are slowed down by parallelizing them.
SCEV only:
- One existing failure in lencod due to llvm.org/PR21204 (not OpenMP specific)
OpenMP code generation is the last feature that was only available in the CLooG
backend. With the isl backend being the only one supporting features such as
run-time alias checks and delinearization, we will soon switch to use the isl
ast generator by the default and subsequently remove our dependency on CLooG.
http://reviews.llvm.org/D5517
llvm-svn: 222088
Polly was accidently modifying a debug info metadata node when
attempting to generate a new unique metadata node for the loop id.
The problem was that we had dwarf metadata that referred to a
metadata node with a null value, like this:
!6 = ... some dwarf metadata referring to !7 ...
!7 = {null}
When we attempt to generate a new metadata node, we reserve the
first space for self-referential node by setting the first argument
to null and then mutating the node later to refer to itself.
However, because the nodes are uniqued based on pointer values, when
we get the new metadata node it actually referred to an existing
node (!7 in the example). When we went to modify the metadata to
point to itself, we were accidently mutating the dwarf metatdata. We
ended up in this situation:
!6 = ... some dwarf metadata referring to !7 ...
!7 = {!7}
and this causes an assert when generating the debug info. The fix is
simple, we just need to use a unique value when getting a new
metadata node. The MDNode::getTemporary() provides exactly the API
we need (and it is used in clang to generate the unique nodes).
Differential Revision: http://reviews.llvm.org/D6174
llvm-svn: 221550
We introduces a new flag -polly-parallel and use it to annotate the for-nodes in
the isl ast that we want to execute thread parallel (e.g., using OpenMP). We
previously already emmitted openmp annotations, but we did this for various
kinds of parallel loops, including some which we can not run in parallel.
With this patch we now have three annotations:
1) #pragma known-parallel [reduction]
2) #pragma omp for
3) #pragma simd
meaning:
1) loop has no loop carried dependences
2) loop will be executed thread-parallel
3) loop can possibly be vectorized
This patch introduces 1) and reduces the use of 2) to only the cases where we
will actually generate thread parallel code.
It is in preparation of openmp code generation in our isl backend.
Legacy:
- We also have a command line option -enable-polly-openmp. This option controls
the OpenMP code generation in CLooG. It will become an alias of
-polly-parallel after the CLooG code generation has been dropped.
http://reviews.llvm.org/D6142
llvm-svn: 221479
This patch moves the SCEV based (re)generation of values before the checking for
scop-constant terms. It enables us to provide SCEV based replacements, which
are necessary to correctly generate OpenMP subfunctions when using the SCEV
based code generation.
When recomputing a new value for a value used in the code of the original scop,
we previously directly returned the same original value for all scop-constant
expressions without even trying to regenerate these values using our SCEV
expression. This is correct when the newly generated code remains fully in the
same function, however in case we want to outline parts of the newly generated
scop into subfunctions, this approach means we do not have any opportunity to
update these values in the SCEV based code generation. (In the non-SCEV based
code generation, we can provide such updates through the GlobalMap). To ensure
we have this opportunity, we first try to regenerate scalar terms with our SCEV
builder and will only return scop-constant expressions if SCEV based code
generation was not possible.
This change should not affect the results of the existing code generation
passes. It only impacts the upcoming OpenMP based code generation.
This commit also adds a test case. This test case passes before and after this
commit. It was added to ensure test coverage for the changed code.
llvm-svn: 221393
We restricted the new access functions to be a subset of the old one
because we want to keep the alignment, however if the alignment is
"not special", thus the default for the type, we can allow any access.
Differential Revision: http://reviews.llvm.org/D5680
llvm-svn: 219503
In case the pieceweise affine function used to create an isl_ast_expr
had empty cases (e.g., with contradicting constraints on the
parameters), it was possible that the condition of the isl_ast_expr
select was not a comparison but a constant (thus of type i64).
This patch does two thing:
1) Handle the case the condition of a select is not a i1 type like C.
2) Try to simplify the pieceweise affine functions for the min/max
access when we generate runtime alias checks. That step can often
remove empty or redundant cases as well as redundant constrains.
This fixes bug: http://llvm.org/PR21167
Differential Revision: http://reviews.llvm.org/D5627
llvm-svn: 219208
This resolved the issues with delinearized accesses that might alias,
thus delinearization doesn't deactivate runtime alias checks anymore.
Differential Revision: http://reviews.llvm.org/D5614
llvm-svn: 219078
This class allows to store information about the arrays in the SCoP.
For each base pointer in the SCoP one object is created storing the
type and dimension sizes of the array. The objects can be obtained via
the SCoP, a MemoryAccess or the isl_id associated with the output
dimension of a MemoryAccess (the description of what is accessed).
So far we use the information in the IslExprBuilder to create the
right base type before indexing into the base array. This fixes the
bug http://llvm.org/bugs/show_bug.cgi?id=21113 (both test cases are
included). On top of that we can now build runtime alias checks for
delinearized arrays as the dimension sizes are also part of the
ScopArrayInfo objects.
Differential Revision: http://reviews.llvm.org/D5613
llvm-svn: 219077
Update debug info testcases for the LLVM metadata schema change in
r219010 to fold metadata constant operands into a single `MDString`.
Part of PR17891.
llvm-svn: 219019
This also forbids the json importer to access other memory locations
than the original instruction as we to reuse the alignment of the
original load/store.
Differential Revision: http://reviews.llvm.org/D5560
llvm-svn: 218883
The command line flag -polly-annotate-alias-scopes controls whether or not
Polly annotates alias scopes in the new SCoP (default ON). This can improve
later optimizations as the new SCoP is basically an alias free environment for
them.
llvm-svn: 218877
This change allows to annotate all parallel loops with loop id metadata.
Furthermore, it will annotate memory instructions with
llvm.mem.parallel_loop_access metadata for all surrounding parallel loops.
This is especially usefull if an external paralleliser is used.
This also removes the PollyLoopInfo class and comments the
LoopAnnotator.
A test case for multiple parallel loops is attached.
llvm-svn: 218793
If there are multiple read only base addresses in an alias group
we can split it into multiple alias groups each with only one
read only access. This way we might reduce the number of
comparisons significantly as it grows linear in the number of
alias groups but exponential in their size.
Differential Revision: http://reviews.llvm.org/D5435
llvm-svn: 218757
This fixes two problems which are usualy caused together:
1) The elements of an isl AST access expression could be pointers
not only integers, floats and vectores thereof.
2) The runtime alias checks need to compare pointers but if they
are of a different type we need to cast them into a "max" type
similar to the non pointer case.
llvm-svn: 218113
This change will build all alias groups (minimal/maximal accesses
to possible aliasing base pointers) we have to check before
we can assume an alias free environment. It will also use these
to create Runtime Alias Checks (RTC) in the ISL code generation
backend, thus allow us to optimize SCoPs despite possibly aliasing
pointers when this backend is used.
This feature will be enabled for the isl code generator, e.g.,
--polly-code-generator=isl, but disabled for:
- The cloog code generator (still the default).
- The case delinearization is enabled.
- The case non-affine accesses are allowed.
llvm-svn: 218046
We use SplitEdge to split a conditional entry edge of the SCoP region.
However, SplitEdge can cause two different situations (depending on
whether or not the edge is critical). This patch tests
which one is present and deals with the former unhandled one.
It also refactors and unifies the case we have to change the basic
blocks of the SCoP to new ones (see replaceScopAndRegionEntry).
llvm-svn: 217802
During the IslAst parallelism check also compute the minimal dependency
distance and store it in the IstAst for node.
Reviewer: sebpop
Differential Revision: http://reviews.llvm.org/D4987
llvm-svn: 217729
This allows us to omit the GuardBB in front of created loops
if we can show the loop trip count is at least one. It also
simplifies the dominance relation inside the new created region.
A GuardBB (even with a constant branch condition) might trigger
false dominance errors during function verification.
Differential Revision: http://reviews.llvm.org/D5297
llvm-svn: 217525
Summary:
+ Refactor the runtime check (RTC) build function
+ Added helper function to create an PollyIRBuilder
+ Change the simplify region function to create not
only unique entry and exit edges but also enfore that
the entry edge is unconditional
+ Cleaned the IslCodeGeneration runOnScop function:
- less post-creation changes of the created IR
+ Adjusted and added test cases
Reviewers: grosser, sebpop, simbuerg, dpeixott
Subscribers: llvm-commits, #polly
Differential Revision: http://reviews.llvm.org/D5076
llvm-svn: 217508
There was a bug in the IslAst which caused that no more outermost
parallel loops were detected/checked after a parallel outermost loop
of depth 1.
+ Test case attached
llvm-svn: 217452
In Polly we used to have a mix of test cases, some that used 'opt %s' and others
that used 'opt < %s'. We now change all to use 'opt < %s'. Piping in test files
is preferable as it does prevent temporary files to be written to disk. This
brings us in line with what is usus in LLVM.
llvm-svn: 216816