should perform loop interchanges itself.
This also fixes a bug we see due to the "loop-interchange" pass producing
incorrect IR when compiling linpack-pc.c from the LLVM test-suite with
"-polly-position=before-vectorizer".
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 257495
Call assumeNoOutOfBound only in updateDimensionality to process situations
when new dimensions are added and new bounds checks are required.
Contributed-by: Tobias Grosser, Gareev Roman
llvm-svn: 257170
This change clarifies that for Not-NonAffine-SubRegions we actually iterate over
the subnodes and for both NonAffine-SubRegions and BasicBlocks, we perform the
schedule construction. As a result, the tree traversal becomes trivial, the
special case for a scop consisting just of a single non-affine region
disappears and the indentation of the code is reduced.
No functional change intended.
llvm-svn: 256940
At code generation, scalar reads are generated before the other
statement's instructions, respectively scalar writes after them, in
contrast to array accesses which are "executed" with the instructions
they are linked to. Therefore it makes sense to not map the scalar
accesses to a place of execution. Follow-up patches will also remove
some of the directs links from a scalar access to a single instruction,
such that only having array accesses in InstructionToAccess ensures
consistency.
Differential Revision: http://reviews.llvm.org/D13676
llvm-svn: 256298
We clarify that certain code is only executed if LSchedule is != nullptr.
Previously some of these functions have been executed, but they only passed
a nullptr through. This caused some confusion when reading the code.
llvm-svn: 256209
Besides improving the documentation and the code we now assert in case the input
is invalid (N < 0) and also do not any more return a nullptr in case USet is
empty. This should make the code more readable.
llvm-svn: 256208
If a loop has a sufficiently large amount of compute instruction in its loop
body, it is unlikely that our rewrite of the loop iterators introduces large
performance changes. As Polly can also apply beneficical optimizations (such
as parallelization) to such loop nests, we mark them as profitable.
This option is currently "disabled" by default, but can be used to run
experiments. If enabled by setting it e.g. to 40 instructions, we currently
see some compile-time increases on LNT without any significant run-time
changes.
llvm-svn: 256199
.. and add some documentation. We also simplify the code by dropping an early
check that is also covered by the the later checks. This might have a small
compile time impact, but as the scops that are skipped are small we should
probably only add this back in the unlikely case that this has a notable
compile-time cost.
No functional change intended.
llvm-svn: 256149
As we already log an error when calling invalid, scops unprofitable scops are in
any case marked invalid, but returning immediately safes (a tiny bit of) compile
time and is consistent with our use of 'invalid' in the remainder of the file.
Found by inspection.
llvm-svn: 256140
Without this return we still log the incorrect array size (and do not detect
this scop), but we would unnecessarily continue to verify that access functions
are affine. As we do not need to do this, we can return right ahead and
consequently safe compile time.
This issue was found by inspection.
llvm-svn: 256139
Instead of counting all array memory accesses associated with a load
instruction, we now explicitly check that the single array access that could
(potentially) be associated with a load instruction does not exist. This helps
to document the current behavior of Polly where load instructions can indeed
have at most one associated array access. In the unlikely case this changes
in the future, we add an assert for the case where two load accesses would
prevent us to return a single memory access, but we still should communicate
that not all array memory accesses have been removed.
This addresses post-commit comments from Johannes Doerfert for commit 255776.
llvm-svn: 256136
Scops that contain many complex branches are likely to result in complex domain
conditions that consist of a large (> 100) number of conjucts. Transforming
such domains is expensive and unlikely to result in efficient code. To avoid
long compile times we detect this case and skip such scops. In the future we may
improve this by either using non-affine subregions to hide such complex
condition structures or by exploiting in certain cases properties (e.g.,
dominance) that allow us to construct the domains of a scop in a way that
results in a smaller number improving conjuncts.
Example of a code that results in complex iteration spaces:
loop.header
/ | \ \
A0 A2 A4 \
\ / \ / \
A1 A3 \
/ \ / \ |
B0 B2 B4 |
\ / \ / |
B1 B3 ^
/ \ / \ |
C0 C2 C4 |
\ / \ / /
C1 C3 /
\ / /
loop backedge
llvm-svn: 256123
The patch fixes Bug 25759 produced by inappropriate handling of unsigned
maximum SCEV expressions by SCEVRemoveMax. Without a fix, we get an infinite
loop and a segmentation fault, if we try to process, for example,
'((-1 + (-1 * %b1)) umax {(-1 + (-1 * %yStart)),+,-1}<%.preheader>)'.
It also fixes a potential issue related to signed maximum SCEV expressions.
Tested-by: Roman Gareev <gareevroman@gmail.com>
Fixed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D15563
llvm-svn: 255922
When running 'clang -O3 -mllvm -polly -mllvm -polly-show' we now only show the
CFGs of functions with at least one detected scop. For larger files/projects
this reduces the number of graphs printed significantly and is likely what
developers want to see. The new option -polly-view-all enforces all graphs to be
printed and the exiting option -poll-view-only limites the graph printing to
functions that match a certain pattern.
This patch requires https://llvm.org/svn/llvm-project/llvm/trunk@255889 (and
vice versa) to compile correctly.
llvm-svn: 255891
Load instructions may possibly be related to multiple memory accesses, but we
are only interested in the array read access that describes the memory location
the load instructions loads from. By using getArrayAccessfor we ensure to always
obtain the right memory access.
This issue was found by inspection without having a failing test case.
llvm-svn: 255716
getAccessFor does not guarantee a certain access to be returned in case an
instruction is related to multiple accesses. However, in the vector code
generation we want to know the stride of the array access of a store
instruction. By using getArrayAccessFor we ensure we always get the correct
memory access.
This patch fixes a potential bug, but I was unable to produce a failing test
case. Several existing test cases cover this code, but all of them already
passed out of luck (or the specific but not-guaranteed order in which we build
memory accesses).
llvm-svn: 255715
When generating scalar loads/stores separately the vector code has not been
updated. This commit adds code to generate scalar loads for vector code as well
as code to assert in case scalar stores are encountered within a vector loop.
llvm-svn: 255714
When rewriting the access functions of load/store statements, we are only
interested in the actual array memory location. The current code just took
the very first memory access, which could be a scalar or an array access. As
a result, we failed to update access functions even though this was requested
via .jscop.
llvm-svn: 255713
This change should not change the behavior of Polly today, but it allows
external constants to be remapped e.g. when targetting multiple LLVM modules.
llvm-svn: 255506
This reverts commit r255471.
Johannes raised in the post-commit review of r255471 the concern that PHI
writes in non-affine regions with two exiting blocks are not really MUST_WRITE,
but we just know that at least one out of the set of all possible PHI writes
will be executed. Modeling all PHI nodes as MUST_WRITEs is probably save, but
adding the needed documentation for such a special case is probably not worth
the effort. Michael will be proposing a new patch that ensures only a single
PHI_WRITE is created for non-affine regions, which - besides other benefits -
should also allow us to use a single well-defined MUST_WRITE for such PHI
writes.
(This is not a full revert, but the condition and documentation have been
slightly extended)
llvm-svn: 255503
Before this commit, only the region's entry block was assumed to always
execute in a non-affine subregion. We replace this by a test whether it
dominates the exit block (this necessarily includes the entry block)
which should be more accurate.
llvm-svn: 255473
LLVM's IR guarantees that a value definition occurs before any use, and
also the value of a PHI must be one of the incoming values, "written"
in one of the incoming blocks. Hence, such writes are never conditional
in the context of a non-affine subregion.
llvm-svn: 255471
Over time different vocabulary has been introduced to describe the different
memory objects in Polly, resulting in different - often inconsistent - naming
schemes in different parts of Polly. We now standartize this to the following
scheme:
KindArray, KindValue, KindPHI, KindExitPHI
| ------- isScalar -----------|
In most cases this naming scheme has already been used previously (this
minimizes changes and ensures we remain consistent with previous publications).
The main change is that we remove KindScalar to clearify the difference between
a scalar as a memory object of kind Value, PHI or ExitPHI and a value (former
KindScalar) which is a memory object modeling a llvm::Value.
We also move all documentation to the Kind* enum in the ScopArrayInfo class,
remove the second enum in the MemoryAccess class and update documentation to be
formulated from the perspective of the memory object, rather than the memory
access. The terms "Implicit"/"Explicit", formerly used to describe memory
accesses, have been dropped. From the perspective of memory accesses they
described the different memory kinds well - especially from the perspective of
code generation - but just from the perspective of a memory object it seems more
straightforward to talk about scalars and arrays, rather than explicit and
implicit arrays. The last comment is clearly subjective, though. A less
subjective reason to go for these terms is the historic use both in mailing list
discussions and publications.
llvm-svn: 255467
Use it to print "null" if a MemoryAccess's access relation is not
available instead of printing nothing.
Suggested-by: Johannes Doerfert
llvm-svn: 255466
Introduce a function getStmtForRegionNode() to the corresponding
ScopStmt of a RegionNode. We can use it to call the existing
ScopStmt::isEmpty() function instead of searching for accesses.
llvm-svn: 255465
When introducing separate control flow for the original and optimized code we
introduce now a special 'ExitingBlock':
\ /
EnteringBB
|
SplitBlock---------\
_____|_____ |
/ EntryBB \ StartBlock
| (region) | |
\_ExitingBB_/ ExitingBlock
| |
MergeBlock---------/
|
ExitBB
/ \
This 'ExitingBlock' contains code such as the final_reloads for scalars, which
previously were just added to whichever statement/loop_exit/branch-merge block
had been generated last. Having an explicit basic block makes it easier to
find these constructs when looking at the CFG.
llvm-svn: 255107
This update brings in improvements to isl's 'isolate' option that reduce the
number of code versions generated. This results in both code-size and compile
time reduction for outer loop vectorization.
Thanks to Roman Garev and Sven Verdoolaege for working on this improvement.
llvm-svn: 254706
The motivation is to fix a compilation error with Visual Studio 2013.
See http://reviews.llvm.org/D14886.
Thanks to Sumanth Gundapaneni for finding the issue and suggesting a
patch.
llvm-svn: 254498
The script will checkout the most recent master from
http://repo.or.cz/isl.git into /tmp, create a distribution tarball, and
extract it as replacement of lib/External/isl. After that it can be
committed to the Polly repository.
llvm-svn: 254497
Acc==MA implies Acc->getAccessInstruction() == MA->getAccessInstruction().
Suggested as post-commit review for 254305 by Michael Kruse.
llvm-svn: 254327
The use of C++'s high-level iterator functionality instead of two while loops
and explicit iterator handling improves readability of this code.
Proposed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: http://reviews.llvm.org/D15068
llvm-svn: 254305
Re-run canonicalization passes after Polly's code generation.
The set of passes currently added here are nearly all the passes between
--polly-position=early and --polly-position=before-vectorizer, i.e. all
passes that would usually run after Polly.
In order to run these only if Polly actually modified the code, we add a
function attribute "polly-optimzed" to a function that contains
generated code. The cleanup pass is skipped if the function does not
have this attribute.
There is no support by the (legacy) PassManager to run passes only under
some conditions. One could have wrapped all transformation passes to run
only when CodeGeneration changed the code, but the analyses would run
anyway. This patch creates an independent pass manager. The
disadvantages are that all analyses have to re-run even if preserved and
it does not honor compiler switches like the PassManagerBuilder does.
Differential Revision: http://reviews.llvm.org/D14333
llvm-svn: 254150
Previously, accesses that originate from PHI nodes in the exit block
were registered as SCALAR. In some context they are treated as scalars,
but it makes a difference in others. We used to check whether the
AccessInstruction is a terminator to differentiate the cases.
This patch introduces an MemoryAccess origin EXIT_PHI and a
ScopArrayInfo kind KIND_EXIT_PHI to make this case more explicit. No
behavioural change intended.
Differential Revision: http://reviews.llvm.org/D14688
llvm-svn: 254149
gfortran (and fortran in general?) does not compute the address of an array
element directly from the array sizes (e.g., %s0, %s1), but takes first the
maximum of the sizes and 0 (e.g., max(0, %s0)) before multiplying the resulting
value with the per-dimension array subscript expressions. To successfully
delinearize index expressions as we see them in fortran, we first filter 'smax'
expressions out of the SCEV expression, use them to guess array size parameters
and only then continue with the existing delinearization.
llvm-svn: 253995
Trying to build up access functions for any of these blocks is likely to fail,
as error blocks may contain invalid/non-representable instructions, and blocks
dominated by error blocks may reference such instructions, which wil also cause
failures. As all of these blocks are anyhow assumed to not be executed, we can
just remove them early on.
This fixes http://llvm.org/PR25596
llvm-svn: 253818
The most interesting change for Polly in this isl update is 4d5654af which
in certain cases can speed up the construction of run-time checks from an isl
set consisting of several disjuncts significantly.
llvm-svn: 253794
At some point we enforced lcssa for the loop surrounding the entry block.
This is not only questionable as it does not check any other loop but also
not needed any more.
llvm-svn: 253789
In case the original parameter instruction does not have a name, but it comes
from a load instruction where the base pointer has a name we used the name of
the load instruction to give some more intuition of where the parameter came
from. To ensure this works also through GEPs which may have complex offsets,
we originally just dropped the offsets and _only_ used the base pointer name.
As this can result in multiple parameters to get the same name, we now prefix
the parameter ID to ensure parameter names are unique. This will make it easier
to understand debug output.
This change does not affect correctness, as parameter IDs (even of the same
name) can always be distinguished through the SCEV pointer stored inside them.
llvm-svn: 253330
Without this change we may start to refuse scops in larger compilation units
just because a lot of code has already been compiled earlier.
Found by inspection. I do not yet have a good test case for this.
llvm-svn: 253050
Only when we check for wrapping we want to use the store size, for all
other cases we use the alloc size now.
Suggested by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 252941
IVs of loops for which the loop header is in the subregion, but not the entire
loop may be incremented outside of the subregion and can consequently not be
kept private to the subregion. Instead, they need to and are modeled as virtual
loops in the iteration domains. As this is the case, generating new subregion
induction variables for such loops is not needed and indeed wrong as they would
hide the virtual induction variables modeled in the scop.
This fixes a miscompile in MultiSource/Benchmarks/Ptrdist/bc and
MultiSource/Benchmarks/nbench/. Thanks Michael and Johannes for their
investiagations and helpful observations regarding this bug.
llvm-svn: 252860
If an llvm.assume dominates the SCoP entry block and the assumed condition
can be expressed as an affine inequality we will now add it to the context.
Differential Revision: http://reviews.llvm.org/D14413
llvm-svn: 252851
Error blocks may contain arbitrary instructions, among them some which we can
not modeled correctly. As we do not generate ScopStmts for error blocks anyhow
there is no point in trying to generate access functions for them.
This fixes llvm.org/PR25494
llvm-svn: 252794
In certain cases isl will not free the return values of operations for which
a computeout has been triggered. Hence, make sure we free it explicitly.
No test, as I did not manage to reduce one yet.
llvm-svn: 252766
For complex inputs our current approach of construction the boundary context
may in rare cases become computationally so expensive that it is better to
abort. This change adds a compute out check that bounds the compuations we
spend on boundary context construction and bails out if this limit is reached.
We can probably make our boundary construction algorithm more efficient, but
this requires some more investigation and probably also some additional changes
to isl. Until these have been added, we bound the compile time to ensure our
buildbots are green.
llvm-svn: 252758
In certain rare cases (mostly -polly-process-unprofitable on large sequences
of conditions - often without any loop), we see some compile-time timeouts due
to the construction of an overly complex assumption context. This change limits
the number of disjuncts to 150 (adjustable), to prevent us from creating
assumptions contexts that are too large for even the compilation to finish.
The limit has been choosen as large as possible to make sure we do not
unnecessarily drop test coverage. If such cases also appear in
-polly-process-unprofitable=false mode we may need to think about this again,
as the current limitations may still allow assumptions that are way to complex
to be checked profitably at run-time.
There is also certainly room for improvement regarding how (and how efficient)
we construct an assumed context, but this requires some more thinking.
This completes llvm.org/PR25458
llvm-svn: 252750
Basic blocks that are always executed can not be error blocks as their execution
can not possibly be an unlikely event. In this commit we tighten the check
if an error block to basic blcoks that do not dominate the exit condition, but
that dominate all exiting blocks of the scop.
llvm-svn: 252726
r252713 introduced a couple of regressions due to later basic blocks refering
to instructions defined in error blocks which have not yet been modeled.
This commit is currently just encoding limitations of our modeling and code
generation backends to ensure correctness. In theory, we should be able to
generate and optimize such regions, as everything that is dominated by an error
region is assumed to not be executed anyhow. We currently just lack the code
to make this happen in practice.
llvm-svn: 252725
Previously, we just skipped error blocks during scop construction. With
this change we make sure we can construct domains for error blocks such that
these domains can be forwarded to subsequent basic blocks.
This change ensures that basic blocks that post-dominate and are dominated by
a basic block that branches to an error condition have the very same iteration
domain as the branching basic block. Before, this change we would construct
a domain that excludes all error conditions. Such domains could become _very_
complex and were undesirable to build.
Another solution would have been to drop these constraints using a
dominance/post-dominance check instead of modeling the error blocks. Such
a solution could also work in case of unreachable statements or infinite
loops in the scop. However, as we currently (to my believe incorrectly) model
unreachable basic blocks in the post-dominance tree, such a solution is not
yet feasible and requires first a change to LLVM's post-dominance tree
construction.
This commit addresses the most sever compile time issue reported in:
http://llvm.org/PR25458
llvm-svn: 252713
Especially for structs, the SAI object of a base pointer does not
describe all the types that the user might expect when he loads from
that base pointer. While we will still cast integers and pointers we
will now reload the value with the correct type if floating point and
non-floating point values are involved. However, there are now TODOs
where we use bitcasts instead of a proper conversion or reloading.
This fixes bug 25479.
llvm-svn: 252706
We now create all invariant equivalence classes for required invariant loads
instead of creating them on-demand. This way we can check if a parameter
references an invariant load that is actually not executed and was therefor
not materialized. If that happens the parameter is not materialized either.
This fixes bug 25469.
llvm-svn: 252701
Since 252422 we do not only distinguish two ScopArrayInfo kinds, PHI nodes
and others, but work with three kind of ScopArrayInfo objects. SCALAR, PHI and
ARRAY objects. Instead of keeping two boolean flags isPHI and isScalar and
wonder what an ScopArrayInfo object of kind (!isScalar && isPHI) is, we
list now explicitly the three different possible types of memory objects.
This change also allows us to remove the confusing nested pairs that have
been used in ArrayInfoMapTy.
llvm-svn: 252620
In polly the first dimensions of an array as well as all scalars do not carry
any size information. This commit makes this explicit in the interface of
getDimensionSize. Before this commit getDimensionSize(0) returned the size of
the first dimension that carried a size. After this commit getDimensionSize(i)
will either return the size of dimension 'i' or assert in case 'i' does not
carry a size or does not exist at all.
This very same behaviour was already present in getDimensionSizePw(). This
commit also adds assertions that ensure getDimensionSizePw() is called
appropriately.
llvm-svn: 252607
Memory references are now printed as follows:
Old New
Scalars: i64 MemRef_val[*] i64 MemRef_val;
Arrays: i64 MemRef_A[*][%m][%o][8] i64 MemRef_A[*][%m][%o];
We do not print any more information about the element size in the type. Such
information has already been available in a comment after the scalar/array
declaration. It was redundant and did not match well with what people were used
from C.
llvm-svn: 252602
Scalar reloads in the generated entering block were not recognized as
dominating the subregions locks when there were multiple entering
nodes. This resulted in values defined in there not being copied.
As a fix, we unconditionally add the BBMap of the generated entering
node to the generated entry. This fixes part of llvm.org/PR25439.
This reverts 252449 and reapplies r252445. Its test was failing
indeterministically due to r252375 which was reverted in r252522.
llvm-svn: 252540
The dominance of the generated non-affine subregion block was based on
the scop's merge block, therefore resulted in an invalid DominanceTree.
It resulted in some values as assumed to be unusable in the actual
generated exit block.
We detect the case that the exit block has been moved and decide
dominance using the BB at the original exit. If we create another exit
node, that exit nodes is dominated by the one generated from where the
original exit resides. This fixes llvm.org/PR25438 and part of
llvm.org/PR25439.
llvm-svn: 252526
It introduced indeterminism as it was iterating over an address-indexed
hashtable. The corresponding bug PR25438 will be fixed in a successive
commit.
llvm-svn: 252522
This reverts commit 9775824b265e574fc541e975d64d3e270243b59d due to a
failing unit test.
Please check and correct the unit test and commit again.
llvm-svn: 252449
Scalar reloads in the generated entering block were not recognized as
dominating the subregions locks when there were multiple entering
nodes. This resulted in values defined in there not being copied.
As a fix, we unconditionally add the BBMap of the generated entering
node to the generated entry. This fixes part of llvm.org/PR25439.
llvm-svn: 252445
If a SCoP contains error blocks we cannot use the domain constraints
to simplify the assumptions as the domain is already influenced by the
assumptions we took. Before this patch we did that and some assumptions
became self-fulfilling as they were implied by the domain constraints.
llvm-svn: 252424