Even before we build the domain the branch condition can become very
complex, especially if we have to build the complement of a lot of
equality constraints. With this patch we bail if the branch condition
has a lot of basic sets and parameters.
After this patch we now successfully compile
External/SPEC/CINT2000/186_crafty/186_crafty
with "-polly-process-unprofitable -polly-position=before-vectorizer".
llvm-svn: 265286
As a CFG is often structured we can simplify the steps performed during
domain generation. When we push domain information we can utilize the
information from a block A to build the domain of a block B, if A dominates B
and there is no loop backede on a path from A to B. When we pull domain
information we can use information from a block A to build the domain of a
block B if B post-dominates A. This patch implements both ideas and thereby
simplifies domains that were not simplified by isl. For the FINAL basic block
in test/ScopInfo/complex-successor-structure-3.ll we used to build a universe
set with 81 basic sets. Now it actually is represented as universe set.
While the initial idea to utilize the graph structure depended on the
dominator and post-dominator tree we can use the available region
information as a coarse grained replacement. To this end we push the
region entry domain to the region exit and pull it from the region
entry for the region exit if applicable.
With this patch we now successfully compile
External/SPEC/CINT2006/400_perlbench/400_perlbench
and
SingleSource/Benchmarks/Adobe-C++/loop_unroll.
Differential Revision: http://reviews.llvm.org/D18450
llvm-svn: 265285
If a loop has no exiting blocks the region covering we use during
schedule genertion might not cover that loop properly. For now we bail
out as we would not optimize these loops anyway.
llvm-svn: 265280
If an exit PHI is written and also read in the SCoP we should not create two
SAI objects but only one. As the read is only modeled to ensure OpenMP code
generation knows about it we can simply use the EXIT_PHI MemoryKind for both
accesses.
llvm-svn: 265261
If a loop has no exiting blocks the region covering we use during
schedule genertion might not cover that loop properly. For now we bail
out as we would not optimize these loops anyway.
llvm-svn: 265260
... instead of hardcoding something that has been free at some point. This fixes
a crash triggered by r265084, where the diagnostic IDs have been shifted in a
way that resulted our hardcode ID to not be assigned any implementation. Our ID
was likely already wrong earlier on, but this time we really crashed nicely.
llvm-svn: 265114
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:
FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time
The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.
llvm-svn: 264885
As a CFG is often structured we can simplify the steps performed
during domain generation. When we push domain information we can
utilize the information from a block A to build the domain of a
block B, if A dominates B. When we pull domain information we can
use information from a block A to build the domain of a block B
if B post-dominates A. This patch implements both ideas and thereby
simplifies domains that were not simplified by isl. For the FINAL
basic block in
test/ScopInfo/complex-successor-structure-3.ll .
we used to build a universe set with 81 basic sets. Now it actually is
represented as universe set.
While the initial idea to utilize the graph structure depended on the
dominator and post-dominator tree we can use the available region
information as a coarse grained replacement. To this end we push the
region entry domain to the region exit and pull it from the region
entry for the region exit.
Differential Revision: http://reviews.llvm.org/D18450
llvm-svn: 264789
Instead of waiting for the domain construction to finish we will now
bail as early as possible in case a complexity problem is encountered.
This might save compile time but more importantly it makes the "abort"
explicit. While we can always check if we invalidated the assumed
context we can simply propagate the result of the construction back.
This also removes the HasComplexCFG flag that was used for the very
same reason.
Differential Revision: http://reviews.llvm.org/D18504
llvm-svn: 264775
This patch applies the restrictions on the number of domain conjuncts
also to the domain parts of piecewise affine expressions we generate.
To this end the wording is change slightly. It was needed to support
complex additions featuring zext-instructions but it also fixes PR27045.
lnt profitable runs reports only little changes that might be noise:
Compile Time:
Polybench/[...]/2mm +4.34%
SingleSource/[...]/stepanov_container -2.43%
Execution Time:
External/[...]/186_crafty -2.32%
External/[...]/188_ammp -1.89%
External/[...]/473_astar -1.87%
llvm-svn: 264514
This fixes PR27035. While we now exclude MemIntrinsics from the
polyhedral model if they would access "null" we could exploit this
even more, e.g., remove all parameter combinations that would lead to
the execution of this statement from the context.
llvm-svn: 264284
Similar to r262612 we need to check not only the pointer SCEV and the
type of an alias group but also the actual access instruction. The
reason is again the same: The pointer SCEV is not flow sensitive but the
access function is. In r262612 we avoided consolidating alias groups
even though the pointer SCEV and the type were the same but the access
function was not. Here it is simpler as we can simply check all members
of an alias group against the given access instruction.
llvm-svn: 264274
This might be useful to evaluate the benefit of us handling modref funciton
calls. Also, a new bug that was triggered by modref function calls was
recently reported http://llvm.org/PR27035. To ensure the same issue does not
cause troubles for other people, we temporarily disable this until the bug
is resolved.
llvm-svn: 264140
ISL can conclude additional conditions on parameters from restrictions
on loop variables. Such conditions persist when leaving the loop and the
loop variable is projected out. This results in a narrower domain for
exiting the loop than entering it and is logically impossible for
non-infinite loops.
We fix this by not adding a lower bound i>=0 when constructing BB
domains, but defer it to when also the upper bound it computed, which
was done redundantly even before this patch.
This reduces the number of LNT fails with -polly-process-unprofitable
-polly-position=before-vectorizer from 8 to 6.
llvm-svn: 264118
We bail out if current scop has a complex control flow as this could lead to
building of large domain conditions. This is to reduce compile time. This
addresses r26382.
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D18362
llvm-svn: 264105
Affine branches are fully modeled and regenerated from the polyhedral domain and
consequently do not require any input conditions to be propagated.
llvm-svn: 263678
The scope will be required in the following fix. This commit separates
the large changes that do not change behaviour from the small, but
functional change.
llvm-svn: 262664
This should fix PR19422.
Thanks to Jeremy Huddleston Sequoia for reporting this.
Thanks to Roman Gareev for his investigation and the reduced test case.
llvm-svn: 262612
Polly recognizes affine loops that ScalarEvolution does not, in
particular those with loop conditions that depend on hoisted invariant
loads. Check for SCEVAddRec dependencies on such loops and do not
consider their exit values as synthesizable because SCEVExpander would
generate them as expressions that depend on the original induction
variables. These are not available in generated code.
llvm-svn: 262404
In order to speed up compile time and to avoid random timeouts we now
separately track assumptions and restrictions. In this context
assumptions describe parameter valuations we need and restrictions
describe parameter valuations we do not allow. During AST generation
we create a runtime check for both, whereas the one for the
restrictions is negated before a conjunction is build.
Except the In-Bounds assumptions we currently only track restrictions.
Differential Revision: http://reviews.llvm.org/D17247
llvm-svn: 262328
removeCachedResults deletes the DetectionContext from
DetectionContextMap such that any it cannot be used anymore.
Unfortunately invalid<ReportUnprofitable> and RejectLogs.insert still do
use it. Because the memory is part of a map and not returned to to the
OS immediatly, such that the observable effect was only a memory leak
due to reference counters not decreased when the second call to
removeCachedResults does not remove the DetectionContext because because
it already has been removed.
Fix by not removing the DetectionContext prematurely. The second call to
removeCachedResults will handle it anyway.
llvm-svn: 262235
We move verifyInvariantLoads out of this function to allow for an early return
without the need for code duplication. A similar transformation was suggested
by Johannes Doerfert in post commit review of r262033.
llvm-svn: 262203
This debug output distracts from the -debug-only=polly-scops output. As it is
rather verbose and only really needed for debugging the domain construction
I drop this output. The domain construction is meanwhile stable enough to
not require regular debugging.
llvm-svn: 262117
The functions buildAccessMultiDimFixed and buildAccessMultiDimParam were
refactored from buildMemoryAccess. In their own functions, the control
flow can be shortcut and simplified using returns.
Suggested-by: etherzhhb
llvm-svn: 262029
Check the ModRefBehaviour of functions in order to decide whether or
not a call instruction might be acceptable.
Differential Revision: http://reviews.llvm.org/D5227
llvm-svn: 261866
From now on we bail only if a non-trivial alias group contains a non-affine
access, not when we discover aliasing and non-affine accesses are allowed.
llvm-svn: 261863
Replace Scop::getStmtForBasicBlock and Scop::getStmtForRegionNode, and
add overloads for llvm::Instruction and llvm::RegionNode.
getStmtFor and overloads become the common interface to get the Stmt
that contains something. Named after LoopInfo::getLoopFor and
RegionInfo::getRegionFor.
llvm-svn: 261791
This patch adds support for memcpy, memset and memmove intrinsics. They are
represented as one (memset) or two (memcpy, memmove) memory accesses in the
polyhedral model. These accesses have an access range that describes the
summarized effect of the intrinsic, i.e.,
memset(&A[i], '$', N);
is represented as a write access from A[i] to A[i+N].
Differential Revision: http://reviews.llvm.org/D5226
llvm-svn: 261489
To support non-aligned accesses we introduce a virtual element size
for arrays that divides each access function used for this array. The
adjustment of the access function based on the element size of the
array was therefore moved after this virtual element size was
determined, thus after all accesses have been created.
Differential Revision: http://reviews.llvm.org/D17246
llvm-svn: 261226
After we moved isl_ctx into Scop, we need to free the isl_ctx after
freeing all isl objects, which requires the ScopInfo pass to be freed
at last. But this is not guaranteed by the PassManager, and we need
extra code to free the isl_ctx at the right time.
We introduced a shared pointer to manage the isl_ctx, and distribute
it to all analyses that create isl objects. As such, whenever we free
an analyses with the shared_ptr (and also free the isl objects which
are created by the analyses), we decrease the (shared) reference
counter of the shared_ptr by 1. Whenever the reference counter reach
0 in the releaseMemory function of an analysis, that analysis will
be the last one that hold any isl objects, and we can safely free the
isl_ctx with that analysis.
Differential Revision: http://reviews.llvm.org/D17241
llvm-svn: 261100
First support for this feature was committed in r259784. Support for
loop invariant load hoisting with different types was added by
Johannes Doerfert in r260045 and r260886.
llvm-svn: 260965
A load can only be invariant if its base pointer is invariant too. To
this end, we check if the base pointer is defined inside the region or
outside. In the former case we recursively check if we can (and
therefore will) hoist the base pointer too. Only if that happends we
can hoist the load.
llvm-svn: 260886
This reverts commit 98efa006c96ac981c00d2e386ec1102bce9f549a.
The fix was broken since we do not use AA in the ScopDetection anymore to
check for invariant accesses.
llvm-svn: 260884
Eliminate the global variable "InsnToMemAcc" to make Scop/ScopInfo become
more protable, such that we can safely use them in a CallGraphSCC pass.
Differential Revision: http://reviews.llvm.org/D17238
llvm-svn: 260863
Before this patch it could happen that we did not hoist a load that
was a base pointer of another load even though AA already declared the
first one as invariant (during ScopDetection). If this case arises we
will now skipt the "can be overwriten" check because in this case the
over-approximating nature causes us to generate broken code.
llvm-svn: 260862
The former ScopArrayInfo::updateSizes was implicitly divided into an
updateElementType and an updateSizes. Now this partitioning is
explicit.
llvm-svn: 260860
This reverts commit https://llvm.org/svn/llvm-project/polly/trunk@260853
We unfortunately still have two bugs left which show only up with
-polly-process-unprofitable and which I forgot to test before committing.
llvm-svn: 260854
First support for this feature was committed in r259784. Support for
loop invariant load hoisting with different types was added by Johannes
Doerfert in r260045. This fixed the last known bug.
llvm-svn: 260853
Since the origin AccFuncMap in ScopInfo is used by the underlying Scop
only, and it must stay alive until we delete the Scop. It will be better
if we simply move the origin AccFuncMap in ScopInfo into the Scop class.
llvm-svn: 260820
Make Scop become more portable such that we can use it in a CallGraphSCC pass.
The first step is to drop the analyses that are only used during Scop construction.
This patch drop LoopInfo from Scop.
llvm-svn: 260819
Make Scop become more portable such that we can use it in a CallGraphSCC pass.
The first step is to drop the analyses that are only used during Scop construction.
This patch drop DominatorTree from Scop.
llvm-svn: 260818
Make Scop become more portable such that we can use it in a CallGraphSCC pass.
The first step is to drop the analyses that are only used during Scop construction.
This patch drop ScopDecection from Scop.
llvm-svn: 260817
We now distinguish invariant loads to the same memory location if they
have different types. This will cause us to pre-load an invariant
location once for each type that is used to access it. However, we can
thereby avoid invalid casting, especially if an array is accessed
though different typed/sized invariant loads.
This basically reverts the changes in r260023 but keeps the test
cases.
llvm-svn: 260045
We also disable this feature by default, as there are still some issues in
combination with invariant load hoisting that slipped through my initial
testing.
llvm-svn: 260025
Invariant load hoisting of memory accesses with non-canonical element
types lacks support for equivalence classes that contain elements of
different width/size. This support should be added, but to get our buildbots
back to green, we disable load hoisting for memory accesses with non-canonical
element size for now.
llvm-svn: 260023
The previously implemented approach is to follow value definitions and
create write accesses ("push defs") while searching for uses. This
requires the same relatively validity- and requirement conditions to be
replicated at multiple locations (PHI instructions, other instructions,
uses by PHIs).
We replace this by iterating over the uses in a SCoP ("pull in
requirements"), and add writes only when at least one read has been
added. It turns out to be simpler code because each use is only iterated
over once and writes are added for the first access that reads it. We
need another iteration to identify escaping values (uses not in the
SCoP), which also makes the difference between such accesses more
obvious. As a side-effect, the order of scalar MemoryAccess can change.
Differential Revision: http://reviews.llvm.org/D15706
llvm-svn: 259987
This allows code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To model such code we use as canonical element type of the modeled array the
smallest element type of all original array accesses, if type allocation sizes
are multiples of each other. Otherwise, we use a newly created iN type, where N
is the gcd of the allocation size of the types used in the accesses to this
array. Accesses with types larger as the canonical element type are modeled as
multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support code-generating these memory accesses, we introduce a new method
getAccessAddressFunction that assigns each statement instance a single memory
location, the address we load from/store to. Currently we obtain this address by
taking the lexmin of the access function. We may consider keeping track of the
memory location more explicitly in the future.
We currently do _not_ handle multi-dimensional arrays and also keep the
restriction of not supporting accesses where the offset expression is not a
multiple of the access element type size. This patch adds tests that ensure
we correctly invalidate a scop in case these accesses are found. Both types of
accesses can be handled using the very same model, but are left to be added in
the future.
We also move the initialization of the scop-context into the constructor to
ensure it is already available when invalidating the scop.
Finally, we add this as a new item to the 2.9 release notes
Reviewers: jdoerfert, Meinersbur
Differential Revision: http://reviews.llvm.org/D16878
llvm-svn: 259784
We support now code such as:
void multiple_types(char *Short, char *Float, char *Double) {
for (long i = 0; i < 100; i++) {
Short[i] = *(short *)&Short[2 * i];
Float[i] = *(float *)&Float[4 * i];
Double[i] = *(double *)&Double[8 * i];
}
}
To support such code we use as element type of the modeled array the smallest
element type of all original array accesses. Accesses with larger types are
modeled as multiple accesses with the smaller type.
For example the second load access is modeled as:
{ Stmt_bb2[i0] -> MemRef_Float[o0] : 4i0 <= o0 <= 3 + 4i0 }
To support jscop-rewritable memory accesses we need each statement instance to
only be assigned a single memory location, which will be the address at which
we load the value. Currently we obtain this address by taking the lexmin of
the access function. We may consider keeping track of the memory location more
explicitly in the future.
llvm-svn: 259587
We create separate functions for fixed-size multi-dimensional, parameteric-sized
multi-dimensional, as well as single-dimensional memory accesses to reduce the
complexity of a large monolithic function.
Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 259522
There is no need to pass the size of the elements as the last size dimension
to ScopArrayInfo. This information is already available through the ElementType.
Tracking it twice is not only redundant but may result in inconsistencies.
llvm-svn: 259521
For schedule generation we assumed that the reverse post order traversal used by
the domain generation is sufficient, however it is not. Once a loop is
discovered, we have to completely traverse it, before we can generate the
schedule for any block/region that is only reachable through a loop exiting
block.
To this end, we add a "loop stack" that will keep track of loops we
discovered during the traversal but have not yet traversed completely.
We will never visit a basic block (or region) outside the most recent
(thus smallest) loop in the loop stack but instead queue such blocks
(or regions) in a waiting list. If the waiting list is not empty and
(might) contain blocks from the most recent loop in the loop stack the
next block/region to visit is drawn from there, otherwise from the
reverse post order iterator.
We exploit the new property of loops being always completed before additional
loops are processed, by removing the LoopSchedules map and instead keep all
information in LoopStack. This clarifies that we indeed always only keep a
stack of in-process loops, but will never keep incomplete schedules for an
arbitrary set of loops. As a result, we can simplify some of the existing code.
This patch also adds some more documentation about how our schedule construction
works.
This fixes http://llvm.org/PR25879
This patch is an modified version of Johannes Doerfert's initial fix.
Differential Revision: http://reviews.llvm.org/D15679
llvm-svn: 259354
In https://llvm.org/svn/llvm-project/polly/trunk@251870 code was committed to
avoid a failure in the presence of infinite loops, but the test case committed
along with this change passes without the actual change. I looked back into the
code and also checked with the original committer (Johannes), but could not find
the reason why the code is needed. The introduction of LoopStacks for
buildSchedule in one of the next commits will make it even more clear that this
code is not needed, but I remove this ahead of time to facilitate bisecting in
case I missed something.
llvm-svn: 259347
Before adding a MK_Value READ MemoryAccess, check whether the read is
necessary or synthesizable. Synthesizable values are later generated by
the SCEVExpander and therefore do not need to be transferred
explicitly. This can happen because the check for synthesizability has
presumbly been forgotten in the case where a phi's incoming value has
been defined in a different statement.
Differential Revision: http://reviews.llvm.org/D15687
llvm-svn: 258998
MemAccInst wraps the common members of LoadInst and StoreInst. Also use
of this class in:
- ScopInfo::buildMemoryAccess
- BlockGenerator::generateLocationAccessed
- ScopInfo::addArrayAccess
- Scop::buildAliasGroups
- Replace every use of polly::getPointerOperand
Reviewers: jdoerfert, grosser
Differential Revision: http://reviews.llvm.org/D16530
llvm-svn: 258947
Ensure that there is at most one phi write access per PHINode and
ScopStmt. In particular, this would be possible for non-affine
subregions with multiple exiting blocks. We replace multiple MAY_WRITE
accesses by one MUST_WRITE access. The written value is constructed
using a PHINode of all exiting blocks. The interpretation of the PHI
WRITE's "accessed value" changed from the incoming value to the PHI like
for PHI READs since there is no unique incoming value.
Because region simplification shuffles around PHI nodes -- particularly
with exit node PHIs -- the PHINodes at analysis time does not always
exist anymore in the code generation pass. We instead remember the
incoming block/value pair in the MemoryAccess.
Differential Revision: http://reviews.llvm.org/D15681
llvm-svn: 258809
Keep at most one value read MemoryAccess per value and statement;
multiple generated loads do not have any additional effect. As one such
MemoryAccess can cater multiple uses within the statement, the
AccessInstruction property is not unique any more and set to nullptr.
Differential Revision: http://reviews.llvm.org/D15510
llvm-svn: 258808
Ensure there is at most one write access per definition of an
llvm::Value. Keep track of already created value write access by using
a (dense) map.
Replace addValueWriteAccess by ensureValueStore which can be uses more
liberally without worrying to add redundant accesses. It will be used,
e.g. in a logical correspondant for value reads -- ensureValueReload --
to ensure that the expected definition has been written when loading it.
Differential Revision: http://reviews.llvm.org/D15483
llvm-svn: 258807
Polly currently does not support irreducible control and it is probably not
worth supporting. This patch adds code that checks for irreducible control
and refuses regions containing irreducible control.
Polly traditionally had rather restrictive checks on the control flow structure
which would have refused irregular control, but within the last couple of months
most of the control flow restrictions have been removed. As part of this
generalization we accidentally allowed irregular control flow.
Contributed-by: Karthik Senthil and Ajith Pandel
llvm-svn: 258497
Call assumeNoOutOfBound only in updateDimensionality to process situations
when new dimensions are added and new bounds checks are required.
Contributed-by: Tobias Grosser, Gareev Roman
llvm-svn: 257170
This change clarifies that for Not-NonAffine-SubRegions we actually iterate over
the subnodes and for both NonAffine-SubRegions and BasicBlocks, we perform the
schedule construction. As a result, the tree traversal becomes trivial, the
special case for a scop consisting just of a single non-affine region
disappears and the indentation of the code is reduced.
No functional change intended.
llvm-svn: 256940
At code generation, scalar reads are generated before the other
statement's instructions, respectively scalar writes after them, in
contrast to array accesses which are "executed" with the instructions
they are linked to. Therefore it makes sense to not map the scalar
accesses to a place of execution. Follow-up patches will also remove
some of the directs links from a scalar access to a single instruction,
such that only having array accesses in InstructionToAccess ensures
consistency.
Differential Revision: http://reviews.llvm.org/D13676
llvm-svn: 256298
We clarify that certain code is only executed if LSchedule is != nullptr.
Previously some of these functions have been executed, but they only passed
a nullptr through. This caused some confusion when reading the code.
llvm-svn: 256209
Besides improving the documentation and the code we now assert in case the input
is invalid (N < 0) and also do not any more return a nullptr in case USet is
empty. This should make the code more readable.
llvm-svn: 256208
If a loop has a sufficiently large amount of compute instruction in its loop
body, it is unlikely that our rewrite of the loop iterators introduces large
performance changes. As Polly can also apply beneficical optimizations (such
as parallelization) to such loop nests, we mark them as profitable.
This option is currently "disabled" by default, but can be used to run
experiments. If enabled by setting it e.g. to 40 instructions, we currently
see some compile-time increases on LNT without any significant run-time
changes.
llvm-svn: 256199
.. and add some documentation. We also simplify the code by dropping an early
check that is also covered by the the later checks. This might have a small
compile time impact, but as the scops that are skipped are small we should
probably only add this back in the unlikely case that this has a notable
compile-time cost.
No functional change intended.
llvm-svn: 256149
As we already log an error when calling invalid, scops unprofitable scops are in
any case marked invalid, but returning immediately safes (a tiny bit of) compile
time and is consistent with our use of 'invalid' in the remainder of the file.
Found by inspection.
llvm-svn: 256140
Without this return we still log the incorrect array size (and do not detect
this scop), but we would unnecessarily continue to verify that access functions
are affine. As we do not need to do this, we can return right ahead and
consequently safe compile time.
This issue was found by inspection.
llvm-svn: 256139
Instead of counting all array memory accesses associated with a load
instruction, we now explicitly check that the single array access that could
(potentially) be associated with a load instruction does not exist. This helps
to document the current behavior of Polly where load instructions can indeed
have at most one associated array access. In the unlikely case this changes
in the future, we add an assert for the case where two load accesses would
prevent us to return a single memory access, but we still should communicate
that not all array memory accesses have been removed.
This addresses post-commit comments from Johannes Doerfert for commit 255776.
llvm-svn: 256136
Scops that contain many complex branches are likely to result in complex domain
conditions that consist of a large (> 100) number of conjucts. Transforming
such domains is expensive and unlikely to result in efficient code. To avoid
long compile times we detect this case and skip such scops. In the future we may
improve this by either using non-affine subregions to hide such complex
condition structures or by exploiting in certain cases properties (e.g.,
dominance) that allow us to construct the domains of a scop in a way that
results in a smaller number improving conjuncts.
Example of a code that results in complex iteration spaces:
loop.header
/ | \ \
A0 A2 A4 \
\ / \ / \
A1 A3 \
/ \ / \ |
B0 B2 B4 |
\ / \ / |
B1 B3 ^
/ \ / \ |
C0 C2 C4 |
\ / \ / /
C1 C3 /
\ / /
loop backedge
llvm-svn: 256123
The patch fixes Bug 25759 produced by inappropriate handling of unsigned
maximum SCEV expressions by SCEVRemoveMax. Without a fix, we get an infinite
loop and a segmentation fault, if we try to process, for example,
'((-1 + (-1 * %b1)) umax {(-1 + (-1 * %yStart)),+,-1}<%.preheader>)'.
It also fixes a potential issue related to signed maximum SCEV expressions.
Tested-by: Roman Gareev <gareevroman@gmail.com>
Fixed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D15563
llvm-svn: 255922
When running 'clang -O3 -mllvm -polly -mllvm -polly-show' we now only show the
CFGs of functions with at least one detected scop. For larger files/projects
this reduces the number of graphs printed significantly and is likely what
developers want to see. The new option -polly-view-all enforces all graphs to be
printed and the exiting option -poll-view-only limites the graph printing to
functions that match a certain pattern.
This patch requires https://llvm.org/svn/llvm-project/llvm/trunk@255889 (and
vice versa) to compile correctly.
llvm-svn: 255891
Load instructions may possibly be related to multiple memory accesses, but we
are only interested in the array read access that describes the memory location
the load instructions loads from. By using getArrayAccessfor we ensure to always
obtain the right memory access.
This issue was found by inspection without having a failing test case.
llvm-svn: 255716
This reverts commit r255471.
Johannes raised in the post-commit review of r255471 the concern that PHI
writes in non-affine regions with two exiting blocks are not really MUST_WRITE,
but we just know that at least one out of the set of all possible PHI writes
will be executed. Modeling all PHI nodes as MUST_WRITEs is probably save, but
adding the needed documentation for such a special case is probably not worth
the effort. Michael will be proposing a new patch that ensures only a single
PHI_WRITE is created for non-affine regions, which - besides other benefits -
should also allow us to use a single well-defined MUST_WRITE for such PHI
writes.
(This is not a full revert, but the condition and documentation have been
slightly extended)
llvm-svn: 255503
Before this commit, only the region's entry block was assumed to always
execute in a non-affine subregion. We replace this by a test whether it
dominates the exit block (this necessarily includes the entry block)
which should be more accurate.
llvm-svn: 255473
LLVM's IR guarantees that a value definition occurs before any use, and
also the value of a PHI must be one of the incoming values, "written"
in one of the incoming blocks. Hence, such writes are never conditional
in the context of a non-affine subregion.
llvm-svn: 255471
Over time different vocabulary has been introduced to describe the different
memory objects in Polly, resulting in different - often inconsistent - naming
schemes in different parts of Polly. We now standartize this to the following
scheme:
KindArray, KindValue, KindPHI, KindExitPHI
| ------- isScalar -----------|
In most cases this naming scheme has already been used previously (this
minimizes changes and ensures we remain consistent with previous publications).
The main change is that we remove KindScalar to clearify the difference between
a scalar as a memory object of kind Value, PHI or ExitPHI and a value (former
KindScalar) which is a memory object modeling a llvm::Value.
We also move all documentation to the Kind* enum in the ScopArrayInfo class,
remove the second enum in the MemoryAccess class and update documentation to be
formulated from the perspective of the memory object, rather than the memory
access. The terms "Implicit"/"Explicit", formerly used to describe memory
accesses, have been dropped. From the perspective of memory accesses they
described the different memory kinds well - especially from the perspective of
code generation - but just from the perspective of a memory object it seems more
straightforward to talk about scalars and arrays, rather than explicit and
implicit arrays. The last comment is clearly subjective, though. A less
subjective reason to go for these terms is the historic use both in mailing list
discussions and publications.
llvm-svn: 255467
Use it to print "null" if a MemoryAccess's access relation is not
available instead of printing nothing.
Suggested-by: Johannes Doerfert
llvm-svn: 255466
Introduce a function getStmtForRegionNode() to the corresponding
ScopStmt of a RegionNode. We can use it to call the existing
ScopStmt::isEmpty() function instead of searching for accesses.
llvm-svn: 255465
Acc==MA implies Acc->getAccessInstruction() == MA->getAccessInstruction().
Suggested as post-commit review for 254305 by Michael Kruse.
llvm-svn: 254327
The use of C++'s high-level iterator functionality instead of two while loops
and explicit iterator handling improves readability of this code.
Proposed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: http://reviews.llvm.org/D15068
llvm-svn: 254305
Previously, accesses that originate from PHI nodes in the exit block
were registered as SCALAR. In some context they are treated as scalars,
but it makes a difference in others. We used to check whether the
AccessInstruction is a terminator to differentiate the cases.
This patch introduces an MemoryAccess origin EXIT_PHI and a
ScopArrayInfo kind KIND_EXIT_PHI to make this case more explicit. No
behavioural change intended.
Differential Revision: http://reviews.llvm.org/D14688
llvm-svn: 254149
gfortran (and fortran in general?) does not compute the address of an array
element directly from the array sizes (e.g., %s0, %s1), but takes first the
maximum of the sizes and 0 (e.g., max(0, %s0)) before multiplying the resulting
value with the per-dimension array subscript expressions. To successfully
delinearize index expressions as we see them in fortran, we first filter 'smax'
expressions out of the SCEV expression, use them to guess array size parameters
and only then continue with the existing delinearization.
llvm-svn: 253995
Trying to build up access functions for any of these blocks is likely to fail,
as error blocks may contain invalid/non-representable instructions, and blocks
dominated by error blocks may reference such instructions, which wil also cause
failures. As all of these blocks are anyhow assumed to not be executed, we can
just remove them early on.
This fixes http://llvm.org/PR25596
llvm-svn: 253818
At some point we enforced lcssa for the loop surrounding the entry block.
This is not only questionable as it does not check any other loop but also
not needed any more.
llvm-svn: 253789
In case the original parameter instruction does not have a name, but it comes
from a load instruction where the base pointer has a name we used the name of
the load instruction to give some more intuition of where the parameter came
from. To ensure this works also through GEPs which may have complex offsets,
we originally just dropped the offsets and _only_ used the base pointer name.
As this can result in multiple parameters to get the same name, we now prefix
the parameter ID to ensure parameter names are unique. This will make it easier
to understand debug output.
This change does not affect correctness, as parameter IDs (even of the same
name) can always be distinguished through the SCEV pointer stored inside them.
llvm-svn: 253330
Without this change we may start to refuse scops in larger compilation units
just because a lot of code has already been compiled earlier.
Found by inspection. I do not yet have a good test case for this.
llvm-svn: 253050
Only when we check for wrapping we want to use the store size, for all
other cases we use the alloc size now.
Suggested by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 252941
If an llvm.assume dominates the SCoP entry block and the assumed condition
can be expressed as an affine inequality we will now add it to the context.
Differential Revision: http://reviews.llvm.org/D14413
llvm-svn: 252851
Error blocks may contain arbitrary instructions, among them some which we can
not modeled correctly. As we do not generate ScopStmts for error blocks anyhow
there is no point in trying to generate access functions for them.
This fixes llvm.org/PR25494
llvm-svn: 252794
In certain cases isl will not free the return values of operations for which
a computeout has been triggered. Hence, make sure we free it explicitly.
No test, as I did not manage to reduce one yet.
llvm-svn: 252766
For complex inputs our current approach of construction the boundary context
may in rare cases become computationally so expensive that it is better to
abort. This change adds a compute out check that bounds the compuations we
spend on boundary context construction and bails out if this limit is reached.
We can probably make our boundary construction algorithm more efficient, but
this requires some more investigation and probably also some additional changes
to isl. Until these have been added, we bound the compile time to ensure our
buildbots are green.
llvm-svn: 252758
In certain rare cases (mostly -polly-process-unprofitable on large sequences
of conditions - often without any loop), we see some compile-time timeouts due
to the construction of an overly complex assumption context. This change limits
the number of disjuncts to 150 (adjustable), to prevent us from creating
assumptions contexts that are too large for even the compilation to finish.
The limit has been choosen as large as possible to make sure we do not
unnecessarily drop test coverage. If such cases also appear in
-polly-process-unprofitable=false mode we may need to think about this again,
as the current limitations may still allow assumptions that are way to complex
to be checked profitably at run-time.
There is also certainly room for improvement regarding how (and how efficient)
we construct an assumed context, but this requires some more thinking.
This completes llvm.org/PR25458
llvm-svn: 252750
r252713 introduced a couple of regressions due to later basic blocks refering
to instructions defined in error blocks which have not yet been modeled.
This commit is currently just encoding limitations of our modeling and code
generation backends to ensure correctness. In theory, we should be able to
generate and optimize such regions, as everything that is dominated by an error
region is assumed to not be executed anyhow. We currently just lack the code
to make this happen in practice.
llvm-svn: 252725
Previously, we just skipped error blocks during scop construction. With
this change we make sure we can construct domains for error blocks such that
these domains can be forwarded to subsequent basic blocks.
This change ensures that basic blocks that post-dominate and are dominated by
a basic block that branches to an error condition have the very same iteration
domain as the branching basic block. Before, this change we would construct
a domain that excludes all error conditions. Such domains could become _very_
complex and were undesirable to build.
Another solution would have been to drop these constraints using a
dominance/post-dominance check instead of modeling the error blocks. Such
a solution could also work in case of unreachable statements or infinite
loops in the scop. However, as we currently (to my believe incorrectly) model
unreachable basic blocks in the post-dominance tree, such a solution is not
yet feasible and requires first a change to LLVM's post-dominance tree
construction.
This commit addresses the most sever compile time issue reported in:
http://llvm.org/PR25458
llvm-svn: 252713
We now create all invariant equivalence classes for required invariant loads
instead of creating them on-demand. This way we can check if a parameter
references an invariant load that is actually not executed and was therefor
not materialized. If that happens the parameter is not materialized either.
This fixes bug 25469.
llvm-svn: 252701
Since 252422 we do not only distinguish two ScopArrayInfo kinds, PHI nodes
and others, but work with three kind of ScopArrayInfo objects. SCALAR, PHI and
ARRAY objects. Instead of keeping two boolean flags isPHI and isScalar and
wonder what an ScopArrayInfo object of kind (!isScalar && isPHI) is, we
list now explicitly the three different possible types of memory objects.
This change also allows us to remove the confusing nested pairs that have
been used in ArrayInfoMapTy.
llvm-svn: 252620
In polly the first dimensions of an array as well as all scalars do not carry
any size information. This commit makes this explicit in the interface of
getDimensionSize. Before this commit getDimensionSize(0) returned the size of
the first dimension that carried a size. After this commit getDimensionSize(i)
will either return the size of dimension 'i' or assert in case 'i' does not
carry a size or does not exist at all.
This very same behaviour was already present in getDimensionSizePw(). This
commit also adds assertions that ensure getDimensionSizePw() is called
appropriately.
llvm-svn: 252607
Memory references are now printed as follows:
Old New
Scalars: i64 MemRef_val[*] i64 MemRef_val;
Arrays: i64 MemRef_A[*][%m][%o][8] i64 MemRef_A[*][%m][%o];
We do not print any more information about the element size in the type. Such
information has already been available in a comment after the scalar/array
declaration. It was redundant and did not match well with what people were used
from C.
llvm-svn: 252602
If a SCoP contains error blocks we cannot use the domain constraints
to simplify the assumptions as the domain is already influenced by the
assumptions we took. Before this patch we did that and some assumptions
became self-fulfilling as they were implied by the domain constraints.
llvm-svn: 252424
Even if a scalar and memory access have the same base pointer, we cannot use
one SAI object as the type but also the number of dimensions are wrong. For
the attached test case this caused a crash in the invariant load hoisting,
though it could cause various other problems too.
This fixes bug 25428 and a execution time bug in MallocBench/cfrac.
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 252422
Remove all the implicit ilist iterator conversions from polly, in
preparation for making them illegal in ADT. There was one oddity I came
across: at line 95 of lib/CodeGen/LoopGenerators.cpp, there was a
post-increment `Builder.GetInsertPoint()++`.
Since it was a no-op, I removed it, but I admit I wonder if it might be
a bug (both before and after this change)? Perhaps it should be a
pre-increment?
llvm-svn: 252357
Before this commit memory reference identifiers have only been unique per
basic block, but not per (non-affine) ScopStmt. This commit now uses the
MemoryAccess base pointer to uniquely identify each Memory access.
llvm-svn: 252200
An incoming value from a block the is not inside the scop is an
external use, even if the phi is inside the scop. A previous fix in
r251208 did not apply if the phi is inside a non-affine subregion. We
move the check for this phi case before the non-affine subregion check.
llvm-svn: 252157
We do not need to model read-only statements in the SCoP as they will
not cause any side effects that are visible to the outside anyway.
Removing them should safe us time and might even simplify the ASTs we
generate.
Differential Revision: http://reviews.llvm.org/D14272
llvm-svn: 251948
ScalarEvolution doesn't allow the operands of an AddRec to be variant in the
loop of the AddRec. When we rewrite parameter SCEVs it might seem like the
new SCEV violates this property and ScalarEvolution will trigger an
assertion. To avoid this we move the start part out of an AddRec when we
rewrite it, thus avoid the operands to be possibly variant completely.
llvm-svn: 251945
In some cases different memory accesses access the very same array using a
different multi-dimensional array layout where the same dimensions have
different sizes. Instead of asserting when encountering this issue, we
gracefully bail out for this scop.
This fixes llvm.org/PR25252
llvm-svn: 251791
We remove -polly-detect-unprofitable and -polly-no-early-exit. Both have been
superseeded by -polly-process-unprofitable and were only kept as aliases for
our buildbots to continue to work. As all buildbots have been moved to the new
options, we can now remove the old ones for good.
llvm-svn: 251787
Volatile or atomic memory accesses are currently not supported. Neither did
we think about any special handling needed nor do we support the unknown
instructions the alias set tracker turns them into sometimes. Before this
patch, us not supporting unkown instructions in an alias set caused the
following assertion failures:
Assertion `AG.size() > 1 && "Alias groups should contain at least two accesses"'
failed
llvm-svn: 251234
When verifying if a scop is still valid we rerun all analysis, but did not
update DetectionContextMap. This change ensures that information, e.g. about
non-affine regions, is correctly updated
llvm-svn: 251227
the size expression.
We previously only checked if the size expression is 'undef', but allowed size
expressions of the form 'undef * undef' by accident. After this change we now
require size expressions to be affine which implies no 'undef' appears anywhere
in the expression.
llvm-svn: 251225
of the Region are external.
During code generation we split off the parts of the PHI nodes in the entry
block, which have incoming blocks that are not part of the region. As these
split-off PHI nodes then are external uses, we consequently also need to model
these uses in ScopInfo.
llvm-svn: 251208
There are several different kinds of constants that could occur in a
branch condition, however we can only handle the most interesting one
namely constant integers. To this end we have to treat others as
non-affine.
This fixes bug 25244.
llvm-svn: 250669
We build the schedule based on a traversal of the region and accumulate
information for each loop in it. The total schedule is associated with the
loop surrounding the SCoP, though it can happen that there are blocks in the
SCoP which are part of loops that are only partially in the SCoP. Instead of
associating information with them (they are not part of the SCoP and
consequently are not modeled) we have to associate the schedule information
with the surrounding loop if any.
This fixes bug 25240.
llvm-svn: 250668
Accesses that have a relative offset (in bytes) that is not divisible
by the type size (in bytes) will be represented as empty in the SCoP
description. This is on its own not good but it also crashed the
invariant load hoisting. This patch will fix the latter problem while
the former should be addressed too.
This fixes bug 25236.
llvm-svn: 250664
If the base pointer of a load is invariant and defined in the SCoP but
not loaded we cannot hoist the load as we would not hoist the base
pointer definition.
This fixes bug 25237.
llvm-svn: 250663
Sorting is replaced by a demand driven code generation that will pre-load a
value when it is needed or, if it was not needed before, at some point
determined by the order of invariant accesses in the program. Only in very
little cases this demand driven pre-loading will kick in, though it will
prevent us from generating faulty code. An example where it is needed is
shown in:
test/ScopInfo/invariant_loads_complicated_dependences.ll
Invariant loads that appear in parameters but are not on the top-level (e.g.,
the parameter is not a SCEVUnknown) will now be treated correctly.
Differential Revision: http://reviews.llvm.org/D13831
llvm-svn: 250655
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
While clang-format takes care that the line-length is not surpassed, the
resulting comments sometimes look not optimal. We re-flow the text in the
comment to avoid these ugly single-word lines.
llvm-svn: 250626
Instead of generating implicit loads within basic blocks, put them
before the instructions of the statment itself, including non-affine
subregions. The region's entry node is dominating all blocks in the
region and therefore the loaded value will be available there.
Implicit writes in block-stmts were already stored back at the end of
the block. Now, also generate the stores of non-affine subregions when
leaving the statement, i.e. in the exiting block.
This change is required for array-mapped implicits ("De-LICM") to
ensure that there are no dependencies of demoted scalars within
statments. Statement load all required values, operator on copied in
registers, and then write back the changed value to the demoted memory.
Lifetimes analysis within statements becomes unecessary.
Differential Revision: http://reviews.llvm.org/D13487
llvm-svn: 250625
Accesses for exit node phis will be handled separately by
buildPHIAccesses if there is more than one exiting edge,
buildScalarDependences does not need to create additional SCALAR
accesses.
This is a corrected version of r250517, which was reverted in r250607.
Differential Revision: http://reviews.llvm.org/D13848
llvm-svn: 250622
When pulling a llvm::Value to be written as a PHI write, the former
code did only check whether it is within the same basic block, but it
could also be the same non-affine subregion. In that case some
unecessary pair of MemoryAccesses would have been created.
Two unit test were explicitely checking for the unecessary writes,
including the comments that the writes are unecessary.
llvm-svn: 250411