The dominance of the generated non-affine subregion block was based on
the scop's merge block, therefore resulted in an invalid DominanceTree.
It resulted in some values as assumed to be unusable in the actual
generated exit block.
We detect the case that the exit block has been moved and decide
dominance using the BB at the original exit. If we create another exit
node, that exit nodes is dominated by the one generated from where the
original exit resides. This fixes llvm.org/PR25438 and part of
llvm.org/PR25439.
llvm-svn: 252526
It introduced indeterminism as it was iterating over an address-indexed
hashtable. The corresponding bug PR25438 will be fixed in a successive
commit.
llvm-svn: 252522
This reverts commit 9775824b265e574fc541e975d64d3e270243b59d due to a
failing unit test.
Please check and correct the unit test and commit again.
llvm-svn: 252449
Scalar reloads in the generated entering block were not recognized as
dominating the subregions locks when there were multiple entering
nodes. This resulted in values defined in there not being copied.
As a fix, we unconditionally add the BBMap of the generated entering
node to the generated entry. This fixes part of llvm.org/PR25439.
llvm-svn: 252445
If a SCoP contains error blocks we cannot use the domain constraints
to simplify the assumptions as the domain is already influenced by the
assumptions we took. Before this patch we did that and some assumptions
became self-fulfilling as they were implied by the domain constraints.
llvm-svn: 252424
Even if a scalar and memory access have the same base pointer, we cannot use
one SAI object as the type but also the number of dimensions are wrong. For
the attached test case this caused a crash in the invariant load hoisting,
though it could cause various other problems too.
This fixes bug 25428 and a execution time bug in MallocBench/cfrac.
Reported-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
llvm-svn: 252422
When we bail out early we make the partially build new code path
practically dead, though it was not unreachable. To remove dominance
problems we now make it not only dead but also prevent the control
flow to join with the original code path, thus allow to use original
values after the SCoP without any PHI nodes.
This fixes bug 25447.
llvm-svn: 252420
The bail out in r252412 left the code generation without verifying the (so
far) generated IR. This will change now and ensure we always run the
verifier.
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 252419
While the program cannot cause a dependence cycle between invariant
loads, additional constraints (e.g., to ensure finite loops) can
introduce them. It is hard to detect them in the SCoP description,
thus we will only check for them at code generation time. If such a
recursion is detected we will bail out the code generation and place a
"false" runtime check to guarantee the original code is used.
This fixes bug 25443.
llvm-svn: 252412
After loop versioning, a dominance check of a non-affine subregion's
exit node causes the dominance check to always fail on any block in the
subregion if it shares the same exit block with the scop. The
subregion's exit block has become polly_merge_new_and_old, which also
receives the control flow of the generated code. This would cause that
any value for implicit stores is assumed to be not from the scop.
We check dominance with the generated exit node instead.
This fixes llvm.org/PR25438
llvm-svn: 252375
Remove all the implicit ilist iterator conversions from polly, in
preparation for making them illegal in ADT. There was one oddity I came
across: at line 95 of lib/CodeGen/LoopGenerators.cpp, there was a
post-increment `Builder.GetInsertPoint()++`.
Since it was a no-op, I removed it, but I admit I wonder if it might be
a bug (both before and after this change)? Perhaps it should be a
pre-increment?
llvm-svn: 252357
We were adding all generated values in non-affine subregions to be used
for the subregions generated exit block. The thought was that only
values that are dominating the original exit block can be used there.
But it is possible for synthesizable values to be expanded in any
block. If the same values is also used for implicit writes, it would
try to reuse already synthesized values even if not dominating the exit
block.
The fix is to only add values to the list of values usable in the exit
block only if it is dominating the exit block. This fixes
llvm.org/PR25412.
llvm-svn: 252301
Before this commit memory reference identifiers have only been unique per
basic block, but not per (non-affine) ScopStmt. This commit now uses the
MemoryAccess base pointer to uniquely identify each Memory access.
llvm-svn: 252200
For generating scalar writes of non-affine subregions, all except phi
writes are generated in the exit block. The phi writes are generated in
the incoming block for which we errornously used the same BBMap. This
can conflict if a value for one block is synthesized, and then reused
for another block which is not dominated by the first block. This is
fixed by using block-specific BBMaps for phi writes.
llvm-svn: 252172
An incoming value from a block the is not inside the scop is an
external use, even if the phi is inside the scop. A previous fix in
r251208 did not apply if the phi is inside a non-affine subregion. We
move the check for this phi case before the non-affine subregion check.
llvm-svn: 252157
To simplify and correct the preloading of a base pointer origin, e.g.,
the base pointer for the current indirect invariant load, we now just
check if there is an invariant access class that involves the base
pointer of the current class.
llvm-svn: 251962
We do not need to model read-only statements in the SCoP as they will
not cause any side effects that are visible to the outside anyway.
Removing them should safe us time and might even simplify the ASTs we
generate.
Differential Revision: http://reviews.llvm.org/D14272
llvm-svn: 251948
If a base pointer of a preloaded value has a base pointer origin, thus it is
an indirect invariant load, we have to make sure the base pointer origin is
preloaded first.
llvm-svn: 251946
ScalarEvolution doesn't allow the operands of an AddRec to be variant in the
loop of the AddRec. When we rewrite parameter SCEVs it might seem like the
new SCEV violates this property and ScalarEvolution will trigger an
assertion. To avoid this we move the start part out of an AddRec when we
rewrite it, thus avoid the operands to be possibly variant completely.
llvm-svn: 251945
If a base pointer load is preloaded, we have change the base pointer of
the derived SAI. However, as the derived SAI relationship is is
coarse grained, we need to check if we actually preloaded the base
pointer or a different element of the base pointer SAI array.
llvm-svn: 251881
In some cases different memory accesses access the very same array using a
different multi-dimensional array layout where the same dimensions have
different sizes. Instead of asserting when encountering this issue, we
gracefully bail out for this scop.
This fixes llvm.org/PR25252
llvm-svn: 251791
We remove -polly-detect-unprofitable and -polly-no-early-exit. Both have been
superseeded by -polly-process-unprofitable and were only kept as aliases for
our buildbots to continue to work. As all buildbots have been moved to the new
options, we can now remove the old ones for good.
llvm-svn: 251787
These maps are only needed during the construction of a single region statement.
Clearing them is important, as we otherwise get an assert in case some of the
referenced values are erased before the RegionGenerator is deleted.
llvm-svn: 251341
Volatile or atomic memory accesses are currently not supported. Neither did
we think about any special handling needed nor do we support the unknown
instructions the alias set tracker turns them into sometimes. Before this
patch, us not supporting unkown instructions in an alias set caused the
following assertion failures:
Assertion `AG.size() > 1 && "Alias groups should contain at least two accesses"'
failed
llvm-svn: 251234
When verifying if a scop is still valid we rerun all analysis, but did not
update DetectionContextMap. This change ensures that information, e.g. about
non-affine regions, is correctly updated
llvm-svn: 251227
the size expression.
We previously only checked if the size expression is 'undef', but allowed size
expressions of the form 'undef * undef' by accident. After this change we now
require size expressions to be affine which implies no 'undef' appears anywhere
in the expression.
llvm-svn: 251225
of the Region are external.
During code generation we split off the parts of the PHI nodes in the entry
block, which have incoming blocks that are not part of the region. As these
split-off PHI nodes then are external uses, we consequently also need to model
these uses in ScopInfo.
llvm-svn: 251208
Such PHI nodes can not only appear in the ExitBlock of the Scop, but indeed
any scalar PHI node above the scop and used in the scop is modeled as scalar
read access.
llvm-svn: 251198
This change adds code to directly code-generate multi-exit PHI nodes, instead
of trying to reuse the EscapeMap infrastructure for this. Using escape maps
adds a level of indirection that is hard to understand and - more importantly -
breaks in certain cases.
Specifically, the original code relied on simplifyRegion() to split the original
PHI node in two PHI nodes, one merging the values coming from within the scop
and a second that merges the first PHI node with the values that come from
outside the scop. To generate code the first PHI node is then just handled like
any other in-scop value that is used somewhere outside the scop. This fails for
the case where all values from inside the scop are identical, as the first PHI
node is in such cases automatically simplified and eliminated by LLVM right at
construction. As a result, there is no instruction that can be pass to the
EscapeMap handling, which means the references in the second PHI node are not
updated and may still reference values from within the original scop that do not
dominate it.
Our new code iterates directly over all modeled ScopArrayInfo objects that
represent multi-exit PHI nodes and generates code for them without relying on
the EscapeMap infrastructure. Hence, it works also for the case where the first
PHI node is eliminated.
llvm-svn: 251191
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809
New values were always synthesized in the block of the instruction
that needed them. This is incorrect for PHI node whose' value must be
defined in the respective incoming block. This patch temporarily moves
the builder's insert point to the incoming block while synthesizing phi
node arguments.
This fixes PR25241 (http://llvm.org/bugs/show_bug.cgi?id=25241)
llvm-svn: 250693
There are several different kinds of constants that could occur in a
branch condition, however we can only handle the most interesting one
namely constant integers. To this end we have to treat others as
non-affine.
This fixes bug 25244.
llvm-svn: 250669
We build the schedule based on a traversal of the region and accumulate
information for each loop in it. The total schedule is associated with the
loop surrounding the SCoP, though it can happen that there are blocks in the
SCoP which are part of loops that are only partially in the SCoP. Instead of
associating information with them (they are not part of the SCoP and
consequently are not modeled) we have to associate the schedule information
with the surrounding loop if any.
This fixes bug 25240.
llvm-svn: 250668
Accesses that have a relative offset (in bytes) that is not divisible
by the type size (in bytes) will be represented as empty in the SCoP
description. This is on its own not good but it also crashed the
invariant load hoisting. This patch will fix the latter problem while
the former should be addressed too.
This fixes bug 25236.
llvm-svn: 250664
If the base pointer of a load is invariant and defined in the SCoP but
not loaded we cannot hoist the load as we would not hoist the base
pointer definition.
This fixes bug 25237.
llvm-svn: 250663
Sorting is replaced by a demand driven code generation that will pre-load a
value when it is needed or, if it was not needed before, at some point
determined by the order of invariant accesses in the program. Only in very
little cases this demand driven pre-loading will kick in, though it will
prevent us from generating faulty code. An example where it is needed is
shown in:
test/ScopInfo/invariant_loads_complicated_dependences.ll
Invariant loads that appear in parameters but are not on the top-level (e.g.,
the parameter is not a SCEVUnknown) will now be treated correctly.
Differential Revision: http://reviews.llvm.org/D13831
llvm-svn: 250655
Polly can now be used as a analysis only tool as long as the code
generation is disabled. However, we do not have an alternative to the
independent blocks pass in place yet, though in the relevant cases
this does not seem to impact the performance much. Nevertheless, a
virtual alternative that allows the same transformations without
changing the input region will follow shortly.
llvm-svn: 250652
Expressing this in terms of BlockGenerator::getOrCreateAlloca(const
ScopArrayInfo *Array) does not work as the MemoryAccess BasePtr is in case of
invariant load hoisting different to the ScopArrayInfo BasePtr. Until this is
investigated and fixed, we move back to code that just uses the baseptr of
MemoryAccess.
llvm-svn: 250637
This allows the caller to get the alloca locations of an array without the
need to thank if Array is a PHI or a non-PHI Array. We directly make use of this
in BlockGenerator::getOrCreateAlloca(MemoryAccess &Access).
llvm-svn: 250628
Other places (e.g. hoistInvariantLoads) assume that an empty lookup
will return nullptr. The situation can currently not arise because
MemoryAccesses are not removed before hoistInvariantLoads.
llvm-svn: 250627
While clang-format takes care that the line-length is not surpassed, the
resulting comments sometimes look not optimal. We re-flow the text in the
comment to avoid these ugly single-word lines.
llvm-svn: 250626
Instead of generating implicit loads within basic blocks, put them
before the instructions of the statment itself, including non-affine
subregions. The region's entry node is dominating all blocks in the
region and therefore the loaded value will be available there.
Implicit writes in block-stmts were already stored back at the end of
the block. Now, also generate the stores of non-affine subregions when
leaving the statement, i.e. in the exiting block.
This change is required for array-mapped implicits ("De-LICM") to
ensure that there are no dependencies of demoted scalars within
statments. Statement load all required values, operator on copied in
registers, and then write back the changed value to the demoted memory.
Lifetimes analysis within statements becomes unecessary.
Differential Revision: http://reviews.llvm.org/D13487
llvm-svn: 250625
Accesses for exit node phis will be handled separately by
buildPHIAccesses if there is more than one exiting edge,
buildScalarDependences does not need to create additional SCALAR
accesses.
This is a corrected version of r250517, which was reverted in r250607.
Differential Revision: http://reviews.llvm.org/D13848
llvm-svn: 250622
Instead of checking at code generation time for each ScopStmt if a scalar has
external uses, we just iterate over the ScopArrayInfo descriptions we have and
check each of these for possible external uses.
Besides being somehow clearer, this approach has the benefit that we will always
create valid LLVM-IR even in case we disable the code generation of ScopStmt
bodies e.g. for testing purposes.
llvm-svn: 250608
In r250408 'CHECK-NEXT: br' lines were removed as they also matched a
'%polly.subregion.iv.inc' instruction and did consequently not check what they
were supposed to check. However, without these lines we can not test that the
.s2a instructions that are not any more generated since r250411 really are not
emitted. Hence, we add back the CHECK-NEXT lines to ensure there are really no
instructions generated between the store that we check for and the branch at the
end of the basic block. To ensure we do not match too early, we now check for
'br i1' or 'br label'.
llvm-svn: 250435
When pulling a llvm::Value to be written as a PHI write, the former
code did only check whether it is within the same basic block, but it
could also be the same non-affine subregion. In that case some
unecessary pair of MemoryAccesses would have been created.
Two unit test were explicitely checking for the unecessary writes,
including the comments that the writes are unecessary.
llvm-svn: 250411
They happen to match
%polly.subregion.iv.inc = add i32 %polly.subregion.iv, 1
^^ ^^
that is, are misleading in what they actually check.
llvm-svn: 250408
When sharing the same map from old to new value, CodeGeneration would
reuse the same new value for each basic block. However, the SCEV
expander might emit code in a basic block that does not dominate a use
of the SCEV in another basic block. This test checks whether both such
blocks have their own expanded new values.
llvm-svn: 250389
We harden one test case by ensuring no additional stores may possibly be
introduced between the stores we check for and the basic block terminator
statements.
We also add a test case for the situation where a value that is passed from
a non-affine region to a PHI node does not dominate the exit of the non-affine
region. This case has come up in patch reviews, so we make sure it is properly
handled today and in the future.
llvm-svn: 250217
This will allow us to optimize C++ template code with Polly. This support is
mostly for debugging purpose and individual experiments. The ultimate goal is
still to run Polly later in the pass manager when inlining already happened.
llvm-svn: 250092
instead of llvm::PassManagerBuilder::EP_EarlyAsPossible. This will allow us
to run actual module passes in Polly's canonicalization sequence, but should
otherwise have only little impact.
llvm-svn: 250091
We also allow such products for cases where 'Parameter' is loaded within the
scop, but where we can dynamically verify that the value of 'Parameter' remains
unchanged during the execution of the scop.
This change relies on Polly's new RequiredILS tracking infrastructure recently
contributed by Johannes.
llvm-svn: 250019
The domain generation can handle lazy && and || by default but eager
evaluated expressions were dismissed as non-affine. With this patch we
will allow arbitrary combinations of and/or bit-operations in the
conditions of branches.
Differential Revision: http://reviews.llvm.org/D13624
llvm-svn: 249971
Helper functions in the BlockGenerators.h/cpp introduce dependences
from the frontend to the backend of Polly. As they are used in
ScopDetection, ScopInfo, etc. we move them to the ScopHelper file.
llvm-svn: 249919
If a (assumed) invariant location is loaded multiple times we
generated a parameter for each location. However, this caused compile
time problems for several benchmarks (e.g., 445_gobmk in SPEC2006 and
BT in the NAS benchmarks). Additionally, the code we generate is
suboptimal as we preload the same location multiple times and perform
the same checks on all the parameters that refere to the same value.
With this patch we consolidate the invariant loads in three steps:
1) During SCoP initialization required invariant loads are put in
equivalence classes based on their pointer operand. One
representing load is used to generate a parameter for the whole
class, thus we never generate multiple parameters for the same
location.
2) During the SCoP simplification we remove invariant memory
accesses that are in the same equivalence class. While doing so
we build the union of all execution domains as it is only
important that the location is at least accessed once.
3) During code generation we only preload one element of each
equivalence class with the unified execution domain. All others
are mapped to that preloaded value.
Differential Revision: http://reviews.llvm.org/D13338
llvm-svn: 249853
This was left out from the original patch proposed in
http://reviews.llvm.org/D13195
even though it is needed to define an order invariant loads
are hoisted.
llvm-svn: 249680
Drop an unused flag polly-allow-non-scev-backedge-taken-count and also
its occurrences from the tests.
Contributed-by: Chris Jenneisch <chrisj@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D13400
llvm-svn: 249675
ScopDetection users are interested in the detection context and access
these via different get-methods. However, not all information was
exposed though the number of maps to hold it was increasing steadily.
With this change only the detection contexts the rejection log and the
ValidRegions set are mapped. The former is needed, the second could be
integrated in the first and the ValidRegions set is only needed for the
deterministic order of the regions.
llvm-svn: 249614
This replaces the support for user defined error functions by a
heuristic that tries to determine if a call to a non-pure function
should be considered "an error". If so the block is assumed not to be
executed at runtime. While treating all non-pure function calls as
errors will allow a lot more regions to be analyzed, it will also
cause us to dismiss a lot again due to an infeasible runtime context.
This patch tries to limit that effect. A non-pure function call is
considered an error if it is executed only in conditionally with
regards to a cheap but simple heuristic.
llvm-svn: 249611
This patch allows invariant loads to be used in the SCoP description,
e.g., as loop bounds, conditions or in memory access functions.
First we collect "required invariant loads" during SCoP detection that
would otherwise make an expression we care about non-affine. To this
end a new level of abstraction was introduced before
SCEVValidator::isAffineExpr() namely ScopDetection::isAffine() and
ScopDetection::onlyValidRequiredInvariantLoads(). Here we can decide
if we want a load inside the region to be optimistically assumed
invariant or not. If we do, it will be marked as required and in the
SCoP generation we bail if it is actually not invariant. If we don't
it will be a non-affine expression as before. At the moment we
optimistically assume all "hoistable" (namely non-loop-carried) loads
to be invariant. This causes us to expand some SCoPs and dismiss them
later but it also allows us to detect a lot we would dismiss directly
if we would ask e.g., AliasAnalysis::canBasicBlockModify(). We also
allow potential aliases between optimistically assumed invariant loads
and other pointers as our runtime alias checks are sound in case the
loads are actually invariant. Together with the invariant checks this
combination allows to handle a lot more than LICM can.
The code generation of the invariant loads had to be extended as we
can now have dependences between parameters and invariant (hoisted)
loads as well as the other way around, e.g.,
test/Isl/CodeGen/invariant_load_parameters_cyclic_dependence.ll
First, it is important to note that we cannot have real cycles but
only dependences from a hoisted load to a parameter and from another
parameter to that hoisted load (and so on). To handle such cases we
materialize llvm::Values for parameters that are referred by a hoisted
load on demand and then materialize the remaining parameters. Second,
there are new kinds of dependences between hoisted loads caused by the
constraints on their execution. If a hoisted load is conditionally
executed it might depend on the value of another hoisted load. To deal
with such situations we sort them already in the ScopInfo such that
they can be generated in the order they are listed in the
Scop::InvariantAccesses list (see compareInvariantAccesses). The
dependences between hoisted loads caused by indirect accesses are
handled the same way as before.
llvm-svn: 249607
Value maps are created and used in many places and it is not always
possible to include CodeGen/Blockgenerators.h. To this end, ValueMapT
now lives in the ScopHelper.h which does not have any dependences itself.
This patch also replaces uses of different other value map types with
the ValueMapT.
llvm-svn: 249606
Do not use "Map[Key] == nullptr" to check if a Key is in the map, but use
"Map.find(Key) == Map.end()". Map[Key] always adds Key into the map, a
side-effect we do not want.
Found by inspection. This is hard to test outside of a targetted unit test,
which seems too much overhead for this individual issue.
llvm-svn: 249544
Clang's been taught to warn on more things here (unused values when
calling pure functions and ignoring their result, for example).
It might be better to figure out how to have cmake compile these tests
without -Werror/without any warnings enabled. But this'll do for now & I
don't know enough about cmake to fix it any other way, or to understand
the tradeoffs in that space.
llvm-svn: 249472
This single option replaces -polly-detect-unprofitable and -polly-no-early-exit
and is supposed to be the only option that disables compile-time heuristics that
aim to bail out early on scops that are believed to not benefit from Polly
optimizations.
Suggested-by: Johannes Doerfert
llvm-svn: 249426
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.
llvm-svn: 249422