We enumerated the cross product Domain x Scatter, but sorted only be the
scatter key. In case there are are multiple statement instances per
scatter value, the order between statement instances of the same loop
iteration was undefined.
Propertly enumerate and sort only by the scatter value, and group the
domains using the scatter dimension again.
Thanks to Leonard Chan for the report.
Make Polly look for unrolling metadata (https://llvm.org/docs/TransformMetadata.html#loop-unrolling) that is usually only interpreted by the LoopUnroll pass and apply it to the SCoP's schedule.
While not that useful by itself (there already is an unroll pass), it introduces mechanism to apply arbitrary loop transformation directives in arbitrary order to the schedule. Transformations are applied until no more directives are found. Since ISL's rescheduling would discard the manual transformations and it is assumed that when the user specifies the sequence of transformations, they do not want any other transformations to apply. Applying user-directed transformations can be controlled using the `-polly-pragma-based-opts` switch and is enabled by default.
This does not influence the SCoP detection heuristic. As a consequence, loop that do not fulfill SCoP requirements or the initial profitability heuristic will be ignored. `-polly-process-unprofitable` can be used to disable the latter.
Other than manually editing the IR, there is currently no way for the user to add loop transformations in an order other than the order in the default pipeline, or transformations other than the one supported by clang's LoopHint. See the `unroll_double.ll` test as example that clang currently is unable to emit. My own extension of `#pragma clang loop` allowing an arbitrary order and additional transformations is available here: https://github.com/meinersbur/llvm-project/tree/pragma-clang-loop. An effort to upstream this functionality as `#pragma clang transform` (because `#pragma clang loop` has an implicit transformation order defined by the loop pipeline) is D69088.
Additional transformations from my downstream pragma-clang-loop branch are tiling, interchange, reversal, unroll-and-jam, thread-parallelization and array packing. Unroll was chosen because it uses already-defined metadata and does not require correctness checks.
Reviewed By: sebastiankreutzer
Differential Revision: https://reviews.llvm.org/D97977
Regenerate the C++ wrapper header from the current isl version's
headers.
The most notable change is that some dimension sizes are represented by
an isl_size (instead of unsigned), which is a signed int. Additionally,
some function may return -1 in case of an error which already had been
fixed in the past. The C++ may no return -1 instead of UINT_MAX which
caused the problems.
Some types in Polly had been changed from unsigned to isl_size
(that were not already auto) and some loops/comparision had to be
changed to avoid unsigned/signed comparison warnings.
These are implementation details of the IslScheduleOptimizer pass
implementation and not use anywhere else. Hence, we can move them to the
cpp file and into an anonymous namespace.
Only getPartialTilePrefixes is, aside from the pass itself, used
externally (by the ScheduleOptimizerTest) and moved into the polly
namespace.
Move SimplifiyVisitor from Simplify.h to Simplify.cpp. It is not
relevant for applying the pass in either the NewPM or the legacyPM.
Rename it to SimplifyImpl to account for that.
This is possible due its state not being necessary to be preserved
between runs and thefore SimplifyImpl not needed to be held in the
pass object. Instead, SimplifyImpl is only instatiated for the
current Scop. In the NewPM as a function-local variable, and in the
legacy PM inside a llvm::Optional object because the state must be
preserved between the printScop (invoked by opt -analyze) and the most
recent runOnScop calls.
"using namespace" pollutes the namespace of every file that includes
such a header and universally considered a bad thing. Even the variant
namespace polly {
using namespace llvm;
}
(previously used by LoopGenerators.h) imports more symbols than the file
is in control of. The header may include a fixed set of files from LLVM,
but the header itself may by be included together with other headers
from LLVM. For instance, LLVM's MemorySSA.h and Polly's ScopInfo.h both
declare a class 'MemoryAccess' which may conflict.
Instead of prefixing everything in Polly's header files, this patch adds
'using' statements to import only the symbols that are actually
referenced in Polly. This approach is also used by MLIR to import
commonly used symbols into the mlir namespace.
This patch also puts the symbols declared in IslNodeBuilder.h into the
Polly namespace to also be able to use the imported symbols.
ZoneAlgorithms's computePHI relies on being provided with consistent a
schedule to compute the statement prodecessors of a statement containing
PHINodes. Otherwise unexpected results such as PHI nodes with multiple
predecessors can occur which would result in problems in the
algorithms expecting consistent data.
In the added test case, statement instances are scrubbed from the
SCoP their execution would result in undefined behavior (Due to a nsw
overflow). As already being undefined behavior in LLVM-IR, neither
AssumedContext nor InvalidContext are updated, giving computePHI no
means to avoid these cases.
Intoduce a new SCoP property, the DefinedBehaviorContext, that among
the runtime-checked conditions, also tracks the assumptions not needing
a runtime check, in particular those affecting the assumed control flow.
This replaces the manual combination of the 3 other contexts that was
already done in computePHI and setNewAccessRelation. Currently, the only
additional assumption is that loop induction variables will nsw flag for
not wrap, but potentially more can be added. Use in
hasFeasibleRuntimeContext, isl::ast_build and gisting are other
potential uses.
To limit computational complexity, the DefinedBehaviorContext is not
availabe if it grows too large (atm hardcoded to 8 disjuncts).
Possible other fixes include bailing out in computePHI when
inconsistencies are detected, choose an arbitrary value for inconsistent
cases (since it is undefined behavior anyways), or make the code
receiving the result from ComputePHI handle inconsistent data. All of
them reduce the quality of implementation having to bail out more often
and disabling the ability to assert on actually wrong results.
This fixes llvm.org/PR48783.
Declarations in headers should not be in the anonymous
namespace. Compilers also warn about the use of
<anon namespace>::SimplifyVisitor as a public field in
polly::SimplifyPass and polly::SimplifyPrinterPass.
Operand tree forwarding can cause the change of an access kind; in
particular change from a scalar kind to an array kind if the scalar
dependency is not necessary. Such an access cannot and doesn't need to
be forwarded anymore.
Fixes llvm.org/PR48034
Print to dbgs() any taken action.
Also, read-only scalars do not require any action unless
-polly-analyze-read-only-scalars=true is used. Better refect this by
using ForwardingAction::triviallyForwardable and thus not bumping the
statistics.
Recursively traversing the operand tree leads to an exponential blowup
if instructions are used multiple times due to every path leading to an
additional copy of the instructions after forwarding. This problem was
marked as a TODO in the code and was reported as a bug in llvm.org/PR47340.
Fix by caching already visited instructions and returning the cached
version when already visited. Instead of calling forwardTree() twice,
return a ForwardingAction structure that contains a lambda which will
carry-out the forwarding when requested. The lambdas are executed in
reverse-postorder to mimic the previous recursive calls unless there
is a reuse.
Fixes llvm.org/PR47340
VirtualUse of type UseKind::Inter expects the definition of a
llvm::Value to be represented in another statement. In the bug report
that statement has been removed due to its domain being empty.
Scop::InstStmtMap for the llvm::Value's defintion still pointed to the
removed statement, which resulted in the use-after-free.
The defintion statement was removed by Simplify because it was
considered to not be reachable by other uses; trivially because it is
never executed due to its empty domain. However, no such thing happend
to the using statement using the value altough its domain is also empty.
Fix by always removing statements with empty domains in Simplify since
these are not properly analyzable. A UseKind::Inter should always have a
statement with its defintion due to LLVM's SSA form.
Scop::removeStmtNotInDomainMap() also removes statements with empty
domains but does so without considering the context as used by
Simplify's analyzes.
In another angle, InstStmtMap pointing to removed statements should not
happen either and ForwardOpTree would have bailed out if the llvm::Value
definition was not represented by a statement. This will be corrected in
a followup-commit.
This fixes llvm.org/PR47098
The schedule of a fused loop has one isl_space per statement, such that
a conversion to a isl_map fails. However, the prevectorization is
interested in the schedule space only: Converting to the non-union
representation only after extracting the schedule range fixes the problem.
This fixes llvm.org/PR46578
The member LastSchedule was never set, such that printScop would always
print "n/a" instead of the last schedule.
To ensure that the isl_ctx lives as least as long as the stored
schedule, also store a shared_ptr.
Also set the schedule tree output style to ISL_YAML_STYLE_BLOCK to avoid
printing everything on a single line.
`opt -polly-opt-isl -analyze` will be used in the next commit.
../polly/lib/Transform/ScheduleOptimizer.cpp:812:54: warning: comparison of integers of different signs: 'isl_size' (aka 'int') and 'const unsigned int' [-Wsign-compare]
isl_schedule_node_band_n_member(Node.get()) >
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
The primary motivation is to fix an assertion failure in
isl_basic_map_alloc_equality:
isl_assert(ctx, room_for_con(bmap, 1), return -1);
Although the assertion does not occur anymore, I could not identify
which of ISL's commits fixed it.
Compared to the previous ISL version, Polly requires some changes for this update
* Since ISL commit
20d3574 "perform parameter alignment by modifying both arguments to function"
isl_*_gist_* and similar functions do not always align the paramter
list anymore. This caused the parameter lists in JScop files to
become out-of-sync. Since many regression tests use JScop files with
a fixed parameter list and order, we explicitly call align_params to
ensure a predictable parameter list.
* ISL changed some return types to isl_size, a typedef of (signed) int.
This caused some issues where the return type was unsigned int before:
- No overload for std::max(unsigned,isl_size)
- It cause additional 'mixed signed/unsigned comparison' warnings.
Since they do not break compilation, and sizes larger than 2^31
were never supported, I am going to fix it separately.
* With the change to isl_size, commit
57d547 "isl_*_list_size: return isl_size"
also changed the return value in case of an error from 0 to -1. This
caused undefined looping over isl_iterator since the 'end iterator'
got index -1, never reached from the 'begin iterator' with index 0.
* Some internal changes in ISL caused the number of operations to
increase when determining access ranges to determine aliasing
overlaps. In one test, this caused exceeding the default limit of
800000. The operations-limit was disabled for this test.
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
lib/Transform/ScheduleOptimizer.cpp fails to compile on Solaris, both on the 9.x
branch (first noticed when running test-release.sh without -no-polly) and on trunk:
/var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp: In function ‘MicroKernelParamsTy getMicroKernelParams(const llvm::TargetTransformInfo*, polly::MatMulInfoTy)’:
/var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:914:62: error: call of overloaded ‘sqrt(long unsigned int)’ is ambiguous
914 | ceil(sqrt(Nvec * LatencyVectorFma * ThroughputVectorFma) / Nvec) * Nvec;
| ^
In file included from /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/math.h:24,
from /usr/gcc/9/include/c++/9.1.0/cmath:45,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm-c/DataTypes.h:28,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/Support/DataTypes.h:16,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/Hashing.h:47,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/ArrayRef.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/include/polly/ScheduleOptimizer.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:48:
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’
220 | inline long double sqrt(long double __X) { return __sqrtl(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:186:15:
note: candidate: ‘float std::sqrt(float)’
186 | inline float sqrt(float __X) { return __sqrtf(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:74:15:
note: candidate: ‘double std::sqrt(double)’
74 | extern double sqrt __P((double));
| ^~~~
/var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:915:67:
error: call of overloaded ‘ceil(long unsigned int)’ is ambiguous
915 | int Mr = ceil(Nvec * LatencyVectorFma * ThroughputVectorFma / Nr);
| ^
In file included from /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/math.h:24,
from /usr/gcc/9/include/c++/9.1.0/cmath:45,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm-c/DataTypes.h:28,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/Support/DataTypes.h:16,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/Hashing.h:47,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/ArrayRef.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/include/polly/ScheduleOptimizer.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:48:
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:196:21: note: candidate: ‘long double std::ceil(long double)’
196 | inline long double ceil(long double __X) { return __ceill(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:160:15:
note: candidate: ‘float std::ceil(float)’
160 | inline float ceil(float __X) { return __ceilf(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:76:15:
note: candidate: ‘double std::ceil(double)’
76 | extern double ceil __P((double));
| ^~~~
Fixed by adding casts to disambiguate, checked that it now compiles on both
amd64-pc-solaris2.11 and sparcv9-sun-solaris2.11 and on x86_64-pc-linux-gnu.
Differential Revision: https://reviews.llvm.org/D67442
llvm-svn: 371825
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368935
FunctionAnalysisManagerModuleProxy started to be used by the
AlwaysInlinerPass in r363287 and therefore had to be registered in the
New PassManager.
Should fix the regression tests
Polly :: ScopInliner/invariant-load-func.ll
Polly :: ScopInliner/simple-inline-loop.ll
llvm-svn: 363572
Extension nodes make schedule trees are less flexible: Many operations,
such as rescheduling, do not work on such schedule trees with extension.
As such, some functionality such as determining parallel loops in isl's
AST are disabled.
Currently, only the pattern-matching generalized matrix-matrix
multiplication optimization adds extension nodes (to add copy-in
statements).
This patch removes all extension nodes as the last step of the schedule
optimization by hoisting the extension node's added domain up to the
root domain node. All following passes can assume that schedule trees
work without restrictions, including the parallelism test. Mark the
outermost loop of the optimized matrix-matrix multiplication as parallel
such that -polly-parallel is able to parallelize that loop.
Differential Revision: https://reviews.llvm.org/D58202
llvm-svn: 362257
isl_map_from_union_map cannot determine the map's space if the union_map
is empty. polly::singleton was designed for this case. We pass the
expected map space to avoid crashing in isl_map_from_union_map.
This fixes an issue found by the aosp buildbot. Thanks to Eli Friedman
for the reproducer.
llvm-svn: 361290
PHI nodes (reads) could point to multiple instances of predecessor
blocks (PHI writes) when in an invalid context. Fix by removing PHI
instances that are in an invalid or ouside assumed context.
This fixes llvm.org/PR41656.
llvm-svn: 360454
This removes unused includes (and forward declarations) as
suggested by include-what-you-use. If a transitive include of a removed
include is required to compile a file, I added the required header (or
forward declaration if suggested by include-what-you-use).
This should reduce compilation time and reduce the number of iterative
recompilations when a header was changed.
llvm-svn: 357209
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
The main difference in this change is that isl_stat is now always
checked by default. As we elminiated most used of isl_stat, thanks to
Philip Pfaffe's implementation of foreach, only a small set of changes
is needed.
This change does not include the following recent changes to isl's C++
bindings:
- stricter error handling for isl_bool
- dropping of the isl::namespace qualifiers
The former requires a larger patch in Polly and consequently should go
through a patch-review. The latter will be applied in the next commit to
keep this commit free from noise.
We also still apply a couple of other changes on top of the official isl
bindings. This delta is expected to shrink over time.
llvm-svn: 338504
This time we replace for loops where the return isl::stat::error has
been used to carry status information.
There are still two uses of foreach remaining as we do not have a
corresponding for implementation for pw_aff functions.
llvm-svn: 337239
Move the optimized getDefToTarget() from ForwardOpTree to ZoneAlgo such
that it can be used by makeValInst.
This reduces the compile time of GrTestUtils of the aosp buildbot from
2m46s to 21s, which should fix the timeout issue.
Differential Revision: https://reviews.llvm.org/D48579
llvm-svn: 335606
In case the schedule has not changed and the operand tree root uses a
value defined in an ancestor loop, the def-to-target mapping is trivial.
For instance, the SCoP
for (int i < 0; i < N; i+=1) {
DefStmt:
D = ...;
for (int j < 0; j < N; j+=1) {
TargetStmt:
use(D);
}
}
has DefStmt-to-TargetStmt mapping of
{ DefStmt[i] -> TargetStmt[i,j] }
This should apply on the majority of def-to-target mappings.
This patch detects this case and directly constructs the expected
mapping. It assumes that the mapping never crosses the loop header
DefStmt is in, which ForwardOpTree does not support at the moment
anyway.
Differential Revision: https://reviews.llvm.org/D47752
llvm-svn: 334134
The aosp-O3-polly-before-vectorizer-unprofitable buildbot currently
fails in ZoneAlgorithm::isNormalized, presumably because an
out-of-quota happens in that function.
Modify ZoneAlgorithm::isNormalized to return an isl::boolean such
it can report an error.
In the failing case, it was called in an assertion in ForwardOpTree.
Allow to pass the assertion in an out-of-quota event, a condition that
is later checked before forwarding an operand tree.
llvm-svn: 333709
When forwarding a LoadInst to another statement, a map that translates
their domain is needed. Before this patch, is was computed by appending
the def-to-use map to the def-to-target of the operand tree's target.
This patch lets the new method getDefToTarget do this. This is
computationally less expensive due to:
* Caching of the result such that it can be used for multiple operands
tree to the same target.
* The map is only computed when there is a LoadInst that needs it.
* It is only computed for the statement requiring the translator map,
instead of having an intermediate result for every edge in the
operand tree.
The downside is that this scheme cannot handle forwarding from a
previous loop iteration (which would require the entire path from
statement to target). Since ForwardOpTree currently does not support
forwarding across loop iterations (SCEV expressions would need to be
transformed), this was not needed anyway.
Differential Revision: https://reviews.llvm.org/D47385
llvm-svn: 333426
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
Differential Revision: https://reviews.llvm.org/D44978
llvm-svn: 332352
As part of this cleanup a couple of unnecessary isl::manage(obj.copy()) pattern
are eliminated as well.
We checked for all potential cleanups by scanning for:
"grep -R isl::manage\( lib/ | grep copy"
llvm-svn: 325558
Summary:
Most changes are mechanical, but in one place I changed the program semantics
by fixing a likely bug:
In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the
error-case. Before, when the call to `addNonEmptyDomainConstraints()`
returned a null set, this (probably) accidentally worked because
isl_bool_error converts to true. I'm checking for nullptr now.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39971
llvm-svn: 318632
Put the analysis part of reloadKnownContent under an isl
max-operations quota scope, as has already been done for
forwardKnownLoad.
This should fix the aosp timeout of "GrTestUtils.cpp".
llvm-svn: 317495
Represent PHIs by their incoming values instead of an opaque value of
themselves. This allows ForwardOpTree to "look through" the PHIs and
forward the incoming values since forwardings PHIs is currently not
supported.
This is particularly useful to cope with PHIs inserted by GVN LoadPRE.
The incoming values all resolve to a load from a single array element
which then can be forwarded.
It should in theory also reduce spurious conflicts in value mapping
(DeLICM), but I have not yet found a profitable case yet, so it is
not included here.
To avoid transitive closure and potentially necessary overapproximations
of those, PHIs that may reference themselves are excluded from
normalization and keep their opaque self-representation.
Differential Revision: https://reviews.llvm.org/D39333
llvm-svn: 317008
ForwardOpTree may already transform a scalar access to an array
accesses. The access remains implicit (isOriginalScalarKind(), meaning
that the access is always executed at the begin/end of a statement), but
targets an array (isLatestArrayKind(), which is unrelated to whether the
execution is implicit/explicit).
Fix by properly using isOriginalXXX() to determine execution order.
This fixes the buildbots on MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG.
llvm-svn: 316995
For scalar accesses, change the access target to an array element that
is known to contain the same value.
This may become an alternative to forwardKnownLoad which creates new
loads (and therefore closer to forwarding speculatives). Reloading does
not require the known value originating from a load, but can be a store
as well.
Differential Revision: https://reviews.llvm.org/D39325
llvm-svn: 316766
Add check and skip when the store used to determine the target accesses
multiple array elements. Only a single array location should for
mapping the scalar. Having multiple creates problems when deciding which
element to load from. While MemoryAccess::getAddressFunction() should
select just one of them, other problems arise in code that assumes
that there is just one target element per statement instance.
This fixes llvm.org/PR34989
This also reverts r313902 which fixed llvm.org/PR34485 also caused by
a non-functional target array element. This patch avoids the situation
to occur in the first place.
llvm-svn: 316432
These functions print a multi-line and sorted representation of unions
of polyhedra. Each polyhedron (basic_{ast/map}) has its own line.
First sort key is the polyhedron's hierachical space structure.
Secondary sort key is the lower bound of the polyhedron, which should
ensure that the polyhedral are printed in approximately ascending order.
Example output of dumpPw():
[p_0, p_1, p_2] -> {
Stmt0[0] -> [0, 0];
Stmt0[i0] -> [i0, 0] : 0 < i0 <= 5 - p_2;
Stmt1[0] -> [0, 2] : p_1 = 1 and p_0 = -1;
Stmt2[0] -> [0, 1] : p_1 >= 3 + p_0;
Stmt3[0] -> [0, 3];
}
In contrast dumpExpanded() prints each point in the sets, unless there
is an unbounded dimension that cannot be expandend.
This is useful for reduced test cases where the loop counts are set to
some constant to understand a bug.
Example output of dumpExpanded(
{ [MemRef_A[i0] -> [i1]] : (exists (e0 = floor((1 + i1)/3): i0 = 1 and
3e0 <= i1 and 3e0 >= -1 + i1 and i1 >= 15 and i1 <= 25)) or (exists (e0
= floor((i1)/3): i0 = 0 and 3e0 < i1 and 3e0 >= -2 + i1 and i1 > 0 and
i1 <= 11)) }):
{
[MemRef_A[0] ->[1]];
[MemRef_A[0] ->[2]];
[MemRef_A[0] ->[4]];
[MemRef_A[0] ->[5]];
[MemRef_A[0] ->[7]];
[MemRef_A[0] ->[8]];
[MemRef_A[0] ->[10]];
[MemRef_A[0] ->[11]];
[MemRef_A[1] ->[15]];
[MemRef_A[1] ->[16]];
[MemRef_A[1] ->[18]];
[MemRef_A[1] ->[19]];
[MemRef_A[1] ->[21]];
[MemRef_A[1] ->[22]];
[MemRef_A[1] ->[24]];
[MemRef_A[1] ->[25]]
}
Differential Revision: https://reviews.llvm.org/D38349
llvm-svn: 314525
Remove an assertion that tests the injectivity of the
PHIRead -> PHIWrite relation. That is, allow a single PHI write to be
used by multiple PHI reads. This may happen due to some statements
containing the PHI write not having the statement instances that would
overwrite the previous incoming value due to (assumed/invalid) contexts.
This result in that PHI write is mapped to multiple targets which is not
supported. Codegen will select one one of the targets using
getAddressFunction(). However, the runtime check should protect us from
this case ever being executed.
We therefore allow injective PHI relations. Additional calculations to
detect/santitize this case would probably not be worth the compuational
effort.
This fixes llvm.org/PR34485
llvm-svn: 313902
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.
When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.
Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.
This fixes llvm.org/PR34441
llvm-svn: 313842
Fix walking over the schedule tree to collect its properties
(Number of permutable bands etc.).
Also add regression tests for these statistics.
llvm-svn: 313750
Computing the reaching definition in forwardTree() can take a long time
if the coefficients are large. When the forwarding is
carried-out (doIt==true), forwardTree() must execute entirely or not at
all to get a consistent output, which means we cannot just allow
out-of-quota errors to happen in the middle of the processing.
We introduce the class IslQuotaScope which allows to opt-in code that is
conformant and has been tested with out-of-quota events. In case of
ForwardOpTree, out-of-quota is allowed during the operand tree
examination, but not during the transformation. The same forwardTree()
recursion is used for examination and execution, meaning that the
reaching definition has already been computed in the examination tree
walk and cached for reuse in the transformation tree walk.
This should fix the time-out of grtestutils.ll of the asop buildbot. If
the compilation still takes too long, we can reduce the max-operations
allows for -polly-optree.
Differential Revision: https://reviews.llvm.org/D37984
llvm-svn: 313690
cl::opt<unsigned long> is not specialized and hence the option
-polly-optree-max-ops impossible to use.
Replace by supported option cl::opt<unsigned>.
Also check for an error state when computing the written value, which
happens when the quota runs out.
llvm-svn: 313546
The remaining parts produced by the full partial tile isolation can contain
hot spots that are worth to be optimized. Currently, we rely on the simple
loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts.
However, the approach can suffer from the lack of the information about
aliasing that Polly provides using additional alias metadata or/and the lack
of the information required by simple loop unrolling pass.
This patch is the first step to optimize the remaining parts. To do it, we
unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps
to increase the performance of the generated code from 39.87 GFlop/s to
49.23 GFlop/s.
The next possible step is to avoid unrolling performed by Polly in case of
isolated and remaining parts and rely only on simple loop unrolling pass and
the Loop vectorizer.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D37692
llvm-svn: 312929
Up to now ZoneAlgo considered array elements access by something else
than a LoadInst or StoreInst as not analyzable. This patch removes that
restriction by using the unknown ValInst to describe the written
content, repectively the element type's null value in case of memset.
Differential Revision: https://reviews.llvm.org/D37362
llvm-svn: 312630
Since r312249 instructions of a entry block of region statements are
not marked as root anymore and hence can theoretically be removed
if unused. Theoretically, because the instruction list was not changed.
Still, MemoryAccesses for unused instructions were removed. This lead
to a failed assertion in the code generator when the MemoryAccess for
the still listed instruction was not found.
This hould fix the
Assertion failed: ArrayAccess && "No array access found for instruction!",
file ScopInfo.h, line 1494
compiler crashes.
llvm-svn: 312566
Before this patch, OpTree did not consider forwarding an operand tree consisting
of only single LoadInst as useful. The motivation was that, like an access to a
read-only variable, it would just replace one MemoryAccess by another. However,
in contrast to read-only accesses, this would replace a scalar access by an
array access, which is something worth doing.
In addition, leaving scalar MemoryAccess is problematic in that VirtualUse
prioritizes inter-Stmt use over intra-Stmt. It was possible that the same LLVM
value has a MemoryAccess for accessing the remote Stmt's LoadInst as well as
having the same LoadInst in its own instruction list (due to being forwarded
from another operand tree).
With this patch we ensure that if a LoadInst is forwarded is any operand tree,
also the operand tree containing just the LoadInst is forwarded as well, which
effectively removes the scalar MemoryAccess such that only the array access
remains, not both.
Thanks Michael for the detailed explanation.
Reviewers: Meinersbur, bellu, singam-sanjay, gareevroman
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37424
llvm-svn: 312456