Do not add a "_last" suffix to the statement name if there is no (other)
main statement for a basic block. In other words, it becomes the main
statement itself. This further reduces the statement naming difference
between -polly-stmt-granularity=bb and
-polly-stmt-granularity=scalar-indep.
llvm-svn: 324168
Summary:
This change is step four in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: jdoerfert, grosser, bollu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41677
llvm-svn: 323618
Theoretically, a PHI write can be added to any statement that represents
the incoming basic block. We previously always chose the last because
the incoming value's definition is guaranteed to be defined.
With this patch the PHI write is added to the statement that defines the
incoming value. It avoids the requirement for a scalar dependency between
the defining statement and the statement containing the write. As such the
logic for -polly-stmt-granularity=scalar-indep that ensures that there is
such scalar dependencies can be removed.
Differential Revision: https://reviews.llvm.org/D42147
llvm-svn: 323284
VirtualUse::create is only called for MemoryKind::Value, but its
consistency nonetheless checked in verifyUses(). PHI uses are always
inter-stmt dependencies, which was not considered by the constructor
method. The virtual and non-virtual execution paths were the same, such
that verifyUses did not encounter any inconsistencies.
llvm-svn: 323283
except Darwin and Windows. This prevents inserting an environment
variable with an empty name (which is illegal and leads to a Python
exception) on any of the BSDs.
llvm-svn: 323041
Summary:
Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset
intrinsics. This change updates the polly tests for this change.
The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.
This change removes the alignment argument in favour of placing the alignment
attribute on the source and destination pointers of the memory intrinsic call.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
At this time the source and destination alignments must be the same (Step 1).
Step 2 of the change, to be landed shortly, will relax that contraint and allow
the source and destination to have different alignments.
llvm-svn: 322963
The goal is to have -polly-stmt-granularity=bb and
-polly-stmt-granularity=scalar-indep to have the same names if there is
just one statement per basic block.
This fixes a fluke when Polybench's jacobi-2d is optimized differently
depending on the -polly-stmt-granularity option, although both options
create the same SCoP, just with different statement names.
The new naming scheme is:
With -polly-use-llvm-names=0:
Stmt<BBIdx as decimal><Idx within BB as letter>
With -polly-use-llvm-names=1:
Stmt_BBName_<Idx within BB as letter>
The <Idx within BB> suffix is omitted for the main statement of a BB. The
main statement is either the one containing the first store or call
(those cannot be removed by the simplifyer), or if there is no such
instruction, the first. If after simplification there is just a single
statement left, it should be the main statement and have the same names as
with -polly-stmt-granularity=bb.
Differential Revision: https://reviews.llvm.org/D42136
llvm-svn: 322852
This will give control of the statement's name to the caller.
Required to give -polly-stmt-granularity=scalar-indep more control
over the name of the generated statement in a follow-up commit.
llvm-svn: 322851
isl_val_get_num_si crashes on overflow, so don't use it on arbitrary
integers.
Testcase only crashes on platforms where long is 32 bits because of the
signature of isl_val_get_num_si; not sure if it's possible to write a
testcase which crashes if long is 64 bits.
There are a few other places in polly which use isl_val_get_num_si;
they probably need to be fixed as well. I don't think polly uses any
of the other "long" isl APIs in an unsafe manner.
Differential Revision: https://reviews.llvm.org/D42129
llvm-svn: 322766
Print same or similar structure elements together. Previously, the
value could take more importance that the space structure if visited
first in the space nest tree.
Before:
{
Left[0] -> Right[i]: i >= 0;
Left[1] -> AnotherRight[i];
Left[2] -> Right[-1]
}
After:
{
Left[0] -> Right[i]: i >= 0;
Left[2] -> Right[-1];
Left[1] -> AnotherRight[i]
}
llvm-svn: 322581
CMake insists that for each target, one uses only the non-keyword
version of target_link_library
target_link_library(mytarget lib)
or the one with PUBLIC/PRIVATE/INTERFACE keyword:
target_link_library(mytarget PUBLIC lib)
Otherwise, CMake fails with the error message:
The keyword signature for target_link_libraries has already been used with
the target "mytarget". All uses of target_link_libraries with a target
must be either all-keyword or all-plain.
Change all occurances of target_link_library to the newer keyworded
version to avoid such errors. Some already have been changed in r319840,
but might not be sufficient for all build configurations to build
the doxygen manual.
Reported-by: Tanya Lattner <tanyalattner@llvm.org>
llvm-svn: 322376
Memory transfer instructions take two pointers. It is not defined to
which of those a noalias annotation applies. To ensure correctness,
do not add noalias annotations to memcpy/memmove instructions anymore.
The caused a miscompile with test-suite's MultiSource/Applications/obsequi.
Since r321138, the MemCpyOpt pass would remove memcpy/memmove calls if
known to copy uninitialized memory. In that case, it was initialized
by another memcpy, but the annotation for the target pointer said
it would not alias. The annotation was actually meant for the source
pointer, which was was an alloca and could not alias with the target
pointer.
llvm-svn: 321371
If an out-of-quota error occurred, the last error would be
isl_error_quota unless a different error occured. We typically check
whether the max-operations occured by comparing to that error value
after leaving the quota guard. This would check whether there ever
was a quota-error, not just in the last quota guards.
The observable bug occurred if the max-operations limit was reached in
DeLICM, and if -polly-dependences-computout=0, DependenceInfo would
think that the quota for computing dependencies was the reason,
i.e., fail the operation even if the calculation itself was successful.
Fix by reseting the last error to isl_error_none when entering a
quota guard, signaling that no quota error occured unless in the
guard's scope.
llvm-svn: 321329
We currently use target_link_libraries without an explicit scope
specifier (INTERFACE, PRIVATE or PUBLIC) when linking executables.
Dependencies added in this way apply to both the target and its
dependencies, i.e. they become part of the executable's link interface
and are transitive.
Transitive dependencies generally don't make sense for executables,
since you wouldn't normally be linking against an executable. This also
causes issues for generating install export files when using
LLVM_DISTRIBUTION_COMPONENTS. For example, clang has a lot of LLVM
library dependencies, which are currently added as interface
dependencies. If clang is in the distribution components but the LLVM
libraries it depends on aren't (which is a perfectly legitimate use case
if the LLVM libraries are being built static and there are therefore no
run-time dependencies on them), CMake will complain about the LLVM
libraries not being in export set when attempting to generate the
install export file for clang. This is reasonable behavior on CMake's
part, and the right thing is for LLVM's build system to explicitly use
PRIVATE dependencies for executables.
Unfortunately, CMake doesn't allow you to mix and match the keyword and
non-keyword target_link_libraries signatures for a single target; i.e.,
if a single call to target_link_libraries for a particular target uses
one of the INTERFACE, PRIVATE, or PUBLIC keywords, all other calls must
also be updated to use those keywords. This means we must do this change
in a single shot. I also fully expect to have missed some instances; I
tested by enabling all the projects in the monorepo (except dragonegg),
and configuring both with and without shared libraries, on both Darwin
and Linux, but I'm planning to rely on the buildbots for other
configurations (since it should be pretty easy to fix those).
Even after this change, we still have a lot of target_link_libraries
calls that don't specify a scope keyword, mostly for shared libraries.
I'm thinking about addressing those in a follow-up, but that's a
separate change IMO.
Differential Revision: https://reviews.llvm.org/D40823
llvm-svn: 319840
Summary:
This can be seen as a follow-up on my previous differential [D33411](https://reviews.llvm.org/D33411).
We received a bug report where this error was triggered. I have tried my best to recreate the issue in a minimal lit testcase which is also part of this differential.
I only handle return instructions as predecessors to a virtual TLR-exit right now. From inspecting the codebase, it seems `unreachable` instructions may also be of interest here. If requested, I can extend my patches to consider them as well. I would also apply this on `ScopHelper.cpp::isErrorBlock` (see D33411), of course.
Reviewers: philip.pfaffe, bollu
Reviewed By: bollu
Subscribers: Meinersbur, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D40492
llvm-svn: 319431
Summary: We want to automatically copy the appropriate mailing list
for review requests to the polly repository.
For context, see the proposal and discussion here:
http://lists.llvm.org/pipermail/cfe-dev/2017-November/056032.html
Similar to D40179, I set up a new Diffusion repository with callsign
"PLO" for polly:
https://reviews.llvm.org/source/polly/
This explicitly updates polly's .arcconfig to point to the new C
repository in Diffusion, which will let us use Herald rule H270.
llvm-svn: 319056
Isl does not allow generating isl_ast_expr from an isl_pw_aff that has an
empty domain (i.e. has no pieces). We already detected the case if the
isl_pw_aff comes with an empty domain.
isl_ast_build also considers the domain empty if it is disjoint with the
parameter context (e.g. parameters values that we exclude by runtime
versioning).
Intersect the access relation domain with the parameter context to
also detect such practically empty access domains. The effective
pointer used in the generated code is unimportand because it will never
be executed.
This fixes llvm.org/PR35362
llvm-svn: 318806
Summary:
Most changes are mechanical, but in one place I changed the program semantics
by fixing a likely bug:
In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the
error-case. Before, when the call to `addNonEmptyDomainConstraints()`
returned a null set, this (probably) accidentally worked because
isl_bool_error converts to true. I'm checking for nullptr now.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39971
llvm-svn: 318632
Summary:
There is a potential use-after-free bug in Scop::buildSchedule(Region *,
LoopStackTy &, LoopInfo &). Before, we took a reference to LoopStack.back()
which is a use after free, since back is popped off further below. This didn't
crash before by pure chance, since LoopStack is actually a vector, and the
memory isn't freed upon pop. I turned this into an iterator-based algorithm.
Reviewers: grosser, bollu, Meinersbur
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D39979
llvm-svn: 318415
Put the analysis part of reloadKnownContent under an isl
max-operations quota scope, as has already been done for
forwardKnownLoad.
This should fix the aosp timeout of "GrTestUtils.cpp".
llvm-svn: 317495
Represent PHIs by their incoming values instead of an opaque value of
themselves. This allows ForwardOpTree to "look through" the PHIs and
forward the incoming values since forwardings PHIs is currently not
supported.
This is particularly useful to cope with PHIs inserted by GVN LoadPRE.
The incoming values all resolve to a load from a single array element
which then can be forwarded.
It should in theory also reduce spurious conflicts in value mapping
(DeLICM), but I have not yet found a profitable case yet, so it is
not included here.
To avoid transitive closure and potentially necessary overapproximations
of those, PHIs that may reference themselves are excluded from
normalization and keep their opaque self-representation.
Differential Revision: https://reviews.llvm.org/D39333
llvm-svn: 317008
ForwardOpTree may already transform a scalar access to an array
accesses. The access remains implicit (isOriginalScalarKind(), meaning
that the access is always executed at the begin/end of a statement), but
targets an array (isLatestArrayKind(), which is unrelated to whether the
execution is implicit/explicit).
Fix by properly using isOriginalXXX() to determine execution order.
This fixes the buildbots on MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG.
llvm-svn: 316995
When collecting base pointers that need to be made available in parallel
subfunctions, use the base pointer associated with the latest
ScopArrayInfo, instead of the original one.
llvm-svn: 316983
Summary:
When GPUNodeBuilder creates loops inside the kernel, it dispatches to
IslNodeBuilder. This however is surprisingly dangerous, since it accesses the
AST Node's user through the wrong type. This patch fixes this problem by
overriding createFor correctly.
This fixes PR35010.
Reviewers: grosser, bollu, Meinersbur
Reviewed By: Meinersbur
Subscribers: Meinersbur, nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D39364
llvm-svn: 316872
Add missing %loadPolly directive to support out of tree builds. One of
the changes is somewhat bigger, because the directive turns on LLVM
names, and the testcase deosn't use those.
llvm-svn: 316870
For scalar accesses, change the access target to an array element that
is known to contain the same value.
This may become an alternative to forwardKnownLoad which creates new
loads (and therefore closer to forwarding speculatives). Reloading does
not require the known value originating from a load, but can be a store
as well.
Differential Revision: https://reviews.llvm.org/D39325
llvm-svn: 316766
Previously we marked scalars based on the original access function. However,
when a scalar read access is redirected, the original definition
(or incoming values of a PHI) is not used anymore, and can be deleted
(unless referenced by use that has not been redirected).
llvm-svn: 316660
Add check and skip when the store used to determine the target accesses
multiple array elements. Only a single array location should for
mapping the scalar. Having multiple creates problems when deciding which
element to load from. While MemoryAccess::getAddressFunction() should
select just one of them, other problems arise in code that assumes
that there is just one target element per statement instance.
This fixes llvm.org/PR34989
This also reverts r313902 which fixed llvm.org/PR34485 also caused by
a non-functional target array element. This patch avoids the situation
to occur in the first place.
llvm-svn: 316432
After rL315683 (improve SCEV to calculate max BETakenCount when end
bound of loop is variant and loop is of form {Start,+1, Stride} LT End)
this test in polly started failing.
However, as discussed in https://reviews.llvm.org/rL315683,
this polly test is not a loops bound test and the MaxBECount calculated by
SCEV looks correct. The max BECount is the value calculated even when the end
bound of loop is invariant.
As discussed with Tobias offline, I'm marking this as an XFAIL, until he
gets a chance to update the testcase, so the build bot goes to green.
llvm-svn: 315912
The option splits BasicBlocks into minimal statements such that no
additional scalar dependencies are introduced.
The algorithm is based on a union-find structure, and unites sets if
putting them into separate statements would introduce a scalar
dependencies. As a consequence, instructions may be split into separate
statements such their relative order is different than the statements
they are in. This is accounted for instructions whose relative order
matters (e.g. memory accesses).
The algorithm is generic in that heuristic changes can be made
relatively easily. We might relax the order requirement for read-reads
or accesses to different base pointers. Forwardable instructions can be
made to not cause a join.
This implementation gives us a speed-up of 82% in SPEC 2006 456.hmmer
benchmark by allowing loop-distribution in a hot loop such that one of
the loops can be vectorized.
Differential Revision: https://reviews.llvm.org/D38403
llvm-svn: 314983
The option is introduced with only one possible value
-polly-stmt-granularity=bb which represents the current behaviour, which
is outlined into the new function buildSequentialBlockStmts().
More options will be added in future commits.
llvm-svn: 314900
We make sure that the final reload of an invariant scalar memory access uses the
same stack slot into which the invariant memory access was stored originally.
Earlier, this was broken as we introduce a new stack slot aside of the preload
stack slot, which remained uninitialized and caused our escaping loads to
contain garbage. This happened due to us clearing the pre-populated values
in EscapeMap after kernel code generation. We address this issue by preserving
the original host values and restoring them after kernel code generation.
EscapeMap is not expected to be used during kernel code generation, hence we
clear it during kernel generation to make sure that any unintended uses are
noticed.
llvm-svn: 314894
This test XFAILs two test that start to fail when verifying DT's
DFS numbers, as per Tobias' suggestion.
Related VerifyDFSNumbers patch: D38331.
llvm-svn: 314800
Iterate over statement instructions instead over basic block
instructions when creating MemoryAccesses. It allows making the creation
of MemoryAccesses independent of how the basic blocks are split into
multiple ScopStmts.
llvm-svn: 314665
Create the MemoryAccesses of invariant loads separately and before
all other MemoryAccesses.
Invariant loads are classified as synthesizable and therefore are not
contained in any statement. When iterating over all instructions of all
statements, the invariant loads are consequently not processed and
iterating over them separately becomes necessary.
This patch can change the order in which MemoryAccesses are created, but
otherwise has no functional change.
Some temporary code is introduced to ensure correctness, but will be
removed in the next commit.
llvm-svn: 314664
Instructions that compute escaping values might be synthesizable and
therefore not contained in any ScopStmt. When buildAccessFunctions is
changed to only iterate over the instruction list of statement,
"free" instructions still need to be written. We do this after the
main MemoryAccesses have been created.
This can change the order in which MemoryAccesses are created, but has
otherwise no functional change.
llvm-svn: 314663
Decouple handling of exit block PHIs and other MemoryAccesses. Exit PHIs
only need the PHI handling part of buildAccessFunctions but requires
code for skipping them in while creating other MemoryAcesses.
This change will make it easier to modify how statement MemoryAccesses
are created without considering the exit block special case.
llvm-svn: 314662
Loads before the SCoP are always invariant within the SCoP and
therefore are no "required invariant loads". An assertion failes in
ScopBuilder when it finds such an invariant load.
Fix by not adding such loads to the required invariant load list. This
likely will cause the region to be not considered a valid SCoP.
We may want to unconditionally accept instructions defined before
the region as valid invariant conditions instead of rejecting them.
This fixes a compilation crash of SPEC CPU2006 453.povray's
render.cpp.
llvm-svn: 314636
This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and
makes sure that we fall back to the original code. It seems when invariant load
hoisting was introduced to the GPGPU backend we missed to reset the RTC flag,
such that kernels where invariant load hoisting failed executed the 'optimized'
SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this
results in hard to debug issues that are a lot of fun to debug.
llvm-svn: 314624
These functions print a multi-line and sorted representation of unions
of polyhedra. Each polyhedron (basic_{ast/map}) has its own line.
First sort key is the polyhedron's hierachical space structure.
Secondary sort key is the lower bound of the polyhedron, which should
ensure that the polyhedral are printed in approximately ascending order.
Example output of dumpPw():
[p_0, p_1, p_2] -> {
Stmt0[0] -> [0, 0];
Stmt0[i0] -> [i0, 0] : 0 < i0 <= 5 - p_2;
Stmt1[0] -> [0, 2] : p_1 = 1 and p_0 = -1;
Stmt2[0] -> [0, 1] : p_1 >= 3 + p_0;
Stmt3[0] -> [0, 3];
}
In contrast dumpExpanded() prints each point in the sets, unless there
is an unbounded dimension that cannot be expandend.
This is useful for reduced test cases where the loop counts are set to
some constant to understand a bug.
Example output of dumpExpanded(
{ [MemRef_A[i0] -> [i1]] : (exists (e0 = floor((1 + i1)/3): i0 = 1 and
3e0 <= i1 and 3e0 >= -1 + i1 and i1 >= 15 and i1 <= 25)) or (exists (e0
= floor((i1)/3): i0 = 0 and 3e0 < i1 and 3e0 >= -2 + i1 and i1 > 0 and
i1 <= 11)) }):
{
[MemRef_A[0] ->[1]];
[MemRef_A[0] ->[2]];
[MemRef_A[0] ->[4]];
[MemRef_A[0] ->[5]];
[MemRef_A[0] ->[7]];
[MemRef_A[0] ->[8]];
[MemRef_A[0] ->[10]];
[MemRef_A[0] ->[11]];
[MemRef_A[1] ->[15]];
[MemRef_A[1] ->[16]];
[MemRef_A[1] ->[18]];
[MemRef_A[1] ->[19]];
[MemRef_A[1] ->[21]];
[MemRef_A[1] ->[22]];
[MemRef_A[1] ->[24]];
[MemRef_A[1] ->[25]]
}
Differential Revision: https://reviews.llvm.org/D38349
llvm-svn: 314525
Summary:
Add a document which describes:
- GEMM performance comparison.
- An experiment that measures the compile time impact
of enabling Polly when compiling LLVM+Clang+Polly.
Contributed-by: Theodoros Theodoridis<theodoros.theodoridis@inf.ethz.ch>
Differential Revision: https://reviews.llvm.org/D38330
llvm-svn: 314419
In order for debuggers to be able to call an inline method, it must have
been instantiated somewhere. The dump() methods are usually not used, so
add an instantiation in debug builds.
This allows to call .dump() on any isl++ object from the gcc/gdb and
Visual Studio debugger in debug builds with assertions enabled.
In optimized builds, even with assertions enabled, the dump() methods
are also inlined in GICHelper.cpp, so no externally visible symbols
will be available either.
Differential Revision: https://reviews.llvm.org/D38198
llvm-svn: 314395
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
This is a recommit of r312663 after fixing
test/Isl/CodeGen/phi_after_error_block_outside_of_scop.ll
llvm-svn: 314075
Such RTCs may introduce integer wrapping intrinsics with more than 64 bit,
which are translated to library calls on AOSP that are not part of the
runtime and will consequently cause linker errors.
Thanks to Eli Friedman for reporting this issue and reducing the test case.
llvm-svn: 314065
Remove an assertion that tests the injectivity of the
PHIRead -> PHIWrite relation. That is, allow a single PHI write to be
used by multiple PHI reads. This may happen due to some statements
containing the PHI write not having the statement instances that would
overwrite the previous incoming value due to (assumed/invalid) contexts.
This result in that PHI write is mapped to multiple targets which is not
supported. Codegen will select one one of the targets using
getAddressFunction(). However, the runtime check should protect us from
this case ever being executed.
We therefore allow injective PHI relations. Additional calculations to
detect/santitize this case would probably not be worth the compuational
effort.
This fixes llvm.org/PR34485
llvm-svn: 313902
Before this patch, ScopInfo::getValueDef(SAI) used
getStmtFor(Instruction*) to find the MemoryAccess that writes a
MemoryKind::Value. In cases where the value is synthesizable within the
statement that defines, the instruction is not added to the statement's
instruction list, which means getStmtFor() won't return anything.
If the synthesiable instruction is not synthesiable in a different
statement (due to being defined in a loop that and ScalarEvolution
cannot derive its escape value), we still need a MemoryKind::Value
and a write to it that makes it available in the other statements.
Introduce a separate map for this purpose.
This fixes MultiSource/Benchmarks/MallocBench/cfrac where
-polly-simplify could not find the writing MemoryAccess for a use. The
write was not marked as required and consequently was removed.
Because this could in principle happen as well for PHI scalars,
add such a map for PHI reads as well.
llvm-svn: 313881
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.
When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.
Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.
This fixes llvm.org/PR34441
llvm-svn: 313842
Fix walking over the schedule tree to collect its properties
(Number of permutable bands etc.).
Also add regression tests for these statistics.
llvm-svn: 313750
Computing the reaching definition in forwardTree() can take a long time
if the coefficients are large. When the forwarding is
carried-out (doIt==true), forwardTree() must execute entirely or not at
all to get a consistent output, which means we cannot just allow
out-of-quota errors to happen in the middle of the processing.
We introduce the class IslQuotaScope which allows to opt-in code that is
conformant and has been tested with out-of-quota events. In case of
ForwardOpTree, out-of-quota is allowed during the operand tree
examination, but not during the transformation. The same forwardTree()
recursion is used for examination and execution, meaning that the
reaching definition has already been computed in the examination tree
walk and cached for reuse in the transformation tree walk.
This should fix the time-out of grtestutils.ll of the asop buildbot. If
the compilation still takes too long, we can reduce the max-operations
allows for -polly-optree.
Differential Revision: https://reviews.llvm.org/D37984
llvm-svn: 313690
cl::opt<unsigned long> is not specialized and hence the option
-polly-optree-max-ops impossible to use.
Replace by supported option cl::opt<unsigned>.
Also check for an error state when computing the written value, which
happens when the quota runs out.
llvm-svn: 313546
In r301670 IR verification was disabled. Since then, CodeGen writing
malformed IR would only be noticed by unpredictable behavior in
follow-up passes (e.g. segfaults, infinite loops) or IR verification in
the backend assert builds.
Re-enable -polly-codegen-verify at for the regression tests to ensure
that malformed IR is detected where Polly generated malformed IR in the
past and changes in CodeGen are at least partially covered by
check-polly
(otherwise malformed IR may only get noticed when the buildbots run the
test-suite).
Differential Revision: https://reviews.llvm.org/D37969
llvm-svn: 313527
This is a resubmission of r313270. It broke standalone builds of
compiler-rt because we were not correctly generating the llvm-lit
script in the standalone build directory.
The fixes incorporated here attempt to find llvm/utils/llvm-lit
from the source tree returned by llvm-config. If present, it
will generate llvm-lit into the output directory. Regardless,
the user can specify -DLLVM_EXTERNAL_LIT to point to a specific
lit.py on their file system. This supports the use case of
someone installing lit via a package manager. If it cannot find
a source tree, and -DLLVM_EXTERNAL_LIT is either unspecified or
invalid, then we print a warning that tests will not be able
to run.
Differential Revision: https://reviews.llvm.org/D37756
llvm-svn: 313407
This patch is still breaking several multi-stage compiler-rt bots.
I already know what the fix is, but I want to get the bots green
for now and then try re-applying in the morning.
llvm-svn: 313335
This patch simplifies LLVM's lit infrastructure by enforcing an ordering
that a site config is always run before a source-tree config.
A significant amount of the complexity from lit config files arises from
the fact that inside of a source-tree config file, we don't yet know if
the site config has been run. However it is *always* required to run
a site config first, because it passes various variables down through
CMake that the main config depends on. As a result, every config
file has to do a bunch of magic to try to reverse-engineer the location
of the site config file if they detect (heuristically) that the site
config file has not yet been run.
This patch solves the problem by emitting a mapping from source tree
config file to binary tree site config file in llvm-lit.py. Then, during
discovery when we find a config file, we check to see if we have a
target mapping for it, and if so we use that instead.
This mechanism is generic enough that it does not affect external users
of lit. They will just not have a config mapping defined, and everything
will work as normal.
On the other hand, for us it allows us to make many simplifications:
* We are guaranteed that a site config will be executed first
* Inside of a main config, we no longer have to assume that attributes
might not be present and use getattr everywhere.
* We no longer have to pass parameters such as --param llvm_site_config=<path>
on the command line.
* It is future-proof, meaning you don't have to edit llvm-lit.in to add
support for new projects.
* All of the duplicated logic of trying various fallback mechanisms of
finding a site config from the main config are now gone.
One potentially noteworthy thing that was required to implement this
change is that whereas the ninja check targets previously used the first
method to spawn lit, they now use the second. In particular, you can no
longer run lit.py against the source tree while specifying the various
`foo_site_config=<path>` parameters. Instead, you need to run
llvm-lit.py.
Differential Revision: https://reviews.llvm.org/D37756
llvm-svn: 313270
The remaining parts produced by the full partial tile isolation can contain
hot spots that are worth to be optimized. Currently, we rely on the simple
loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts.
However, the approach can suffer from the lack of the information about
aliasing that Polly provides using additional alias metadata or/and the lack
of the information required by simple loop unrolling pass.
This patch is the first step to optimize the remaining parts. To do it, we
unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps
to increase the performance of the generated code from 39.87 GFlop/s to
49.23 GFlop/s.
The next possible step is to avoid unrolling performed by Polly in case of
isolated and remaining parts and rely only on simple loop unrolling pass and
the Loop vectorizer.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D37692
llvm-svn: 312929
Update CodegenCleanup using the function-level passes added by
populatePassManager that run between EP_EarlyAsPossible and
EP_VectorizerStart in -O3.
The changes in particular are:
- Added pass create arguments, e.g. ExpensiveCombines for InstCombine.
- Remove reroll pass. The option -reroll-loops is disabled by default.
- Add passes run with UnitAtATime, which is the default.
- Add instances of LibCallsShrinkWrap, TailCallElimination, SCCP
(sparse conditional constant propagation), Float2Int
that did not run before.
- Add instances of GVN as in the default pipeline.
Notes:
- GVNHoist, GVNSink, NewGVN are still disabled in the -O3 pipeline.
- The optimization level and other optimization parameters are not
accessible outside of PassManagerBuilder, hence we cannot add passes
depending on these.
Differential Revision: https://reviews.llvm.org/D37571
llvm-svn: 312875
The type of NewValue might change due to ScalarEvolution
looking though bitcasts. The synthesized NewValue therefore
becomes the type before the bitcast.
llvm-svn: 312718
This reverts commit
r312410 - [ScopDetect/Info] Look through PHIs that follow an error block
The commit caused generation of invalid IR due to accessing a parameter
that does not dominate the SCoP.
llvm-svn: 312663
Up to now ZoneAlgo considered array elements access by something else
than a LoadInst or StoreInst as not analyzable. This patch removes that
restriction by using the unknown ValInst to describe the written
content, repectively the element type's null value in case of memset.
Differential Revision: https://reviews.llvm.org/D37362
llvm-svn: 312630
Since r312249 instructions of a entry block of region statements are
not marked as root anymore and hence can theoretically be removed
if unused. Theoretically, because the instruction list was not changed.
Still, MemoryAccesses for unused instructions were removed. This lead
to a failed assertion in the code generator when the MemoryAccess for
the still listed instruction was not found.
This hould fix the
Assertion failed: ArrayAccess && "No array access found for instruction!",
file ScopInfo.h, line 1494
compiler crashes.
llvm-svn: 312566
Before this patch, OpTree did not consider forwarding an operand tree consisting
of only single LoadInst as useful. The motivation was that, like an access to a
read-only variable, it would just replace one MemoryAccess by another. However,
in contrast to read-only accesses, this would replace a scalar access by an
array access, which is something worth doing.
In addition, leaving scalar MemoryAccess is problematic in that VirtualUse
prioritizes inter-Stmt use over intra-Stmt. It was possible that the same LLVM
value has a MemoryAccess for accessing the remote Stmt's LoadInst as well as
having the same LoadInst in its own instruction list (due to being forwarded
from another operand tree).
With this patch we ensure that if a LoadInst is forwarded is any operand tree,
also the operand tree containing just the LoadInst is forwarded as well, which
effectively removes the scalar MemoryAccess such that only the array access
remains, not both.
Thanks Michael for the detailed explanation.
Reviewers: Meinersbur, bellu, singam-sanjay, gareevroman
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37424
llvm-svn: 312456
In certain situations, the context in the isl_ast_build could result for the
min/max locations of our alias sets to become empty, which would cause an
internal error in isl, which is then unable to derive a value for these
expressions. Check these conditions before code generating expressions and
instead assume that alias check succeeded. This is valid, as the corresponding
memory accesses will not be executed under any valid context.
This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting.
llvm-svn: 312455
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
llvm-svn: 312410
In Polly, we specifically add a paramter to represent the outermost dimension
size of fortran arrays. We do this because this information is statically
available from the fortran metadata generated by dragonegg.
However, we were only materializing these parameters (meaning, creating an
llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
we should materialize parameters from *scop array info*.
It is wrong because if there is a case where we detect 2 fortran arrays,
but only one of them is accessed, we may not materialize the other array's
dimensions at all.
This is incorrect. We fix this by looping over all
`polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.
Differential Revision: https://reviews.llvm.org/D37379
llvm-svn: 312350
Mark scalar dependences for different statements belonging to same BB
as 'Inter'.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D37147
llvm-svn: 312324
Currently, GVN can be necessary to eliminate redundant instructions in case
of, for instance, GEMM and float type. This patch makes GVN be run during
the cleanup.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D37340
llvm-svn: 312307
Summary:
After region statements now also have instruction lists, this is a
straightforward extension.
Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman
Reviewed By: Meinersbur
Subscribers: hfinkel, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37298
llvm-svn: 312249
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.
So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.
This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.
Differential Revision: https://reviews.llvm.org/D37056
llvm-svn: 312239
The adds code generation support for the previous commit.
This patch has been re-applied, after the memory issue in the previous patch
has been fixed.
llvm-svn: 312211
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.
We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.
This change set is reapplied, after a memory corruption issue had been fixed.
llvm-svn: 312210
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.
We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.
llvm-svn: 312128
Reduction detection is only executed in the SCoP building phase.
Hence it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312118
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312117
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
llvm-svn: 312116
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.
This mostly mechanical change makes ScopBuilder directly access some of
ScopStmt/MemoryAccess private fields. We add ScopBuilder as a friend
class and will add proper accessor functions sometime later.
llvm-svn: 312115
This patch allows annotating of metadata in ir instruction
(with "polly_split_after"), which specifies where to split a particular
scop statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36402
llvm-svn: 312107
The intrinsics memset, memcopy and memmove do have their memory accesses
modeled by ScopBuilder. Do not consider them error-case behavior.
Test case will come with a future patch that requires memory intrinsics
outside of error blocks.
llvm-svn: 312021
Commit r252725 introduced a "return false" if an ignored intrinsics was
found. The consequence of this was that the mere existence of an ignored
intrinsic (such as llvm.dbg.value) before a call that would have
qualified the block to be an error block, to not be an error block.
The obvious goal was to just skip ignored intrinsics, not changing the
meaning of what an error block is.
llvm-svn: 312020
ZoneAlgo used to bail out for the complete SCoP if it encountered
something violating its assumption. This meant the neither OpTree can
forward any load nor DeLICM do anything in such cases, even if their
transformations are unrelated to the violations.
This patch adds a list of compatible elements (currently with the
granularity of entire arrays) that can be used for analysis. OpTree
and DeLICM can then check whether their transformations only concern
compatible elements, and skip non-compatible ones.
This will be useful for e.g. Polybench's benchmarks covariance,
correlation, bicg, doitgen, durbin, gramschmidt, adi that have
assumption violation, but which are not necessarily relevant
for all transformations.
Differential Revision: https://reviews.llvm.org/D37219
llvm-svn: 311929
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.
This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)
JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.
Differential Revision: https://reviews.llvm.org/D37010
llvm-svn: 311888
In cases where the entry block of a scop was not contained in a loop that was
part of the scop region and at the same time there was a loop surrounding the
scop, we missed to count the loops in the scop and consequently did not consider
the scop profitable. We correct this by only moving to the loop parent, in case
the current loop is loop contained in the scop.
This increases the number of loops in COSMO which we assume to be profitable
from 3974 to 4981.
llvm-svn: 311863
Whether a partial write is tautological/unsatisfiable not only
depends on the access domain, but also on the domain covered
by its node in the AST.
In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0).
for (int c0 = 0; c0 < tmp5; c0 += 1) {
Stmt_for_body344(c0);
if (tmp5 >= c0 + 2)
Stmt_cond_false(c0);
Stmt_cond_end(c0);
}
if (tmp5 <= 0) {
Stmt_for_body344(0);
Stmt_cond_false(0);
Stmt_cond_end(0);
}
Isl cannot derive a subscript for an array element that is never accessed.
This caused an error in that no subscript expression has been generated
in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one
to exist because there is an execution of that write, just not in that
ast node.
Fixed by instead of determining whether the access domain is empty,
inspect whether isl generated a constant "false" ast expression in
the current ast node.
This should fix a compiler crash of the aosp buildbot.
llvm-svn: 311663
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.
Differential Revision: https://reviews.llvm.org/D37058
llvm-svn: 311648
Summary:
This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.
MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.
MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :
```
void mse(double A[Ni], double B[Nj]) {
int i,j;
double tmp = 6;
for (i = 0; i < Ni; i++) {
for (int j = 0; j<Nj; j++) {
tmp = tmp + 2;
}
B[i] = tmp;
}
}
```
Polly gives us the following statements and memory accesses :
```
Statements {
Stmt_for_body
Domain :=
{ Stmt_for_body[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_body[i0] -> [i0, 0, 0] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
Instructions {
%tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
}
Stmt_for_inc
Domain :=
{ Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
Schedule :=
{ Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
Instructions {
%tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
%add = fadd double %tmp.11, 2.000000e+00
%exitcond = icmp ne i32 %inc, 10000
}
Stmt_for_end
Domain :=
{ Stmt_for_end[i0] : 0 <= i0 <= 9999 };
Schedule :=
{ Stmt_for_end[i0] -> [i0, 2, 0] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_for_end[i0] -> MemRef_B[i0] };
Instructions {
%add.lcssa = phi double [ %add, %for.inc ]
store double %add.lcssa, double* %arrayidx, align 8
%exitcond5 = icmp ne i64 %indvars.iv.next, 10000
}
}
```
and the following dependences :
```
{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999;
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998;
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999;
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }
```
When trying to expand this memory access :
```
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
```
The new access map would look like this :
```
{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }
```
The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.
Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev, Meinersbur
Differential Revision: https://reviews.llvm.org/D36647
llvm-svn: 311619
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
for scalar false dependencies
- Number of parallel loops
These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.
Differential Revision: https://reviews.llvm.org/D37049
llvm-svn: 311553
Loop with zero iteration are, syntactically, loops. They have been
excluded from the loop counter even for the non-profitable counters.
This seems to be unintentially as the sentinel value of '0' minimal
iterations does exclude such loops.
Fix by never considering the iteration count when the sentinel
value of 0 is found.
This makes the recently added NumTotalLoops couter redundant
with NumLoopsOverall, which now is equivalent. Hence, NumTotalLoops
is removed as well.
Note: The test case 'ScopDetect/statistics.ll' effectively does not
check profitability, because -polly-process-unprofitable is passed
to all test cases.
llvm-svn: 311551
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.
Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.
llvm-svn: 311548
Summary:
ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore.
The test failure was observed on these two polly buildbots:
http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310
This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert.
Reviewers: grosser, sanjoy, bollu
Reviewed By: grosser
Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36776
llvm-svn: 311503
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.
This improves 2mm performance from 1.5 seconds to 0.17 seconds.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: Meinersbur
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37028
llvm-svn: 311498
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36928
llvm-svn: 311473
This was originally a `#define`. It is much easier to play around with
this as an environment variable when we run on large programs.
Differential Revision: https://reviews.llvm.org/D37012
llvm-svn: 311471
The implementation of computeArrayUnused did not consider writes without
reads before, except for the first write in the SCoP. This caused it to
'forget' writes directly following another write.
This patch re-adds the entire reaching defintion of a write that has not
been covered before by a read.
This fixes Polybench 4.2 2mm where only one of the matrix-multiplication
was detected.
llvm-svn: 311403
Dragonegg generates most function parameters as pointers to the actual
parameters. However, it does not mark these parameters with the
dereferencable attribute.
Polly is conservative when it comes to invariant load
hoisting, thus we add runtime checks to invariant load hoisted pointers
when we do not know that pointers are dereferencable. This is correct behaviour,
but is a performance penalty.
Add a flag that allows all pointer parameters to be dereferencable. That
way, polly can speculatively load-hoist paramters to functions without
runtime checks.
Differential Revision: https://reviews.llvm.org/D36461
llvm-svn: 311329
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.
Differential Revision: https://reviews.llvm.org/D36934
llvm-svn: 311328
The pattern recognition for MatMul is restrictive.
The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).
This was changed and made more flexible.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D36460
llvm-svn: 311302
We now load the function pointer for `cuMemAllocManaged` dynamically, so
it should be possible to compile `GPUJIT` on non-CUDA systems again.
It should now be possible to link on non-cuda systems again.
Thanks to Philipp Schaad for noticing this inconsitency.
Differential Revision: https://reviews.llvm.org/D36921
llvm-svn: 311289
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.
This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.
llvm-svn: 311268
Instead of using Twines and temporary expressions, we do string manipulation
through a std::string. This resolves a memory corruption issue, which likely
was caused by twines loosing their underlying string too soon.
llvm-svn: 311264
- We should iterate over `I`, which is `Cur` expanded out to an
instruction, and not `Cur` itself.
- This is a bugfix.
Differential Revision: https://reviews.llvm.org/D36923
llvm-svn: 311261
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.
We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36929
llvm-svn: 311259
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.
Differential revision: D36925
llvm-svn: 311248
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
int i,j;
for (j = 0; j < Ni; j++) {
for (int i = 0; i<Nj; i++)
S: B[i] = i;
for (int i = 0; i<Nj; i++)
T: D[i] = i;
U: A[j] = B[j];
C[j] = D[j];
}
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.
This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.
We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.
Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>
Reviewers: Meinersbur, simbuerg, bollu
Reviewed By: Meinersbur, simbuerg
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36791
llvm-svn: 311165
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.
This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.
Differential Revision: https://reviews.llvm.org/D36832
llvm-svn: 311126
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:
```
call void \
bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```
- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.
Differential Revision: https://reviews.llvm.org/D36825
llvm-svn: 311121
In release builds LLVM may not pass along LLVM names consistently. We make the
test cases independent of the LLVM-IR names to avoid spurious test case
failures.
llvm-svn: 311118
- If we have global arrays, we would like to rewrite them to global
pointers which are allocated using `cudaMallocManaged`.
- If we have allocas in a function, we would like to rewrite them to
heap-allocations with `cudaMallocManaged` and `cudaFree`.
- With these rewrite mechanisms, we can offload _any_ function to the
GPU with no code rewrite whatsover.
Differential Revision: https://reviews.llvm.org/D36516
llvm-svn: 311080
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.
We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: mgorny, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36800
llvm-svn: 311066