Dragonegg generates most function parameters as pointers to the actual
parameters. However, it does not mark these parameters with the
dereferencable attribute.
Polly is conservative when it comes to invariant load
hoisting, thus we add runtime checks to invariant load hoisted pointers
when we do not know that pointers are dereferencable. This is correct behaviour,
but is a performance penalty.
Add a flag that allows all pointer parameters to be dereferencable. That
way, polly can speculatively load-hoist paramters to functions without
runtime checks.
Differential Revision: https://reviews.llvm.org/D36461
llvm-svn: 311329
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
Summary:
During code generation for a Scop we modify the IR of a function.
While this shouldn't affect a Scop in the formal sense, the implementation
caches various information about the IR such as SCEV expressions for bounds or
parameters. This cached information needs to be updated or invalidated. To this
end, SPMUpdater allows passes to report when they've invalidated a Scop to the
PassManager, which will then flush and recompute all Scops. This in turn
invalidates all iterators, so references to Scops shouldn't be held.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D36524
llvm-svn: 310551
We are working towards removing uses of Scop::getStmtFor(BB). In this
patch, we remove dependency of Scop::getStmtFor(Inst) on getStmtFor(BB).
To do so, we introduce a map of instructions to their corresponding scop
statements and use it to get the instructions' statement.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35663
llvm-svn: 310494
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.
The analysis is now enabled by default.
The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.
Differential Revision: https://reviews.llvm.org/D36380
llvm-svn: 310279
Summary:
Small patch to fix the JSON exporter.
Currently, using "opt -polly-export-jscop" does not generate jscop files, but gives an error:
*** Error in `opt': corrupted double-linked list: 0x0000000000bc4bb0 ***
Updated the function getAccessRelationStr() to work with the current version of getAccessRelation(), fixing the JSON exporter
Reviewers: bollu, grosser
Reviewed By: grosser
Subscribers: grosser, llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36370
llvm-svn: 310199
Summary:
Testing the new-pm passes becomes much easier once they behave more like the
old passes in terms of the order in which Scops are processed and printed. This
requires three changes:
- ScopInfo: Use an ordered map to store scops
- ScopInfo: Iterate and print Scops in reverse order to match legacy PM behaviour
- ScopDetection: print function name in ScopAnalysisPrinter
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D36303
llvm-svn: 310052
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.
To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36243
llvm-svn: 309939
ScopBuilder and Simplify (through VirtualInstruction.cpp) previously
used this functionality in their own implementation. Refactor them
both into a common one into the Scop class.
BlockGenerator also makes use of a similiar functionality, but also
records outside users and takes place after region simplification.
Merging it as well would be more complicated.
llvm-svn: 309273
A region statement's instruction list is always empty and ignored by the code
generator. Don't give the impression that it means anything.
llvm-svn: 309197
Since there will be no more a 1:1 correspondence between statements and
basic blocks, we would like to get rid of the method getStmtFor(BB)
and its uses. Here we remove one of its uses in ScopInfo by fetching
the statement in which the call instruction lies.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35691
llvm-svn: 309110
Read-only values (values defined before the SCoP) require special
handing with -polly-analyze-read-only-scalars=true (which is the
default). If active, each use of a value requires a read access.
When a copied value uses a read-only value, we must also ensure that
such a MemoryAccess is available or is created.
Differential Revision: https://reviews.llvm.org/D35764
llvm-svn: 308876
Change the indention of the last brace to align with the opening line.
Before:
Instructions {
%val = fadd double %arg, 2.100000e+01
store double %val, double* %A
}
After:
Instructions {
%val = fadd double %arg, 2.100000e+01
store double %val, double* %A
}
llvm-svn: 308828
Print a statement's instruction on dump() regardless of
-polly-print-instructions. dump() is supposed to be used in the debugger
only and never in regression tests. While debugging, get all the
information we have and we are not bound to break anything. For non-dump
purposes of print, forward the setting of -polly-print-instructions as
parameters.
Some calls to print() had to be changed because the
PollyPrintInstructions setting is only available in ScopInfo.cpp.
In ScheduleOptimizer.cpp, dump() was used in regression tests.
That's not what dump() is for.
The print parameter "PrintInstructions" will also be useful for an
explicit print SCoP pass in a future patch.
llvm-svn: 308746
When performing invariant load hoisting we check that invariant load expressions
are not too complex. Up to this commit, we performed this check by counting the
sum of dimensions in the access range as a very simple heuristic. This heuristic
is a little too conservative, as it prevents hoisting for any scops with a
very large number of parameters. Hence, we update the heuristic to only count
existentially quantified dimensions and set dimensions. We expect this to still
detect the problematic expressions in h264 because of which this check was
originally introduced.
For some unknown reason, this complexity check was originally committed in
IslNodeBuilder. It really belongs in ScopInfo, as there is no point in
optimizing a program which we could have known earlier cannot be code generated.
The benefit of running the check early is that we can avoid to even hoist checks
that are expensive to code generate as invariant loads. This can be seen in
the changed tests, where we now indeed detect the scop, but just not invariant
load hoist the complicated access.
We also improve the formatting of the code, document it, and use isl++ to
simplify expressions.
llvm-svn: 308659
When constructing a schedule true and there are multiple statements for
a basic block, create a sequence node for these statements.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35679
llvm-svn: 308635
We are working towards removing uses of Scop::getStmtFor(BB). In this
patch, we remove dependency of Scop::getLastStmtFor(BB) on
getStmtFor(BB). To do so, we get the list of all statements
corresponding to the BB and then fetch the last one.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35665
llvm-svn: 308633
Introduce previously missing PHIReads analogous the the already existing
PHIWrites/ValueWrites/ValueReads maps. PHIReads was initially not
required and the later introduced lookupPHIReadOf() used a linear
search instead.
With PHIReads, lookupPHIReadOf() can now also do a map lookup and remove
any surprising performance/behaviour differences to lookupPHIWriteOf(),
lookupValueWriteOf() and lookupValueReadOf().
llvm-svn: 308630
Use a mark-and-sweep algorithm to find and remove unused instructions
and MemoryAccesses. This is useful in particular to remove scalar
writes that are never used anywhere. A scalar write in a loop induces
a write-after-write dependency that stops the loop iterations to be
rescheduled. Such writes can be a result of previous transformations
such as DeLICM and operand tree forwarding.
It adds a new class VirtualInstruction that represents an instruction in
a particular statement. At the moment an instruction can only belong to
the statement that represents a BasicBlock. In the future, instructions
can be in one of multiple statements representing a BasicBlock
(Nandini's work), in different statements than its BasicBlock would
indicate, and even multiple statements at once (by forwarding operand
trees). It also integrates nicely with the VirtualUse class.
ScopStmt::contains(Instruction*) currently uses the instruction's parent
BasicBlock to check whether it contains the instruction. It will need to
check the actual statement list when one of the aforementioned features
become possible.
Differential Revision: https://reviews.llvm.org/D35656
llvm-svn: 308626
This is one possible solution to implement wrap-arounds for integers in
unsigned icmp operations. For example,
store i32 -1, i32* %A_addr
%0 = load i32, i32* %A_addr
%1 = icmp ult i32 %0, 0
%1 should hold false, because under the assumption of unsigned integers,
-1 should wrap around to 2^32-1. However, previously. it was assumed
that the MSB (Most Significant Bit - aka the Sign bit) was never set for
integers in unsigned operations.
This patch modifies the buildConditionSets function in ScopInfo.cpp to
give better information about the integers in these unsigned
comparisons.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35464
llvm-svn: 308608
Before this patch, ScalarDefUseChain was a tool used by DeLICM to find
all reads and writes of scalar accesses. It iterated once over all
accesses and stores the accesses into maps.
By integrating it into the Scop class, we can keep the maps up-to-date
without the need for recomputing them. It will be needed for more than
DeLICM in the future, such as SCoP simplification, code movement between
virtual statements, and array expansion (GSoC project).
Compared to ScalarUseDefChain, we save two maps by finding the ScopStmt
a Def/PHIRead must reside in, and use its already existing lookup
function to find the MemoryAccess.
Differential Revision: https://reviews.llvm.org/D35631
llvm-svn: 308495
Once statements are split, a BasicBlock will comprise of multiple
statements. To prepare for this change in future, we introduce a list
of statements in the statement map.
Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D35301
llvm-svn: 308318
Utilizing newer LLVM diagnostic remark API in order to enable use of
opt-viewer tool. Polly Diagnostic Remarks also now appear in YAML
remark file.
In this patch, I've added the OptimizationRemarkEmitter into certain
classes where remarks are being emitted and update the remark emit calls
itself. I also provide each remark a BasicBlock or Instruction from where
it is being called, in order to compute the hotness of the remark.
Patch by Tarun Rajendran!
Differential Revision: https://reviews.llvm.org/D35399
llvm-svn: 308233
Summary:
We do not keep domain constraints on access functions when building the
scop. Hence, for consistency reasons, it makes also sense to not include
them when storing a new access function. This change results in simpler
access functions that make output easier to read.
This patch also helps to make DeLICMed memory accesses to be understood by
our matrix multiplication pattern matching pass. Further changes to the
matrix multiplication pattern matching are needed for this to work, so the
corresponding test case will be added in a future commit.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35237
llvm-svn: 308215
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.
Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.
Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.
Reviewers: grosser, bollu
Reviewed By: grosser
Subscribers: mehdi_amini, nemanjai, pollydev, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35176
llvm-svn: 307814
Summary:
Since r306667, propagateInvalidStmtDomains gets a reference to an
InvalidDomainMap. As part of the branch leading to return false, the respective
domain is freed. It is, however, not removed from the InvalidDomainMap, leaking
a pointer to a freed object which results in a use-after-free. Fix this be
removing the domain from the map before returning.
We tried to derive a test case that reliably failes, but did not succeed in
producing one. Hence, for now the failures in our LNT bots must be sufficient
to keep this issue tested.
Reviewers: grosser, Meinersbur, bollu
Subscribers: bollu, nandini12396, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D34971
llvm-svn: 307499
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.
When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.
In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: kbarton, nemanjai, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D34054
llvm-svn: 306863
ScopStmts were being used in the computation of the Domain of the SCoPs
in ScopInfo. Once statements are split, there will not be a 1-to-1
correspondence between Stmts and Basic blocks. Thus this patch avoids
the use of getStmtFor() by creating a map of BB to InvalidDomain and
using it to compute the domain of the statements.
Contributed-by: Nanidini Singhal <cs15mtech01004@iith.ac.in>
Differential Revision: https://reviews.llvm.org/D33942
llvm-svn: 306667
This patch aims to implement the option of allocating new arrays created
by polly on heap instead of stack. To enable this option, a key named
'allocation' must be written in the imported json file with the value
'heap'.
We need such a feature because in a next iteration, we will implement a
mechanism of maximal static expansion which will need a way to allocate
arrays on heap. Indeed, the expansion is very costly in terms of memory
and doing the allocation on stack is not worth considering.
The malloc and the free are added respectively at polly.start and
polly.exiting such that there is no use-after-free (for instance in case
of Scop in a loop) and such that all memory cells allocated with a
malloc are free'd when we don't need them anymore.
We also add :
- In the class ScopArrayInfo, we add a boolean as member called IsOnHeap
which represents the fact that the array in allocated on heap or not.
- A new branch in the method allocateNewArrays in the ISLNodeBuilder for
the case of heap allocation. allocateNewArrays now takes a BBPair
containing polly.start and polly.exiting. allocateNewArrays takes this
two blocks and add the malloc and free calls respectively to
polly.start and polly.exiting.
- As IntPtrTy for the malloc call, we use the DataLayout one.
To do that, we have modified :
- createScopArrayInfo and getOrCreateScopArrayInfo such that it returns
a non-const SAI, in order to be able to call setIsOnHeap in the
JSONImporter.
- executeScopConditionnaly such that it return both start block and end
block of the scop, because we need this two blocs to be able to add
the malloc and the free calls at the right position.
Differential Revision: https://reviews.llvm.org/D33688
llvm-svn: 306540
This reduces the compilation time of one reduced test case from Android from
16 seconds to 100 mseconds (we bail out), without negatively impacting any
other test case we currently have.
We still saw occasionally compilation timeouts on the AOSP buildbot. Hopefully,
those will go away with this change.
llvm-svn: 306235
This allows us to bail out both in case the lexmin/max computation is too
expensive, but also in case the commulative cost across an alias group is
too expensive. This is an improvement of r303404, which did not seem to
be sufficient to keep the Android Buildbot quiet.
llvm-svn: 306087
r303971 added an assertion that SCEV addition involving an AddRec
and a SCEVUnknown must involve a dominance relation: either the
SCEVUnknown value dominates the AddRec's loop, or the AddRec's
loop header dominates the SCEVUnknown. This is generally fine
for most usage of SCEV because it isn't possible to write an
expression in IR which would violate it, but it's a bit inconvenient
here for polly.
To solve the issue, just avoid creating a SCEV expression which
triggers the asssertion.
I'm not really happy with this solution, but I don't have any better
ideas.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33464.
Differential Revision: https://reviews.llvm.org/D34259
llvm-svn: 305864
Previously, we would generate one performance counter for all scops.
Now, we generate both the old information, as well as a per-scop
performance counter to generate finer grained information.
This patch needed a way to generate a unique name for a `Scop`.
The start region, end region, and function name combined provides a
unique `Scop` name. So, `Scop` has a new public API to provide its start
and end region names.
Differential Revision: https://reviews.llvm.org/D33723
llvm-svn: 304528
Should not have 'fixed' the formatting issue, I did not have the most
recent version of `clang-format`.
This reverts commit 761b1268359e14e59142f253d77864a29d55c56c.
llvm-svn: 304148
- Fix formatting in `RegisterPasses.cpp`.
- `assert` tried to compare `isl::boolean` against `long`. Explicitly
construct `bool` from `isl::boolean`. This allows the implicit cast of
`bool` to `long.
llvm-svn: 304146
Side-effect free function calls with only constant parameters can be easily
re-generated and consequently do not prevent us from modeling a SCEV. This
change allows array subscripts to reference function calls such as
'get_global_id()' as used in OpenCL.
We use the function name plus the constant operands to name the parameter. This
is possible as the function name is required and is not dropped in release
builds the same way names of llvm::Values are dropped. We also provide more
readable names for common OpenCL functions, to make it easy to understand the
polyhedral model we generate.
llvm-svn: 304074
Summary: This patch outputs all the list of instructions in BlockStmts.
Reviewers: Meinersbur, grosser, bollu
Subscribers: bollu, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D33163
llvm-svn: 304062
It seems we are still spending too much time on rare inputs, which continue to
timeout the AOSP buildbot. Let's see if a further reduction is sufficient.
llvm-svn: 303807
Summary:
My goal is to make the newly added `AllowWholeFunctions` options more usable/powerful.
The changes to ScopBuilder.cpp are exclusively checks to prevent `Region.getExit()` from being dereferenced, since Top Level Regions (TLRs) don't have an exit block.
In ScopDetection's `isValidCFG`, I removed a check that disallowed ReturnInstructions to have return values. This might of course have been intentional, so I would welcome your feedback on this and maybe a small explanation why return values are forbidden. Maybe it can be done but needs more changes elsewhere?
The remaining changes in ScopDetection are simply to consider the AllowWholeFunctions option in more places, i.e. allow TLRs when it is set and once again avoid derefererncing `getExit()` if it doesn't exist.
Finally, in ScopHelper.cpp I extended `polly::isErrorBlock` to handle regions without exit blocks as well: The original check was if a given BasicBlock dominates all predecessors of the exit block. Therefore I do the same for TLRs by regarding all BasicBlocks terminating with a ReturnInst as predecessors of a "virtual" function exit block.
Patch by: Lukas Boehm
Reviewers: philip.pfaffe, grosser, Meinersbur
Reviewed By: grosser
Subscribers: pollydev, llvm-commits, bollu
Tags: #polly
Differential Revision: https://reviews.llvm.org/D33411
llvm-svn: 303790
This speeds up scop modeling for scops with many redundent existentially
quantified constraints. For the attached test case, this change reduces
scop modeling time from minutes (hours?) to 0.15 seconds.
This change resolves a compilation timeout on the AOSP build.
Thanks Eli for reporting _and_ reducing the test case!
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 303600
Allow the BlockGenerator to generate memory writes that are not defined
over the complete statement domain, but only over a subset of it. It
generates a condition that evaluates to 1 if executing the subdomain,
and only then execute the access.
Only write accesses are supported. Read accesses would require a PHINode
which has a value if the access is not executed.
Partial write makes DeLICM able to apply mappings that are not defined
over the entire domain (for instance, a branch that leaves a loop with
a PHINode in its header; a MemoryKind::PHI write when leaving is never
read by its PHI read).
Differential Revision: https://reviews.llvm.org/D33255
llvm-svn: 303517
- We use the outermost dimension of arrays since we need this
information to generate GPU transfers.
- In general, if we do not know the outermost dimension of the array
(because the indexing expression is non-affine, for example) then we
simply cannot generate transfer code.
- However, for Fortran arrays, we can use the Fortran array
representation which stores the dimensions of all arrays.
- This patch uses the Fortran array representation to generate code that
computes the outermost dimension size.
Differential Revision: https://reviews.llvm.org/D32967
llvm-svn: 303429
In r302231 we mistakenly use bitwise or (|) instead of logical
or (||). This patch fixes that.
Contributed-by: Sameer AbuAsal <sabuasal@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D33337
llvm-svn: 303386
This patch adds both a ScopAnalysisManager and a ScopPassManager.
The ScopAnalysisManager is itself a Function-Analysis, and manages
analyses on Scops. The ScopPassManager takes care of building Scop pass
pipelines.
This patch is marked WIP because I've left two FIXMEs which I need to
think about some more. Both of these deal with invalidation:
Deferred invalidation is currently not implemented. Deferred
invalidation deals with analyses which cache references to other
analysis results. If these results are invalidated, invalidation needs
to be propagated into the caching analyses.
The ScopPassManager as implemented assumes that ScopPasses do not affect
other Scops in any way. There has been some discussion about this on
other patch threads, however it makes sense to reiterate this for this
specific patch.
I'm uploading this patch even though it's incomplete to encourage
discussion and give you an impression of how this is going to work.
Differential Revision: https://reviews.llvm.org/D33192
llvm-svn: 303062
- This breaks the previous assumption that Fortran Arrays are `GlobalValue`.
- The names of functions were getting unwieldy. So, I renamed the
Fortran related functions.
Differential Revision: https://reviews.llvm.org/D33075
llvm-svn: 303040
Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.
Reviewers: Meinersbur, grosser
Reviewed By: grosser
Subscribers: pollydev, sanjoy, nemanjai, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31459
llvm-svn: 302902
Previous to this patch, we used VirtualUse to determine the input
access of an llvm::Value in a statement. The input access is the
READ MemoryAccess that makes a value available in that statement,
which can either be a READ of a MemoryKind::Value or the
MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input
access to heuristically find a candidate to map without searching all
possible values.
This might modify the behaviour in that previously PHI accesses were
not considered input accesses before. This was unintentially lost when
"VirtualUse" was extracted from the "Known Knowledge" patch.
llvm-svn: 302838
When removing a MemoryAccess, also remove it from maps pointing to it.
This was already done for InstructionToAccess, but not yet for
ValueReads, ValueWrites and PHIWrites as those were only used during
the ScopBuilder phase. Keeping them updated allows us to use them
later as well.
llvm-svn: 302836
Add the ability to tag certain memory accesses as those belonging to
Fortran arrays. We do this by pattern matching against known patterns
of Dragonegg's LLVM IR output from Fortran code.
Fortran arrays have metadata stored with them in a struct. This struct
is called the "Fortran array descriptor", and a reference to this is
stored in each MemoryAccess.
Differential Revision: https://reviews.llvm.org/D32639
llvm-svn: 302653
Summary:
In case two arrays share base pointers in the same invariant load equivalence
class, we canonicalize all memory accesses to the first of these arrays
(according to their order in the equivalence class).
This enables us to optimize kernels such as boost::ublas by ensuring that
different references to the C array are interpreted as accesses to the same
array. Before this change the runtime alias check for ublas would fail, as it
would assume models of the C array with differing (but identically valued) base
pointers would reference distinct regions of memory whereas the referenced
memory regions were indeed identical.
As part of this change we remove most of the MemoryAccess::get*BaseAddr
interface. We removed already all references to get*BaseAddr in previous
commits to ensure that no code relies on matching base pointers between
memory accesses and scop arrays -- except for three remaining uses where we
need the original base pointer. We document for these situations that
MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
to the base pointer of the scop array referenced by this memory access.
Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert
Reviewed By: Meinersbur
Subscribers: etherzhhb
Tags: #polly
Differential Revision: https://reviews.llvm.org/D28518
llvm-svn: 302636
Extend the Knowledge class to store information about the contents
of array elements and which values are written. Two knowledges do
not conflict the known content is the same. The content information
if computed from writes to and loads from the array elements, and
represented by "ValInst": isl spaces that compare equal if the value
represented is the same.
Differential Revision: https://reviews.llvm.org/D31247
llvm-svn: 302339
Scop::init is used only during SCoP construction. Therefore ScopBuilder
seems the more appropriate place for it. We integrate it onto its only
caller ScopBuilder::buildScop where some other construction steps
already took place.
Differential Revision: https://reviews.llvm.org/D32908
llvm-svn: 302276
Since r294891, in MemoryAccess::computeBoundsOnAccessRelation(), we skip
manually bounding the access relation in case the parameter of the load
instruction is already a wrapped set. Later on we assume that the lower
bound on the set is always smaller or equal to the upper bound on the
set. Bug 32715 manages to construct a sign wrapped set, in which case
the assertion does not necessarily hold. Fix this by handling a sign
wrapped set similar to a normal wrapped set, that is skipping the
computation.
Contributed-by: Maximilian Falkenstein <falkensm@student.ethz.ch>
Reviewers: grosser
Subscribers: pollydev, llvm-commits
Tags: #Polly
Differential Revision: https://reviews.llvm.org/D32893
llvm-svn: 302231
LLVM-IR names are commonly available in debug builds, but often not in release
builds. Hence, using LLVM-IR names to identify statements or memory reference
results makes the behavior of Polly depend on the compile mode. This is
undesirable. Hence, we now just number the statements instead of using LLVM-IR
names to identify them (this issue has previously been brought up by Zino
Benaissa).
However, as LLVM-IR names help in making test cases more readable, we add an
option '-polly-use-llvm-names' to still use LLVM-IR names. This flag is by
default set in the polly tests to make test cases more readable.
This change reduces the time in ScopInfo from 32 seconds to 2 seconds for the
following test case provided by Eli Friedman <efriedma@codeaurora.org> (already
used in one of the previous commits):
struct X { int x; };
void a();
#define SIG (int x, X **y, X **z)
typedef void (*fn)SIG;
#define FN { for (int i = 0; i < x; ++i) { (*y)[i].x += (*z)[i].x; } a(); }
#define FN5 FN FN FN FN FN
#define FN25 FN5 FN5 FN5 FN5
#define FN125 FN25 FN25 FN25 FN25 FN25
#define FN250 FN125 FN125
#define FN1250 FN250 FN250 FN250 FN250 FN250
void x SIG { FN1250 }
For a larger benchmark I have on-hand (10000 loops), this reduces the time for
running -polly-scops from 5 minutes to 4 minutes, a reduction by 20%.
The reason for this large speedup is that our previous use of printAsOperand
had a quadratic cost, as for each printed and unnamed operand the full function
was scanned to find the instruction number that identifies the operand.
We do not need to adjust the way memory reference ids are constructured, as
they do not use LLVM values.
Reviewed by: efriedma
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32789
llvm-svn: 302072
Before this change a memory reference identifier had the form:
<STMT>_<ACCESSTYPE><ID>_<MEMREF>, e.g., Stmt_bb9_Write0_MemRef_tmp11
After this change, we use the format:
<STMT>_<ACCESSTYPE><ID>, e.g., Stmt_bb9_Write0
The name of the array that is accessed through a memory reference is not
necessary to uniquely identify a memory reference, but was only added to
provide additional information for debugging. We drop this information now
for the following two reasons:
1) This shortens the names and consequently improves readability
2) This removes a second location where we decide on the name of a scop array,
leaving us only with the location where the actual scop array is created.
Having after 2) only a single location to name scop arrays will allow us to
change the naming convention of scop arrays more easily, which we will do
in a future commit to reduce compilation time.
llvm-svn: 302004
When we introduced in r297375 support for hoisting loads that are known
to be dereferencable without any conditional guard, we forgot to keep the check
to verify that no other write into the very same location exists. This
change ensures now that dereferencable loads are allowed to access everything,
but can only be hoisted in case no conflicting write exists.
This resolves llvm.org/PR32778
Reported-by: Huihui Zhang <huihuiz@codeaurora.org>
llvm-svn: 301582
The AssumptionCache removal of r289756 has been reverted in
r290086/r290087. A different solution has been implemented in r291671
which keeps the AssumptionCache. We can therefore use it again in Polly.
This reverts r289791.
llvm-svn: 298089
In the previous default ScopInfo applied the profitability heuristic for
scalar accesses (-polly-unprofitable-scalar-accs=true) and the
-polly-prune-unprofitable was disabled by default
(-polly-enable-prune-unprofitable=false) as that pruning was already done.
This changes switches the defaults to -polly-unprofitable-scalar-accs=true
-polly-enable-prune-unprofitable=false such that the scalar access
heuristic check is done by the pass. This allows passes between ScopInfo
and PruneUnprofitable to optimize away scalar accesses.
Without enabling such intermediate passes, there is no change in
behaviour of profitability checks in a PassManagerBuilder built
pass chain, but it allows us to cover this configuration with the
buildbots.
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298081
ScopInfo's normal profitability heuristic considers SCoPs where all
statements have scalar writes as not profitably optimizable and
invalidate the SCoP in that case. However, -polly-delicm and
-polly-simplify may be able to remove some of the scalar writes such
that the flag -polly-unprofitable-scalar-accs=false allows disabling
that part of the heuristic.
In cases where DeLICM (or other passes after ScopInfo) are not
successful in removing scalar writes, the SCoP is still not profitably
optimizable. The schedule optimizer would again try computing another
schedule, resulting in slower compilation.
The -polly-prune-unprofitable pass applies the profitability heuristic
again before the schedule optimizer Polly can still bail out even with
-polly-unprofitable-scalar-accs=false.
Differential Revision: https://reviews.llvm.org/D31033
llvm-svn: 298080
For experiments it is sometimes helpful to provide parameter bound information
to polly and to not use these parameter bounds for simplification.
Add a new option "-polly-ignore-parameter-bounds" which does precisely this.
llvm-svn: 298077
For experiments it is sometimes helpful to not take any inbounds assumptions.
Add a new option "-polly-ignore-inbounds" which does precisely this.
llvm-svn: 298073
In subsequent changes we will make Polly a little bit more lazy in adding
parameter dimensions to different sets. As a result, not all parameters will
always be part of the parameter space. This change ensures that we do not use
the '-1' returned when a parameter dimension cannot be found, but instead
just do not try to eliminate the anyhow non-existing dimension.
llvm-svn: 298054
Since several years, isl can perform most operations on sets with differing
parameter spaces, by expanding the parameter space on demand relying using
named isl ids to distinguish different parameter dimensions.
By not always expanding to full dimensionality the set remain smaller and can
likely be operated on faster. This change by itself did not yet result in
measurable performance benefits, but it is a step into the right direction
needed to ensure that subsequent changes indeed can work with lower-dimensional
sets and these sets do not get blown up by accident when later intersected with
the domain context.
llvm-svn: 298053
Introduce ScopStmt::getSurroundingLoop() to replace getFirstNonBoxedLoopFor.
getSurroundingLoop() returns the precomputed surrounding/first non-boxed
loop. Except in ScopDetection, the list of boxed loops is only used to
get the surrounding loop. getFirstNonBoxedLoopFor also requires LoopInfo
at every use which is not necessarily available everywhere where we may
want to use it.
Differential Revision: https://reviews.llvm.org/D30985
llvm-svn: 297899
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.
It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.
It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.
Differential Revision: https://reviews.llvm.org/D30820
llvm-svn: 297473
In case LLVM pointers are annotated with !dereferencable attributes/metadata
or LLVM can look at the allocation from which a pointer is derived, we can know
that dereferencing pointers is safe and can be done unconditionally. We use this
information to proof certain pointers as save to hoist and then hoist them
unconditionally.
llvm-svn: 297375
Our current scop modeling enters an infinite loop when trying to model code
that has unreachable instructions (e.g.,
test/ScopInfo/BoundChecks/single-loop.ll), as the number of basic blocks
returned by the LLVM Loop* does not include unreachable basic blocks that
branch off from the core loop body. This arises for example in the following
piece of code:
for (i = 0; i < N; i++) {
if (i > 1024)
abort(); <- this abort might be translated to an
unreachable
A[i] = ...
}
This patch adds these unreachable basic blocks in our per loop basic block
count to ensure that the schedule construction does not assume a loop has been
processed completely, despite certain unreachable basic blocks still remaining.
The infinite loop is only observable in combination with
https://reviews.llvm.org/D12676 or a similar patch.
llvm-svn: 297156
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.
Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access
[n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }
which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.
For people only reading commit messages, here the comment that explains what
memory folding is:
To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.
We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.
As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:
[i,j] -> [i, j] : j >= 0
[i,j] -> [i-1, j+N] : j < 0
After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.
llvm-svn: 296679
Without this simplification for a loop nest:
void foo(long n1_a, long n1_b, long n1_c, long n1_d,
long p1_b, long p1_c, long p1_d,
float A_1[][p1_b][p1_c][p1_d]) {
for (long i = 0; i < n1_a; i++)
for (long j = 0; j < n1_b; j++)
for (long k = 0; k < n1_c; k++)
for (long l = 0; l < n1_d; l++)
A_1[i][j][k][l] += i + j + k + l;
}
the assumption:
n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)
is taken rather than the simpler assumption:
p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.
The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.
This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.
llvm-svn: 296456
Instead of counting the number of read-only accesses, we now count the number of
distinct read-only array references when checking if a run-time alias check
may be too complex. The run-time alias check is quadratic in the number of
base pointers, not the number of accesses.
Before this change we accidentally skipped SPEC's lbm test case.
llvm-svn: 295567
Trying to fold such kind of dimensions will result in a division by zero,
which crashes the compiler. As such arrays are likely to invalidate the
scop anyhow (but are not illegal in LLVM-IR), there is no point in trying
to optimize the array layout. Hence, we just avoid the folding of
constant dimensions of size zero.
llvm-svn: 295415