A partial write is a write where the domain of the values written is a subset of
the execution domain of the parent statement containing the write. Originally,
we directly checked this subset relation whereas it is indeed only important
that the subset relation holds for the parameter values that are known to be
valid in the execution context of the scop. We update our check to avoid the
unnecessary introduction of partial writes in situations where the write appears
to be partial without context information, but where context information allows
us to understand that a full write can be generated.
This change fixes (hides) a recent regression introduced in r303517, which broke
our AOSP builds. The part that is correctly fixed in this change is that we do
not any more unnecessarily generate a partial write. This is good performance
wise and, as we currently do not yet explicitly introduce partial writes in the
default configuration, this also hides possible bugs in the partial writes
implementation. The crashes that we have originally seen were caused by such
a bug, where partial writes were incorrectly generated in region statements. An
additional patch in a subsequent commit is needed to address this problem.
Reported-by: Reported-by: Eli Friedman <efriedma@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D33759
llvm-svn: 304398
This change removes the requirement for explicit conversions from isl::boolean
to isl::bool, which resolves a compilation error on OSX.
Suggested-by: Siddharth Bhat <siddu.druid@gmail.com>
llvm-svn: 304288
Such instructions are generates on-demand by the CodeGenerator and thus
do not need representation in a statement.
Differential Revision: https://reviews.llvm.org/D33642
llvm-svn: 304151
Should not have 'fixed' the formatting issue, I did not have the most
recent version of `clang-format`.
This reverts commit 761b1268359e14e59142f253d77864a29d55c56c.
llvm-svn: 304148
- Fix formatting in `RegisterPasses.cpp`.
- `assert` tried to compare `isl::boolean` against `long`. Explicitly
construct `bool` from `isl::boolean`. This allows the implicit cast of
`bool` to `long.
llvm-svn: 304146
Certain affine memory accesses which we model today might contain products of
parameters which we might combined into a new parameter to be able to create an
affine expression that represents these memory accesses. Especially in the
context of OpenCL, this approach looses information as memory accesses such as
A[get_global_id(0) * N + get_global_id(1)] are assumed to be linear. We
correctly recover their multi-dimensional structure by assuming that parameters
that are the result of a function call at IR level likely are not parameters,
but indeed induction variables. The resulting access is now
A[get_global_id(0)][get_global_id(1)] for an array A[][N].
llvm-svn: 304075
Side-effect free function calls with only constant parameters can be easily
re-generated and consequently do not prevent us from modeling a SCEV. This
change allows array subscripts to reference function calls such as
'get_global_id()' as used in OpenCL.
We use the function name plus the constant operands to name the parameter. This
is possible as the function name is required and is not dropped in release
builds the same way names of llvm::Values are dropped. We also provide more
readable names for common OpenCL functions, to make it easy to understand the
polyhedral model we generate.
llvm-svn: 304074
Summary: This patch outputs all the list of instructions in BlockStmts.
Reviewers: Meinersbur, grosser, bollu
Subscribers: bollu, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D33163
llvm-svn: 304062
It seems we are still spending too much time on rare inputs, which continue to
timeout the AOSP buildbot. Let's see if a further reduction is sufficient.
llvm-svn: 303807
Summary:
My goal is to make the newly added `AllowWholeFunctions` options more usable/powerful.
The changes to ScopBuilder.cpp are exclusively checks to prevent `Region.getExit()` from being dereferenced, since Top Level Regions (TLRs) don't have an exit block.
In ScopDetection's `isValidCFG`, I removed a check that disallowed ReturnInstructions to have return values. This might of course have been intentional, so I would welcome your feedback on this and maybe a small explanation why return values are forbidden. Maybe it can be done but needs more changes elsewhere?
The remaining changes in ScopDetection are simply to consider the AllowWholeFunctions option in more places, i.e. allow TLRs when it is set and once again avoid derefererncing `getExit()` if it doesn't exist.
Finally, in ScopHelper.cpp I extended `polly::isErrorBlock` to handle regions without exit blocks as well: The original check was if a given BasicBlock dominates all predecessors of the exit block. Therefore I do the same for TLRs by regarding all BasicBlocks terminating with a ReturnInst as predecessors of a "virtual" function exit block.
Patch by: Lukas Boehm
Reviewers: philip.pfaffe, grosser, Meinersbur
Reviewed By: grosser
Subscribers: pollydev, llvm-commits, bollu
Tags: #polly
Differential Revision: https://reviews.llvm.org/D33411
llvm-svn: 303790
Enable the use for partial writes for PHI write accesses with a switch.
This simply skips the test for whether a PHI write would be partial.
The analog test for partial value writes also protects for partial reads
which we do not support (yet). It is possible to test for partial reads
separately such that we could skip the partial write check as well. In
case this shows up to be useful, I can implement it as well.
Differential Revision: https://reviews.llvm.org/D33487
llvm-svn: 303762
Without this patch, the JSONImporter did not verify if the data it loads
were correct or not (Bug llvm.org/PR32543). I add some checks in the
JSONImporter class and some test cases.
Here are the checks (and test cases) I added :
JSONImporter::importContext
- The "context" key does not exist.
- The context was not parsed successfully by ISL.
- The isl_set has the wrong number of parameters.
- The isl_set is not a parameter set.
JSONImporter::importSchedule
- The "statements" key does not exist.
- There is not the right number of statement in the file.
- The "schedule" key does not exist.
- The schedule was not parsed successfully by ISL.
JSONImporter::importAccesses
- The "statements" key does not exist.
- There is not the right number of statement in the file.
- The "accesses" key does not exist.
- There is not the right number of memory accesses in the file.
- The "relation" key does not exist.
- The memory access was not parsed successfully by ISL.
JSONImporter::areArraysEqual
- The "type" key does not exist.
- The "sizes" key does not exist.
- The "name" key does not exist.
JSONImporter::importArrays
/!\ Do not check if there is an key name "arrays" because it is not
considered as an error.
All checks are already in place or implemented in
JSONImporter::areArraysEqual.
Contributed-by: Nicolas Bonfante <nicolas.bonfante@insa-lyon.fr>
Differential Revision: https://reviews.llvm.org/D32739
llvm-svn: 303759
Summary: To move CG to the new PM I outlined the various helper that were previously members of the CG class into free static functions. The CG class itself I moved into a header, which is required because we need to include it in `RegisterPasses` eventually.
Reviewers: grosser, Meinersbur
Reviewed By: grosser
Subscribers: pollydev, llvm-commits, sanjoy
Tags: #polly
Differential Revision: https://reviews.llvm.org/D33423
llvm-svn: 303624
Summary: This patch ports IslAst to the new PM. The change is mostly straightforward. The only major modification required is making IslAst move-only, to correctly manage the isl resources it owns.
Reviewers: grosser, Meinersbur
Reviewed By: grosser
Subscribers: nemanjai, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D33422
llvm-svn: 303622
Summary: This patch ports DependenceInfo to the new ScopPassManager. Printing is implemented as a seperate printer pass.
Reviewers: grosser, Meinersbur
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D33421
llvm-svn: 303621
This speeds up scop modeling for scops with many redundent existentially
quantified constraints. For the attached test case, this change reduces
scop modeling time from minutes (hours?) to 0.15 seconds.
This change resolves a compilation timeout on the AOSP build.
Thanks Eli for reporting _and_ reducing the test case!
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 303600
The SCEVs of loops surrounding the escape users of a merge blocks are
forgotten, so that loop trip counts based on old values can be revoked.
This fixes llvm.org//PR32536
Contributed-by: Baranidharan Mohan <mbdharan@gmail.com>
Differential Revision: https://reviews.llvm.org/D33195
llvm-svn: 303561
Allow the BlockGenerator to generate memory writes that are not defined
over the complete statement domain, but only over a subset of it. It
generates a condition that evaluates to 1 if executing the subdomain,
and only then execute the access.
Only write accesses are supported. Read accesses would require a PHINode
which has a value if the access is not executed.
Partial write makes DeLICM able to apply mappings that are not defined
over the entire domain (for instance, a branch that leaves a loop with
a PHINode in its header; a MemoryKind::PHI write when leaving is never
read by its PHI read).
Differential Revision: https://reviews.llvm.org/D33255
llvm-svn: 303517
This commit exports the majority of the isl functions to the isl C++ interface.
The official isl C++ bindings still require discussions to define the set of
functions that are officially supported. As a result, the officially exported
functionality will be rather limited until these discussions conclude and a
non-trivial set of isl functions is officially supported through the isl C++
bindings. Starting from this commit we ship with Polly an extended version of
the official isl C++ bindings to ensure sufficient functionality is available
such that LLVM developers can make efficient use of isl through C++. The
practical experience Polly gathers with its bindings will then be used to
gradually upstream patches to isl to extend the official bindings.
llvm-svn: 303506
This reduces the diff to the official isl C++ bindings and solves a correctness
issue with isl::booleans, where isl_bool_error results were accidentally
converted to isl::boolean::true.
llvm-svn: 303505
Instead of relying on these functions to be part of the isl C++ bindings, we
just define this functionality independently. This allows us to use isl C++
bindings that do not contain LLVM specific functionality.
llvm-svn: 303503
- We use the outermost dimension of arrays since we need this
information to generate GPU transfers.
- In general, if we do not know the outermost dimension of the array
(because the indexing expression is non-affine, for example) then we
simply cannot generate transfer code.
- However, for Fortran arrays, we can use the Fortran array
representation which stores the dimensions of all arrays.
- This patch uses the Fortran array representation to generate code that
computes the outermost dimension size.
Differential Revision: https://reviews.llvm.org/D32967
llvm-svn: 303429
In r302231 we mistakenly use bitwise or (|) instead of logical
or (||). This patch fixes that.
Contributed-by: Sameer AbuAsal <sabuasal@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D33337
llvm-svn: 303386
Summary:
Implements PR889
Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:
https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing
This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal. However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue. I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.
I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.
Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv
Reviewed By: chandlerc
Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D31261
llvm-svn: 303362
Summary:
- Rename global / local naming convention that did not make much sense
to Visible / Invisible, where the visible refers to whether the ALLOCATE
call to the Fortran array is present in the current module or not.
- This match now works on both cross fortran module globals and on
parameters to functions since neither of them are necessarily allocated
at the point of their usage.
- Add testcase that matches against both a load and a store against
function parameters.
Differential Revision: https://reviews.llvm.org/D33190
llvm-svn: 303356
This patch adds both a ScopAnalysisManager and a ScopPassManager.
The ScopAnalysisManager is itself a Function-Analysis, and manages
analyses on Scops. The ScopPassManager takes care of building Scop pass
pipelines.
This patch is marked WIP because I've left two FIXMEs which I need to
think about some more. Both of these deal with invalidation:
Deferred invalidation is currently not implemented. Deferred
invalidation deals with analyses which cache references to other
analysis results. If these results are invalidated, invalidation needs
to be propagated into the caching analyses.
The ScopPassManager as implemented assumes that ScopPasses do not affect
other Scops in any way. There has been some discussion about this on
other patch threads, however it makes sense to reiterate this for this
specific patch.
I'm uploading this patch even though it's incomplete to encourage
discussion and give you an impression of how this is going to work.
Differential Revision: https://reviews.llvm.org/D33192
llvm-svn: 303062
- This breaks the previous assumption that Fortran Arrays are `GlobalValue`.
- The names of functions were getting unwieldy. So, I renamed the
Fortran related functions.
Differential Revision: https://reviews.llvm.org/D33075
llvm-svn: 303040
- auto + decltype + template use was not inferrable in
`Transform/Simplify.cpp accessesInOrder`.
- changed code to explicitly construct required vector instead of using
higher order iterator helpers.
- Failing compiler spec:
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
llvm-svn: 303039
At the time of code generation, an instruction with an llvm intrinsic is ignored
in copyBB. However, if the value of the instruction is used later in the
program, the value needs to be synthesized. However, this is causing some issues
with the instructions being generated in a hoisted basic block.
Removing llvm.expect from the list of ignored intrinsics fixes this bug.
This resolves http://llvm.org/PR32324.
Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32992
llvm-svn: 303006
Removal of overwritten writes currently encompasses all the cases
of the identical write removal.
There is an observable behavioral change in that the last, instead
of the first, MemoryAccess is kept. This should not affect the
generated code, however.
Differential Revision: https://reviews.llvm.org/D33143
llvm-svn: 302987
Remove memory writes that are overwritten by later writes. This works
for StoreInsts:
store double 21.0, double* %A
store double 42.0, double* %A
scalar writes at the end of a statement and mixes of these.
Multiple writes can be the result of DeLICM, which might map multiple
writes to the same location when it knows that these do no conflict
(for instance because they write the same value). Such writes
interfere with pattern-matched optimization such as gemm and may not
get removed by other LLVM passes after code generation.
Differential Revision: https://reviews.llvm.org/D33142
llvm-svn: 302986
Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.
Reviewers: Meinersbur, grosser
Reviewed By: grosser
Subscribers: pollydev, sanjoy, nemanjai, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31459
llvm-svn: 302902
Today Polly generates induction variable in this way:
polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar, (UB - stride)
Instead of:
polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar.next, UB
The way Polly generate induction variable cause some problem in the indvar simplify pass.
This patch make polly generate the later form, by assuming the induction variable never overflow
Differential Revision: https://reviews.llvm.org/D33089
llvm-svn: 302866
As with the scalar operand of the initial StoreInst, also use input
accesses when searching for new opportunities after mapping a
PHI write.
The same rational applies here: After LICM has been applied, the
promoted value will either be an instruction in the same statement
(in which case we fall back to try every scalar access of the
statement), or in another statement such that there will be such
an input access. In the latter case other scalars cannot have
originated from the same register promotion, at least not by LICM.
This mostly helps to decrease compilation time and makes debugging
easier by not pursuing unpromising routes. In some circumstances,
it may change the compiler's output.
llvm-svn: 302839
Previous to this patch, we used VirtualUse to determine the input
access of an llvm::Value in a statement. The input access is the
READ MemoryAccess that makes a value available in that statement,
which can either be a READ of a MemoryKind::Value or the
MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input
access to heuristically find a candidate to map without searching all
possible values.
This might modify the behaviour in that previously PHI accesses were
not considered input accesses before. This was unintentially lost when
"VirtualUse" was extracted from the "Known Knowledge" patch.
llvm-svn: 302838
When removing a MemoryAccess, also remove it from maps pointing to it.
This was already done for InstructionToAccess, but not yet for
ValueReads, ValueWrites and PHIWrites as those were only used during
the ScopBuilder phase. Keeping them updated allows us to use them
later as well.
llvm-svn: 302836
After DeLICM, it is possible to have two writes of the same value to
the same location in the same statement when it determined that those
writes do not conflict (write the same value).
Teach -polly-simplify to remove one of the writes. It interferes with
the pattern matching of matrix-multiplication kernels and also seem
to not be optimized away by LLVM.
The algorthm is simple, has O(n^2) behaviour (n = max number of
MemoryAccesses in a statement) and only matches the most obvious cases,
but seem to be enough to pattern-match Boost ublas gemm.
Not handled cases include:
- StoreInst instructions (a.k.a. explicit writes), since the value might
be loaded or overwritten between the two stores.
- PHINode, especially LCSSA, when the PHI value matches with on other's.
- Partial writes (in preparation)
llvm-svn: 302805
Some isl functions can simplify their __isl_keep arguments. The
argument object after the call uses different contraints to represent
the same set. Different contraints can result in different outputs
when printed to a string.
In assert builds additional isl functions are called (in assert() or
mentioned, these can change the internal representation of its read-only
arguments such that printed strings are different in debug and non-debug
builds.
What happened here is that a call to isl_set_is_equal inside an assert
in getScatterFor normalizes one of its arguments such that one redundant
constraint is removed. The redundant constraint therefore does not appear
in the string representing the domain, which FileCheck notices as a
regression test failure compared to a build with assertions disabled.
This fix removes the redundant contraints the domain from the start such
that the redundant contraint is removed in assert and non-assert builds.
Isl adds a flag to such sets such that the removal of redundancies is
not done multiple times (here: by isl_set_is_equal).
Thanks to Tobias Grosser for reporting and hinting to the cause.
llvm-svn: 302711
Add the ability to tag certain memory accesses as those belonging to
Fortran arrays. We do this by pattern matching against known patterns
of Dragonegg's LLVM IR output from Fortran code.
Fortran arrays have metadata stored with them in a struct. This struct
is called the "Fortran array descriptor", and a reference to this is
stored in each MemoryAccess.
Differential Revision: https://reviews.llvm.org/D32639
llvm-svn: 302653
Summary:
In case two arrays share base pointers in the same invariant load equivalence
class, we canonicalize all memory accesses to the first of these arrays
(according to their order in the equivalence class).
This enables us to optimize kernels such as boost::ublas by ensuring that
different references to the C array are interpreted as accesses to the same
array. Before this change the runtime alias check for ublas would fail, as it
would assume models of the C array with differing (but identically valued) base
pointers would reference distinct regions of memory whereas the referenced
memory regions were indeed identical.
As part of this change we remove most of the MemoryAccess::get*BaseAddr
interface. We removed already all references to get*BaseAddr in previous
commits to ensure that no code relies on matching base pointers between
memory accesses and scop arrays -- except for three remaining uses where we
need the original base pointer. We document for these situations that
MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
to the base pointer of the scop array referenced by this memory access.
Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert
Reviewed By: Meinersbur
Subscribers: etherzhhb
Tags: #polly
Differential Revision: https://reviews.llvm.org/D28518
llvm-svn: 302636
Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).
Reviewers: grosser, Meinersbur, bollu
Reviewed By: bollu
Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32961
llvm-svn: 302515
Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).
Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).
Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay
Reviewed By: grosser, Meinersbur
Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32431
llvm-svn: 302379
Extend the Knowledge class to store information about the contents
of array elements and which values are written. Two knowledges do
not conflict the known content is the same. The content information
if computed from writes to and loads from the array elements, and
represented by "ValInst": isl spaces that compare equal if the value
represented is the same.
Differential Revision: https://reviews.llvm.org/D31247
llvm-svn: 302339
Allow using a system's install jsoncpp library instead of the bundled
one with the setting POLLY_BUNDLED_JSONCPP=OFF.
This fixes llvm.org/PR32929
Differential Revision: https://reviews.llvm.org/D32922
llvm-svn: 302336
Scop::init is used only during SCoP construction. Therefore ScopBuilder
seems the more appropriate place for it. We integrate it onto its only
caller ScopBuilder::buildScop where some other construction steps
already took place.
Differential Revision: https://reviews.llvm.org/D32908
llvm-svn: 302276
SCoPs with unfeasible runtime context are thrown away and therefore
do not need their uses verified.
The added test case requires a complexity limit to exceed.
Normally, error statements are removed from the SCoP and for that
reason are skipped during the verification. If there is a unfeasible
runtime context (here: because of the complexity limit being reached),
the removal of error statements and other SCoP construction steps are
skipped to not waste time. Error statements are not modeled in SCoPs
and therefore have no requirements on whether the scalars used in
them are available.
llvm-svn: 302234
Since r294891, in MemoryAccess::computeBoundsOnAccessRelation(), we skip
manually bounding the access relation in case the parameter of the load
instruction is already a wrapped set. Later on we assume that the lower
bound on the set is always smaller or equal to the upper bound on the
set. Bug 32715 manages to construct a sign wrapped set, in which case
the assertion does not necessarily hold. Fix this by handling a sign
wrapped set similar to a normal wrapped set, that is skipping the
computation.
Contributed-by: Maximilian Falkenstein <falkensm@student.ethz.ch>
Reviewers: grosser
Subscribers: pollydev, llvm-commits
Tags: #Polly
Differential Revision: https://reviews.llvm.org/D32893
llvm-svn: 302231
This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933.
Patches should have been submitted in the order of:
1. D32852
2. D32854
3. D32431
I mistakenly pushed D32431(3) first. Reverting to push in the correct
order.
llvm-svn: 302217
Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).
Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).
Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay
Reviewed By: grosser, Meinersbur
Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32431
llvm-svn: 302215
If a ScopStmt references a (scalar) value, there are multiple
possibilities where this value can come. The decision about what kind of
use it is must be handled consistently at different places, which can be
error-prone. VirtualUse is meant to centralize the handling of the
different types of value uses.
This patch makes ScopBuilder and CodeGeneration use VirtualUse. This
already helps to show inconsistencies with the value handling. In order
to keep this patch NFC, exceptions to the general rules are added.
These might be fixed later if they turn to problems. Overall, this
should result in fewer post-codegen IR-verification errors, but instead
assertion failures in `getNewValue` that are closer to the actual error.
Differential Revision: https://reviews.llvm.org/D32667
llvm-svn: 302157
For certain test cases we spent over 50% of the scop detection time in
checking if a load is likely invariant. We can avoid most of these checks by
testing early on if a load is expected to be invariant. Doing this reduces
scop-detection time on a large benchmark from 52 seconds to just 25 seconds.
No functional change is expected.
llvm-svn: 302134
LLVM-IR names are commonly available in debug builds, but often not in release
builds. Hence, using LLVM-IR names to identify statements or memory reference
results makes the behavior of Polly depend on the compile mode. This is
undesirable. Hence, we now just number the statements instead of using LLVM-IR
names to identify them (this issue has previously been brought up by Zino
Benaissa).
However, as LLVM-IR names help in making test cases more readable, we add an
option '-polly-use-llvm-names' to still use LLVM-IR names. This flag is by
default set in the polly tests to make test cases more readable.
This change reduces the time in ScopInfo from 32 seconds to 2 seconds for the
following test case provided by Eli Friedman <efriedma@codeaurora.org> (already
used in one of the previous commits):
struct X { int x; };
void a();
#define SIG (int x, X **y, X **z)
typedef void (*fn)SIG;
#define FN { for (int i = 0; i < x; ++i) { (*y)[i].x += (*z)[i].x; } a(); }
#define FN5 FN FN FN FN FN
#define FN25 FN5 FN5 FN5 FN5
#define FN125 FN25 FN25 FN25 FN25 FN25
#define FN250 FN125 FN125
#define FN1250 FN250 FN250 FN250 FN250 FN250
void x SIG { FN1250 }
For a larger benchmark I have on-hand (10000 loops), this reduces the time for
running -polly-scops from 5 minutes to 4 minutes, a reduction by 20%.
The reason for this large speedup is that our previous use of printAsOperand
had a quadratic cost, as for each printed and unnamed operand the full function
was scanned to find the instruction number that identifies the operand.
We do not need to adjust the way memory reference ids are constructured, as
they do not use LLVM values.
Reviewed by: efriedma
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32789
llvm-svn: 302072
Before this change a memory reference identifier had the form:
<STMT>_<ACCESSTYPE><ID>_<MEMREF>, e.g., Stmt_bb9_Write0_MemRef_tmp11
After this change, we use the format:
<STMT>_<ACCESSTYPE><ID>, e.g., Stmt_bb9_Write0
The name of the array that is accessed through a memory reference is not
necessary to uniquely identify a memory reference, but was only added to
provide additional information for debugging. We drop this information now
for the following two reasons:
1) This shortens the names and consequently improves readability
2) This removes a second location where we decide on the name of a scop array,
leaving us only with the location where the actual scop array is created.
Having after 2) only a single location to name scop arrays will allow us to
change the naming convention of scop arrays more easily, which we will do
in a future commit to reduce compilation time.
llvm-svn: 302004
As has been reported in the previous commit, codegen verification can result in
quadratic compile time increases for large functions with many scops. This is
certainly not something we would like to have in the Polly default
configuration. Hence, we disable codegen verification by default -- also to see
if this resolves some of the compilation timeouts we currently see on the AOSP
buildbots. We still leave this feature in Polly as it has shown _very_ useful
for debugging. In fact, we may want to have a discussion if we can bring this
feature back in a way that does not impact compilation time so much.
Thanks to Eli Friedman <efriedma@codeaurora.org> for reporting this issue and
for providing the test case in the previous commit (where I forgot to
acknowledge him).
llvm-svn: 301670
Before this change, we always tried to verify the function and printed
verification errors, but just did not abort in case -polly-codegen-verify=false
was set and verification failed. As verification can become very cosly -- for
large functions with many scops we may verify the very same function very often
-- this can affect compile time very negatively. Hence, we respect the
-polly-codegen-verify flag with this check, ensuring that no verification is run
if -polly-codegen-verify=false.
This reduces code generation time from 26 seconds to 4 seconds on the test
case below with -polly-codegen-verify=false:
struct X { int x; };
void a();
#define SIG (int x, X **y, X **z)
typedef void (*fn)SIG;
#define FN { for (int i = 0; i < x; ++i) { (*y)[i].x += (*z)[i].x; } a(); }
#define FN5 FN FN FN FN FN
#define FN25 FN5 FN5 FN5 FN5
#define FN125 FN25 FN25 FN25 FN25 FN25
#define FN250 FN125 FN125
#define FN1250 FN250 FN250 FN250 FN250 FN250
void x SIG { FN1250 }
llvm-svn: 301669
generation.
This needs changes to GPURuntime to expose synchronization between host
and device.
1. Needs better function naming, I want a better name than
"getOrCreateManagedDeviceArray"
2. DeviceAllocations is used by both the managed memory and the
non-managed memory path. This exploits the fact that the two code paths
are never run together. I'm not sure if this is the best design decision
Reviewed by: PhilippSchaad
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32215
llvm-svn: 301640
When we introduced in r297375 support for hoisting loads that are known
to be dereferencable without any conditional guard, we forgot to keep the check
to verify that no other write into the very same location exists. This
change ensures now that dereferencable loads are allowed to access everything,
but can only be hoisted in case no conflicting write exists.
This resolves llvm.org/PR32778
Reported-by: Huihui Zhang <huihuiz@codeaurora.org>
llvm-svn: 301582
Polly comes in two library flavors: One loadable module to use the
LLVM framework -load mechanism, and another one that host applications
can link to. These have very different requirements for Polly's
own dependencies.
The loadable module assumes that all its LLVM dependencies are already
available in the address space of the host application, and is not allowed
to bring in its own copy of any LLVM library (including the NVPTX
backend in case of Polly-ACC).
The non-module library is intended to be linked to using
target_link_libraries. CMake would then resolve all of its dependencies,
including NVPTX and ensure that only a single instance of each library
will be used.
Differential Revision: https://reviews.llvm.org/D32442
llvm-svn: 301558
Do not conflict if a write writes the same value as already known.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32026
llvm-svn: 301460
Do not conflict if the value of Existing and Proposed are the same.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32025
llvm-svn: 301301
Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.
The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).
Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.
Contributed-by: Philipp Schaad
Reviewers: Meinersbur, grosser, bollu
Reviewed By: grosser, bollu
Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32215
llvm-svn: 301299
Earlier, the call to buildFlow was:
WAR = buildFlow(Write, Read, MustWrite, Schedule).
This meant that Read could block another Read, since must-sources can
block each other.
Fixed the call to buildFlow to correctly compute Read. The resulting
code needs to do some ISL juggling to get the output we want.
Bug report: https://bugs.llvm.org/show_bug.cgi?id=32623
Reviewers: Meinersbur
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32011
llvm-svn: 301266
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.
Differential Revision: https://reviews.llvm.org/D32027
llvm-svn: 300874
After the isl C++ binding generator is now close to being upstreamed to isl, we
synchronize the latest changes to Polly. These are mostly formatting changes
plus a small interface change for the foreach callback function and some naming
changes in isl::boolean.
llvm-svn: 300398
This commit switches Polly over to the isl::obj::foreach_* implementation, which
is part of the new isl bindings and follows the foreach pattern established in
Polly by Michael Kruse.
The original isl C function:
isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set *uset,
isl_stat (*fn)(__isl_take isl_set *set, void *user), void *user);
which required the user to define a static callback function to which all
interesting parameters are passed via a 'void *' user-pointer, is on the
C++ side available as a function that takes a std::function<>, which can
carry any additional arguments without the need for a user pointer:
stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const;
The following code illustrates the use of the new C++ interface:
auto Lambda = [=, &Result](isl::set Set) -> isl::stat {
auto Shifted = shiftDimension(Set, Pos, Amount);
Result = Result.add(Shifted);
return isl::stat::ok;
}
UnionSet.foreach_set(Lambda);
Polly had some specialized foreach functions which did not require the lambdas
to return a status flag. We remove these functions in this commit to move Polly
completely over to the new isl interface. We may in the future discuss if
functors without return values can be supported easily.
Another extension proposed by Michael Kruse is the use of C++ iterators to allow
the use of normal for loops to iterate over these sets. Such an extension would
allow us to further simplify the code.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D30620
llvm-svn: 300323
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.
For example, in case of the following matrix-matrix multiplication,
for (i = 0; i < 1024; i++)
for (k = 0; k < 1024; k++)
for (j = 0; j < 1024; j++)
C[i][j] += A[i][k] * B[k][j];
it can produce the following schedule tree
domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
0 <= i2 <= 1023 }"
child:
schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i1)] },
{ Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
permutable: 1
coincident: [ 1, 1, 0 ]
The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).
This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.
Refs.:
[1] - https://bugs.llvm.org/show_bug.cgi?id=32500
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D31741
llvm-svn: 299662
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.
Check against a possible divide-by-zero in getMacroKernelParams
and return early.
Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31708
llvm-svn: 299633
The current StackColoring algorithm does not correctly handle the
situation when some, but not all paths from a BB to the entry node
cross a llvm.lifetime.start. According to an interpretation of the
language reference at
http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic
this might be correct, but it would cost too much effort to handle
in StackColoring.
To be on the safe side, remove all lifetime markers even in the original
code version (they have never been copied to the optimized version)
to ensure that no path to the entry block will cross a
llvm.lifetime.start.
The same principle applies to paths the a function return and the
llvm.lifetime.end marker, so we remove them as well.
This fixes llvm.org/PR32251.
Also see the discussion at
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html
llvm-svn: 299585
= Change of WAR, WAW generation: =
- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
`sink <- may source <- must source` as a *may* dependence.
- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.
- Now, we call WAR and WAW correctly.
== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
WAW = isl_union_flow_get_may_dependence(Flow);
isl_union_flow_free(Flow);
```
== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
Flow = buildFlow(Write, Read, MustaWrite, Schedule);
WAR = isl_union_flow_get_must_dependence(Flow);
isl_union_flow_free(Flow);
```
- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
intuitively correct, as we do not want may-writes to block
must-writes.
- Leaves us with direct (R -> W).
- This affects reduction generation since RED is built using WAW and WAR.
= New StrictWAW for Reductions: =
- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```
- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.
= Explanation: Why the new WAR dependences in tests are correct: =
- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.
== Code: ==
```lang=llvm, name=new-war-dependence.ll
; void manyreductions(long *A) {
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S0: *A += 42;
;
; for (long i = 0; i < 1024; i++)
; for (long j = 0; j < 1024; j++)
; S1: *A += 42;
;
```
=== WAR dependence: ===
{ S0[1023, 1023] -> S1[0, 0] }
- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:
```lang=cpp, name=dependence-incorrect, counterexample
S0[1023, 1023]:
*-- tmp = *A (load0)--*
WAR 2 add = tmp + 42 |
*-> *A = add (store0) |
WAR 1
S1[0, 0]: |
tmp = *A (load1) |
add = tmp + 42 |
A = add (store1)<-*
```
- One may assume that WAR2 *hides* WAR1 (since store0 happens before
store1). However, within a statement, Polly has no idea about the
ordering of loads and stores.
- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
S0[1023, 1023]:
A = add (store0)
tmp = A (load0) ---*
add = A + 42 |
WAR 1
S1[0, 0]: |
tmp = A (load1) |
add = A + 42 |
A = add (store1) <-*
```
- So, Polly generates (correct) WAR dependences. It does not make sense
to remove these dependences, since they are correct with respect to
Polly's model.
Reviewers: grosser, Meinersbur
tags: #polly
Differential revision: https://reviews.llvm.org/D31386
llvm-svn: 299429
Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly.
This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however.
Reviewers: grosser, Meinersbur
Reviewed By: grosser
Subscribers: nemanjai, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31653
llvm-svn: 299423
Instead of creating the declaration ourselves, we obtain it directly from the
LLVM intrinsic definitions. This addresses a post-review comment for r299359.
Suggested-by: Hongzing Zheng <etherzhhb@gmail.com>
llvm-svn: 299360
Add support for -polly-codegen-perf-monitoring. When performance monitoring
is enabled, we emit performance monitoring code during code generation that
prints after program exit statistics about the total number of cycles executed
as well as the number of cycles spent in scops. This gives an estimate on how
useful polyhedral optimizations might be for a given program.
Example output:
Polly runtime information
-------------------------
Total: 783110081637
Scops: 663718949365
In the future, we might also add functionality to measure how much time is spent
in optimized scops and how many cycles are spent in the fallback code.
Reviewers: bollu,sebpop
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31599
llvm-svn: 299359
No-alias metadata grows quadratic in the size of arrays involved, which can
become very costly for large programs. This commit bounds the number of arrays
for which we construct no-alias information to ten. This is conservatively
correct, as we just provide less information to LLVM and speeds up the compile
time of one of my internal test cases from 'does-not-terminate' to
'finishes-in-less-than-a-minute'. In the future we might try to be more clever
here, but this change should provide a good baseline.
llvm-svn: 299352
Provide an common way for testing if a statement contains something
for region and block statements. First user is
RegionGenerator::addOperandToPHI.
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298617
Add shiftDim and convertZoneToTimepoints overloads for isl maps.
Add distributeDomain, liftDomains and applyDomainRange functions.
These are going to be used in https://reviews.llvm.org/D31247
(Add known array contents to Knowledge)
llvm-svn: 298543
The isl C++ bindings now has implicit conversions from isl::set to
isl::union_set. Therefore the additional overload accepting isl::set
is not required anymore.
llvm-svn: 298529
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30606
llvm-svn: 298510
Map the new load to the base pointer of the invariant load hoisted load
to be able to find the alias information for it.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30605
llvm-svn: 298507
"Write" is an overloaded term. In collectInfo() till buildFlow(), it is
used to mean "must writes". However, within the memory based analysis,
it is used to mean "both may and must writes". Renaming the Write
variable helps clarify this difference.
Reviewers: grosser
Tags: #polly
Differential Revision: https://reviews.llvm.org/D31181
llvm-svn: 298361
When not adding constraints on parameters using -polly-ignore-parameter-bounds,
the context may not necessarily list all parameter dimensions. To support code
generation in this situation, we now always iterate over the actual parameter
list, rather than relying on the context to list all parameter dimensions.
llvm-svn: 298197
After this change, enabling -polly-codegen-add-debug-printing in combination
with -polly-codegen-generate-expressions allows us to instrument the compiled
binaries to not only print the values stored and loaded to a given memory
access, but also to print the accessed location with array name and
per-dimension offset:
MemRef_A[3][2]
Store to 6299784: 5.000000
MemRef_A[3][3]
Load from 6299788: 0.000000
MemRef_A[3][3]
Store to 6299788: 6.000000
This can be very helpful for debugging.
llvm-svn: 298194
In commit r219005 lifetime markers have been introduced to mark the lifetime of
the OpenMP context data structure. However, their use seems incorrect and
recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was
not at all related to r298053. r298053 only caused a change in the loop order,
as this change resulted in a different isl internal representation which caused
the scheduler to derive a different schedule. This change then caused the IR to
change, which apparently created a pattern in which LLVM exploites the lifetime
markers. It seems we are using the OpenMP context outside of the lifetime
markers. Even though CrystalMk could probably be fixed by expanding the scope of
the lifetime markers, it is not clear what happens in case the OpenMP function
call is in a loop which will cause a sequence of starting and ending lifetimes.
As it is unlikely that the lifetime markers give any performance benefit, we
just drop them to remove complexity.
llvm-svn: 298192
The AssumptionCache removal of r289756 has been reverted in
r290086/r290087. A different solution has been implemented in r291671
which keeps the AssumptionCache. We can therefore use it again in Polly.
This reverts r289791.
llvm-svn: 298089
In the previous default ScopInfo applied the profitability heuristic for
scalar accesses (-polly-unprofitable-scalar-accs=true) and the
-polly-prune-unprofitable was disabled by default
(-polly-enable-prune-unprofitable=false) as that pruning was already done.
This changes switches the defaults to -polly-unprofitable-scalar-accs=true
-polly-enable-prune-unprofitable=false such that the scalar access
heuristic check is done by the pass. This allows passes between ScopInfo
and PruneUnprofitable to optimize away scalar accesses.
Without enabling such intermediate passes, there is no change in
behaviour of profitability checks in a PassManagerBuilder built
pass chain, but it allows us to cover this configuration with the
buildbots.
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298081
ScopInfo's normal profitability heuristic considers SCoPs where all
statements have scalar writes as not profitably optimizable and
invalidate the SCoP in that case. However, -polly-delicm and
-polly-simplify may be able to remove some of the scalar writes such
that the flag -polly-unprofitable-scalar-accs=false allows disabling
that part of the heuristic.
In cases where DeLICM (or other passes after ScopInfo) are not
successful in removing scalar writes, the SCoP is still not profitably
optimizable. The schedule optimizer would again try computing another
schedule, resulting in slower compilation.
The -polly-prune-unprofitable pass applies the profitability heuristic
again before the schedule optimizer Polly can still bail out even with
-polly-unprofitable-scalar-accs=false.
Differential Revision: https://reviews.llvm.org/D31033
llvm-svn: 298080
For experiments it is sometimes helpful to provide parameter bound information
to polly and to not use these parameter bounds for simplification.
Add a new option "-polly-ignore-parameter-bounds" which does precisely this.
llvm-svn: 298077
Dependences::calculateDependences.
This ensures that we handle may-writes correctly when building
dependence information. Also add a test case checking correctness of
may-write information. Not handling it before was an oversight.
Differential Revision: https://reviews.llvm.org/D31075
llvm-svn: 298074
For experiments it is sometimes helpful to not take any inbounds assumptions.
Add a new option "-polly-ignore-inbounds" which does precisely this.
llvm-svn: 298073
In subsequent changes we will make Polly a little bit more lazy in adding
parameter dimensions to different sets. As a result, not all parameters will
always be part of the parameter space. This change ensures that we do not use
the '-1' returned when a parameter dimension cannot be found, but instead
just do not try to eliminate the anyhow non-existing dimension.
llvm-svn: 298054
Since several years, isl can perform most operations on sets with differing
parameter spaces, by expanding the parameter space on demand relying using
named isl ids to distinguish different parameter dimensions.
By not always expanding to full dimensionality the set remain smaller and can
likely be operated on faster. This change by itself did not yet result in
measurable performance benefits, but it is a step into the right direction
needed to ensure that subsequent changes indeed can work with lower-dimensional
sets and these sets do not get blown up by accident when later intersected with
the domain context.
llvm-svn: 298053
Introduce ScopStmt::getSurroundingLoop() to replace getFirstNonBoxedLoopFor.
getSurroundingLoop() returns the precomputed surrounding/first non-boxed
loop. Except in ScopDetection, the list of boxed loops is only used to
get the surrounding loop. getFirstNonBoxedLoopFor also requires LoopInfo
at every use which is not necessarily available everywhere where we may
want to use it.
Differential Revision: https://reviews.llvm.org/D30985
llvm-svn: 297899
The bindings currently need to be generated manually, as they are not yet
part of the official isl distribution. Hence, we keep them across updates
assuming they only need to be updated when new functions or functionality
should be exposed.
llvm-svn: 297710
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.
The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".
Reviewers: grosser
Subscribers: gareevroman, pollydev
Tags: #polly
Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>
Differential Revision: https://reviews.llvm.org/D30815
llvm-svn: 297587
If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.
Reviewers: grosser
Subscribers: nemanjai
Tags: #polly
Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com>
Differential Revision: https://reviews.llvm.org/D30864
llvm-svn: 297578
As most discussions about these bindings have concluded and only the final
patch review on the isl mailing list is missing, we drop the experimental
warning tag to match the patchset we will submit to isl, which is expected to
not change notably any more.
llvm-svn: 297519
Instead of declaring a function as:
inline val plain_get_val_if_fixed(enum dim type, unsigned int pos) const;
we use:
inline isl::val plain_get_val_if_fixed(isl::dim type, unsigned int pos) const;
The first argument caused the following compile time error on windows:
"error C3431: 'dim': a scoped enumeration cannot be redeclared as an
unscoped enumeration"
In some cases it is sufficient to just drop the 'enum' prefix, but for example
for isl::set the 'enum class dim' type collides with the function name
isl::set::dim and can consequently not be referenced. To avoid such kind of
ambiguities in the future we add the isl:: prefix consistently to all types
used.
Reported-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 297478
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.
It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.
It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.
Differential Revision: https://reviews.llvm.org/D30820
llvm-svn: 297473
This pass is a small and self-contained example of a piece of code that was
written with the isl C interface. The diff of this change nicely shows how the
C++ bindings can improve the readability of the code by avoiding the long C
function names and by avoiding any need for memory management.
As you will see, no calls to isl_*_copy or isl_*_free are needed anymore.
Instead the C++ interface takes care of automatically managing the objects.
This may introduce internally additional copies, but due to the isl reference
counting, such copies are expected to be cheap. For performance critical
operations, we will later exploit move semantics to eliminate unnecessary
copies that have shown to be costly.
Below we give a set of examples that shows the benefit of the C++ interface vs.
the pure C interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(aff) || isl_aff_is_one(aff))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA = isl_union_pw_multi_aff_from_union_map(umap);
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto *Empty = isl_union_map_empty(space);
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
set = isl_union_set_intersect(set, set2);
After:
Set = Set.intersect(Set2);
The use of isl::boolean in return types also adds an increases the robustness
of Polly, as on conversion to true or false, we verify that no isl_bool_error
has been returned and assert in case an error was returned. Before this change
we would have just ignored the error and proceeded with (some) exection path.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30619
llvm-svn: 297466
Translate the full algorithm to use the new isl C++ bindings
This is a large piece of code that has been written with the Polly IslPtr<>
memory management tool, which only performed memory management, but did not
provide a method interface. As such the code was littered with calls to
give(), copy(), keep(), and take(). The diff of this change should give a
good example how the new method interface simplifies the code by removing the
need for switching between managed types and C functions all the time
and consequently also the need to use the long C function names.
These are a couple of examples comparing the old IslPtr memory management
interface with the complete method interface.
Check properties
----------------
Before:
if (isl_aff_is_zero(Aff.get()) || isl_aff_is_one(Aff.get()))
return true;
After:
if (Aff.is_zero() || Aff.is_one())
return true;
Type conversion
---------------
Before:
isl_union_pw_multi_aff *UPMA =
give(isl_union_pw_multi_aff_from_union_map(UMap.copy());
After:
isl::union_pw_multi_aff UPMA = UMap;
Type construction
-----------------
Before:
auto Empty = give(isl_union_map_empty(Space.copy());
After:
auto Empty = isl::union_map::empty(Space);
Operations
----------
Before:
Set = give(isl_union_set_intersect(Set.copy(), Set2.copy());
After:
Set = Set.intersect(Set2);
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30617
llvm-svn: 297463
The isl C++ binding method interface introduces a thin C++ layer that allows
to call isl methods directly on the memory managed C++ objects. This makes the
relevant methods directly available via code-completion interfaces, allows for
the use of overloading, conversion constructors, and many other nice C++
features that make using isl a lot easier.
The individual features will be highlighted in the subsequent commits.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30616
llvm-svn: 297462
Over the last couple of months several authors of independent isl C++ bindings
worked together to jointly design an official set of isl C++ bindings which
combines their experience in developing isl C++ bindings. The new bindings have
been designed around a value pointer style interface and remove the need for
explicit pointer managenent and instead use C++ language features to manage isl
objects.
This commit introduces the smart-pointer part of the isl C++ bindings and
replaces the current IslPtr<T> classes, which served the very same purpose, but
had to be manually maintained. Instead, we now rely on automatically generated
classes for each isl object, which provide value_ptr semantics.
An isl object has the following smart pointer interface:
inline set manage(__isl_take isl_set *ptr);
class set {
friend inline set manage(__isl_take isl_set *ptr);
isl_set *ptr = nullptr;
inline explicit set(__isl_take isl_set *ptr);
public:
inline set();
inline set(const set &obj);
inline set &operator=(set obj);
inline ~set();
inline __isl_give isl_set *copy() const &;
inline __isl_give isl_set *copy() && = delete;
inline __isl_keep isl_set *get() const;
inline __isl_give isl_set *release();
inline bool is_null() const;
}
The interface and behavior of the new value pointer style classes is inspired
by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which
proposes a std::value_ptr, a smart pointer that applies value semantics to its
pointee.
We currently only provide a limited set of public constructors and instead
require provide a global overloaded type constructor method "isl::obj
isl::manage(isl_obj *)", which allows to convert an isl_set* to an isl::set by
calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor
for unique pointers.
The next two functions isl::obj::get() and isl::obj::release() are taken
directly from the std::value_ptr proposal:
S.get() extracts the raw pointer of the object managed by S.
S.release() extracts the raw pointer of the object managed by S and sets the
object in S to null.
We additionally add std::obj::copy(). S.copy() returns a raw pointer refering
to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a
functionality commonly needed when interacting directly with the isl C
interface where all methods marked with __isl_take require consumable raw
pointers.
S.is_null() checks if S manages a pointer or if the managed object is currently
null. We add this function to provide a more explicit way to check if the
pointer is empty compared to a direct conversion to bool.
This commit also introduces a couple of polly-specific extensions that cover
features currently not handled by the official isl C++ bindings draft, but
which have been provided by IslPtr<T> and are consequently added to avoid code
churn. These extensions include:
- operator bool() : Conversion from objects to bool
- construction from nullptr_t
- get_ctx() method
- take/keep/give methods, which match the currently used naming
convention of IslPtr<T> in Polly. They just forward to
(release/get/manage).
- raw_ostream printers
We expect that these extensions are over time either removed or upstreamed to
the official isl bindings.
We also export a couple of classes that have not yet been exported in isl (e.g.,
isl::space)
As part of the code review, the following two questions were asked:
- Why do we not use a standard smart pointer?
std::value_ptr was a proposal that has not been accepted. It is consequently
not available in the standard library. Even if it would be available, we want
to expand this interface with a complete method interface that is conveniently
available from each managed pointer. The most direct way to achieve this is to
generate a specialiced value style pointer class for each isl object type and
add any additional methods to this class. The relevant changes follow in
subsequent commits.
- Why do we not use templates or macros to avoid code duplication?
It is certainly possible to use templates or macros, but as this code is
auto-generated there is no need to make writing this code more efficient. Also,
most of these classes will be specialized with individual member functions in
subsequent commits, such that there will be little code reuse to exploit. Hence,
we decided to do so at the moment.
These bindings are not yet officially part of isl, but the draft is already very
stable. The smart pointer interface itself did not change since serveral months.
Adding this code to Polly is against our normal policy of only importing
official isl code. In this case however, we make an exception to showcase a
non-trivial use case of these bindings which should increase confidence in these
bindings and will help upstreaming them to isl.
Tags: #polly
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D30325
llvm-svn: 297452
This pass allows writing the LLVM-IR just before and after the Polly
passes to a file.
Dumping the IR before Polly helps reproducing bugs that occur in code
generated by clang. It is the only reliable way to get the IR that
triggers a bug. The alternative is to emit the IR with
clang -c -emit-llvm -S -o dump.ll
then pass it through all optimization passes
opt dump.ll -basicaa -sroa ... -S -o optdump.ll
to then reproduce the error with
opt optdump.ll -polly-opt-isl -polly-codegen -analyze
However, the IR is not the same. -O3 uses a PassBuilder than creates passes
with different parameters than the default.
Dumping the IR after Polly is useful to compare a miscompilation with
a known-good configuration.
Differential Revision: https://reviews.llvm.org/D30788
llvm-svn: 297415
In case LLVM pointers are annotated with !dereferencable attributes/metadata
or LLVM can look at the allocation from which a pointer is derived, we can know
that dereferencing pointers is safe and can be done unconditionally. We use this
information to proof certain pointers as save to hoist and then hoist them
unconditionally.
llvm-svn: 297375
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.
This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.
Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.
The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.
A patch that makes codegen support partial accesses is in preparation
as well.
Differential Revision: https://reviews.llvm.org/D30763
llvm-svn: 297373
Simplify ScopDetection::isInvariant(). Essentially deny everything that
is defined within the SCoP and is not load-hoisted.
The previous understanding of "invariant" has a few holes:
- Expressions without side-effects with only invariant arguments, but
are defined withing the SCoP's region with the exception of selects
and PHIs. These should be part of the index expression derived by
ScalarEvolution and not of the base pointer.
- Function calls with that are !mayHaveSideEffects() (typically
functions with "readnone nounwind" attributes). An example is given
below.
@C = external global i32
declare float* @getNextBasePtr(float*) readnone nounwind
...
%ptr = call float* @getNextBasePtr(float* %A, float %B)
The call might return:
* %A, so %ptr aliases with it in the SCoP
* %B, so %ptr aliases with it in the SCoP
* @C, so %ptr aliases with it in the SCoP
* a new pointer everytime it is called, such as malloc()
* a pointer into the allocated block of one of the aforementioned
* any of the above, at random at each call
Hence and contrast to a comment in the base_pointer.ll regression
test, %ptr is not necessarily the same all the time. It might also
alias with anything and no AliasAnalysis can tell otherwise if the
definition is external. It is hence not suitable in the role of a
base pointer.
The practical problem with base pointers defined in SCoP statements is
that it is not available globally in the SCoP. The statement instance
must be executed first before the base pointer can be used. This is no
problem if the base pointer is transferred as a scalar value between
statements. Uses of MemoryAccess::setNewAccessRelation may add a use of
the base pointer anywhere in the array. setNewAccessRelation is used by
JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently
assumes that base pointers are available globally and generates invalid
code for new access relation (referring to the base pointer of the
original code) if not, even if the base pointer would be available in
the statement.
This could be fixed with some added complexity and restrictions. The
ExprBuilder must lookup the local BBMap and code that call
setNewAccessRelation must check whether the base pointer is available
first.
The code would still be incorrect in the presence of aliasing. There
is the switch -polly-ignore-aliasing to explicitly allow this, but
it is hardly a justification for the additional complexity. It would
still be mostly useless because in most cases either getNextBasePtr()
has external linkage in which case the readnone nounwind attributes
cannot be derived in the translation unit itself, or is defined in the
same translation unit and gets inlined.
Reviewed By: grosser
Differential Revision: https://reviews.llvm.org/D30695
llvm-svn: 297281
Only when load-hoisted we can be sure the base pointer is invariant
during the SCoP's execution. Most of the time it would be added to
the required hoists for the alias checks anyway, except with
-polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if
AliasAnalysis is already sure it doesn't alias with anything
(for instance if there is no other pointer to alias with).
Two more parts in Polly assume that this load-hoisting took place:
- setNewAccessRelation() which contains an assert which tests this.
- BlockGenerator which would use to the base ptr from the original
code if not load-hoisted (if the access expression is regenerated)
Differential Revision: https://reviews.llvm.org/D30694
llvm-svn: 297195
Our current scop modeling enters an infinite loop when trying to model code
that has unreachable instructions (e.g.,
test/ScopInfo/BoundChecks/single-loop.ll), as the number of basic blocks
returned by the LLVM Loop* does not include unreachable basic blocks that
branch off from the core loop body. This arises for example in the following
piece of code:
for (i = 0; i < N; i++) {
if (i > 1024)
abort(); <- this abort might be translated to an
unreachable
A[i] = ...
}
This patch adds these unreachable basic blocks in our per loop basic block
count to ensure that the schedule construction does not assume a loop has been
processed completely, despite certain unreachable basic blocks still remaining.
The infinite loop is only observable in combination with
https://reviews.llvm.org/D12676 or a similar patch.
llvm-svn: 297156
Scops that exit with an unreachable are today still permitted, but make little
sense to optimize. We therefore can already skip them during scop detection.
This speeds up scop detection in certain cases and also ensures that bugpoint
does not introduce unreachables when reducing test cases.
In practice this change should have little impact, as the performance of
unreachable code is unlikely to matter.
This commit is part of a series that makes Polly more robust in the presence
of unreachables.
llvm-svn: 297151
These loads cannot be savely hoisted as the condition guarding the
non-affine region cannot be duplicated to also protect the hoisted load
later on. Today they are dropped in ScopInfo. By checking for this early, we
do not even try to model them and possibly can still optimize smaller regions
not containing this specific required-invariant load.
llvm-svn: 296744
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.
Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access
[n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }
which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.
For people only reading commit messages, here the comment that explains what
memory folding is:
To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.
We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.
As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:
[i,j] -> [i, j] : j >= 0
[i,j] -> [i-1, j+N] : j < 0
After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.
llvm-svn: 296679
Without this simplification for a loop nest:
void foo(long n1_a, long n1_b, long n1_c, long n1_d,
long p1_b, long p1_c, long p1_d,
float A_1[][p1_b][p1_c][p1_d]) {
for (long i = 0; i < n1_a; i++)
for (long j = 0; j < n1_b; j++)
for (long k = 0; k < n1_c; k++)
for (long l = 0; l < n1_d; l++)
A_1[i][j][k][l] += i + j + k + l;
}
the assumption:
n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
(n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)
is taken rather than the simpler assumption:
p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.
The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.
This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.
llvm-svn: 296456
This patch adds an option to build against a version of libisl already
installed on the system. The installation is autodetected using the
pkg-config file shipped with isl.
The detection of the library is in the FindISL.cmake module that creates
an imported target.
Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>
Differential Revision: https://reviews.llvm.org/D30043
llvm-svn: 296361
We can not perform the dependence analysis and, consequently, the parallel
code generation in case the schedule tree contains extension nodes.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30394
llvm-svn: 296325
Control flow would flow-through after the check whether the operations
quota exceeded, with the intention that it would later be caught by
Knowledge::isUsable(). However, the Knowledge constructor has its own
assertions to check consistency which would fail if its fields have only
been initialized partially because some sets have been computed correctly
before the operations quota takes effect.
Fix by erroring-out early instead of falling-throught into the code that
might expect that everything has been computed correctly. For robustness,
also bail-out if any of the fields contain nullptr values instead of
relying on isl always setting exactly this error code if something went
wrong.
This should fix the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable
(-polly-process-unprofitable -polly-position=before-vectorizer
-polly-enable-delicm) buildbot.
llvm-svn: 296022
NonowningIslPtr<isl_X> was used as types of function parameters when the
function does not consume the isl object, i.e. an __isl_keep parameter.
The alternatives are:
1. IslPtr<isl_X>
This has additional calls to isl_X_copy and isl_X_free to
increase/decrease the reference counter even though not needed. The
caller already owns a reference to the isl object.
2. const IslPtr<isl_X>&
This does not change the reference counter, but requires an
additional load to get the pointer to the isl object (instead of just
passing the pointer itself).
Moreover, the compiler cannot rely on the constness of the pointer
and has to reload the pointer every time it writes to memory (unless
alias analysis such as TBAA says it is not possible).
The isl C++ bindings currently in development do not have an equivalent
to NonowningIslPtr and adding one would make the binding more
complicated and its advantage in performance is small. In order to
simplify the transition to these C++ bindings, remove NonowningIslPtr.
Change every former use of it to alternative 2 mentioned aboce
(const IslPtr<isl_X>&).
llvm-svn: 295998
Once a StmtSchedule is created, only its domain is used anywhere within
DependenceInfo::calculateDependences. So, we choose to return the
wrapped domain of the union_map rather than the entire union_map.
However, we still build the union_map first within collectInfo(). It is
cleaner to first build the entire union_map and then pull the domain out in
one shot, rather than repeatedly extracting the domain in bits and pieces
from accdom.
Contributed-by: Siddharth Bhat <siddu.druid@gmail.com>
Differential Revision: https://reviews.llvm.org/D30208
llvm-svn: 295984
Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause the the Polly pass chain to be interrupted. It is not used by any
Polly pass anymore, hence we can remove all references to it.
llvm-svn: 295983
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D30293
llvm-svn: 295958
These tests were not included in the main DeLICM commit. These check the
cases where zone analysis cannot be successful because of assumption
violations.
We use the LLVM optimization remark infrastructure as it seems to be the
best fit for this kind of messages. I tried to make use if the
OptimizationRemarkEmitter. However, it would insert additional function
passes into the pass manager to get the hotness information. The pass
manager would insert them between the flatten pass and delicm, causing
the ScopInfo with the flattened schedule being thrown away.
Differential Revision: https://reviews.llvm.org/D30253
llvm-svn: 295846
There is no template specialization for cl::parser<unsigned long> such
that parsing an cl::opt<unsigned long> command line argument will fail.
Use opt<int> instead which has an associated parser.
llvm-svn: 295832
We only ever use the wrapped domain of AccessSchedule, so stop
creating an entire union_map and then pulling the domain out.
Reviewers: grosser
Tags: #polly
Contributed-by: Siddharth Bhat <siddu.druid@gmail.com>
Differential Revision: https://reviews.llvm.org/D30179
llvm-svn: 295726
Implement the -polly-delicm pass. The pass intends to undo the
effects of LoopInvariantCodeMotion (LICM) which adds additional scalar
dependencies into SCoPs. DeLICM will try to map those scalars back to
the array elements they were promoted from, as long as the array
element is unused.
The is the main patch from the DeLICM/DePRE patch series. It does not
yet undo GVN PRE for which additional information about known values
is needed and does not handle PHI write accesses that have have no
target. As such its usefulness is limited. Patches for these issues
including regression tests for error situatons will follow.
Reviewers: grosser
Differential Revision: https://reviews.llvm.org/D24716
llvm-svn: 295713
isl headers are currently missing in a Polly installation. Because the
Polly headers depend on those, code can't be compiled against an
installed Polly.
This patch installs the isl headers. I left a TODO, as optionally it
should be possible to use a system version of isl instead of the one
shipped with Polly.
When compiling, clients of the installation need to add
-I${PREFIX}/include/polly/ to there include path right now, because
there currently is no way to export this path automatically.
Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>
Differential Revision: https://reviews.llvm.org/D29931
llvm-svn: 295671
Instead of counting the number of read-only accesses, we now count the number of
distinct read-only array references when checking if a run-time alias check
may be too complex. The run-time alias check is quadratic in the number of
base pointers, not the number of accesses.
Before this change we accidentally skipped SPEC's lbm test case.
llvm-svn: 295567
This change gets rid of the need for zero padding, makes the reduction
computation code more similar to the normal dependence computation, and also
better documents what we do at the moment.
Making the dependence computation for reductions a little bit easier to
understand will hopefully help us to further reduce code duplication.
This reduces the time spent only in the reduction dependence pass from 260ms to
150ms for test/DependenceInfo/reduction_sequence.ll. This is a reduction of over
40% in dependence computation time.
This change was inspired by discussions with Michael Kruse, Utpal Bora,
Siddharth Bhat, and Johannes Doerfert. It can hopefully lay the base for further
cleanups of the reduction code.
llvm-svn: 295550
Trying to fold such kind of dimensions will result in a division by zero,
which crashes the compiler. As such arrays are likely to invalidate the
scop anyhow (but are not illegal in LLVM-IR), there is no point in trying
to optimize the array layout. Hence, we just avoid the folding of
constant dimensions of size zero.
llvm-svn: 295415
Before this change wrapping range metadata resulted in exponential growth of
the context, which made context construction of large scops very slow. Instead,
we now just do not model the range information precisely, in case the number
of disjuncts in the context has already reached a certain limit.
llvm-svn: 295360
Commit r230230 introduced the use of range metadata to derive bounds for
parameters, instead of just looking at the type of the parameter. As part of
this commit support for wrapping ranges was added, where the lower bound of a
parameter is larger than the upper bound:
{ 255 < p || p < 0 }
However, at the same time, for wrapping ranges support for adding bounds given
by the size of the containing type has acidentally been dropped. As a result,
the range of the parameters was not guaranteed to be bounded any more. This
change makes sure we always add the bounds given by the size of the type and
then additionally add bounds based on signed wrapping, if available. For a
parameter p with a type size of 32 bit, the valid range is then:
{ -2147483648 <= p <= 2147483647 and (255 < p or p < 0) }
llvm-svn: 295349
The Knowledge class remembers the state of data at any timepoint of a SCoP's
execution. Currently, it tracks whether an array element is unused or is
occupied by some value, and the writes to it. A future addition will be to also
remember which value it contains.
Objects are used to determine whether two Knowledge contain conflicting
information, i.e. two states cannot be true a the same time.
This commit was extracted from the DeLICM algorithm at
https://reviews.llvm.org/D24716.
llvm-svn: 295197
Formatting unnamed array names is expensive in LLVM as the this requires
deriving the numbered virtual instruction name (e.g., %12) for an llvm::Value,
which is currently not implemented efficiently. As instruction numberes anyhow
do not really carry a lot of information for the user, we just print 'unknown'
instead.
This change reduces the scop detection time from 24 to 19 seconds, for one of
our large-scale inputs. This is a reduction by 21%.
llvm-svn: 294894
When deriving the range of valid values of a scalar evolution expression might
be a range [12, 8), where the upper bound is smaller than the lower bound and
where the range is expected to possibly wrap around. We theoretically could
model such a range as a union of two non-wrapping ranges, but do not do this
as of yet. Instead, we just do not derive any bounds. Before this change,
we could have obtained bounds where the maximal possible value is strictly
smaller than the minimal possible value, which is incorrect and also caused
assertions during scop modeling.
llvm-svn: 294891
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Sven Verdoolaege <skimo-polly@kotnet.org>
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29814
llvm-svn: 294836
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29269
llvm-svn: 294828
This change clarfies that we want to indeed use the original base address
when creating the ScopArrayInfo that corresponds to a given memory access.
This change prepares for https://reviews.llvm.org/D28518.
llvm-svn: 294734
This replaces the use of getOriginalAddrPtr, a value that is stored in
ScopArrayInfo and might at some point not be unique any more. However, the
access value is defined to be unique.
This change is an update on r294576, which only clarified that we need the
original memory access, but where we still remained dependent to have one base
pointer per scop.
This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294733
When generating code in the BlockGenerator we copy all (interesting)
instructions and keep track of the new values in a basic block map. To obtain
the original llvm::Value that belongs to a load memory access, we use
getAccessValue() instead of getOriginalBaseAddr(). The former always references
the instruction we use to load values from. The latter, on the other hand,
is obtaine from the corresponding ScopArrayInfo and would not be unique in
case ScopArrayInfo objects at some point allow memory accesses with different
base addresses.
This change is an update on r294566, which only clarified that we need the
original memory access, but where we still remained dependent to have one
base pointer per scop.
This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294669
By using the public interface MemoryAccess::getScopArrayInfo() we avoid the
direct access to the ScopArrayInfoMap and as a result also do not need to
use the BasePtr as key. This change makes the code cleaner.
The const-cast we introduce is a little ugly. We may consider to drop const
correctness for getScopArrayInfo() at some point.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294655
LLVM's coding conventions suggest to use auto only in obvious cases. Hence,
we move this code to actually declare the types used. We also replace the
variable name 'SAI', with the name 'Array', as this improves readability.
llvm-svn: 294654
When building alias groups, we sort different ScopArrays into unrelated groups.
Historically we identified arrays through their base pointer, as no
ScopArrayInfo class was yet available. This change changes the alias group
construction to reference arrays through their ScopArrayInfo object.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294649
During SCoP construction we sometimes inspect the underlying IR by looking at
the base address of a MemoryAccess. In such cases, we always want the original
base address. Make this clear by calling getOriginalBaseAddr().
This is a non-functional change as getBaseAddr maps to getOriginalBaseAddr
at the moment.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294576
The base address of a memory access is already an llvm::Value. Hence, there is
no need to go through SCEV, but we can directly work with the llvm::Value.
Also use 'Value *' instead of 'auto' for cases where the type is not obvious.
llvm-svn: 294575
Instead of iterating over statements and their memory accesses to extract the
set of available base pointers, just directly iterate over all ScopArray
objects. This reflects more the actual intend of the code: collect all arrays
(and their base pointers) to emit alias information that specifies that accesses
to different arrays cannot alias.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294574
Before this change we used the name of the base pointer to mark reductions. This
is imprecise as the canonical reference is the ScopArray itself and not the
basepointer of a reduction. Using the base pointer of reductions is problematic
in cases where a single ScopArray is referenced through two different base
pointers.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294568
When computing reduction dependences we first identify all ScopArrays which are
part of reductions and then only compute for these ScopArrays the more detailed
data dependences that allow us to identify reductions and optimize across them.
Instead of using the base pointer as identifier of a ScopArray, it is clearer
and more understandable to directly use the ScopArray as identifier. This change
implements such a switch.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294567
When regenerating code in the BlockGenerator we copy instructions that may
references scalar values, for which the new value of a given scalar is looked up
in BBMap using the original scalar llvm::Value as index. It is consequently
necessary that (re)loaded scalar values are made available in BBMap using the
original llvm::Value as key independently if the llvm::Value was (re)loaded from
the original scalar or a new access function has been specified that caused the
value to be reloaded from an array with a differnet base address. We make this
clear by using MemoryAccess::getOriginalBaseAddr() instead of
MemoryAccess::getBaseAddr() as index to BBMap.
This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.
llvm-svn: 294566
optimization
Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.
In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.
The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.
In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.
It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D29244
llvm-svn: 294564
with optimizeMatMulPattern
This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.
llvm-svn: 294444
This function has been extracted from the upcoming DeLICM patch
(https://reviews.llvm.org/D24716).
In contrast to computeReachingWrite and computeArrayUnused,
convertZoneToTimepoints implies a format for zones (ranges between timepoints).
Zones at the moment are unique to DeLICM, but convertZoneToTimepoints makes most
sense in conjunction with the previous two functions.
llvm-svn: 294094
multiplication
The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28357
llvm-svn: 293890
Add a simple example to update the documentation on how the packing
transformation is implemented.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28021
llvm-svn: 293429
Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single map
from ScopArrayInfo to the corresponding Alloca. This change is intended, both as
a general simplification and cleanup, but also to reduce our use of
MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure
we have only a single place where the array (and its base pointer) for which we
generate code for is specified, which means we can more easily introduce new
access functions that use a different ScopArrayInfo as base. We already today
experiment with modifiable access functions, so this change does not address
a specific bug, but it just reduces the scope one needs to reason about.
Another motivation for this patch is https://reviews.llvm.org/D28518, where
memory accesses with different base pointers could possibly be mapped to a
single ScopArrayInfo object. Such a mapping is currently not possible, as we
currently generate alloca instructions according to the base addresses of the
memory accesses, not according to the ScopArrayInfo object they belong to. By
making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo
object will automatically mean that the same stack slot is used for these
arrays. For D28518 this is not a problem, as only MemoryType::Array objects are
mapping, but resolving this inconsistency will hopefully avoid confusion.
llvm-svn: 293374
Add some generally useful isl tools into a their own new ISLTools.cpp.
These are the helpers were extracted from and will be use by the DeLICM
algorithm (https://reviews.llvm.org/D24716).
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 293340
Before this change the user only saw "Unspecified Error", when a region
contained the entry block. Now we report:
"Scop contains function entry (not yet supported)."
llvm-svn: 293169
Before this change we created an additional reload in the copy of the incoming
block of a PHI node to reload the incoming value, even though the necessary
value has already been made available by the normally generated scalar loads.
In this change, we drop the code that generates this redundant reload and
instead just reuse the scalar value already available.
Besides making the generated code slightly cleaner, this change also makes sure
that scalar loads go through the normal logic, which means they can be remapped
(e.g. to array slots) and corresponding code is generated to load from the
remapped location. Without this change, the original scalar load at the
beginning of the non-affine region would have been remapped, but the redundant
scalar load would continue to load from the old PHI slot location.
It might be possible to further simplify the code in addOperandToPHI,
but this would not only mean to pull out getNewValue, but to also change the
insertion point update logic. As this did not work when trying it the first
time, this change is likely not trivial. To not introduce bugs last minute, we
postpone further simplications to a subsequent commit.
We also document the current behavior a little bit better.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D28892
llvm-svn: 292486
Making certain values 'const' to just cast it away a little later mainly
obfuscates the code. Hence, we just drop the 'const' parts.
Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 292480
Summary:
Instead of forbidding such access functions completely, we verify that their
base pointer has been hoisted and only assert in case the base pointer was
not hoisted.
I was trying for a little while to get a test case that ensures the assert is
correctly fired in case of invariant load hoisting being disabled, but I could
not find a good way to do so, as llvm-lit immediately aborts if a command
yields a non-zero return value. As we do not generally test our asserts,
not having a test case here seems OK.
This resolves http://llvm.org/PR31494
Suggested-by: Michael Kruse <llvm@meinersbur.de>
Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D28798
llvm-svn: 292213
Move the function getFirstNonBoxedLoopFor which is used in ScopBuilder
and in ScopInfo to Support/ScopHelpers to make it reusable in other
locations. No functionality change.
Patch by Sameer Abu Asal.
Differential Revision: https://reviews.llvm.org/D28754
llvm-svn: 292168
Before this change, this code has been mixed with a check for non-affine
loops (and when originally introduce was also duplicated). By creating
a separate loop and explicitly documenting this property, the current
behavior becomes a lot more clear.
llvm-svn: 292140
The loop body in buildAliasGroups is still too large to easily scan it. Hence,
we split the loop body out into a separate function to improve readability.
llvm-svn: 292138
Instead of modifying the original alias group and repurposing it as read-write
access group when splitting accesses in read-only and read-write accesses, we
just keep all three groups: the original alias group, the set of read-only
accesses and the set of read-write accesses. This allows us to remove some
complicated iterator handling and also allows for more code-reuse in
calculateMinMaxAccess.
llvm-svn: 292137
It seems over time we added an additional map that maps from the base address
of a read-only access to the actual access. However this map is never used.
Drop the creation and use of this map to simplify our alias check generation
code.
llvm-svn: 292126
The alias group will anyhow be cleared at the end of this function and is not
used afterwards. We avoid an explicit clear() call at multiple places to
improve readability of this code.
llvm-svn: 292125
Hoisting small vectors out of a loop seems to be a pure performance
optimization, which is unlikely to have great impact in practice. As this
hoisting just increases code-complexity, we fold the SmallVectors back into
the loop.
In subsequent commits, we will further simplify and structure this code, but
we committed this change separately to provide an explanation to make clear
that we purposefully reverted this optimization.
llvm-svn: 292122
The function buildAliasGroups got very large. We extract out the splitting
of alias groups to reduce its size and to better document the current behavior.
llvm-svn: 292121
The function buildAliasGroups got very large. We extract out the actual
construction of alias groups to reduce its size and to better document the
current behavior.
llvm-svn: 292120
There is no point in regularly committing a binary file to the repository, as
this just unnecessarily increases the repository size. Interested people can
find the isl manual for example at isl.gforge.inria.fr/manual.pdf.
llvm-svn: 292105
To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960, we make MemoryKind a typed enum.
This change also allows us to drop the 'MK_' prefix and to instead use the more
descriptive full name of the enum as prefix. To reduce the amount of typing
needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a
global scope, which means the ScopArrayInfo:: prefix is not needed. This move
also makes historically sense. In the beginning of Polly we had different
MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later
canonicalized to one. During this canonicalization we just choose the enum in
ScopArrayInfo, but did not consider to move this shared enum to global scope.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28090
llvm-svn: 292030
This update improves isl's ability to coalesce different convex sets/maps,
especially when the contain existentially quantified variables.
llvm-svn: 290538
If the parameters of the target cache (i.e., cache level sizes, cache level
associativities) are not specified or have wrong values, we use ones for
parameters of the macro-kernel and do not perform data-layout optimizations of
the matrix multiplication. In this patch we specify the default values of the
cache parameters to be able to apply the pattern matching optimizations even in
this case. Since there is no typical values of this parameters, we use the
parameters of Intel Core i7-3820 SandyBridge that also help to attain the
high-performance on IBM POWER System S822 and IBM Power 730 Express server.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28090
llvm-svn: 290518
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.
This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.
In case of Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8
it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D28019
llvm-svn: 290256
Aligning data to cache lines boundaries helps to avoid overheads related to
an access to it ([1]). This patch aligns newly created arrays and adds an
option to specify the first level cache line size. By default we use 64 bytes,
which is a typical cache-line size ([2]).
In case of Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8
it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak).
Refs.:
[1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
[2] - http://igoro.com/archive/gallery-of-processor-cache-effects/
Differential Revision: https://reviews.llvm.org/D28020
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 290253
multiplication
Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,
clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7
it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D27878
llvm-svn: 290251
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D25655
llvm-svn: 289815
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.
In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D25653
llvm-svn: 289806
The AssumptionCache was removed in r289756 after being replaced by the an
addtional operand list of affected values in r289755. The absence of that cache
means that we have now have to manually search for llvm.assume intrinsics as
now done by other passes (LazyValueInfo, CodeMetrics) do not take into
account an llvm::Instruction's user lists (ScalarEvolution).
llvm-svn: 289791
clang-format has been updated in r289531 to keep labels and values on
the same line. This change updates Polly to the new formatting style.
llvm-svn: 289533
Add and implement foreachElt for isl_map, isl_set and isl_union_set. These are
used by an out-of-tree patch which is in process of being upstreamed.
llvm-svn: 288924
Add traits for isl_id and isl_multi_aff, required by out-of-tree patches
currently in progress of upstreaming.
isl_union_pw_aff_dump has been added to ISL during one of the last ISL
updates, such that we can also enable its dump() trait.
llvm-svn: 288915
This version includes an update for imath (isl-0.17.1-49-g2f1c129). It fixes
the compilation under windows, which does not know ssize_t.
In addition, isl-0.17.1-288-g0500299 changed the way isl_test finds the source
directory. It now generates a file isl_srcdir.c at configure-time, containing
the source path, to not require setting the environment variable "srcdir" at
test-time. The cmake build system had to be modified to also generate that file.
llvm-svn: 288811
Unsigned operations are often useful to support but the heuristics are
not yet tuned. This options allows to disable them if necessary.
llvm-svn: 288521
Relational comparisons should not involve multiple potentially
aliasing pointers. Similarly this should hold for switch conditions
and the two conditions involved in equality comparisons (separately!).
This is a heuristic based on the C semantics that does only allow such
operations when the base pointers do point into the same object.
Since this makes aliasing likely we will bail out early instead of
producing a probably failing runtime check.
llvm-svn: 288516
It did happen that after the inliner finished we end up with promotable
allocas in a function. We now run mem2reg to make sure everything is
promoted if possible.
llvm-svn: 288514
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.
struct com {
double Real;
double Img;
};
void foo(long n, struct com A[][n]) {
for (long i = 0; i < 100; i++)
for (long j = 0; j < 1000; j++)
A[i][j].Real += A[i][j].Img;
}
int main() {
struct com A[100][1000];
foo(1000, A);
llvm-svn: 288489
After having built memory accesses we perform some additional transformations
on them to increase the chances that our delinearization guesses the right
shape. Only after these transformations, we take the assumptions that the
array shape we predict is such that no out-of-bounds memory accesses arise.
Before this change, the construction of the memory access, the access folding
that improves the represenation for certain parametric subscripts, and taking
the assumption was all done right after a memory access was created. In this
change we split this now into three separate iterations over all memory
accesses. This means only after all memory accesses have been built, we start
to canonicalize accesses, and to take assumptions. This split prepares for
future canonicalizations that must consider all memory accesses for deriving
additional beneficial transformations.
llvm-svn: 288479
Feasibility is checked late on its own but early it is hidden behind
the "PollyProcessUnprofitable" guard. This change will make sure we opt
out early if the runtime context is infeasible anyway.
llvm-svn: 288329
In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed
without a newline at the end of each diagnostic. We add such a newline to
improve readability.
llvm-svn: 288323
Add an empty DeLICM pass, without any functional parts.
Extracting the boilerplate from the the functional part reduces the size of the
code to review (https://reviews.llvm.org/D24716)
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 288160
We now collect:
Number of total loops
Number of loops in scops
Number of scops
Number of scops with maximal loop depth 1
Number of scops with maximal loop depth 2
Number of scops with maximal loop depth 3
Number of scops with maximal loop depth 4
Number of scops with maximal loop depth 5
Number of scops with maximal loop depth 6 and larger
Number of loops in scops (profitable scops only)
Number of scops (profitable scops only)
Number of scops with maximal loop depth 1 (profitable scops only)
Number of scops with maximal loop depth 2 (profitable scops only)
Number of scops with maximal loop depth 3 (profitable scops only)
Number of scops with maximal loop depth 4 (profitable scops only)
Number of scops with maximal loop depth 5 (profitable scops only)
Number of scops with maximal loop depth 6 and larger (profitable scops only)
These statistics are certainly completely accurate as we might drop scops
when building up their polyhedral representation, but they should give a good
indication of the number of scops we detect.
llvm-svn: 287973
Our original statistics were added before we introduced a more fine-grained
diagnostic system, but the granularity of our statistics has never been
increased accordingly. This change introduces now one statistic counter per
diagnostic to enable us to collect fine-grained statistics about who certain
scops are not detected. In case coarser grained statistics are needed, the
user is expected to combine counters manually.
llvm-svn: 287968
Introduce the new flag -polly-codegen-generate-expressions which forces Polly
to code generate AST expressions instead of using our SCEV based access
expression generation even for cases where the original memory access relation
was not changed and the SCEV based access expression could be code generated
without any issue.
This is an experimental option for better testing the isl ast expression
generation. The default behavior of Polly remains unchanged. We also exclude
a couple of cases for which the AST expression is not yet working.
llvm-svn: 287694
Do not assume a load to be hoistable/invariant if the pointer is used by
another instruction in the SCoP that might write to memory and that is
always executed.
llvm-svn: 287272
The declaration as an "error block" is currently aggressive and not very
smart. This patch allows to disable error blocks completely. This might
be useful to prevent SCoP expansion to a point where the assumed context
becomes infeasible, thus the SCoP has to be discarded.
llvm-svn: 287271
Since we do not necessarily treat memory intrinsics as non-affine
anymore, we have to check for them explicitly before we try to hoist an
access.
llvm-svn: 287270
The new command line flag "polly-codegen-emit-rtc-print" can be used to
place a "printf" in the generated code that will print the RTC value and
the overflow state.
llvm-svn: 287265
In r286430 "SCEVValidator: add new parameters resulting from constant
extraction" we added functionality to scan for parameters after constant
extraction has taken place to ensure newly created parameters are correctly
registered. This addition made the already existing registration of parameters
redundant. Hence, we remove the corresponding call in this commit.
An alternative solution would have been to also perform constant extraction when
validating SCEV expressions and to then scan for parameters when validating
a SCEV expression. However, as SCEV validation is used during SCoP detection
where we want to be especially fast, adding additional functionality on this
hot path should be avoided if good alternatives exist. In this case, we can
choose to continue to only transform SCEV expression when actually modeling
them. As all transformations we perform are expected to not change the validity
of the SCEV expressions, this solution seems preferable.
Suggested-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286780
Commit r286294 introduced support for inaccessiblememonly and
inaccessiblemem_or_argmemonly attributes to BasicAA, which we need to
support to avoid undefined behavior. This change just refuses all calls
which are annotated with these attributes, which is conservatively correct.
In the future we may consider to model and support such function calls
in Polly.
llvm-svn: 286771
The validity of a branch condition must be verified at the location of the
branch (the branch instruction), not the location of the icmp that is
used in the branch instruction. When verifying at the wrong location, we
may accept an icmp that is defined within a loop which itself dominates, but
does not contain the branch instruction. Such loops cannot be modeled as
we only introduce domain dimensions for surrounding loops. To address this
problem we change the scop detection to evaluate and verify SCEV expressions at
the right location.
This issue has been around since at least r179148 "scop detection: properly
instantiate SCEVs to the place where they are used", where we explicitly
set the scope to the wrong location. Before this commit the scope
was not explicitly set, which probably also resulted in the scope around the
ICmp to be choosen.
This resolves http://llvm.org/PR30989
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286769
Assumptions can either be added for a given basic block, in which case the set
describing the assumptions is expected to match the dimensions of its domain.
In case no basic block is provided a parameter-only set is expected to describe
the assumption.
The piecewise expressions that are generated by the SCEVAffinator sometimes
have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }),
which looks similar to a parameter-only domain, but is still a set domain.
This change adds an assert that checks that we always pass parameter domains to
addAssumptions if BB is empty to make mismatches here fail early.
We also change visitTruncExpr to always convert to parameter sets, if BB is
null. This change resolves http://llvm.org/PR30941
Another alternative to this change would have been to inspect all code to make
sure we directly generate in the SCEV affinator parameter sets in case of empty
domains. However, this would likely complicate the code which combines parameter
and non-parameter domains when constructing a statement domain. We might still
consider doing this at some point, but as this likely requires several non-local
changes this should probably be done as a separate refactoring.
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286444
Providing the context to the ast generator allows for additional simplifcations
and -- more importantly -- allows to generate loops with only partially bounded
domains, assuming the domains are bounded for all parameter configurations
that are valid as defined by the context.
This change fixes the crash reported in http://llvm.org/PR30956
The original reason why we did not include the context when generating an
AST was that CLooG and later isl used to sometimes transfer some of the
constraints that bound the size of parameters from the context into the
generated AST. This resulted in operations with very large constants, which
sometimes introduced problematic integer overflows. The latest versions of
the isl AST generator are careful to not introduce such constants.
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286442
When extracting constant expressions out of SCEVs, new parameters may be
introduced, which have not been registered before. This change scans
SCEV expressions after constant extraction again to make sure newly
introduced parameters are registered.
We may for example extract the constant '8' from the expression '((8 * ((%a *
%b) + %c)) + (-8 * %a))' and obtain the expression '(((-1 + %b) * %a) + %c)'.
The new expression has a new parameter '(-1 + %b) * %a)', which was not
registered before, but must be registered to not crash.
This closes http://llvm.org/PR30953
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286430
In r248701 "Allow switch instructions in SCoPs" support for switch statements
has been introduced, but support for switch statements in loop latches was
incomplete. This change completely disables switch statements in loop latches.
The original commit changed addLoopBoundsToHeaderDomain to support non-branch
terminator instructions, but this change was incorrect: it added a check for
BI != null to the if-branch of a condition, but BI was used in the else branch
es well. As a result, when a non-branch terminator instruction is encounted a
nullptr dereference is triggered. Due to missing test coverage, this bug was
overlooked.
r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow
switch statements for non-affine loops, if they appear in either a loop latch
or a loop exit. We adapt this code to now prohibit switch statements in
loop latches even if the control condition is affine.
We could possibly add support for switch statements in loop latches, but such
support should be evaluated and tested separately.
This fixes llvm.org/PR30952
Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286426
Add asserts that verify that the memory accesses of a new copy statement
are defined for all domain instances the copy statement is defined for.
llvm-svn: 286047
This makes polly generate a CFG which is closer to what we want
in LLVM IR, with a loop preheader for the original loop. This is
just a cleanup, but it exposes some fragile assumptions.
I'm not completely happy with the changes related to expandCodeFor;
RTCBB->getTerminator() is basically a random insertion point which
happens to work due to the way we generate runtime checks. I'm not
sure what the right answer looks like, though.
Differential Revision: https://reviews.llvm.org/D26053
llvm-svn: 285864
We don't actually check whether a MemoryAccess is affine in very many
places, but one important one is in checks for aliasing.
Differential Revision: https://reviews.llvm.org/D25706
llvm-svn: 285746
When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw
source and target pointers which preserve bitcasts.
MemAccInst::getPointerOperand() also returns the raw target pointers, but
Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches
between AliasSetTracker and ScopInfo on which pointer to use.
Fixed by also using raw pointers in Scop::buildAliasGroups().
llvm-svn: 285071
Integer math in LLVM IR is modular. Integer math in isl is
arbitrary-precision. Modeling LLVM IR math correctly in isl requires
either adding assumptions that math doesn't actually overflow, or
explicitly wrapping the math. However, expressions with the "nsw" flag
are special; we can pretend they're arbitrary-precision because it's
undefined behavior if the result wraps. SCEV expressions based on IR
instructions with an nsw flag also carry an nsw flag (roughly; actually,
the real rule is a bit more complicated, but the details don't matter
here).
Before this patch, SCEV flags were also overloaded with an additional
function: the ZExt code was mutating SCEV expressions as a hack to
indicate to checkForWrapping that we don't need to add assumptions to
the operand of a ZExt; it'll add explicit wrapping itself. This kind of
works... the problem is that if anything else ever touches that SCEV
expression, it'll get confused by the incorrect flags.
Instead, with this patch, we make the decision about whether to
explicitly wrap the math a bit earlier, basing the decision purely on
the SCEV expression itself, and not its users.
Differential Revision: https://reviews.llvm.org/D25287
llvm-svn: 284848
Summary: Otherwise the lack of an iteration order results in non-determinism in codegen.
Reviewers: _jdoerfert, zinob, grosser
Tags: #polly
Differential Revision: https://reviews.llvm.org/D25863
llvm-svn: 284845
Apply the __attribute__((unused)) before the function to unambiguously apply to
the function declaration.
Add more casts-to-void to mark return values unused as intended.
Contributed-by: Andy Gibbs <andyg1001@hotmail.co.uk>
llvm-svn: 284718
Summary: Iterating over SeenBlocks which is a SmallPtrSet results in non-determinism in codegen
Reviewers: jdoerfert, zinob, grosser
Tags: #polly
Differential Revision: https://reviews.llvm.org/D25778
llvm-svn: 284622
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.
Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.
Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.
llvm-svn: 284023
ISL tries to simplify the polyhedral operations before printing its objects.
This increases the operations counter and therefore can contribute to hitting
the operations limit. Therefore the result could be different when -debug output
is enabled, making debugging harder.
llvm-svn: 283745
IslMaxOperationsGuard defines a scope where ISL may abort operations because if
it takes too many operations. Replace the call to the raw ISL interface by a
use of the guard.
IslMaxOperationsGuard provides a uniform way to define a maximal computation
time for a code region in C++ using RAII.
llvm-svn: 283744
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
Handle MSVC, ISL and PPCG in one place. The only functional change is that
warnings are also disabled for MSVC compiling PPCG (Which currently fails
anyway).
llvm-svn: 283547
Folders in Visual Studio solutions help organize the build artifacts from all
LLVM projects. There is a folder to keep Polly-built files in.
llvm-svn: 283546
Running isl tests is important to gain confidence that the isl build we created
works as expected. Besides the actual isl tests, there are also isl AST
generation tests shipped with isl. This change only adds support for the isl
unit tests. AST generation test support is left for a later commit.
There is a choice to run tests directly through the build system or in the
context of lit. We choose to run tests as part of lit to as this allows us to
easily set environment variables, print output only on error and generally run
the tests directly from the lit command.
Reviewers: brad.king, Meinersbur
Subscribers: modocache, brad.king, pollydev, beanz, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D25155
llvm-svn: 283245
With this option one can disable the heuristic that assumes that statements with
a scalar write access cannot be profitably optimized. Such a statement instances
necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be
changed to array accesses, which can avoid these WAW-dependence.
llvm-svn: 283233
ScopArrayInfo used to determine base pointer origins by looking up whether the
base pointer is a load. The "base pointer" for scalar accesses is the
llvm::Value being accessed. This is only a symbolic base pointer, it
represents the alloca variable (.s2a or .phiops) generated for it at code
generation.
This patch disables determining base pointer origin for scalars.
A test case where this caused a crash will be added in the next commit. In that
test SAI tried to get the origin base pointer that was only declared later,
therefore not existing. This is probably only possible for scalars used in
PHINode incoming blocks.
llvm-svn: 283232
Currently Polly cannot generate code for index expressions if the base pointer
is computed within the scop. The base pointer must be generated as well, but
there is no code that triggers that.
Add an assertion to detect when this would occur and miscompile. The IR verifier
should catch it as well.
llvm-svn: 282893
gcc 5.4 insists on template specialization to be in a namespace polly { ... }
block, instead of being prefixed with 'polly::'. Error message:
root/src/llvm/tools/polly/lib/Support/GICHelper.cpp:203:54: error: specialization of ‘template<class T> void polly::IslPtr<T>::dump() const’ in different namespace [-fpermissive]
template <> void polly::IslPtr<isl_##TYPE>::dump() const { \
^
msvc14 and clang 3.8 did not complain.
llvm-svn: 282874
The dump() methods can be called from a debugger instead of e.g.
isl_*_dump(Var.Obj)
where Var is a variable of type IslPtr/NonowningIslPtr. To ensure that the
existence of the function pointers do not depdend on whether the methods are
used somwhere, they are declared with external linkage.
llvm-svn: 282870
generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array)
memory accesses, therefore the method names were misleading. The names also
were similar to generateScalarLoads() and generateScalarStores() (plural forms)
which indeed handle scalar accesses. Presumbly, they were originally named to
contrast VectorBlockGenerator::generateLoad().
Rename the two methods to generateArrayLoad(),
respectively generateArrayStore().
llvm-svn: 282861
The code generator always adds unconditional LoadInst and StoreInst, hence the
MemoryAccess must be defined over all statement instances.
llvm-svn: 282853
Summary:
Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks
to reject endless loops which are now removed and replaced by a single check
in `isValidLoop()`.
For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is
renamed to `ReportLoopHasNoExit`. The test case
`ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well.
The schedule generation in `buildSchedule()` is based on the following
assumption:
Given some block B that is contained in a loop L and a SESE region R,
we assume that L is contained in R or the other way around.
However, this assumption is broken in the presence of endless loops that are
nested inside other loops. Therefore, in order to prevent erroneous behavior
in `buildSchedule()`, r265280 introduced a corresponding check in
`canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible
to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding
another check to reject endless loops in `allowOverApproximatedRegion()` in
r273905. Hence there existed two separate locations that handled this case.
Thank you Johannes Doerfert for helping to provide the above background
information.
Reviewers: Meinersbur, grosser
Subscribers: _jdoerfert, pollydev
Differential Revision: https://reviews.llvm.org/D24560
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 281987
In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run it on the
CPU.
llvm-svn: 281849
Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
of dynamic compute, which means GPU acceleration is not beneficial. We
compute at run-time an approximation of how many dynamic instructions will be
executed and fall back to CPU code in case this number is not sufficiently
large. To keep the run-time checking code simple, we over-approximate the
number of instructions executed in each statement by computing the volume of
the rectangular hull of its iteration space.
llvm-svn: 281848
We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the GPU. For these kernels it is important to not keep the scalar values
in thread-local registers, but to store them back to the corresponding device
memory objects that backs them up.
We currently only store scalars back at the end of a kernel. This is only
correct if precisely one thread is executed. In case more than one thread may
be run, we currently invalidate the scop. To support such cases correctly,
we would need to always load and store back from a corresponding global
memory slot instead of a thread-local alloca slot.
llvm-svn: 281838
Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.
We also adjust the size of the array according to the offset at which the array
is actually accessed.
An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.
llvm-svn: 281611
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D23260
llvm-svn: 281441
This line makes BUILD_SHARED_LIBS=ON work for Polly-ACC. Without it, ld
complains about missing isl symbols when constructing the shared library.
llvm-svn: 281396
The alias to the array element is read-only and a primitive type (pointer),
therefore use the value directly instead of a reference to it.
llvm-svn: 281311
The flag -fvisibility=hidden flag was used for the integrated Integer
Set Library (and PPCG) to keep their definitions local to Polly. The
motivation was the be loaded into a DragonEgg-powered GCC, where GCC
might itself use ISL for its Graphite extension. The symbols of Polly's
ISL and GCC's ISL would clash.
The DragonEgg project is not actively developed anymore, but Polly's
unittests need to call ISL functions to set up a testing environment.
Unfortunately, the -fvisibility=hidden flag means that the ISL symbols
are not available to the gtest executable as it resides outside of
libPolly when linked dynamically. Currently, CMake links a second copy
of ISL into the unittests which leads to subtle bugs. What got observed
is that two isl_ids for isl_id_none exist, one for each library
instance. Because isl_id's are compared by address, isl_id_none could
happen to be different from isl_id_none, depending on which library
instance set the address and does the comparison.
Also remove the FORCE_STATIC flag which was introduced to keep the ISL
symbols visible inside the same libPolly shared object, even when build
with BUILD_SHARED_LIBS.
Differential Revision: https://reviews.llvm.org/D24460
llvm-svn: 281242
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D23991
llvm-svn: 281234
Instead of aborting, we now bail out gracefully in case the kernel IR we
generate is invalid. This can currently happen in case the SCoP stores
pointer values, which we model as arrays, as data values into other arrays. In
this case, the original pointer value is not available on the device and can
consequently not be stored. As detecting this ahead of time is not so easy, we
detect these situations after the invalid IR has been generated and bail out.
llvm-svn: 281193
If these arrays have never been accessed we failed to derive an upper bound
of the accesses and consequently a size for the outermost dimension. We
now explicitly check for empty access sets and then just use zero as size
for the outermost dimension.
llvm-svn: 281165
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.
To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:
{ Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }
And here the same lifetime for a semantically identical one-dimensional
schedule:
{ Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }
Differential Revision: https://reviews.llvm.org/D24310
llvm-svn: 280948
... to preserve reference counting logic.
In practice the missing assignment would not have caused any issues. We still
fix it as the code is wrong and it also causes noise in the clang static
analysis runs.
llvm-svn: 280946
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_set_free always returns NULL and consequently later uses of the isl object
we just freed will never be reached. Without this knowledge, the analyser has
to issue a warning.
We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.
llvm-svn: 280940