This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and
makes sure that we fall back to the original code. It seems when invariant load
hoisting was introduced to the GPGPU backend we missed to reset the RTC flag,
such that kernels where invariant load hoisting failed executed the 'optimized'
SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this
results in hard to debug issues that are a lot of fun to debug.
llvm-svn: 314624
Such RTCs may introduce integer wrapping intrinsics with more than 64 bit,
which are translated to library calls on AOSP that are not part of the
runtime and will consequently cause linker errors.
Thanks to Eli Friedman for reporting this issue and reducing the test case.
llvm-svn: 314065
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.
When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.
Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.
This fixes llvm.org/PR34441
llvm-svn: 313842
Update CodegenCleanup using the function-level passes added by
populatePassManager that run between EP_EarlyAsPossible and
EP_VectorizerStart in -O3.
The changes in particular are:
- Added pass create arguments, e.g. ExpensiveCombines for InstCombine.
- Remove reroll pass. The option -reroll-loops is disabled by default.
- Add passes run with UnitAtATime, which is the default.
- Add instances of LibCallsShrinkWrap, TailCallElimination, SCCP
(sparse conditional constant propagation), Float2Int
that did not run before.
- Add instances of GVN as in the default pipeline.
Notes:
- GVNHoist, GVNSink, NewGVN are still disabled in the -O3 pipeline.
- The optimization level and other optimization parameters are not
accessible outside of PassManagerBuilder, hence we cannot add passes
depending on these.
Differential Revision: https://reviews.llvm.org/D37571
llvm-svn: 312875
The type of NewValue might change due to ScalarEvolution
looking though bitcasts. The synthesized NewValue therefore
becomes the type before the bitcast.
llvm-svn: 312718
In certain situations, the context in the isl_ast_build could result for the
min/max locations of our alias sets to become empty, which would cause an
internal error in isl, which is then unable to derive a value for these
expressions. Check these conditions before code generating expressions and
instead assume that alias check succeeded. This is valid, as the corresponding
memory accesses will not be executed under any valid context.
This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting.
llvm-svn: 312455
In Polly, we specifically add a paramter to represent the outermost dimension
size of fortran arrays. We do this because this information is statically
available from the fortran metadata generated by dragonegg.
However, we were only materializing these parameters (meaning, creating an
llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
we should materialize parameters from *scop array info*.
It is wrong because if there is a case where we detect 2 fortran arrays,
but only one of them is accessed, we may not materialize the other array's
dimensions at all.
This is incorrect. We fix this by looping over all
`polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.
Differential Revision: https://reviews.llvm.org/D37379
llvm-svn: 312350
Currently, GVN can be necessary to eliminate redundant instructions in case
of, for instance, GEMM and float type. This patch makes GVN be run during
the cleanup.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D37340
llvm-svn: 312307
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.
So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.
This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.
Differential Revision: https://reviews.llvm.org/D37056
llvm-svn: 312239
The adds code generation support for the previous commit.
This patch has been re-applied, after the memory issue in the previous patch
has been fixed.
llvm-svn: 312211
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.
This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)
JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.
Differential Revision: https://reviews.llvm.org/D37010
llvm-svn: 311888
Whether a partial write is tautological/unsatisfiable not only
depends on the access domain, but also on the domain covered
by its node in the AST.
In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0).
for (int c0 = 0; c0 < tmp5; c0 += 1) {
Stmt_for_body344(c0);
if (tmp5 >= c0 + 2)
Stmt_cond_false(c0);
Stmt_cond_end(c0);
}
if (tmp5 <= 0) {
Stmt_for_body344(0);
Stmt_cond_false(0);
Stmt_cond_end(0);
}
Isl cannot derive a subscript for an array element that is never accessed.
This caused an error in that no subscript expression has been generated
in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one
to exist because there is an execution of that write, just not in that
ast node.
Fixed by instead of determining whether the access domain is empty,
inspect whether isl generated a constant "false" ast expression in
the current ast node.
This should fix a compiler crash of the aosp buildbot.
llvm-svn: 311663
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.
Differential Revision: https://reviews.llvm.org/D37058
llvm-svn: 311648
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
for scalar false dependencies
- Number of parallel loops
These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.
Differential Revision: https://reviews.llvm.org/D37049
llvm-svn: 311553
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.
Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.
llvm-svn: 311548
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.
This improves 2mm performance from 1.5 seconds to 0.17 seconds.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: Meinersbur
Subscribers: pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D37028
llvm-svn: 311498
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D36928
llvm-svn: 311473
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.
Differential Revision: https://reviews.llvm.org/D36934
llvm-svn: 311328
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.
This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.
llvm-svn: 311268
Instead of using Twines and temporary expressions, we do string manipulation
through a std::string. This resolves a memory corruption issue, which likely
was caused by twines loosing their underlying string too soon.
llvm-svn: 311264
- We should iterate over `I`, which is `Cur` expanded out to an
instruction, and not `Cur` itself.
- This is a bugfix.
Differential Revision: https://reviews.llvm.org/D36923
llvm-svn: 311261
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.
We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36929
llvm-svn: 311259
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.
Differential revision: D36925
llvm-svn: 311248
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.
llvm-svn: 311239
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:
```
call void \
bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```
- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.
Differential Revision: https://reviews.llvm.org/D36825
llvm-svn: 311121
- If we have global arrays, we would like to rewrite them to global
pointers which are allocated using `cudaMallocManaged`.
- If we have allocas in a function, we would like to rewrite them to
heap-allocations with `cudaMallocManaged` and `cudaFree`.
- With these rewrite mechanisms, we can offload _any_ function to the
GPU with no code rewrite whatsover.
Differential Revision: https://reviews.llvm.org/D36516
llvm-svn: 311080
Summary:
During code generation for a Scop we modify the IR of a function.
While this shouldn't affect a Scop in the formal sense, the implementation
caches various information about the IR such as SCEV expressions for bounds or
parameters. This cached information needs to be updated or invalidated. To this
end, SPMUpdater allows passes to report when they've invalidated a Scop to the
PassManager, which will then flush and recompute all Scops. This in turn
invalidates all iterators, so references to Scops shouldn't be held.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: grosser
Subscribers: llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D36524
llvm-svn: 310551
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.
Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.
A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.
Differential Revision: https://reviews.llvm.org/D36513
llvm-svn: 310471
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.
This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.
llvm-svn: 310466
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.
To fix this, ask data layout what the size in bytes should be.
Differential Revision: https://reviews.llvm.org/D36459
llvm-svn: 310448
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.
In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.
To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D35761
llvm-svn: 310380
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.
Differential Revision: https://reviews.llvm.org/D36457
llvm-svn: 310350
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D36372
llvm-svn: 310196
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.
The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.
Differential Revision: https://reviews.llvm.org/D36290
llvm-svn: 310193
This logic is duplicated, so we refactor it into a separate function.
This will be used in a later patch to teach PPCGCodeGen code generation
for loops that are outside the scop.
Differential Revision: https://reviews.llvm.org/D36310
llvm-svn: 310192
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.
To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.
Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36243
llvm-svn: 309939
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.
The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.
We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.
llvm-svn: 309934
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**
When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.
The only debug metadata of the instruction, the DebugLoc, is unset by this patch.
This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.
Reviewers: grosser, bollu
Reviewed By: grosser
Subscribers: pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36161
llvm-svn: 309822
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.
Differential Revision: https://reviews.llvm.org/D36001
llvm-svn: 309681
We populate `IslNodeBuilder::ValueMap` which contains replacements for
`llvm::Value`s. There was no simple method to pick up a replacement if
it exists, otherwise fall back to the original.
Create a method `IslNodeBuilder::getLatestValue` which provides this
functionality.
This will be used in a later patch to fix bugs in `PPCGCodeGeneration`
where the latest value is not being used.
Differential Revision: https://reviews.llvm.org/D36000
llvm-svn: 309674
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.
Reviewers: bollu, singam-sanjay
Reviewed By: bollu
Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35703
llvm-svn: 309560
This reverts commit r309490 as it triggers on our AOSP buildbut error messages
of the form:
inlinable function call in a function with debug info must have a !dbg location
llvm-svn: 309556
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**
When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.
The only debug metadata of the instruction, the DebugLoc, is unset by this patch.
Reviewers: grosser, bollu, Meinersbur
Reviewed By: grosser, bollu
Subscribers: pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D35630
llvm-svn: 309490
In future, there will be no more a 1:1 correspondence between statements
and basic blocks, the name `contains` does not correctly capture their
relationship. A BB may infact comprise of multiple statements; hence we
describe a statement 'representing' a basic block.
Differential Revision: https://reviews.llvm.org/D35838
llvm-svn: 308982
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.
Invariant load hoisted scalars are sent to the kernel directly as parameters.
Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.
Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.
Differential Revision: https://reviews.llvm.org/D35795
llvm-svn: 308971