"using namespace" pollutes the namespace of every file that includes
such a header and universally considered a bad thing. Even the variant
namespace polly {
using namespace llvm;
}
(previously used by LoopGenerators.h) imports more symbols than the file
is in control of. The header may include a fixed set of files from LLVM,
but the header itself may by be included together with other headers
from LLVM. For instance, LLVM's MemorySSA.h and Polly's ScopInfo.h both
declare a class 'MemoryAccess' which may conflict.
Instead of prefixing everything in Polly's header files, this patch adds
'using' statements to import only the symbols that are actually
referenced in Polly. This approach is also used by MLIR to import
commonly used symbols into the mlir namespace.
This patch also puts the symbols declared in IslNodeBuilder.h into the
Polly namespace to also be able to use the imported symbols.
In particular, print the ast with -debug-only=polly-ast, print a
per-scop header with print<polly-ast> and force-add the analysis with
-polly-code-generation=ast.
In addition to that regression tests should not test the intire pass
pipeline (unless they are testing the pipeline itself), the Polly-ACC
currently does not support the new pass manager. If enabled by default,
such tests will therefore fail.
Use the -polly-gpu-runtime and -polly-gpu-arch options also as default
values for the PPCGCodeGeneration pass. This requires to move the option
to be moved from the pipeline-building Register passes to the
PPCGCodeGeneration implementation.
Fixes the spir-typesize.ll buildbot fail.
ScalarEvolution::getSCEV cannot be used during codegen. ScalarEvolution
assumes a stable IR and control flow which is under construction during
Polly's CodeGen. In particular, it uses DominatorTree for compute the
backedge taken count. However the DominatorTree is not updated during
codegen.
In this case, SCEV was used to determine the base pointer of an array
access. Replace it by our own function. Polly generates only GEP and
BitCasts for array acceses, i.e. it is sufficient to handle these to to
find the base pointer.
Fixes llvm.org/PR48422
There's a small number of users of this function, they are all updated.
This updates the C API adding a new method LLVMGetTypeByName2 that takes a context and a name.
Differential Revision: https://reviews.llvm.org/D78793
This is a long-delayed follow-up to
5e5b85098d.
`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.
Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.
Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.
Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.
Differential Revision: https://reviews.llvm.org/D90079
getVectorPtrTy is private to VectorBlockGenerator, and all uses query
the address space from the passed-in pointer prior to calling it.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D89745
Polly incorrectly dropped the address space specified for a load instruction when it vectorized the code.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D88907
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
Simply dropping the createPollyIRBuilder() function here, because
it doesn't do much. Also directly initialize Expander in
ScopExpander instead of going through the copy-constructor.
Static chunked OpenMP scheduling has not been treated correctly.
This patch fixes the problem that threads would not process their
(work-)chunks as intended.
Differential Revision: https://reviews.llvm.org/D61081
The primary motivation is to fix an assertion failure in
isl_basic_map_alloc_equality:
isl_assert(ctx, room_for_con(bmap, 1), return -1);
Although the assertion does not occur anymore, I could not identify
which of ISL's commits fixed it.
Compared to the previous ISL version, Polly requires some changes for this update
* Since ISL commit
20d3574 "perform parameter alignment by modifying both arguments to function"
isl_*_gist_* and similar functions do not always align the paramter
list anymore. This caused the parameter lists in JScop files to
become out-of-sync. Since many regression tests use JScop files with
a fixed parameter list and order, we explicitly call align_params to
ensure a predictable parameter list.
* ISL changed some return types to isl_size, a typedef of (signed) int.
This caused some issues where the return type was unsigned int before:
- No overload for std::max(unsigned,isl_size)
- It cause additional 'mixed signed/unsigned comparison' warnings.
Since they do not break compilation, and sizes larger than 2^31
were never supported, I am going to fix it separately.
* With the change to isl_size, commit
57d547 "isl_*_list_size: return isl_size"
also changed the return value in case of an error from 0 to -1. This
caused undefined looping over isl_iterator since the 'end iterator'
got index -1, never reached from the 'begin iterator' with index 0.
* Some internal changes in ISL caused the number of operations to
increase when determining access ranges to determine aliasing
overlaps. In one test, this caused exceeding the default limit of
800000. The operations-limit was disabled for this test.
Commit 777180a "[ADT] Make StringRef's std::string conversion operator explicit"
caused Polly's GPU code generator to not compile anymore. The rest of
Polly has already been fixed in commit
0257a9 "Fix polly build after StringRef change."
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
Adapt for 05da2fe521 "Sink all InitializePasses.h includes" which
forgot the GPGPU files (presumably because POLLY_ENABLE_GPGPU_CODEGEN
is OFF by default).
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Root cause is VectorBlockGenerator::copyStmt iterates all instructions
in basic block, however some load instructions may be not unnecessary
thus removed by simplification. As a result, these load instructions
don't have a corresponding array.
Looking at BlockGenerator::copyBB, it only iterates instructions list
of ScopStmt. Given it must be a block type scop in case of
vectorization, I think we should do the same in
VectorBlockGenerator::copyStmt.
Patch by bin.narwal <bin.narwal@gmail.com>
Differential Revision: https://reviews.llvm.org/D70076
Since the removal of extensions nodes from schedule trees in r362257 it
is possible to emit parallel code for SCoPs containing
matrix-multiplications. However, the code looking for references used in
outlined statement was not prepared to handle CopyStmts introduced by
the matrix-matrix multiplication detection.
In this case, CopyStmts do not introduce references in addition to the
ones captured by MemoryAccesses, i.e. we change the assertion to accept
CopyStmts and add a regression test for this case.
This fixes llvm.org/PR43164
llvm-svn: 372188
Extension nodes make schedule trees are less flexible: Many operations,
such as rescheduling, do not work on such schedule trees with extension.
As such, some functionality such as determining parallel loops in isl's
AST are disabled.
Currently, only the pattern-matching generalized matrix-matrix
multiplication optimization adds extension nodes (to add copy-in
statements).
This patch removes all extension nodes as the last step of the schedule
optimization by hoisting the extension node's added domain up to the
root domain node. All following passes can assume that schedule trees
work without restrictions, including the parallelism test. Mark the
outermost loop of the optimized matrix-matrix multiplication as parallel
such that -polly-parallel is able to parallelize that loop.
Differential Revision: https://reviews.llvm.org/D58202
llvm-svn: 362257
At the end of a region statement, the PHINode must be generated
while the current IRBuilder's block is the region's exit node. For
obvious reasons: The PHINode references the region's exiting block.
A partial write would insert new control flow, i.e. insert new basic
blocks between the exiting blocks and the current block.
We fix this by generating the PHI nodes (region exit values) before
generating any MemoryAccess's stores.
This should fix the AOSP buildbot.
Reported-by: Eli Friedman <efriedma@quicinc.com>
llvm-svn: 361204
This removes unused includes (and forward declarations) as
suggested by include-what-you-use. If a transitive include of a removed
include is required to compile a file, I added the required header (or
forward declaration if suggested by include-what-you-use).
This should reduce compilation time and reduce the number of iterative
recompilations when a header was changed.
llvm-svn: 357209
The ParallelLoopGenerator class is changed such that GNU OpenMP specific
code was removed, allowing to use it as super class in a
template-pattern. Therefore, the code has been reorganized and one may
not use the ParallelLoopGenerator directly anymore, instead specific
implementations have to be provided. These implementations contain the
library-specific code. As such, the "GOMP" (code completely taken from
the existing backend) and "KMP" variant were created.
For "check-polly" all tests that involved "GOMP": equivalents were added
that test the new functionalities, like static scheduling and different
chunk sizes. "docs/UsingPollyWithClang.rst" shows how the alternative
backend may be used.
Patch by Michael Halkenhäuser <michaelhalk@web.de>
Differential Revision: https://reviews.llvm.org/D59100
llvm-svn: 356434
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This removes the primary remaining API producing `TerminatorInst` which
will reduce the rate at which code is introduced trying to use it and
generally make it much easier to remove the remaining APIs across the
codebase.
Also clean up some of the stragglers that the previous mechanical update
of variables missed.
Users of LLVM and out-of-tree code generally will need to update any
explicit variable types to handle this. Replacing `TerminatorInst` with
`Instruction` (or `auto`) almost always works. Most of these edits were
made in prior commits using the perl one-liner:
```
perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g'
```
This also my break some rare use cases where people overload for both
`Instruction` and `TerminatorInst`, but these should be easily fixed by
removing the `TerminatorInst` overload.
llvm-svn: 344504
IslAst could mark two nested outer loops as "OutermostParallel". It
caused that the code generator tried to OpenMP-parallelize both loops,
which it is not prepared loop.
It was because the recursive AST build algorithm managed a flag
"InParallelFor" to ensure that no nested loop is also marked as
"OutermostParallel". Unfortunatetly the same flag was used by nodes
marked as SIMD, and reset to false after the SIMD node. Since loops can
be marked as SIMD inside "OutermostParallel" loops, the recursive
algorithm again tried to mark loops as "OutermostParellel" although
still nested inside another "OutermostParallel" loop.
The fix exposed another bug: The function "astScheduleDimIsParallel" was
only called when a loop was potentially "OutermostParallel" or
"InnermostParallel", but as a side-effect also determines the minimum
dependence distance. Hence, changing when we need to know whether a loop
is "OutermostParallel" also changed which loop was annotated with
"#pragma minimal dependence distance".
Moreover, some complex condition linked with "InParallelFor" determined
whether a loop should be an "InnermostParallel" loop. It missed some
situations where it would not use mark as such although being inside an
SIMD mark node, and therefore not be annotated using "#pragma simd".
The changes in particular:
1. Split the "InParallelFor" flag into an "InParallelFor" and an
"InSIMD" flag.
2. Unconditionally call "astScheduleDimIsParallel" for its side-effects
and store the result in "InParallel" for later use.
3. Simplify the condition when a loop is "InnermostParallel".
Fixes llvm.org/PR33153 and llvm.org/PR38073.
llvm-svn: 343212
Summary:
Update all rdtscp callsites in PerfMonitor so that they conform with the signature changes introduced in r341698.
Reviewers: grosser, bollu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51928
llvm-svn: 341946
The latest version of the isl C++ bindings does not export the 'set'
method yet. Fall back to the C interface until this method can be
exported.
llvm-svn: 338512
ScalarEvolution::getSCEV dereferences its argument, s.t. passing nullptr
leads to undefined behaviour.
Check for nullptr before calling it instead of checking its argument
afterwards.
llvm-svn: 336350
Summary:
It appears that llvm uses unbuffered C++ streams. So, we should not
mix C and C++ stream operations, because that will give us mixed
up output.
Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev, grosser, singam-sanjay, philip.pfaffe
Reviewed By: philip.pfaffe
Subscribers: nemanjai, kbarton
Differential Revision: https://reviews.llvm.org/D40126
llvm-svn: 336288
Summary:
This patch changes the return types for ocl_get_* functions during SPIR code generation. Because these functions return size_t types, the return type needs to be changed to the actual size of size_t on the device.
Based on work by Michal Babej and Pekka Jääskeläinen
Patch by: Alain Denzler
Reviewers: grosser, philip.pfaffe, bollu
Reviewed By: grosser, philip.pfaffe
Subscribers: nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D48774
llvm-svn: 336080
Summary:
The function is currently awfully complicated. Drop the IILE and use
StringRef over std::string.
Reviewers: Meinersbur, grosser, bollu
Reviewed By: Meinersbur
Subscribers: nemanjai, kbarton, bollu, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D48070
llvm-svn: 334695
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
Differential Revision: https://reviews.llvm.org/D44978
llvm-svn: 332352
Add the options -polly-codegen-trace-stmts and
-polly-codegen-trace-scalars. When enabled, adds a call to the
beginning of every generated statement that prints the executed
statement instance. With -polly-codegen-trace-scalars, it also prints
the value of all scalars that are used in the statement, and PHIs
defined in the beginning of the statement.
Differential Revision: https://reviews.llvm.org/D45743
llvm-svn: 330864
In r330292 this assert was turned incorrectly into an unreachable, but
the correct behavior (thanks Michael) is to assert for anything that is
not 64 bit, but falltrough for 64 bit. I document this in the source
code.
llvm-svn: 330309
A check in assert-builds was meant to verify that a load provides a
value in all statement instances (i.e. its domain). The domain is
commonly gist'ed within the parameter context to contain fewer
constraints. However, statement instances outside the context are
no valid executions, hence the value provided can be undefined.
Refine the check for valid loads to only needed to be defined within
the SCoP context.
In addition, the JSONImporter had to be changed to allow importing
access relations that are broader than the current access relation,
but still defined over all statement instances.
This should fix the compiler crash in test-suite's oggenc of the
-polly-process-unprofitable buildbot.
llvm-svn: 329655
Summary:
When checking the parallelism of a scheduling dimension, we first check if excluding reduction dependences the loop is parallel or not.
If the loop is not parallel, then we need to return the minimal dependence distance of all data dependences, including the previously subtracted reduction dependences.
Reviewers: grosser, Meinersbur, efriedma, eli.friedman, jdoerfert, bollu
Reviewed By: Meinersbur
Subscribers: llvm-commits, pollydev
Tags: #polly
Differential Revision: https://reviews.llvm.org/D45236
llvm-svn: 329214
During codegen, Polly attempts to clear all loops from ScalarEvolution
and LoopInfo, and it does so one block at a time. This causes undefined
behaviour, since this way a loop header might be removed from a loop
before the entire loop is erased, causing ScalarEvolution to run into an
error.
Instead, just delete the entire loop atomically. This fixes currently
failing testcases.
llvm-svn: 326643
As part of this cleanup a couple of unnecessary isl::manage(obj.copy()) pattern
are eliminated as well.
We checked for all potential cleanups by scanning for:
"grep -R isl::manage\( lib/ | grep copy"
llvm-svn: 325558
Memory transfer instructions take two pointers. It is not defined to
which of those a noalias annotation applies. To ensure correctness,
do not add noalias annotations to memcpy/memmove instructions anymore.
The caused a miscompile with test-suite's MultiSource/Applications/obsequi.
Since r321138, the MemCpyOpt pass would remove memcpy/memmove calls if
known to copy uninitialized memory. In that case, it was initialized
by another memcpy, but the annotation for the target pointer said
it would not alias. The annotation was actually meant for the source
pointer, which was was an alloca and could not alias with the target
pointer.
llvm-svn: 321371
Isl does not allow generating isl_ast_expr from an isl_pw_aff that has an
empty domain (i.e. has no pieces). We already detected the case if the
isl_pw_aff comes with an empty domain.
isl_ast_build also considers the domain empty if it is disjoint with the
parameter context (e.g. parameters values that we exclude by runtime
versioning).
Intersect the access relation domain with the parameter context to
also detect such practically empty access domains. The effective
pointer used in the generated code is unimportand because it will never
be executed.
This fixes llvm.org/PR35362
llvm-svn: 318806
Summary:
Most changes are mechanical, but in one place I changed the program semantics
by fixing a likely bug:
In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the
error-case. Before, when the call to `addNonEmptyDomainConstraints()`
returned a null set, this (probably) accidentally worked because
isl_bool_error converts to true. I'm checking for nullptr now.
Reviewers: grosser, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39971
llvm-svn: 318632
When collecting base pointers that need to be made available in parallel
subfunctions, use the base pointer associated with the latest
ScopArrayInfo, instead of the original one.
llvm-svn: 316983
Summary:
When GPUNodeBuilder creates loops inside the kernel, it dispatches to
IslNodeBuilder. This however is surprisingly dangerous, since it accesses the
AST Node's user through the wrong type. This patch fixes this problem by
overriding createFor correctly.
This fixes PR35010.
Reviewers: grosser, bollu, Meinersbur
Reviewed By: Meinersbur
Subscribers: Meinersbur, nemanjai, pollydev, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D39364
llvm-svn: 316872
We make sure that the final reload of an invariant scalar memory access uses the
same stack slot into which the invariant memory access was stored originally.
Earlier, this was broken as we introduce a new stack slot aside of the preload
stack slot, which remained uninitialized and caused our escaping loads to
contain garbage. This happened due to us clearing the pre-populated values
in EscapeMap after kernel code generation. We address this issue by preserving
the original host values and restoring them after kernel code generation.
EscapeMap is not expected to be used during kernel code generation, hence we
clear it during kernel generation to make sure that any unintended uses are
noticed.
llvm-svn: 314894
This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and
makes sure that we fall back to the original code. It seems when invariant load
hoisting was introduced to the GPGPU backend we missed to reset the RTC flag,
such that kernels where invariant load hoisting failed executed the 'optimized'
SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this
results in hard to debug issues that are a lot of fun to debug.
llvm-svn: 314624
Such RTCs may introduce integer wrapping intrinsics with more than 64 bit,
which are translated to library calls on AOSP that are not part of the
runtime and will consequently cause linker errors.
Thanks to Eli Friedman for reporting this issue and reducing the test case.
llvm-svn: 314065
Since -polly-codegen reports itself to preserve DependenceInfo and IslAstInfo,
we might get those analysis that were computed by a different ScopInfo for a
different Scop structure. This would be unfortunate because DependenceInfo and
IslAstInfo hold references to resources allocated by
ScopInfo/ScopBuilder/Scop (e.g. isl_id). If -polly-codegen and
DependenceInfo/IslAstInfo do not agree on which Scop to use, unpredictable
things can happen.
When the ScopInfo/Scop object is freed, there is a high probability that the
new ScopInfo/Scop object will be created at the same heap position with the
same address. Comparing whether the Scop or ScopInfo address is the expected
therefore is unreliable.
Instead, we compare the address of the isl_ctx object. Both, DependenceInfo
and IslAstInfo must hold a reference to the isl_ctx object to ensure it is
not freed before the destruction of those analyses which might happen after
the destruction of the Scop/ScopInfo they refer to. Hence, the isl_ctx
will not be freed and its address not reused as long there is a
DependenceInfo or IslAstInfo around.
This fixes llvm.org/PR34441
llvm-svn: 313842
Update CodegenCleanup using the function-level passes added by
populatePassManager that run between EP_EarlyAsPossible and
EP_VectorizerStart in -O3.
The changes in particular are:
- Added pass create arguments, e.g. ExpensiveCombines for InstCombine.
- Remove reroll pass. The option -reroll-loops is disabled by default.
- Add passes run with UnitAtATime, which is the default.
- Add instances of LibCallsShrinkWrap, TailCallElimination, SCCP
(sparse conditional constant propagation), Float2Int
that did not run before.
- Add instances of GVN as in the default pipeline.
Notes:
- GVNHoist, GVNSink, NewGVN are still disabled in the -O3 pipeline.
- The optimization level and other optimization parameters are not
accessible outside of PassManagerBuilder, hence we cannot add passes
depending on these.
Differential Revision: https://reviews.llvm.org/D37571
llvm-svn: 312875
The type of NewValue might change due to ScalarEvolution
looking though bitcasts. The synthesized NewValue therefore
becomes the type before the bitcast.
llvm-svn: 312718
In certain situations, the context in the isl_ast_build could result for the
min/max locations of our alias sets to become empty, which would cause an
internal error in isl, which is then unable to derive a value for these
expressions. Check these conditions before code generating expressions and
instead assume that alias check succeeded. This is valid, as the corresponding
memory accesses will not be executed under any valid context.
This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting.
llvm-svn: 312455
In Polly, we specifically add a paramter to represent the outermost dimension
size of fortran arrays. We do this because this information is statically
available from the fortran metadata generated by dragonegg.
However, we were only materializing these parameters (meaning, creating an
llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
we should materialize parameters from *scop array info*.
It is wrong because if there is a case where we detect 2 fortran arrays,
but only one of them is accessed, we may not materialize the other array's
dimensions at all.
This is incorrect. We fix this by looping over all
`polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.
Differential Revision: https://reviews.llvm.org/D37379
llvm-svn: 312350
Currently, GVN can be necessary to eliminate redundant instructions in case
of, for instance, GEMM and float type. This patch makes GVN be run during
the cleanup.
Reviewed-by: Tobias Grosser <tobias@grosser.es>,
Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D37340
llvm-svn: 312307
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.
So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.
This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.
Differential Revision: https://reviews.llvm.org/D37056
llvm-svn: 312239