The commit in question changed the syntax but did not update the runner
tests. This also required registering the MemRef dialect for custom
parser to work correctly.
The lit test suite uses python 3.6 features. Rather than a strange
python syntax error upon running the lit tests, we will require the
correct version in CMake.
Reviewed By: serge-sans-paille, yln
Differential Revision: https://reviews.llvm.org/D95635
This was seemingly dropped in e2310704d8,
potentially due to a misrebase. The absence of this trait makes aliasing
analysis incorrect, leading to, e.g., buffer deallocation pass inserting
deallocations too early.
The commit in question moved some ops across dialects but did not update
some of the target-specific integration tests that use these ops,
presumably because the corresponding target hardware was not available.
Fix these tests.
A previous commit moved multiple ops from Standard to MemRef dialect.
Some of these ops are exercised in Python bindings. Enable bindings for
the newly created MemRef dialect and update a test accordingly.
The patch in question broke the build with shared libraries due to
missing dependencies, one of which would have been circular between
MLIRStandard and MLIRMemRef if added. Fix this by moving more code
around and swapping the dependency direction. MLIRMemRef now depends on
MLIRStandard, but MLIRStandard does _not_ depend on MLIRMemRef.
Arguably, this is the right direction anyway since numerous libraries
depend on MLIRStandard and don't necessarily need to depend on
MLIRMemref.
Other otable changes include:
- some EDSC code is moved inline to MemRef/EDSC/Intrinsics.h because it
creates MemRef dialect operations;
- a utility function related to shape moved to BuiltinTypes.h/cpp
because it only realtes to shaped types and not any particular dialect
(standard dialect is erroneously believed to contain MemRefType);
- a Python test for the standard dialect is disabled completely because
the ops it tests moved to the new MemRef dialect, but it is not
exposed to Python bindings, and the change for that is non-trivial.
Start the description from a new line instead of putting the first
paragraph in the section header. Wrap the class name in backticks to
make it clear that it relates to the code.
This reverts commit b5d9a3c923.
The commit introduced a memory error in canonicalization/operation
walking that is exposed when compiled with ASAN. It leads to crashes in
some "release" configurations.
We know that all ConstantLike operations have one result and no operands,
so check this first before doing the trait check. This change speeds up
Canonicalize on a CIRCT testcase by ~5%.
Differential Revision: https://reviews.llvm.org/D98615
Two changes:
1) Change the canonicalizer to walk the function in top-down order instead of
bottom-up order. This composes well with the "top down" nature of constant
folding and simplification, reducing iterations and re-evaluation of ops in
simple cases.
2) Explicitly enter existing constants into the OperationFolder table before
canonicalizing. Previously we would "constant fold" them and rematerialize
them, wastefully recreating a bunch fo constants, which lead to pointless
memory traffic.
Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.
One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.
Differential Revision: https://reviews.llvm.org/D98609
This is a temporary work-around to get our all-annotations-all-flags
stress testing effort run clean. In the long run, we want to provide
efficient implementations of strided loads and stores though
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D98563
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
Functions used only in `assert` cause warnings in release mode
Reviewed By: mehdi_amini, dcaballe, ftynse
Differential Revision: https://reviews.llvm.org/D98476
NestedPattern uses a BumpPtrAllocator to store child (nested) pattern
objects to decrease the overhead of dynamic allocation. This assumes all
allocations happen inside the allocator that will be freed as a whole.
However, NestedPattern contains `std::function` as a member, which
allocates internally using `new`, unaware of the BumpPtrAllocator. Since
NestedPattern only holds pointers to the nested patterns allocated in
the BumpPtrAllocator, it never calls their destructors, so the
destructor of the `std::function`s they contain are never called either,
leaking the allocated memory.
Make NestedPattern explicitly call destructors of nested patterns. This
additionally requires to actually copy the nested patterns in
copy-construction and copy-assignment instead of just sharing the
pointer to the arena-allocated list of children to avoid double-free. An
alternative solution would be to add reference counting to the list of
arena-allocated list of children.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98485
Change CUDA integration tests to use mlir-opt + mlir-cpu-runner instead.
Depends On D98203
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98396
Forward references to blocks lead to `Block`s being allocated in the
parser, but they are not necessarily included into a region if parsing
fails, leading to a leak. Clean them up in parser destructor.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98403
This restricts the attributes to integers for constants of type
IndexType. So far an attribute like StringAttr as in
%c1 = constant "" : index
is valid.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98216
This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97822
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97892
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type.
Differential Revision: https://reviews.llvm.org/D98311
This patch fixes a heap-use-after-free introduced by the recent changes
in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f
The problem is due to the way candidate loops are visited. All candidate loops
are pattern-matched beforehand using the 'NestedMatch' utility. These matches may
intersect with each other so it may happen that we try to vectorize a loop that
was previously vectorized. The new vectorization algorithm replaces the original
loops that are vectorized with new loops and, therefore, any reference to the
original loops in the pre-computed matches becomes invalid.
This patch fixes the problem by classifying the candidate matches into buckets
before vectorization. Each bucket contains all the matches that intersect. The
vectorizer uses these buckets to make sure that we only vectorize *one* match from
each bucket, at most.
Differential Revision: https://reviews.llvm.org/D98382
For the use in LLVMOps.td I used the getPointerElementType()
escape hatch, as it's not obvious to me how the load type
should be properly obtained here.
Data layout information allows to answer questions about the size and alignment
properties of a type. It enables, among others, the generation of various
linear memory addressing schemes for containers of abstract types and deeper
reasoning about vectors. This introduces the subsystem for modeling data
layouts in MLIR.
The data layout subsystem is designed to scale to MLIR's open type and
operation system. At the top level, it consists of attribute interfaces that
can be implemented by concrete data layout specifications; type interfaces that
should be implemented by types subject to data layout; operation interfaces
that must be implemented by operations that can serve as data layout scopes
(e.g., modules); and dialect interfaces for data layout properties unrelated to
specific types. Built-in types are handled specially to decrease the overall
query cost.
A concrete default implementation of these interfaces is provided in the new
Target dialect. Defaults for built-in types that match the current behavior are
also provided.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97067
verifyCompatibleShapes is not transitive. Create an n-ary version and
update SameOperandShapes and SameOperandAndResultShapes traits to use
it.
Differential Revision: https://reviews.llvm.org/D98331
Clean-up after D98279, remove one call to createConvertGPUKernelToBlobPass().
Depends On D98203
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98360
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203
The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include:
* Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments.
* Don't process entry block arguments given that we don't erase them at this point.
* Don't track individual operation results, given that we don't erase them. We can just track the parent operation.
Differential Revision: https://reviews.llvm.org/D98309
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97892
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
Link `MLIRStandardToLLVM` to `MLIRAVX512Transforms`, since
the latter uses `LLVMTypeConverter` defined in the first one.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D98336