The "else" group of an optional element is a collection of elements that get parsed/printed when the anchor of the main element group is *not* present. This is useful when there is a special syntax when an element is not present. The new syntax for an optional element is shown below:
```
optional-group: `(` elements `)` (`:` `(` else-elements `)`)? `?`
```
An example of how this might be used is shown below:
```tablegen
def FooOp : ... {
let arguments = (ins UnitAttr:$foo);
let assemblyFormat = "attr-dict (`foo_is_present` $foo^):(`foo_is_absent`)?";
}
```
would be formatted as such:
```mlir
// When the `foo` attribute is present:
foo.op foo_is_present
// When the `foo` attribute is not present:
foo.op foo_is_absent
```
Differential Revision: https://reviews.llvm.org/D99129
This nicely aligns the naming with RewritePatternSet. This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.
Differential Revision: https://reviews.llvm.org/D99131
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This maintains the old name to have minimal source impact on downstream codes, and
does not do the huge mechanical patch. I expect the huge mechanical patch to land
sometime this week, but we can keep around the old names for a couple weeks to reduce
impact on downstream projects.
Differential Revision: https://reviews.llvm.org/D99119
This allows adding a C function pointer as a matchAndRewrite style pattern, which
is a very common case. This adopts it in ExpandTanh to show how it reduces a level
of nesting.
We could allow C++ lambdas here, but that doesn't work as well with type inference
in the common case. Instead of:
patterns.insert(convertTanhOp);
you need to specify:
patterns.insert<math::TanhOp>(convertTanhOp);
which is boilerplate'y. Capturing state like this is very uncommon, so we choose
to require clients to define their own structs and use the non-convenience method
when they need to do so.
Differential Revision: https://reviews.llvm.org/D99039
In the original structure, it will try to match CHECK-LABEL first then see if
the subsequent doesn't have the target strings. This is not what we are
expected. We are expecting the two functions which will be deleted should be
matched before CHECK-LABEL. Also fixed the function names.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D99060
Multiply-shift requires wider compute types or CPU specific code to avoid
premature truncation, apply_shift fixes this issue
Also, Tosa's mul op supports different input / output types. Added path that
sign-extends input values to int-32 values before multiplying.
Differential Revision: https://reviews.llvm.org/D99011
- Drop unnecessary occurrences of rewriter.eraseOp: dead linalg ops on tensors should be cleaned up by DCE.
- reimplement the part of Linalg on fusion that constructs the body and block arguments: the previous implementation had too much magic. Instead this spells out all cases explicitly and asserts / introduces TODOs for incorrect cases.
As a consequence, we can use the default traversal order for this pattern.
Differential Revision: https://reviews.llvm.org/D99070
GreedyPatternRewriteDriver was changed from bottom-up traversal to top-down traversal. Not all passes work yet with that change for traversal order. To give some time for fixing, add an option to allow to switch back to bottom-up traversal. Use this option in FusionOfTensorOpsPass which fails otherwise.
Differential Revision: https://reviews.llvm.org/D99059
mlir/lib/Dialect/Shape/IR/Shape.cpp:573:26: warning: loop variable 'shape' is always a copy because the range of type '::mlir::Operation::operand_range' (aka 'mlir::OperandRange') does not return a reference [-Wrange-loop-analysis]
for (const auto &shape : shapes()) {
^
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
This reapplies b5d9a3c / https://reviews.llvm.org/D98609 with a one line fix in
processExistingConstants to skip() when erasing a constant we've already seen.
Original commit message:
1) Change the canonicalizer to walk the function in top-down order instead of
bottom-up order. This composes well with the "top down" nature of constant
folding and simplification, reducing iterations and re-evaluation of ops in
simple cases.
2) Explicitly enter existing constants into the OperationFolder table before
canonicalizing. Previously we would "constant fold" them and rematerialize
them, wastefully recreating a bunch fo constants, which lead to pointless
memory traffic.
Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.
One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.
Differential Revision: https://reviews.llvm.org/D99006
* Fold SelectOp when both true and false args are same SSA value
* Fold some cmp + select patterns
Differential Revision: https://reviews.llvm.org/D98576
* Do we need a threshold on maximum number of Yeild arguments processed (maximum number of SelectOp's to be generated)?
* Had to modify some old IfOp tests to not get optimized by this pattern
Differential Revision: https://reviews.llvm.org/D98592
This makes the annotation tied to the operand and the use of a keyword
more explicit/readable on what it means.
Differential Revision: https://reviews.llvm.org/D99001
* Moves this out of a test case where it was being developed to good effect and generalizes it.
* Having tried a number of things like this, I think this balances concerns reasonably well.
Differential Revision: https://reviews.llvm.org/D98989
Now that all of the builtin dialect is generated from ODS, its documentation in LangRef can be split out and replaced with references to Dialects/Builtin.md. LangRef is quite crusty right now and should really have a full cleanup done in a followup.
Differential Revision: https://reviews.llvm.org/D98562
This removes the need to construct an APInt for each value, given that it is guaranteed to contain 32 bit elements.
BEGIN_PUBLIC
...text exposed to open source public git repo...
END_PUBLIC
This allows for notifying callers when operations/blocks get erased, which is especially useful for the greedy pattern driver. The current greedy pattern driver "throws away" all information on constants in the operation folder because it doesn't know if they get erased or not. By passing in RewriterBase, we can directly track this and prevent the need for the pattern driver to rediscover all of the existing constants. In some situations this cuts the compile time of the canonicalizer in half.
Differential Revision: https://reviews.llvm.org/D98755
* IRModules.cpp -> (IRCore.cpp, IRAffine.cpp, IRAttributes.cpp, IRTypes.cpp).
* The individual pieces now compile in the 5-15s range whereas IRModules.cpp was starting to approach a minute (didn't capture a before time).
* More fine grained splitting is possible, but this represents the most obvious.
Differential Revision: https://reviews.llvm.org/D98978
Previously low benefit op-specific patterns never had a chance to match
even if high benefit op-agnostic pattern failed to match.
This was already fixed upstream, this commit just adds testscase
Differential Revision: https://reviews.llvm.org/D98513
Handles lowering from the tosa CastOp to the equivalent linalg lowering. It
includes support for interchange between bool, int, and floating point.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98828
Adds lowerings for logical_* boolean operations. Each of these ops only operate
on booleans allowing simple lowerings.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D98910
* Makes the wrapped functions of the `@linalg_structured_op` decorator callable such that they emit IR imperatively when invoked.
* There are numerous TODOs that I will keep working through to achieve generality.
* Will true up exception handling tests as the feature progresses (for things that are actually errors once everything is implemented).
* Includes the addition of an `isinstance` method on concrete types in the Python API.
Differential Revision: https://reviews.llvm.org/D98754
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447
When deleting operations in DCE, the algorithm uses a post-order walk of
the IR to ensure that value uses were erased before value defs. Graph
regions do not have the same structural invariants as SSA CFG, and this
post order walk could delete value defs before uses. This problem is
guaranteed to occur when there is a cycle in the use-def graph.
This change stops DCE from visiting the operations and blocks in any
meaningful order. Instead, we rely on explicitly dropping all uses of a
value before deleting it.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D98919
Add extra `type.isa<FloatType>()` check to `FloatAttr::get(Type, double)` method.
Otherwise it tries to call `type.cast<FloatType>()`, which fails with assertion in Debug mode.
The `!type.isa<FloatType>()` case just redirercts the call to `FloatAttr::get(Type, APFloat)`,
which will perform the actual check and emit appropriate error.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98764
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.
We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.
Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D98753
Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.
Differential Revision: https://reviews.llvm.org/D98813
This reverts commit 32a744ab20.
CI is broken:
test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
// CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
^
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98671
Do not limit the number of arguments in rewriter pattern.
Introduce separate `FmtStrVecObject` class to handle
format of variadic `std::string` array.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97839
This covers cases that are not folded away because the extent tensor type
becomes more concrete in the process.
Differential Revision: https://reviews.llvm.org/D98782
This has been a TODO for a while, and prevents breakages for attributes/types that contain floats that can't roundtrip outside of the hex format.
Differential Revision: https://reviews.llvm.org/D98808
This fixes broken JIT functionality on emulator platforms.
With Alex' recent movement towards squashing llvm ir dialects
into target specific dialects, we now must ensure these dialects
are registered to the cpu runner to ensure JIT can lower this
to proper LLVM IR before handing this off to the backend.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D98727
Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.
This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.
With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.
Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97836
'ForOpIterArgsFolder' can now remove iterator arguments (and corresponding
results) with no use.
Example:
```
%cst = constant 32 : i32
%0:2 = scf.for %arg1 = %lb to %ub step %step iter_args(%arg2 = %arg0, %arg3 = %cst)
-> (i32, i32) {
%1 = addu %arg2, %cst : i32
scf.yield %1, %1 : i32, i32
}
use(%0#0)
```
%arg3 is not used in the block, and its corresponding result `%0#1` has no use,
thus remove the iter argument.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98711
Returning structs directly in LLVM does not necessarily align with the C ABI of
the platform. This might happen to work on Linux but for small structs this
breaks on Windows. With this change, the wrappers work platform independently.
Differential Revision: https://reviews.llvm.org/D98725
Added additional information about the SSA like properties
that has to be fulfilled in the bufferization steps.
Differential Revision: https://reviews.llvm.org/D95522
This commit fixes the lowering of `Affine.IfOp` to `SCF.IfOp` in the
presence of yield values. These changes have been made as a part of
`-lower-affine` pass.
Differential Revision: https://reviews.llvm.org/D98760
Some parameters to attributes and types rely on special comparison routines other than operator== to ensure equality. This revision adds support for those parameters by allowing them to specify a `comparator` code block that determines if `$_lhs` and `$_rhs` are equal. An example of one of these paramters is APFloat, which requires `bitwiseIsEqual` for bitwise comparison (which we want for attribute equality).
Differential Revision: https://reviews.llvm.org/D98473
Supporting ranges in the byte code requires additional complexity, given that a range can't be easily representable as an opaque void *, as is possible with the existing bytecode value types (Attribute, Type, Value, etc.). To enable representing a range with void *, an auxillary storage is used for the actual range itself, with the pointer being passed around in the normal byte code memory. For type ranges, a TypeRange is stored. For value ranges, a ValueRange is stored. The above problem represents a majority of the complexity involved in this revision, the rest is adapting/adding byte code operations to support the changes made to the PDL interpreter in the parent revision.
After this revision, PDL will have initial end-to-end support for variadic operands/results.
Differential Revision: https://reviews.llvm.org/D95723
This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl_interp.check_types : Compare a range of types with a known range.
* pdl_interp.create_types : Create a constant range of types.
* pdl_interp.get_operands : Get a range of operands from an operation.
* pdl_interp.get_results : Get a range of results from an operation.
* pdl_interp.switch_types : Switch on a range of types.
This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision.
Differential Revision: https://reviews.llvm.org/D95722
This revision extends the PDL dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl.operands : Define a range of input operands.
* pdl.results : Extract a result group from an operation.
* pdl.types : Define a handle to a range of types.
Support for these in the pdl interpreter dialect and byte code will be added in followup revisions.
Differential Revision: https://reviews.llvm.org/D95721
This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp:
* Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL.
* Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users.
This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue.
Differential Revision: https://reviews.llvm.org/D95720
Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.:
```
%root = pdl.operation ...
%result = pdl.result 0 of %root
```
Differential Revision: https://reviews.llvm.org/D95719
This adds a new integration test. However, it also
adapts to a recent memref.XXX change for existing tests
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D98680
Lit as it exists today has three hacks that allow users to run tests earlier:
1) An entire test suite can set the `is_early` boolean.
2) A very recently introduced "early_tests" feature.
3) The `--incremental` flag forces failing tests to run first.
All of these approaches have problems.
1) The `is_early` feature was until very recently undocumented. Nevertheless it still lacks testing and is a imprecise way of optimizing test starting times.
2) The `early_tests` feature requires manual updates and doesn't scale.
3) `--incremental` is undocumented, untested, and it requires modifying the *source* file system by "touching" the file. This "touch" based approach is arguably a hack because it confuses editors (because it looks like the test was modified behind the back of the editor) and "touching" the test source file doesn't work if the test suite is read only from the perspective of `lit` (via advanced filesystem/build tricks).
This patch attempts to simplify and address all of the above problems.
This patch formalizes, documents, tests, and defaults lit to recording the execution time of tests and then reordering all tests during the next execution. By reordering the tests, high core count machines run faster, sometimes significantly so.
This patch also always runs failing tests first, which is a positive user experience win for those that didn't know about the hidden `--incremental` flag.
Finally, if users want, they can _optionally_ commit the test timing data (or a subset thereof) back to the repository to accelerate bots and first-time runs of the test suite.
Reviewed By: jhenderson, yln
Differential Revision: https://reviews.llvm.org/D98179
Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a
scf::ForOp. If the block argument corresponding to the given iterator has no
use and the yielded value equals the input, we fold it away.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98503
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470
The commit in question changed the syntax but did not update the runner
tests. This also required registering the MemRef dialect for custom
parser to work correctly.
The lit test suite uses python 3.6 features. Rather than a strange
python syntax error upon running the lit tests, we will require the
correct version in CMake.
Reviewed By: serge-sans-paille, yln
Differential Revision: https://reviews.llvm.org/D95635
This was seemingly dropped in e2310704d8,
potentially due to a misrebase. The absence of this trait makes aliasing
analysis incorrect, leading to, e.g., buffer deallocation pass inserting
deallocations too early.
The commit in question moved some ops across dialects but did not update
some of the target-specific integration tests that use these ops,
presumably because the corresponding target hardware was not available.
Fix these tests.
A previous commit moved multiple ops from Standard to MemRef dialect.
Some of these ops are exercised in Python bindings. Enable bindings for
the newly created MemRef dialect and update a test accordingly.
The patch in question broke the build with shared libraries due to
missing dependencies, one of which would have been circular between
MLIRStandard and MLIRMemRef if added. Fix this by moving more code
around and swapping the dependency direction. MLIRMemRef now depends on
MLIRStandard, but MLIRStandard does _not_ depend on MLIRMemRef.
Arguably, this is the right direction anyway since numerous libraries
depend on MLIRStandard and don't necessarily need to depend on
MLIRMemref.
Other otable changes include:
- some EDSC code is moved inline to MemRef/EDSC/Intrinsics.h because it
creates MemRef dialect operations;
- a utility function related to shape moved to BuiltinTypes.h/cpp
because it only realtes to shaped types and not any particular dialect
(standard dialect is erroneously believed to contain MemRefType);
- a Python test for the standard dialect is disabled completely because
the ops it tests moved to the new MemRef dialect, but it is not
exposed to Python bindings, and the change for that is non-trivial.
Start the description from a new line instead of putting the first
paragraph in the section header. Wrap the class name in backticks to
make it clear that it relates to the code.
This reverts commit b5d9a3c923.
The commit introduced a memory error in canonicalization/operation
walking that is exposed when compiled with ASAN. It leads to crashes in
some "release" configurations.
We know that all ConstantLike operations have one result and no operands,
so check this first before doing the trait check. This change speeds up
Canonicalize on a CIRCT testcase by ~5%.
Differential Revision: https://reviews.llvm.org/D98615
Two changes:
1) Change the canonicalizer to walk the function in top-down order instead of
bottom-up order. This composes well with the "top down" nature of constant
folding and simplification, reducing iterations and re-evaluation of ops in
simple cases.
2) Explicitly enter existing constants into the OperationFolder table before
canonicalizing. Previously we would "constant fold" them and rematerialize
them, wastefully recreating a bunch fo constants, which lead to pointless
memory traffic.
Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.
One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.
Differential Revision: https://reviews.llvm.org/D98609
This is a temporary work-around to get our all-annotations-all-flags
stress testing effort run clean. In the long run, we want to provide
efficient implementations of strided loads and stores though
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D98563
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
Functions used only in `assert` cause warnings in release mode
Reviewed By: mehdi_amini, dcaballe, ftynse
Differential Revision: https://reviews.llvm.org/D98476
NestedPattern uses a BumpPtrAllocator to store child (nested) pattern
objects to decrease the overhead of dynamic allocation. This assumes all
allocations happen inside the allocator that will be freed as a whole.
However, NestedPattern contains `std::function` as a member, which
allocates internally using `new`, unaware of the BumpPtrAllocator. Since
NestedPattern only holds pointers to the nested patterns allocated in
the BumpPtrAllocator, it never calls their destructors, so the
destructor of the `std::function`s they contain are never called either,
leaking the allocated memory.
Make NestedPattern explicitly call destructors of nested patterns. This
additionally requires to actually copy the nested patterns in
copy-construction and copy-assignment instead of just sharing the
pointer to the arena-allocated list of children to avoid double-free. An
alternative solution would be to add reference counting to the list of
arena-allocated list of children.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98485
Change CUDA integration tests to use mlir-opt + mlir-cpu-runner instead.
Depends On D98203
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98396
Forward references to blocks lead to `Block`s being allocated in the
parser, but they are not necessarily included into a region if parsing
fails, leading to a leak. Clean them up in parser destructor.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98403
This restricts the attributes to integers for constants of type
IndexType. So far an attribute like StringAttr as in
%c1 = constant "" : index
is valid.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98216
This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97822
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97892
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type.
Differential Revision: https://reviews.llvm.org/D98311
This patch fixes a heap-use-after-free introduced by the recent changes
in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f
The problem is due to the way candidate loops are visited. All candidate loops
are pattern-matched beforehand using the 'NestedMatch' utility. These matches may
intersect with each other so it may happen that we try to vectorize a loop that
was previously vectorized. The new vectorization algorithm replaces the original
loops that are vectorized with new loops and, therefore, any reference to the
original loops in the pre-computed matches becomes invalid.
This patch fixes the problem by classifying the candidate matches into buckets
before vectorization. Each bucket contains all the matches that intersect. The
vectorizer uses these buckets to make sure that we only vectorize *one* match from
each bucket, at most.
Differential Revision: https://reviews.llvm.org/D98382
For the use in LLVMOps.td I used the getPointerElementType()
escape hatch, as it's not obvious to me how the load type
should be properly obtained here.
Data layout information allows to answer questions about the size and alignment
properties of a type. It enables, among others, the generation of various
linear memory addressing schemes for containers of abstract types and deeper
reasoning about vectors. This introduces the subsystem for modeling data
layouts in MLIR.
The data layout subsystem is designed to scale to MLIR's open type and
operation system. At the top level, it consists of attribute interfaces that
can be implemented by concrete data layout specifications; type interfaces that
should be implemented by types subject to data layout; operation interfaces
that must be implemented by operations that can serve as data layout scopes
(e.g., modules); and dialect interfaces for data layout properties unrelated to
specific types. Built-in types are handled specially to decrease the overall
query cost.
A concrete default implementation of these interfaces is provided in the new
Target dialect. Defaults for built-in types that match the current behavior are
also provided.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97067
verifyCompatibleShapes is not transitive. Create an n-ary version and
update SameOperandShapes and SameOperandAndResultShapes traits to use
it.
Differential Revision: https://reviews.llvm.org/D98331
Clean-up after D98279, remove one call to createConvertGPUKernelToBlobPass().
Depends On D98203
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98360
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203
The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include:
* Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments.
* Don't process entry block arguments given that we don't erase them at this point.
* Don't track individual operation results, given that we don't erase them. We can just track the parent operation.
Differential Revision: https://reviews.llvm.org/D98309
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97892
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
Link `MLIRStandardToLLVM` to `MLIRAVX512Transforms`, since
the latter uses `LLVMTypeConverter` defined in the first one.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D98336
The dialect separation was introduced to demarkate ops operating in different
type systems. This is no longer the case after the LLVM dialect has migrated to
using built-in vector types, so the original reason for separation is no longer
valid. Squash the two dialects into one.
The code size decrease isn't quite large: the ops originally in LLVM_AVX512 are
preserved because they match LLVM IR intrinsics specialized for vector element
bitwidth. However, it is still conceptually beneficial to have only one
dialect. I originally considered to use Tablegen multiclasses to define both
the type-polymorphic op and its two intrinsic-related instantiations, but
decided against it given both the complexity of the required Tablegen input and
its dissimilarity with the rest of ODS-defined ops, both potentially resulting
in very poor maintainability.
Depends On D98327
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D98328
VectorOfLengthAndType accepts a cartesian product of given lengths and types
rather than types produced by co-indexed values in the corresponding lists.
Update the definitions accordingly. The type validity is already enforced by
op traits.
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D98327
It is to use the methods in LinalgInterfaces.cpp for additional static shape verification to match the shaped operands and loop on linalgOps. If I used the existing methods, I would face circular dependency linking issue. Now we can use them as methods of LinalgOp.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D98163
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.
Put the base class in GPU/Transforms, according to the discussion in D98203.
The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.
Depends On D98168
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98279
Based on the following discussion:
https://llvm.discourse.group/t/rfc-memref-memory-shape-as-attribute/2229
The goal of the change is to make memory space property to have more
expressive representation, rather then "magic" integer values.
It will allow to have more clean ASM form:
```
gpu.func @test(%arg0: memref<100xf32, "workgroup">)
// instead of
gpu.func @test(%arg0: memref<100xf32, 3>)
```
Explanation for `Attribute` choice instead of plain `string`:
* `Attribute` classes allow to use more type safe API based on RTTI.
* `Attribute` classes provides faster comparison operator based on
pointer comparison in contrast to generic string comparison.
* `Attribute` allows to store more complex things, like structs or dictionaries.
It will allows to have more complex memory space hierarchy.
This commit preserve old integer-based API and implements it on top
of the new one.
Depends on D97476
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D96145
This method allows for removing multiple disjoint operands at once, reducing the need to erase operands individually (which results in shifting the operand list).
Differential Revision: https://reviews.llvm.org/D98290
This class provides efficient implementations of symbol queries related to uses, such as collecting the users of a symbol, replacing all uses, etc. This provides similar benefits to use related queries, as SymbolTableCollection did for lookup queries.
Differential Revision: https://reviews.llvm.org/D98071