This is useful for bit-packing types such as vectors and tuples as well as for
exotic architectures that have non-8-bit bytes.
Depends On D98500
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D98524
ModuleOp is a natural place to provide scoped data layout information. However,
it is undesirable for ModuleOp to implement the entirety of
DataLayoutOpInterface because that would require either pushing the interface
inside the IR library instead of a separate library, or putting the default
implementation of the interface as inline functions in headers leading to
binary bloat. Instead, ModuleOp accepts an arbitrary data layout spec attribute
and has a dedicated hook to extract it, and DataLayout is modified to know
about ModuleOp particularities.
Reviewed By: herhut, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98500
This mechanism makes it possible for a dialect to not register all
operations but still answer interface-based queries.
This can useful for dialects that are "open" or connected to an external
system and still interoperate with the compiler. It can also open up the
possibility to have a more extensible compiler at runtime: the compiler
does not need a pre-registration for each operation and the dialect can
inject behavior dynamically.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D93085
To match an interface or trait, users currently have to use the `MatchAny` tag. This tag can be quite problematic for compile time for things like the canonicalizer, as the `MatchAny` patterns may get applied to *every* operation. This revision adds better support by bucketing interface/trait patterns based on which registered operations have them registered. This means that moving forward we will only attempt to match these patterns to operations that have this interface registered. Two simplify defining patterns that match traits and interfaces, two new utility classes have been added: OpTraitRewritePattern and OpInterfaceRewritePattern.
Differential Revision: https://reviews.llvm.org/D98986
This provides a simplified way to implement 'matchAndRewrite' style
canonicalization patterns for ops that don't need the full power of
RewritePatterns. Using this style, you can implement a static method
with a signature like:
```
LogicalResult AssertOp::canonicalize(AssertOp op, PatternRewriter &rewriter) {
return success();
}
```
instead of dealing with defining RewritePattern subclasses. This also
adopts this for a few canonicalization patterns in the std dialect to
show how it works.
Differential Revision: https://reviews.llvm.org/D99143
This revision introduces proper backward slice computation during the hoisting of
PadTensorOp. This allows hoisting padding even across multiple levels of tiling.
Such hoisting requires the proper handling of loop bounds that may depend on enclosing
loop variables.
Differential revision: https://reviews.llvm.org/D98965
This nicely aligns the naming with RewritePatternSet. This type isn't
as widely used, but we keep a using declaration in to help with
downstream consumption of this change.
Differential Revision: https://reviews.llvm.org/D99131
This doesn't change APIs, this just cleans up the many in-tree uses of these
names to use the new preferred names. We'll keep the old names around for a
couple weeks to help transitions.
Differential Revision: https://reviews.llvm.org/D99127
This maintains the old name to have minimal source impact on downstream codes, and
does not do the huge mechanical patch. I expect the huge mechanical patch to land
sometime this week, but we can keep around the old names for a couple weeks to reduce
impact on downstream projects.
Differential Revision: https://reviews.llvm.org/D99119
This allows adding a C function pointer as a matchAndRewrite style pattern, which
is a very common case. This adopts it in ExpandTanh to show how it reduces a level
of nesting.
We could allow C++ lambdas here, but that doesn't work as well with type inference
in the common case. Instead of:
patterns.insert(convertTanhOp);
you need to specify:
patterns.insert<math::TanhOp>(convertTanhOp);
which is boilerplate'y. Capturing state like this is very uncommon, so we choose
to require clients to define their own structs and use the non-convenience method
when they need to do so.
Differential Revision: https://reviews.llvm.org/D99039
GreedyPatternRewriteDriver was changed from bottom-up traversal to top-down traversal. Not all passes work yet with that change for traversal order. To give some time for fixing, add an option to allow to switch back to bottom-up traversal. Use this option in FusionOfTensorOpsPass which fails otherwise.
Differential Revision: https://reviews.llvm.org/D99059
This updates the codebase to pass the context when creating an instance of
OwningRewritePatternList, and starts removing extraneous MLIRContext
parameters. There are many many more to be removed.
Differential Revision: https://reviews.llvm.org/D99028
This reapplies b5d9a3c / https://reviews.llvm.org/D98609 with a one line fix in
processExistingConstants to skip() when erasing a constant we've already seen.
Original commit message:
1) Change the canonicalizer to walk the function in top-down order instead of
bottom-up order. This composes well with the "top down" nature of constant
folding and simplification, reducing iterations and re-evaluation of ops in
simple cases.
2) Explicitly enter existing constants into the OperationFolder table before
canonicalizing. Previously we would "constant fold" them and rematerialize
them, wastefully recreating a bunch fo constants, which lead to pointless
memory traffic.
Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.
One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.
Differential Revision: https://reviews.llvm.org/D99006
This makes the annotation tied to the operand and the use of a keyword
more explicit/readable on what it means.
Differential Revision: https://reviews.llvm.org/D99001
Now that all of the builtin dialect is generated from ODS, its documentation in LangRef can be split out and replaced with references to Dialects/Builtin.md. LangRef is quite crusty right now and should really have a full cleanup done in a followup.
Differential Revision: https://reviews.llvm.org/D98562
This allows for notifying callers when operations/blocks get erased, which is especially useful for the greedy pattern driver. The current greedy pattern driver "throws away" all information on constants in the operation folder because it doesn't know if they get erased or not. By passing in RewriterBase, we can directly track this and prevent the need for the pattern driver to rediscover all of the existing constants. In some situations this cuts the compile time of the canonicalizer in half.
Differential Revision: https://reviews.llvm.org/D98755
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447
Add extra `type.isa<FloatType>()` check to `FloatAttr::get(Type, double)` method.
Otherwise it tries to call `type.cast<FloatType>()`, which fails with assertion in Debug mode.
The `!type.isa<FloatType>()` case just redirercts the call to `FloatAttr::get(Type, APFloat)`,
which will perform the actual check and emit appropriate error.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98764
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.
We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.
Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D98753
Do not limit the number of arguments in rewriter pattern.
Introduce separate `FmtStrVecObject` class to handle
format of variadic `std::string` array.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97839
Add a feature to `EnumAttr` definition to generate
specialized Attribute class for the particular enumeration.
This class will inherit `StringAttr` or `IntegerAttr` and
will override `classof` and `getValue` methods.
With this class the enumeration predicate can be checked with simple
RTTI calls (`isa`, `dyn_cast`) and it will return the typed enumeration
directly instead of raw string/integer.
Based on the following discussion:
https://llvm.discourse.group/t/rfc-add-enum-attribute-decorator-class/2252
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97836
Returning structs directly in LLVM does not necessarily align with the C ABI of
the platform. This might happen to work on Linux but for small structs this
breaks on Windows. With this change, the wrappers work platform independently.
Differential Revision: https://reviews.llvm.org/D98725
Some parameters to attributes and types rely on special comparison routines other than operator== to ensure equality. This revision adds support for those parameters by allowing them to specify a `comparator` code block that determines if `$_lhs` and `$_rhs` are equal. An example of one of these paramters is APFloat, which requires `bitwiseIsEqual` for bitwise comparison (which we want for attribute equality).
Differential Revision: https://reviews.llvm.org/D98473
Supporting ranges in the byte code requires additional complexity, given that a range can't be easily representable as an opaque void *, as is possible with the existing bytecode value types (Attribute, Type, Value, etc.). To enable representing a range with void *, an auxillary storage is used for the actual range itself, with the pointer being passed around in the normal byte code memory. For type ranges, a TypeRange is stored. For value ranges, a ValueRange is stored. The above problem represents a majority of the complexity involved in this revision, the rest is adapting/adding byte code operations to support the changes made to the PDL interpreter in the parent revision.
After this revision, PDL will have initial end-to-end support for variadic operands/results.
Differential Revision: https://reviews.llvm.org/D95723
This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl_interp.check_types : Compare a range of types with a known range.
* pdl_interp.create_types : Create a constant range of types.
* pdl_interp.get_operands : Get a range of operands from an operation.
* pdl_interp.get_results : Get a range of results from an operation.
* pdl_interp.switch_types : Switch on a range of types.
This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision.
Differential Revision: https://reviews.llvm.org/D95722
This revision extends the PDL dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant:
* pdl.operands : Define a range of input operands.
* pdl.results : Extract a result group from an operation.
* pdl.types : Define a handle to a range of types.
Support for these in the pdl interpreter dialect and byte code will be added in followup revisions.
Differential Revision: https://reviews.llvm.org/D95721
This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp:
* Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL.
* Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users.
This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue.
Differential Revision: https://reviews.llvm.org/D95720
Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.:
```
%root = pdl.operation ...
%result = pdl.result 0 of %root
```
Differential Revision: https://reviews.llvm.org/D95719
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470
This was seemingly dropped in e2310704d8,
potentially due to a misrebase. The absence of this trait makes aliasing
analysis incorrect, leading to, e.g., buffer deallocation pass inserting
deallocations too early.
The patch in question broke the build with shared libraries due to
missing dependencies, one of which would have been circular between
MLIRStandard and MLIRMemRef if added. Fix this by moving more code
around and swapping the dependency direction. MLIRMemRef now depends on
MLIRStandard, but MLIRStandard does _not_ depend on MLIRMemRef.
Arguably, this is the right direction anyway since numerous libraries
depend on MLIRStandard and don't necessarily need to depend on
MLIRMemref.
Other otable changes include:
- some EDSC code is moved inline to MemRef/EDSC/Intrinsics.h because it
creates MemRef dialect operations;
- a utility function related to shape moved to BuiltinTypes.h/cpp
because it only realtes to shaped types and not any particular dialect
(standard dialect is erroneously believed to contain MemRefType);
- a Python test for the standard dialect is disabled completely because
the ops it tests moved to the new MemRef dialect, but it is not
exposed to Python bindings, and the change for that is non-trivial.
This reverts commit b5d9a3c923.
The commit introduced a memory error in canonicalization/operation
walking that is exposed when compiled with ASAN. It leads to crashes in
some "release" configurations.
We know that all ConstantLike operations have one result and no operands,
so check this first before doing the trait check. This change speeds up
Canonicalize on a CIRCT testcase by ~5%.
Differential Revision: https://reviews.llvm.org/D98615
Two changes:
1) Change the canonicalizer to walk the function in top-down order instead of
bottom-up order. This composes well with the "top down" nature of constant
folding and simplification, reducing iterations and re-evaluation of ops in
simple cases.
2) Explicitly enter existing constants into the OperationFolder table before
canonicalizing. Previously we would "constant fold" them and rematerialize
them, wastefully recreating a bunch fo constants, which lead to pointless
memory traffic.
Both changes together provide a 33% speedup for canonicalize on some mid-size
CIRCT examples.
One artifact of this change is that the constants generated in normal pattern
application get inserted at the top of the function as the patterns are applied.
Because of this, we get "inverted" constants more often, which is an aethetic
change to the IR but does permute some testcases.
Differential Revision: https://reviews.llvm.org/D98609
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
NestedPattern uses a BumpPtrAllocator to store child (nested) pattern
objects to decrease the overhead of dynamic allocation. This assumes all
allocations happen inside the allocator that will be freed as a whole.
However, NestedPattern contains `std::function` as a member, which
allocates internally using `new`, unaware of the BumpPtrAllocator. Since
NestedPattern only holds pointers to the nested patterns allocated in
the BumpPtrAllocator, it never calls their destructors, so the
destructor of the `std::function`s they contain are never called either,
leaking the allocated memory.
Make NestedPattern explicitly call destructors of nested patterns. This
additionally requires to actually copy the nested patterns in
copy-construction and copy-assignment instead of just sharing the
pointer to the arena-allocated list of children to avoid double-free. An
alternative solution would be to add reference counting to the list of
arena-allocated list of children.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98485
This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97822
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type.
Differential Revision: https://reviews.llvm.org/D98311
This patch fixes a heap-use-after-free introduced by the recent changes
in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f
The problem is due to the way candidate loops are visited. All candidate loops
are pattern-matched beforehand using the 'NestedMatch' utility. These matches may
intersect with each other so it may happen that we try to vectorize a loop that
was previously vectorized. The new vectorization algorithm replaces the original
loops that are vectorized with new loops and, therefore, any reference to the
original loops in the pre-computed matches becomes invalid.
This patch fixes the problem by classifying the candidate matches into buckets
before vectorization. Each bucket contains all the matches that intersect. The
vectorizer uses these buckets to make sure that we only vectorize *one* match from
each bucket, at most.
Differential Revision: https://reviews.llvm.org/D98382
For the use in LLVMOps.td I used the getPointerElementType()
escape hatch, as it's not obvious to me how the load type
should be properly obtained here.
Data layout information allows to answer questions about the size and alignment
properties of a type. It enables, among others, the generation of various
linear memory addressing schemes for containers of abstract types and deeper
reasoning about vectors. This introduces the subsystem for modeling data
layouts in MLIR.
The data layout subsystem is designed to scale to MLIR's open type and
operation system. At the top level, it consists of attribute interfaces that
can be implemented by concrete data layout specifications; type interfaces that
should be implemented by types subject to data layout; operation interfaces
that must be implemented by operations that can serve as data layout scopes
(e.g., modules); and dialect interfaces for data layout properties unrelated to
specific types. Built-in types are handled specially to decrease the overall
query cost.
A concrete default implementation of these interfaces is provided in the new
Target dialect. Defaults for built-in types that match the current behavior are
also provided.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97067
verifyCompatibleShapes is not transitive. Create an n-ary version and
update SameOperandShapes and SameOperandAndResultShapes traits to use
it.
Differential Revision: https://reviews.llvm.org/D98331
If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt.
The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner.
Depends On D98279
Reviewed By: herhut, rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98203
The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include:
* Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments.
* Don't process entry block arguments given that we don't erase them at this point.
* Don't track individual operation results, given that we don't erase them. We can just track the parent operation.
Differential Revision: https://reviews.llvm.org/D98309
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
The dialect separation was introduced to demarkate ops operating in different
type systems. This is no longer the case after the LLVM dialect has migrated to
using built-in vector types, so the original reason for separation is no longer
valid. Squash the two dialects into one.
The code size decrease isn't quite large: the ops originally in LLVM_AVX512 are
preserved because they match LLVM IR intrinsics specialized for vector element
bitwidth. However, it is still conceptually beneficial to have only one
dialect. I originally considered to use Tablegen multiclasses to define both
the type-polymorphic op and its two intrinsic-related instantiations, but
decided against it given both the complexity of the required Tablegen input and
its dissimilarity with the rest of ODS-defined ops, both potentially resulting
in very poor maintainability.
Depends On D98327
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D98328
VectorOfLengthAndType accepts a cartesian product of given lengths and types
rather than types produced by co-indexed values in the corresponding lists.
Update the definitions accordingly. The type validity is already enforced by
op traits.
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D98327
It is to use the methods in LinalgInterfaces.cpp for additional static shape verification to match the shaped operands and loop on linalgOps. If I used the existing methods, I would face circular dependency linking issue. Now we can use them as methods of LinalgOp.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D98163
Instead of configuring kernel-to-cubin/rocdl lowering through callbacks, introduce a base class that target-specific passes can derive from.
Put the base class in GPU/Transforms, according to the discussion in D98203.
The mlir-cuda-runner will go away shortly, and the mlir-rocdl-runner as well at some point. I therefore kept the existing code path working and will remove it in a separate step.
Depends On D98168
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D98279
Based on the following discussion:
https://llvm.discourse.group/t/rfc-memref-memory-shape-as-attribute/2229
The goal of the change is to make memory space property to have more
expressive representation, rather then "magic" integer values.
It will allow to have more clean ASM form:
```
gpu.func @test(%arg0: memref<100xf32, "workgroup">)
// instead of
gpu.func @test(%arg0: memref<100xf32, 3>)
```
Explanation for `Attribute` choice instead of plain `string`:
* `Attribute` classes allow to use more type safe API based on RTTI.
* `Attribute` classes provides faster comparison operator based on
pointer comparison in contrast to generic string comparison.
* `Attribute` allows to store more complex things, like structs or dictionaries.
It will allows to have more complex memory space hierarchy.
This commit preserve old integer-based API and implements it on top
of the new one.
Depends on D97476
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D96145
This method allows for removing multiple disjoint operands at once, reducing the need to erase operands individually (which results in shifting the operand list).
Differential Revision: https://reviews.llvm.org/D98290
This class provides efficient implementations of symbol queries related to uses, such as collecting the users of a symbol, replacing all uses, etc. This provides similar benefits to use related queries, as SymbolTableCollection did for lookup queries.
Differential Revision: https://reviews.llvm.org/D98071
Provide default for gpuBinaryAnnotation so that we don't need to specify it in tests.
The annotation likely only needs to be target specific if we want to lower to e.g. both CUDA and ROCDL.
Reviewed By: herhut, bondhugula
Differential Revision: https://reviews.llvm.org/D98168
Instead of storing an array of LoopOpt attributes, which were just
wrapping std::pair<enum, int> anyway, we can have an attribute storing
a sorted ArrayRef<std::pair<enum, int>> as a single unit. This improves
here the textual format and the general API. Note that we're limiting
the options to fit into an int64_t by design, but this isn't a new
constraint.
Building the LoopOptions attribute is likely worth a specific builder
for efficient reason, that'll be the subject of a future patch.
Differential Revision: https://reviews.llvm.org/D98105
Move Target/LLVMIR.h to target/LLVMIR/Import.h to better reflect the purpose of
this file. Also move all LLVM IR target tests under the LLVMIR directory.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98178
Return the vectorization results using a vector passed by reference instead of returning them embedded in a structure.
Differential Revision: https://reviews.llvm.org/D98182
This is using the new Attribute storage generation support in
TableGen to define the LLVM FastMathFlags.
Differential Revision: https://reviews.llvm.org/D98007
This will allow for removing the duplicated type documentation from LangRef and instead link to the builtin dialect documentation.
Differential Revision: https://reviews.llvm.org/D98093
split_at can return an error if the split index is out of bounds. If the
user knows that the index can never be out of bounds it's safe to use
extent tensors. This has a straight-forward lowering to std.subtensor.
Differential Revision: https://reviews.llvm.org/D98177
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere. For ops that
don't have a SPIR-V spec counterpart, we use spv.mlir.snake_case.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98014
Normally tensors will be stored in buffers before converting to SPIR-V,
given that is how a large amount of data is sent to the GPU. However,
SPIR-V supports converting from tensors directly too. This is for the
cases where the tensor just contains a small amount of elements and it
makes sense to directly inline them as a small data array in the shader.
To handle this, internally the conversion might create new local
variables. SPIR-V consumers in GPU drivers may or may not optimize that
away. So this has implications over register pressure. Therefore, a
threshold is used to control when the patterns should kick in.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D98052
The two dialects are largely redundant. The former was introduced as a mirror
of the latter operating on LLVM dialect types. This is no longer necessary
since the LLVM dialect operates on built-in types. Combine the two dialects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98060
This patch is a follow-up on D97217. It adds a new 'Skip' result to the Operation visitor
so that a callback can stop the ongoing visit of an operation/block/region and
continue visiting the next one without fully interrupting the walk. Skipping is
needed to be able to erase an operation/block in pre-order and do not continue
visiting the internals of that operation/block.
Related to the skipping mechanism, the patch also introduces the following changes:
* Added new TestIRVisitors pass with basic testing for the IR visitors.
* Fixed missing early increment ranges in visitor implementation.
* Updated documentation of walk methods to include erasure information and walk
order information.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97820
This patch extends the Region, Block and Operation visitors to also support pre-order walks.
We introduce a new template argument that dictates the walk order (only pre-order and
post-order are supported for now). The default order for Regions, Blocks and Operations is
post-order. Mixed orders (e.g., Region/Block pre-order + Operation post-order) could easily
be implemented, as shown in NumberOfExecutions.cpp.
Reviewed By: rriddle, frgossen, bondhugula
Differential Revision: https://reviews.llvm.org/D97217
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere. For ops that
don't have a SPIR-V spec counterpart, we use spv.mlir.snake_case.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98016
To unify the naming scheme across all ops in the SPIR-V dialect,
we are moving from spv.camelCase to spv.CamelCase everywhere.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97918
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97919
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from `spv.camelCase` to `spv.CamelCase` everywhere.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97917
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere.
Differential Revision: https://reviews.llvm.org/D97920
Now that attributes can be generated using ODS, we can move the builtin attributes as well. This revision removes a majority of the builtin attributes with a few left for followup revisions. The attributes moved to ODS in this revision are: AffineMapAttr, ArrayAttr, DictionaryAttr, IntegerSetAttr, StringAttr, SymbolRefAttr, TypeAttr, and UnitAttr.
Differential Revision: https://reviews.llvm.org/D97591
The value type of the attribute can be specified by either overriding the typeBuilder field on the AttrDef, or by providing a parameter of type `AttributeSelfTypeParameter`. This removes the need to define custom storage class constructors for attributes that have a value type other than NoneType.
Differential Revision: https://reviews.llvm.org/D97590
This function simplifies calling the getChecked methods on Attributes and Types from within the parser, and removes any need to use `getEncodedSourceLocation` for these methods (by using an SMLoc instead). This is much more efficient than using an mlir::Location, as the encoding process to produce an mlir::Location is inefficient and undesirable for parsing (locations used during parsing should not persist afterwards unless otherwise necessary).
Differential Revision: https://reviews.llvm.org/D97900
There is no need for the interface implementations to be exposed, opaque
registration functions are sufficient for all users, similarly to passes.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97852
Add a Loop Option attribute and generate llvm metadata attached to
branch instructions to control code generation.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D96820
The support for attributes closely maps that of Types (basically 1-1) given that Attributes are defined in exactly the same way as Types. All of the current ODS TypeDef classes get an Attr equivalent. The generation of the attribute classes themselves share the same generator as types.
Differential Revision: https://reviews.llvm.org/D97589
This better matches the actual IR concept that is being modeled, and is consistent with how the rest of PDL is structured.
Differential Revision: https://reviews.llvm.org/D95718
This type represents a range of positional values. It will be used in followup revisions to add support for variadic constructs to PDL, such as operand and result ranges.
Differential Revision: https://reviews.llvm.org/D95717
The current implementation of Value involves a pointer int pair with several different kinds of owners, i.e. BlockArgumentImpl*, Operation *, TrailingOpResult*. This design arose from the desire to save memory overhead for operations that have a very small number of results (generally 0-2). There are, unfortunately, many problematic aspects of the current implementation that make Values difficult to work with or just inefficient.
Operation result types are stored as a separate array on the Operation. This is very inefficient for many reasons: we use TupleType for multiple results, which can lead to huge amounts of memory usage if multi-result operations change types frequently(they do). It also means that simple methods like Value::getType/Value::setType now require complex logic to get to the desired type.
Value only has one pointer bit free, severely limiting the ability to use it in things like PointerUnion/PointerIntPair. Given that we store the kind of a Value along with the "owner" pointer, we only leave one bit free for users of Value. This creates situations where we end up nesting PointerUnions to be able to use Value in one.
As noted above, most of the methods in Value need to branch on at least 3 different cases which is both inefficient, possibly error prone, and verbose. The current storage of results also creates problems for utilities like ValueRange/TypeRange, which want to efficiently store base pointers to ranges (of which Operation* isn't really useful as one).
This revision greatly simplifies the implementation of Value by the introduction of a new ValueImpl class. This class contains all of the state shared between all of the various derived value classes; i.e. the use list, the type, and the kind. This shared implementation class provides several large benefits:
* Most of the methods on value are now branchless, and often one-liners.
* The "kind" of the value is now stored in ValueImpl instead of Value
This frees up all of Value's pointer bits, allowing for users to take full advantage of PointerUnion/PointerIntPair/etc. It also allows for storing more operation results as "inline", 6 now instead of 2, freeing up 1 word per new inline result.
* Operation result types are now stored in the result, instead of a side array
This drops the size of zero-result operations by 1 word. It also removes the memory crushing use of TupleType for operations results (which could lead up to hundreds of megabytes of "dead" TupleTypes in the context). This also allowed restructured ValueRange, making it simpler and one word smaller.
This revision does come with two conceptual downsides:
* Operation::getResultTypes no longer returns an ArrayRef<Type>
This conceptually makes some usages slower, as the iterator increment is slightly more complex.
* OpResult::getOwner is slightly more expensive, as it now requires a little bit of arithmetic
From profiling, neither of the conceptual downsides have resulted in any perceivable hit to performance. Given the advantages of the new design, most compiles are slightly faster.
Differential Revision: https://reviews.llvm.org/D97804
Different from the definition in Tensorflow and TOSA, the output is [N,H,W,C,M]. This can make transforms easier in LinAlg because the indexing maps are plain. E.g., to determine if the fill op has dependency between the depthwise conv op, the current pipeline only recognizes the dep if they are all projected affine map.
Reviewed By: asaadaldien
Differential Revision: https://reviews.llvm.org/D97798
This offers the ability to create a JIT and invoke a function by passing
ctypes pointers to the argument and the result.
Differential Revision: https://reviews.llvm.org/D97523
This adds minimalistic bindings for the execution engine, allowing to
invoke the JIT from the C API. This is still quite early and
experimental and shouldn't be considered stable in any way.
Differential Revision: https://reviews.llvm.org/D96651
Since Linalg operations have regions by default which are not isolated
from above, add an another method to the interface that will take a
BlockAndValueMapping to remap the values within the region as well.
Differential Revision: https://reviews.llvm.org/D97709
This gets rid of a dubious shape_eq %a, %a fold, that folds shape_eq
even if %a is not an Attribute.
Differential Revision: https://reviews.llvm.org/D97728
Use `StringLiteral` for function return type if it is known to return
constant string literals only.
This will make it visible to API users, that such values can be safely
stored, since they refers to constant data, which will never be deallocated.
`StringRef` is general is not safe to store for a long term,
since it might refer to temporal data allocated in heap.
Add `inline` and `constexpr` methods support to `OpMethod`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D97390
Some elementwise operations are not scalarizable, vectorizable, or tensorizable.
Split `ElementwiseMappable` trait into the following, more precise traits.
- `Elementwise`
- `Scalarizable`
- `Vectorizable`
- `Tensorizable`
This allows for reuse of `Elementwise` in dialects like HLO.
Differential Revision: https://reviews.llvm.org/D97674
This patch continues detensorizing implementation by detensoring
internal control flow in functions.
In order to detensorize functions, all the non-entry block's arguments
are detensored and branches between such blocks are properly updated to
reflect the detensored types as well. Function entry block (signature)
is left intact.
This continues work towards handling github/google/iree#1159.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97148
Just a pure method renaming.
It is a preparation step for replacing "memory space as raw integer"
with more generic "memory space as attribute", which will be done in
separate commit.
The `MemRefType::getMemorySpace` method will return `Attribute` and
become the main API, while `getMemorySpaceAsInt` will be declared as
deprecated and will be replaced in all in-tree dialects (also in separate
commits).
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D97476
* Moves `batch_matmul`, `matmul`, `matvec`, `vectmat`, `dot` to the new mechanism.
* This is not just an NFC change, in addition to using a new code generation mechanism, it also activates symbolic casting, allowing mixed precision operands and results.
* These definitions were generated from DSL by the tool: https://github.com/stellaraccident/mlir-linalgpy/blob/main/mlir_linalg/oplib/core.py (will be upstreamed in a subsequent set of changes).
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D97719
Add canonicalizers to subtensor_insert operations need canonicalizers
that propagate the constant arguments within offsets, sizes and
strides. Also add pattern to propogate tensor_cast operations.
Differential Revision: https://reviews.llvm.org/D97704
Move the results in line with the op instead. This results in each
operation having its own types recorded vs single tuple type, but comes
at benefit that every mutation doesn't incurs uniquing. Ran into cases
where updating result type of operation led to very large memory usage.
Differential Revision: https://reviews.llvm.org/D97652
For ops that produces tensor types and implement the shaped type component interface, the type inference interface can be used. Create a grouping of these together to make it easier to specify (it cannot be added into a list of traits, but must rather be appended/concated to one as it isn't a trait but a list of traits).
Differential Revision: https://reviews.llvm.org/D97636
This enables this kind of construct in the DSL to generate a named op that is polymorphic over numeric type variables `T` and `U`, generating the correct arithmetic casts at construction time:
```
@tc_def_op
def polymorphic_matmul(A=TensorDef(T1, S.M, S.K),
B=TensorDef(T2, S.K, S.N),
C=TensorDef(U, S.M, S.N, output=True)):
implements(ContractionOpInterface)
C[D.m, D.n] += cast(U, A[D.m, D.k]) * cast(U, B[D.k, D.n])
```
Presently, this only supports type variables that are bound to the element type of one of the arguments, although a further extension that allows binding a type variable to an attribute would allow some more expressiveness and may be useful for some formulations. This is left to a future patch. In addition, this patch does not yet materialize the verifier support which ensures that types are bound correctly (for such simple examples, failing to do so will yield IR that fails verification, it just won't yet fail with a precise error).
Note that the full grid of extensions/truncation/int<->float conversions are supported, but many of them are lossy and higher level code needs to be mindful of numerics (it is not the job of this level).
As-is, this should be sufficient for most integer matmul scenarios we work with in typical quantization schemes.
Differential Revision: https://reviews.llvm.org/D97603
This also exposed a bug in Dialect loading where it was not correctly identifying identifiers that had the dialect namespace as a prefix.
Differential Revision: https://reviews.llvm.org/D97431
Includes a lowering for tosa.const, tosa.if, and tosa.while to Standard/SCF dialects. TosaToStandard is
used for constant lowerings and TosaToSCF handles the if/while ops.
Resubmission of https://reviews.llvm.org/D97518 with ASAN fixes.
Differential Revision: https://reviews.llvm.org/D97529
Similar to mask-load/store and compress/expand, the gather and
scatter operation now allow for higher dimension uses. Note that
to support the mixed-type index, the new syntax is:
vector.gather %base [%i,%j] [%kvector] ....
The first client of this generalization is the sparse compiler,
which needs to define scatter and gathers on dense operands
of higher dimensions too.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97422
Right now they multiply before casting which means they would frequently
overflow. There are various reasonable ways to do this, but until we
have robust op description infra, this is a simple and safe default. More
careful treatments are likely to be hardware specific, as well (e.g.
using an i8*i8->i16 mul instruction).
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D97505
And then push those change throughout LLVM.
Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).
The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.
Differential Revision: https://reviews.llvm.org/D97223
Includes a lowering for tosa.const, tosa.if, and tosa.while to Standard/SCF dialects. TosaToStandard is
used for constant lowerings and TosaToSCF handles the if/while ops.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97352
This expands the op to support error propagation and also makes it symmetric with "shape.get_extent" op.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97261
This will allow us to define select(pred, in, out) for TC ops, which is useful
for pooling ops.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D97312
A majority of operations have a very small number of interfaces, which means that the cost of using a hash map is generally larger for interface lookups than just a binary search. In the future when there are a number of operations with large amounts of interfaces, we can switch to a hybrid approach that optimizes lookups based on the number of interfaces. For now, however, a binary search is the best approach.
This dropped compile time on a largish TF MLIR module by 20%(half a second).
Differential Revision: https://reviews.llvm.org/D96085
Affine parallel ops may contain and yield results from MemRefsNormalizable ops in the loop body. Thus, both affine.parallel and affine.yield should have the MemRefsNormalizable trait.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96821
This transformation was only used for quick experimentation and is not general enough.
Retire it.
Differential Revision: https://reviews.llvm.org/D97266
DebugCounters allow for selectively enabling the execution of a debug action based upon a "counter". This counter is comprised of two components that are used in the control of execution of an action, a "skip" value and a "count" value. The "skip" value is used to skip a certain number of initial executions of a debug action. The "count" value is used to prevent a debug action from executing after it has executed for a set number of times (not including any executions that have been skipped). For example, a counter for a debug action with `skip=47` and `count=2`, would skip the first 47 executions, then execute twice, and finally prevent any further executions.
This is effectively the same as the DebugCounter infrastructure in LLVM, but using the DebugAction infrastructure in MLIR. We can't simply reuse the DebugCounter support already present in LLVM due to its heavy reliance on global constructors (which are not allowed in MLIR). The DebugAction infrastructure already nicely supports the debug counter use case, and promotes the separation of policy and mechanism design philosophy.
Differential Revision: https://reviews.llvm.org/D96395
This revision adds the infrastructure for `Debug Actions`. This is a DEBUG only
API that allows for external entities to control various aspects of compiler
execution. This is conceptually similar to something like DebugCounters in LLVM, but at a lower level. This framework doesn't make any assumptions about how the higher level driver is controlling the execution, it merely provides a framework for connecting the two together. This means that on top of DebugCounter functionality, we could also provide more interesting drivers such as interactive execution. A high level overview of the workflow surrounding debug actions is
shown below:
* Compiler developer defines an `action` that is taken by the a pass,
transformation, utility that they are developing.
* Depending on the needs, the developer dispatches various queries, pertaining
to this action, to an `action manager` that will provide an answer as to
what behavior the action should do.
* An external entity registers an `action handler` with the action manager,
and provides the logic to resolve queries on actions.
The exact definition of an `external entity` is left opaque, to allow for more
interesting handlers.
This framework was proposed here: https://llvm.discourse.group/t/rfc-debug-actions-in-mlir-debug-counters-for-the-modern-world
Differential Revision: https://reviews.llvm.org/D84986
This commit is the first baby step towards detensoring in
linalg-on-tensors.
Detensoring is the process through which a tensor value is convereted to one
or potentially more primitive value(s). During this process, operations with
such detensored operands are also converted to an equivalen form that works
on primitives.
The detensoring process is driven by linalg-on-tensor ops. In particular, a
linalg-on-tensor op is checked to see whether *all* its operands can be
detensored. If so, those operands are converted to thier primitive
counterparts and the linalg op is replaced by an equivalent op that takes
those new primitive values as operands.
This works towards handling github/google/iree#1159.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96271
`verifyConstructionInvariants` is intended to allow for verifying the invariants of an attribute/type on construction, and `getChecked` is intended to enable more graceful error handling aside from an assert. There are a few problems with the current implementation of these methods:
* `verifyConstructionInvariants` requires an mlir::Location for emitting errors, which is prohibitively costly in the situations that would most likely use them, e.g. the parser.
This creates an unfortunate code duplication between the verifier code and the parser code, given that the parser operates on llvm::SMLoc and it is an undesirable overhead to pre-emptively convert from that to an mlir::Location.
* `getChecked` effectively requires duplicating the definition of the `get` method, creating a quite clunky workflow due to the subtle different in its signature.
This revision aims to talk the above problems by refactoring the implementation to use a callback for error emission. Using a callback allows for deferring the costly part of error emission until it is actually necessary.
Due to the necessary signature change in each instance of these methods, this revision also takes this opportunity to cleanup the definition of these methods by:
* restructuring the signature of `getChecked` such that it can be generated from the same code block as the `get` method.
* renaming `verifyConstructionInvariants` to `verify` to match the naming scheme of the rest of the compiler.
Differential Revision: https://reviews.llvm.org/D97100
Simplifies the way lattices are optimized with less, but more
powerful rules. This also fixes an inaccuracy where too many
lattices resulted (expecting a non-existing universal index).
Also puts no-side-effects on all proper getters and unifies
bufferization flags order in integration tests (for future,
more complex use cases).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D97134
This patch adds Linalg named ops for various types of integer matmuls.
Due to limitations in the tc spec/linalg-ods-gen ops cannot be type
polymorphic, so this instead creates new ops (improvements to the
methods for defining Linalg named ops are underway with a prototype at
https://github.com/stellaraccident/mlir-linalgpy).
To avoid the necessity of directly referencing these many new ops, this
adds additional methods to ContractionOpInterface to allow classifying
types of operations based on their indexing maps.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D97006
* It was decided that this was the end of the line for the existing custom tc parser/generator, and this is the first step to replacing it with a declarative format that maps well to mathy source languages.
* One such source language is implemented here: https://github.com/stellaraccident/mlir-linalgpy/blob/main/samples/mm.py
* In fact, this is the exact source of the declarative `polymorphic_matmul` in this change.
* I am working separately to clean this python implementation up and add it to MLIR (probably as `mlir.tools.linalg_opgen` or equiv). The scope of the python side is greater than just generating named ops: the ops are callable and directly emit `linalg.generic` ops fully dynamically, and this is intended to be a feature for frontends like npcomp to define custom linear algebra ops at runtime.
* There is more work required to handle full type polymorphism, especially with respect to integer formulations, since they require more specificity wrt types.
* Followups to this change will bring the new generator to feature parity with the current one and delete the current. Roughly, this involves adding support for interface declarations and attribute symbol bindings.
Differential Revision: https://reviews.llvm.org/D97135
Extracts the relevant dimensions from the map under test to build up the
maps to test against in a permutation-invariant way.
This also includes a fix to the indexing maps used by
isColumnMajorMatmul. The maps as currently written do not describe a
column-major matmul. The linalg named op column_major_matmul has the
correct maps (and notably fails the current test).
If `C = matmul(A, B)` we want an operation that given A in column major
format and B in column major format produces C in column major format.
Given that for a matrix, faux column major is just transpose.
`column_major_matmul(transpose(A), transpose(B)) = transpose(C)`. If
`A` is `NxK` and `B` is `KxM`, then `C` is `NxM`, so `transpose(A)` is
`KxN`, `transpose(B)` is `MxK` and `transpose(C)` is `MxN`, not `NxM`
as these maps currently have.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96984
Static subtensor / subtensor_insert of the same size as the source / destination tensor and root @[0..0] with strides [1..1] are folded away.
Differential revision: https://reviews.llvm.org/D96991
`subtensor_insert` was used instead of `linalg.subtensor_yield` to make this PR
smaller. Verification will be added in a follow-up PR.
Differential Revision: https://reviews.llvm.org/D96943
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
Resolving the dim of outputs of a tensor_reshape op in terms of its
input shape allows the op to be eliminated when its used only in its
dims. The init_tensor -> tensor_reshape canonicalization can be
simplified to use the dims of the output of the tensor_reshape which
gets canonicalized away later making the tensor_reshape dead.
Differential Revision: https://reviews.llvm.org/D96635
Separating the AffineMapAccessInterface from AffineRead/WriteOp interface so that dialects which extend Affine capabilities (e.g. PlaidML PXA = parallel extensions for Affine) can utilize relevant passes (e.g. MemRef normalization).
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D96284
A series of preceding patches changed the mechanism for translating MLIR to
LLVM IR to use dialect interface with delayed registration. It is no longer
necessary for specific dialects to derive from ModuleTranslation. Remove all
virtual methods from ModuleTranslation and factor out the entry point to be a
free function.
Also perform some cleanups in ModuleTranslation internals.
Depends On D96774
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96775
Verification of the LLVM IR produced when translating various MLIR dialects was
only active when calling the translation programmatically. This has led to
several cases of invalid LLVM IR being generated that could not be caught with
textual mlir-translate tests. Add verifiers for these cases and fix the tests
in preparation for enforcing the validation of LLVM IR.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96774
These patterns unrolls transfer read/write ops if the vector consumers/
producers are extract/insert slices op. Transfer ops can map to hardware
load/store functionalities, where the vector size matters for bandwidth
considerations. So these patterns should be collected separately, instead
of being generic canonicalization patterns.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96782
SliceAnalysis originally was developed in the context of affine.for within mlfunc.
It predates the notion of region.
This revision updates it to not hardcode specific ops like scf::ForOp.
When rooted at an op, the behavior of the slice computation changes as it recurses into the regions of the op. This does not support gathering all values transitively depending on a loop induction variable anymore.
Additional variants rooted at a Value are added to also support the existing behavior.
Differential revision: https://reviews.llvm.org/D96702
Allow clients to create a new ShapedType of the same "container" type
but with different element or shape. First use case is when refining
shape during shape inference without needing to consider which
ShapedType is being refined.
Differential Revision: https://reviews.llvm.org/D96682
This corresponds with the previous work to make shape.broadcast nary.
Additionally, simplify the ConvertShapeConstraints pass. It now doesn't
lower an implicit shape.is_broadcastable. This is still the same in
combination with shape-to-standard when the 2 passes are used in either
order.
Differential Revision: https://reviews.llvm.org/D96401
Port the translation of five dialects that define LLVM IR intrinsics
(LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect
interface-based mechanism. This allows us to remove individual translations
that were created for each of these dialects and just use one common
MLIR-to-LLVM-IR translation that potentially supports all dialects instead,
based on what is registered and including any combination of translatable
dialects. This removal was one of the main goals of the refactoring.
To support the addition of GPU-related metadata, the translation interface is
extended with the `amendOperation` function that allows the interface
implementation to post-process any translated operation with dialect attributes
from the dialect for which the interface is implemented regardless of the
operation's dialect. This is currently applied to "kernel" functions, but can
be used to construct other metadata in dialect-specific ways without
necessarily affecting operations.
Depends On D96591, D96504
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96592
Dialects themselves do not support repeated addition of interfaces with the
same TypeID. However, in case of delayed registration, the registry may contain
such an interface, or have the same interface registered several times due to,
e.g., dependencies. Make sure we delayed registration does not attempt to add
an interface with the same TypeID more than once.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D96606
The patch extends the runner utils by verification methods that compare two memrefs. The methods compare the content of the two memrefs and print success if the data is identical up to a small numerical error. The methods are meant to simplify the development of integration tests that compare the results against a reference implementation (cf. the updates to the linalg matmul integration tests).
Originally landed in 5fa893c (https://reviews.llvm.org/D96326) and reverted in dd719fd due to a Windows build failure.
Changes:
- Remove the max function that requires the "algorithm" header on Windows
- Eliminate the truncation warning in the float specialization of verifyElem by using a float constant
Reviewed By: Kayjukh
Differential Revision: https://reviews.llvm.org/D96593
Currently, vector.contract joins the intermediate result and the accumulator
argument (of ranks K) using summation. We desire more joining operations ---
such as max --- to help vector.contract express reductions. This change extends
Vector_ContractionOp to take an optional attribute (called "kind", of enum type
CombiningKind) specifying the joining operation to be add/mul/min/max for int/fp
, and and/or/xor for int only. By default this attribute has value "add".
To implement this we also need to extend vector.outerproduct, since
vector.contract gets transformed to vector.outerproduct (and that to
vector.fma). The extension for vector.outerproduct is also an optional kind
attribute that uses the same enum type and possible values. The default is
"add". In case of max/min we transform vector.outerproduct to a combination of
compare and select.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D93280
This revision takes advantage of the newly extended `ref` directive in assembly format
to allow better region handling for LinalgOps. Specifically, FillOp and CopyOp now build their regions explicitly which allows retiring older behavior that relied on specific op knowledge in both lowering to loops and vectorization.
This reverts commit 3f22547fd1 and reland 973e133b76 with a workaround for
a gcc bug that does not accept lambda default parameters:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59949
Differential Revision: https://reviews.llvm.org/D96598
Align the vector gather/scatter/expand/compress API with
the vector load/store/maskedload/maskedstore API.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D96396