Added a commutativity utility pattern and a function to populate it. The pattern sorts the operands of an op in ascending order of the "key" associated with each operand iff the op is commutative. This sorting is stable.
The function is intended to be used inside passes to simplify the matching of commutative operations. After the application of the above-mentioned pattern, since the commutative operands now have a deterministic order in which they occur in an op, the matching of large DAGs becomes much simpler, i.e., requires much less number of checks to be written by a user in her/his pattern matching function.
The "key" associated with an operand is the list of the "AncestorKeys" associated with the ancestors of this operand, in a breadth-first order.
The operand of any op is produced by a set of ops and block arguments. Each of these ops and block arguments is called an "ancestor" of this operand.
Now, the "AncestorKey" associated with:
1. A block argument is `{type: BLOCK_ARGUMENT, opName: ""}`.
2. A non-constant-like op, for example, `arith.addi`, is `{type: NON_CONSTANT_OP, opName: "arith.addi"}`.
3. A constant-like op, for example, `arith.constant`, is `{type: CONSTANT_OP, opName: "arith.constant"}`.
So, if an operand, say `A`, was produced as follows:
```
`<block argument>` `<block argument>`
\ /
\ /
`arith.subi` `arith.constant`
\ /
`arith.addi`
|
returns `A`
```
Then, the block arguments and operations present in the backward slice of `A`, in the breadth-first order are:
`arith.addi`, `arith.subi`, `arith.constant`, `<block argument>`, and `<block argument>`.
Thus, the "key" associated with operand `A` is:
```
{
{type: NON_CONSTANT_OP, opName: "arith.addi"},
{type: NON_CONSTANT_OP, opName: "arith.subi"},
{type: CONSTANT_OP, opName: "arith.constant"},
{type: BLOCK_ARGUMENT, opName: ""},
{type: BLOCK_ARGUMENT, opName: ""}
}
```
Now, if "keyA" is the key associated with operand `A` and "keyB" is the key associated with operand `B`, then:
"keyA" < "keyB" iff:
1. In the first unequal pair of corresponding AncestorKeys, the AncestorKey in operand `A` is smaller, or,
2. Both the AncestorKeys in every pair are the same and the size of operand `A`'s "key" is smaller.
AncestorKeys of type `BLOCK_ARGUMENT` are considered the smallest, those of type `CONSTANT_OP`, the largest, and `NON_CONSTANT_OP` types come in between. Within the types `NON_CONSTANT_OP` and `CONSTANT_OP`, the smaller ones are the ones with smaller op names (lexicographically).
---
Some examples of such a sorting:
Assume that the sorting is being applied to `foo.commutative`, which is a commutative op.
Example 1:
> %1 = foo.const 0
> %2 = foo.mul <block argument>, <block argument>
> %3 = foo.commutative %1, %2
Here,
1. The key associated with %1 is:
```
{
{CONSTANT_OP, "foo.const"}
}
```
2. The key associated with %2 is:
```
{
{NON_CONSTANT_OP, "foo.mul"},
{BLOCK_ARGUMENT, ""},
{BLOCK_ARGUMENT, ""}
}
```
The key of %2 < the key of %1
Thus, the sorted `foo.commutative` is:
> %3 = foo.commutative %2, %1
Example 2:
> %1 = foo.const 0
> %2 = foo.mul <block argument>, <block argument>
> %3 = foo.mul %2, %1
> %4 = foo.add %2, %1
> %5 = foo.commutative %1, %2, %3, %4
Here,
1. The key associated with %1 is:
```
{
{CONSTANT_OP, "foo.const"}
}
```
2. The key associated with %2 is:
```
{
{NON_CONSTANT_OP, "foo.mul"},
{BLOCK_ARGUMENT, ""}
}
```
3. The key associated with %3 is:
```
{
{NON_CONSTANT_OP, "foo.mul"},
{NON_CONSTANT_OP, "foo.mul"},
{CONSTANT_OP, "foo.const"},
{BLOCK_ARGUMENT, ""},
{BLOCK_ARGUMENT, ""}
}
```
4. The key associated with %4 is:
```
{
{NON_CONSTANT_OP, "foo.add"},
{NON_CONSTANT_OP, "foo.mul"},
{CONSTANT_OP, "foo.const"},
{BLOCK_ARGUMENT, ""},
{BLOCK_ARGUMENT, ""}
}
```
Thus, the sorted `foo.commutative` is:
> %5 = foo.commutative %4, %3, %2, %1
Signed-off-by: Srishti Srivastava <srishti.srivastava@polymagelabs.com>
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D124750
Fix the hardcoded check for `FuncOp` in `getCommonBlock` utility: the
check should have been for an op that starts an affine scope. The
incorrect block returned in turn causes dependence analysis to function
incorrectly.
This change allows affine store-load forwarding to work correctly inside
any ops that start an affine scope.
Reviewed By: ftynse, dcaballe
Differential Revision: https://reviews.llvm.org/D130749
The implementation and API of GEP Op has gotten a bit convoluted over the time. Issues with it are:
* Misleading naming: `indices` actually only contains the dynamic indices, not all of them. To get the amount of indices you need to query the size of `structIndices`
* Very difficult to iterate over all indices properly: One had to iterate over `structIndices`, check whether it contains the magic constant `kDynamicIndex`, if it does, access the next value in `index` etc.
* Inconvenient to build: One either has create lots of constant ops for every index or have an odd split of passing both a `ValueRange` as well as a `ArrayRef<int32_t>` filled with `kDynamicIndex` at the correct places.
* Implementation doing verification in the build method
and more.
This patch attempts to address all these issues via convenience classes and reworking the way GEP Op works:
* Adds `GEPArg` class which is a sum type of a `int32_t` and `Value` and is used to have a single convenient easy to use `ArrayRef<GEPArg>` in the builders instead of the previous `ValueRange` + `ArrayRef<int32_t>` builders.
* Adds `GEPIndicesAdapter` which is a class used for easy random access and iteration over the indices of a GEP. It is generic and flexible enough to also instead return eg. a corresponding `Attribute` for an operand inside of `fold`.
* Rename `structIndices` to `rawConstantIndices` and `indices` to `dynamicIndices`: `rawConstantIndices` signifies one shouldn't access it directly as it is encoded, and `dynamicIndices` is more accurate and also frees up the `indices` name.
* Add `getIndices` returning a `GEPIndicesAdapter` to easily iterate over the GEP Ops indices.
* Move the verification/asserts out of the build method and into the `verify` method emitting op error messages.
* Add convenient builder methods making use of `GEPArg`.
* Add canonicalizer turning dynamic indices with constant values into constant indices to have a canonical representation.
The only breaking change is for any users building GEPOps that have so far used the old `ValueRange` + `ArrayRef<int32_t>` builder as well as those using the generic syntax.
Another follow up patch then goes through upstream and makes use of the new `ArrayRef<GEPArg>` builder to remove a lot of code building constants for GEP indices.
Differential Revision: https://reviews.llvm.org/D130730
Adding complex value with 0 for real and imaginary part can be ignored.
NOTE: This type of canonicalization can be written in an easy and tidy format using `complex.number` after constant op supports custom attribute.
Differential Revision: https://reviews.llvm.org/D130748
A group of functions in the Affine dialect provides a mechanism for
buliding folded-by-construction operations. These functions used to
accept a `RewriterBase` reference because they may need to erase the
operations that were folded and notify the rewriter when called from
rewrite patterns. Adopt a different approach: postpone the builder
notification of the op creation until we are certain that the op will
not be folded away. This removes the need to notify the rewriter about
op deletion following op construction in case of successful folding, and
removes a bunch of one-off `IRRewriter` instances in transform code that
may mess up insertion points.
Reviewed By: springerm, mravishankar
Differential Revision: https://reviews.llvm.org/D130616
We can canonicalize consecutive complex.conj just by removing all conjugate operations.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D130684
Currently DefaultValuedAttr is confusingly actually default valued &
optional but that was an artifact of development and longstanding TODO
to address. Add new attribute that matches this behavior for cases where
that is actually the desired behavior before addressing TODO (e.g., this
is an incremental step to fixing DefaultValuedAttr).
Differential Revision: https://reviews.llvm.org/D130679
It is more useful to use ComplexType as type of the attribute than to
use the element type as attribute type. This means when using this
attribute in complex::ConstantOp, we just need to check whether
the types match.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D130703
This is used to fix a bug in SymbolTable::replaceAllSymbolUses where we replace symbols that
we shouldn't.
Differential Revision: https://reviews.llvm.org/D130693
Current implementation of decomposition of Linalg operations wouldnt
work if the `outs` operand values were used within the body of the
operation. Relax this restriction. This potentially sets the stage for
decomposing ops with reduction iterator types (but is not done here
since it requires more study).
Differential Revision: https://reviews.llvm.org/D130527
While The tiling interface provides a mechanism for operations to be
tiled into tiled version of the op (or another op at the same level of
abstraction), the `generateScalarImplementation` method added here is
the "exit point" after all transformations have been done. Ops that
implement this method are expected to generate IR that are directly
lowerable to backend dialects like LLVM or SPIR-V dialects.
Differential Revision: https://reviews.llvm.org/D130612
This supports lowering from parse-tree to MLIR and translation from
MLIR to LLVM IR using OMPIRBuilder for OpenMP simdlen clause in SIMD
construct.
Reviewed By: shraiysh, peixin, arnamoy10
Differential Revision: https://reviews.llvm.org/D130195
This commit folds a `tensor.cast` op into a `tensor.collapse_shape` op
when following two conditions meet:
1. the `tensor.collapse_shape` op consumes result of the `tensor.cast` op.
2. `tensor.cast` op casts to a more dynamic version of the source tensor.
This is added as a canonicalization pattern in `tensor.collapse_shape` op.
Signed-Off-By: Gaurav Shukla <gaurav@nod-labs.com>
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D130650
* https://discourse.llvm.org/t/rfc-removing-the-quant-dialect/3643/8
* Removes most ops. Leaves casts given final comment (can remove more in a followup).
* There are a few uses in Tosa keeping some of the utilities alive. In a followup, I will probably elect to just move simplified versions of them into Tosa itself vs having this quasi-library dependency.
Differential Revision: https://reviews.llvm.org/D120204
This commit extends UnifyAliasedResourcePass to handle the case
where aliased resources have different vector sizes. (It still
requires all scalar types to be of the same bitwidth.) This is
effectively reusing the code for handling different-bitwidth
scalar types.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D130671
This commit fixes spv.CompositeConstruct to assembly to list
operand types to enable vector construction out of smaller vectors.
Validation is also fixed to properly check the cases for vector
construction.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D130669
This patch adds canonicalization conditions for omp.atomic.update thus
eliminating it when it becomes just a write or a no-op due to other
changes during canonicalization.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D126531
Sparse compiler failed on the provided test (when the sparse kernel is nested in a scf structrual operator).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D130609
Add custom attribute for complex dialect. Although this commit does not have significant impact on the conversion framework, it will lead us to construct complex numbers in a readable and tidy manner.
Related discussion: https://reviews.llvm.org/D127476
Reviewed By: pifon2a, akuegel
Differential Revision: https://reviews.llvm.org/D130149
The fold in it's current state only checks whether the amount of dynamic indices is 1. This does however not check for the presence of any struct indices, leading to an incorrect fold.
This patch fixes that issue by checking that struct indices are 1, which in addition to the pre-existing check that dynamic indices are 1, guarantees that the single index is a dynamic one.
Differential Revision: https://reviews.llvm.org/D129374
Combine the recently added utilities for folded-by-construction affine
operations with the attribute-based Range to enable more folding. This
decreases the amount of emitted code but has little effect on test
precisely because the tests are not checking for the spurious constants.
The difference in the shape of affine maps comes from the internals of
affine folding.
Depends on D129633
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D130167
While most of methods in ViewLikeInterface accept an `OpFoldResult` for
the offset/size/stride that may be static, represented as `Attribute`,
or dynamic, represented as `Value`, the `Range` abstraction only
accepted `Values`. This can often lead to known-constant
offset/size/strides being materialized into constant operations and
hinder further constant propagation without explicitly running the
constant folding pass. This often leads to a more complicated than
necessary addressing code being emitted. Switch `Range` to use
`OpFoldResult`. Code that uses `Range` currently keeps materializing the
constants to minimize the effect of this change on the IR. Further
commits will make use of this.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D129633
The structured op splitting transformation is conceptually similar to
tiling in the sense that it decomposes the iteration space of the
original op into several parts. Therefore, it is possible to implement
it using the TilingInterface to operate on iteration spaces and their
parts. However, the implementation also requires to pass updated input
operands, which is not supported by the interface, so the implementation
currently remains Linalg-specific.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D129564
The current support was essentially the amount necessary
to support replacing SymbolRefAttrs, but suffers from various
deficiencies (both ergonomic and functional):
* Replace crashes if unsupported
This makes it really hard to use safely, given that you don't know
if you are going to crash or not when using it.
* Types aren't supported
This seems like a simple missed addition when the attribute replacement
support was originally added.
* The ergonomics are weird
It currently uses an index based replacement, which makes the implementations
quite clunky.
This commit refactors support to be a bit more ergonomic, and also
adds support for types in the process. This was also a great oppurtunity
to greatly simplify how replacement is done in the symbol table.
Fixes#56355
Differential Revision: https://reviews.llvm.org/D130589
aligned_alloc was added in MacOS 10.15, some users want to support older
versions. The runtime functions makes this easy, so just put in a call
to posix_memalign, which provides the same functionality.
Fixes a regression from D117973, that used CMAKE_BINARY_DIR instead of
LLVM_BINARY_DIR in some places.
Differential Revision: https://reviews.llvm.org/D130555
This patch adds constant folder for Exp2Op which only supports single and double precision floating-point.
Differential Revision: https://reviews.llvm.org/D130472
When this was updated in D127139 the update in-place case was no longer
marked as pessimistic. Add back in.
Differential Revision: https://reviews.llvm.org/D130453
This commit fixes a failure edge case where we accidentally drop forward
declared blocks in the error case. This allows for running the
invalid.mlir test in asan mode now.
Fixes#51387
Differential Revision: https://reviews.llvm.org/D130132
The current Parser library is solely focused on providing API for
the textual MLIR format, but MLIR will soon also provide a binary
format. This commit renames the current Parser library to AsmParser to
better correspond to what the library is actually intended for. A new
Parser library is added which will act as a unified parser interface
between both text and binary formats. Most parser clients are
unaffected, given that the unified interface is essentially the same as
the current interface. Only clients that rely on utilizing the
AsmParserState, or those that want to parse Attributes/Types need to be
updated to point to the AsmParser library.
Differential Revision: https://reviews.llvm.org/D129605
When we apply parent patch : https://reviews.llvm.org/D129475
The prompt I get with the clang compiler is: ValueRange is imcomplete type,ValueRange is a forward declaration in the file TypeRange.h, and the file OperationSupport.h already includes the file TypeRange.h.The class TypeRange and the class ValueRange depend on each other.
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D130332
Firstly, we we make an additional GNUInstallDirs-style variable. With
NixOS, for example, this is crucial as we want those to go in
`${dev}/lib/cmake` not `${out}/lib/cmake` as that would a cmake subdir
of the "regular" libdir, which is installed even when no one needs to do
any development.
Secondly, we make *Config.cmake robust to absolute package install
paths. We for NixOS will in fact be passing them absolute paths to make
the `${dev}` vs `${out}` distinction mentioned above, and the
GNUInstallDirs-style variables are suposed to support absolute paths in
general so it's good practice besides the NixOS use-case.
Thirdly, we make `${project}_INSTALL_PACKAGE_DIR` CACHE PATHs like other
install dirs are.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D117973
Lower the Flang parse-tree containing OpenMP reductions to the OpenMP
dialect. The OpenMP dialect models reductions with,
1) A reduction declaration operation that specifies how to initialize, combine,
and atomically combine private reduction variables.
2) The OpenMP operation (like wsloop) that supports reductions has an array of
reduction accumulator variables (operands) and an array attribute of the same
size that points to the reduction declaration to be used for the reduction
accumulation.
3) The OpenMP reduction operation that takes a value and an accumulator.
This operation replaces the original reduction operation in the source.
(1) is implemented by the `createReductionDecl` in OpenMP.cpp,
(2) is implemented while creating the OpenMP operation,
(3) is implemented by the `genOpenMPReduction` function in OpenMP.cpp, and
called from Bridge.cpp. The implementation of (3) is not very robust.
NOTE 1: The patch currently supports only reductions for integer type addition.
NOTE 2: Only supports reduction in the worksharing loop.
NOTE 3: Does not generate atomic combination region.
NOTE 4: Other options for creating the reduction operation include
a) having the reduction operation as a construct containing an assignment
and then handling it appropriately in the Bridge.
b) we can modify `genAssignment` or `genFIR(AssignmentStmt)` in the Bridge to
handle OpenMP reduction but so far we have tried not to mix OpenMP
and non-OpenMP code and this will break that.
I will try (b) in a separate patch.
NOTE 5: OpenMP dialect gained support for reduction with the patches:
D105358, D107343. See https://discourse.llvm.org/t/rfc-openmp-reduction-support/3367
for more details.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D130077
Co-authored-by: Peixin-Qiao <qiaopeixin@huawei.com>
The minimum required version is now 3.19 due to the usage of some
more recent features. Update the version check and error message
accordingly. Also remove some logic that behaved differently before
3.18, since we can assume we are now on version 3.19+.
Reviewed By: stella.stamenova
Differential Revision: https://reviews.llvm.org/D130171
These functions don't depend on the C++ runtime and therefore belong to
CRunnerUtils. Clean up the macros on the way as `_MSC_VER` indicates the
compiler, not the platform, which is indicated by `_WIN32` and will be
present when, e.g., compiling with minGW.
Reviewed By: rdzhabarov
Differential Revision: https://reviews.llvm.org/D130025
When converted to the LLVM dialect, the memref.alloc and memref.free operations were generating calls to hardcoded 'malloc' and 'free' functions. This didn't leave any freedom to users to provide their custom implementation. Those operations now convert into calls to '_mlir_alloc' and '_mlir_free' functions, which have also been implemented into the runtime support library as wrappers to 'malloc' and 'free'. The same has been done for the 'aligned_alloc' function.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D128791
InsertSliceOp and ParallelInsertSliceOp are very similar and can share some of the bufferization analysis code.
Differential Revision: https://reviews.llvm.org/D130465
In the Transform dialect extensions, provide the separate mechanism to
declare dependent dialects (the dialects the transform IR depends on)
and the generated dialects (the dialects the payload IR may be
transformed into). This allows the Transform dialect clients that are
only constructing the transform IR to avoid loading the dialects
relevant for the payload IR along with the Transform dialect itself,
thus decreasing the build/link time.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D130289
Load dialects that will be generated by the extension. (Except for BufferizationDialect and MemrefDialect which are loaded already.)
Differential Revision: https://reviews.llvm.org/D130463
https://reviews.llvm.org/D130023 added a memory leak in sparse_sampled_matmul.mlir
This diff fixes the memory leak.
Testing: Ran integration tests after building with -DLLVM_USE_SANITIZER=Address flag.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D130428
Copying the folder keeps the original permissions by default. This
creates problems when the source folder is read-only, e.g. in a
packaging environment.
Then, the copied folder in the build directory is read-only as well.
Later on, other files are copied into that directory (in the build
tree), failing when the directory is read-only.
Fix that problem by copying the folder without keeping the original
permissions.
Follow-up to D130254.
Differential Revision: https://reviews.llvm.org/D130338
This patch adds constant folder for ExpOp which only supports single and double precision floating-point.
Differential Revision: https://reviews.llvm.org/D130318
For arith.constant operations of integer type, the operation generates
result names that include the value of the constant (i.e., the
IntegerAttr that defines the constant's value). That code currently
assumes integer widths of 64 bits or less and hits an assert with wider
constants or would create truncated and potentially ambiguous names when
built with assertions disabled.
To enable printing arith.constant ops for arbitrarily wide integer
types, change to use the IntegerAttr's function getValue() when
generating result names.
Also, add a regression test.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D129930
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
Previously the elements of the notes tuple would be invalid objects when
accessed from a diagnostic handler, resulting in a segfault when used.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D129943
This change adds a new DelinearizeIndexOp to the `arith` dialect. The
operation accepts an `index` type as well as a basis (array of index
values) representing how the index should be decomposed into a
multi-index. The decomposition obeys a canonical semantic that treats
the final basis element as "fastest varying" and the first basis element
as "slowest varying". A naive lowering of the operation using a sequence
of `arith.divui` and `arith.remui` operations is also given.
Differential Revision: https://reviews.llvm.org/D129697
Folding of transfer_write into transfer_read is already supported but
this requires the read and write to have the same permuation map.
After linalg vectorization it is common to have different ppermuation
map for write followed by read even though the cases could be
propagated.
This canonicalization handle cases where the permuation maps are
different but the data read and written match and replace the transfer
ops with broadcast and permuation
Differential Revision: https://reviews.llvm.org/D130135
This warning was added because using attribute or type assembly formats
with `skipDefaultBuilders` set could cause compilation errors, since the
required builder prototype may not necessarily be generated and would
need to be checked by hand. This patch removes the warning because a
warning that the generated C++ "might" not compile is not particularly
useful. Attempting to address the TODO (i.e. detect whether a builder of
the correct prototype is provided) would be fragile since it would not
be possible to account for implicit conversions, etc.
In general, ODS should not be emitting warnings in cases like these.
Reviewed By: rriddle, wrengr
Differential Revision: https://reviews.llvm.org/D130102
Scope ops file to ops. Used canonicalization as grouping for canonicalization
patterns and folders (also considered OpTransforms but that felt too generic
and the former two are used together).
Reviewed By: silvas, rsuderman
Differential Revision: https://reviews.llvm.org/D130297
Add a constraint to ensure that the operand and result of the
threadprivate operation are the same.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D128609
This is useful for building small test cases and will be utilized in a subsequent commit that adds a fusion example.
Differential Revision: https://reviews.llvm.org/D130344
This op fuses a given payload op into a given container op. Inside the container, all uses of the producer are replaced (fused) with the newly inserted op. If the producer is tileable and accessed via a tensor.extract_slice, the new op computes only the requested slice ("tile and fuse"). Otherwise, the entire tensor value is computed inside the container ("clone and fuse").
Differential Revision: https://reviews.llvm.org/D130244
Convert arith.cmpi to the canonical form with constants on the right side
to simplify further optimizations and open more opportunities for CSE.
Differential Revision: https://reviews.llvm.org/D129929
Add affine.if canonicalization to compose affine.apply ops into its set
and operands. This eliminates affine.apply ops feeding into affine.if
ops.
Differential Revision: https://reviews.llvm.org/D130242
Missed previously and needed to flip the default. Most of these just
flipped to _Raw to retain existing state/keep this small except for TOSA
dialect which got flipped to _Both as no further change was needed..
First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS
builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as
`CMAKE_INSTALL_BINDIR` becomes an *absolute* path, and then when
downstream projects try to install there too this breaks because our
builds always install to fresh directories for isolation's sake.
Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the
other specially crafted `LLVM_CONFIG_*` variables substituted in
`llvm/cmake/modules/LLVMConfig.cmake.in`.
@beanz added it in d0e1c2a550 to fix a
dangling reference in `AddLLVM`, but I am suspicious of how this
variable doesn't follow the pattern.
Those other ones are carefully made to be build-time vs install-time
variables depending on which `LLVMConfig.cmake` is being generated, are
carefully made relative as appropriate, etc. etc. For my NixOS use-case
they are also fine because they are never used as downstream install
variables, only for reading not writing.
To avoid the problems I face, and restore symmetry, I deleted the
exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s.
`AddLLVM` now instead expects each project to define its own, and they
do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports
`LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in
the usual way, matching the other remaining exported variables.
For the `AddLLVM` changes, I tried to copy the existing pattern of
internal vs non-internal or for LLVM vs for downstream function/macro
names, but it would good to confirm I did that correctly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D117977
This is the same as the existing multiplier-1 variant of DepthwiseConv2D, but in PyTorch dim order.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D128575
This is to improve consistency within the SPIR-V dialect and make these ops a bit shorter.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D130280
This change modifies `structured.tile_to_foreach_thread_op` so that
it accepts either `tile_sizes` or `num_threads` parameters. If
`tile_sizes` are specified, then the number of threads required is
derived the tile sizes rather than the other way around. In both cases,
more aggressive folding of loop parameters is enabled during the
transformation, allowing for the potential elimination of `affine.min`
and `affine.max` operations in the static shape case when calculating
the final adjusted tile size.
Differential Revision: https://reviews.llvm.org/D130139
The type extraction helper function for block argument and op result
list objects was ignoring the slice entirely. So was the slice addition.
Both are caused by a misleading naming convention to implement slices
via CRTP. Make the convention more explicit and hide the helper
functions so users have harder time calling them directly.
Closes#56540.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D130271
This operation is a NavigationOp that simplifies the writing of transform IR.
Since there is no way of refering to an interface by name, the current implementation uses
an EnumAttr and depends on the interfaces it supports.
In the future, it would be worthwhile to remove this dependence and generalize.
Differential Revision: https://reviews.llvm.org/D130267
Shape can be memref of index type, so memref::LoadOp result need to be converted into llvm type.
Differential Revision: https://reviews.llvm.org/D129965
Replace iterators of the outermost loop with region arguments of the innermost
one. The changes avoid later `bufferization` passes to insert allocation within
the body of the innermost loop.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D130083
Replace iterators of the outermost loop with region arguments of the innermost
one. The changes avoid later `bufferization` passes to insert allocation within
the body of the innermost loop.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D130083
The `tileAndFuseLinalgOps` is a legacy approach for tiling + fusion of
Linalg operations. Since it was also intended to work on operations
with buffer operands, this method had fairly complex logic to make
sure tile and fuse was correct even with side-effecting linalg ops.
While complex, it still wasnt robust enough. This patch deprecates
this method and thereby deprecating the tiling + fusion method for ops
with buffer semantics. Note that the core transformation to do fusion
of a producer with a tiled consumer still exists. The deprecation here
only removes methods that auto-magically tried to tile and fuse
correctly in presence of side-effects.
The `tileAndFuseLinalgOps` also works with operations with tensor
semantics. There are at least two other ways the same functionality
exists.
1) The `tileConsumerAndFuseProducers` method. This does a similar
transformation, but using a slightly different logic to
automatically figure out the legal tile + fuse code. Note that this
is also to be deprecated soon.
2) The prefered way uses the `TilingInterface` for tile + fuse, and
relies on the caller to set the tiling options correctly to ensure
that the generated code is correct.
As proof that (2) is equivalent to the functionality provided by
`tileAndFuseLinalgOps`, relevant tests have been moved to use the
interface, where the test driver sets the tile sizes appropriately to
generate the expected code.
Differential Revision: https://reviews.llvm.org/D129901
This patch adds constant folder for LogOp which only supports single and double precision floating-point.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D130148
This allows for automatically inserting expected checks for parser and verifier
diagnostics, which simplifies the workflow when building new dialect
constructs or extending existing ones.
Differential Revision: https://reviews.llvm.org/D130152
This patch adds a new function mlirDenseElementsAttrFloat16Get(),
which accepts the shaped type, the number of Float16 values, and a
pointer to an array of Float16 values, each of which is a uint16_t
value.
This commit is repeating https://reviews.llvm.org/D123981 + #761 but for Float16
Differential Revision: https://reviews.llvm.org/D130069
This is to improve the consistency within the SPIR-V dialect and to make op names a bit shorter.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D130194
This is my very first contact with this dialect, so I am not very
confident with this commit, but it seems like the op returns a memref,
not a tensor, so that's what comment about the result type should say.
[mlir][bufferization][doc] Improve typesetting of inline code. Fix Typo.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D130159
There is no benefit to making it public, and the code is much
cleaner and easier to follow when inlined. This also matches
the pattern within the PDLL lsp server.
This patch adds constant folder for Log1pOp which only supports single and double precision floating-point.
Differential Revision: https://reviews.llvm.org/D129979
This file contains a huge number of tests that should really be in
different dialect/files. It is monolothic because of the legacy
surrounding the old standard dialect, affine operations, etc. Splitting
this up makes the tests much more maintainable given that they are now
group with other similar tests.
This one required more changes than ideal due to overlapping generated name
with different return types. Changed getIndexingMaps to getIndexingMapsArray to
move it out of the way/highlight that it returns (more expensively) a
SmallVector and uses the prefixed name for the Attribute.
Differential Revision: https://reviews.llvm.org/D129919
The code example for pass manager incorrectly uses nestedFunctionPM
instead of nestedAnyPm for adding CSE and Canonicalize Passes. This diff fixes
it by changing it to nestedAnyPm.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D130110
This warning was added because using attribute or type assembly formats
with `skipDefaultBuilders` set could cause compilation errors, since the
required builder prototype may not necessarily be generated and would
need to be checked by hand. This patch removes the warning because a
warning that the generated C++ "might" not compile is not particularly
useful. Attempting to address the TODO (i.e. detect whether a builder of
the correct prototype is provided) would be fragile since it would not
be possible to account for implicit conversions, etc.
In general, ODS should not be emitting warnings in cases like these.
For AttrDef declarations, place specified code in extraClassDefinition into the generated *.cpp.inc file.
Reviewed By: Mogball, rriddle
Differential Revision: https://reviews.llvm.org/D129574
This revision adds a new transformation to tile a TilingInterface `op` to a tiled `scf.foreach_thread`, applying
tiling by `num_threads`.
If non-empty, the `threadDimMapping` is added as an attribute to the resulting `scf.foreach_thread`.
0-tile sizes (i.e. tile by the full size of the data) are used to encode
that a dimension is not tiled.
Differential Revision: https://reviews.llvm.org/D129577
In the current state, this is only special cased for Allocation effects, but any effects on results allocated by the operation may be ignored when checking whether the op may be removed, as none of them are possible to be observed if the result is unused.
A use case for this is for IRs for languages which always initialize on allocation. To correctly model such operations, a Write as well as an Allocation effect should be placed on the result. This would prevent the Op from being deleted if unused however. This patch fixes that issue.
Differential Revision: https://reviews.llvm.org/D129854
This op used to belong to the sparse dialect, but there are use cases for dense bufferization as well. (E.g., when a tensor alloc is returned from a function and should be deallocated at the call site.) This change moves the op to the bufferization dialect, which now has an `alloc_tensor` and a `dealloc_tensor` op.
Differential Revision: https://reviews.llvm.org/D129985
refineReturnType method shares the same parameters as inferReturnTypes
but gets passed in the return types of the op if known that can be used
during refinement passes or for more op specific error reporting.
Currently the error reporting on failure is generic and doesn't allow
for specializing the returned result based on failure, with this change
what would previously have been a separate trait with specialized
verification can just be handled as part of inferrence rather than
duplicated.
refineReturnTypes behaves like inferReturnTypes if no result types are fed in,
while the current verification is recast as the default implementation for
refineReturnTypes with it calling inferReturnTypes (and so the default type
verification now goes through refine and allows for more op specific inference
mismatch errors).
Differential Revision: https://reviews.llvm.org/D129955
The rules in the linalg file were very specific to sparse tensors so will
find a better home under sparse tensor dialect than linalg dialect. Also
moved some rewriting from sparsification into this new "pre-rewriting" file.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D129910
SPIR-V specification does not require a function to have a name
if it is an entry point. Adjust deserializer to allow those kinds
of SPIR-V binaries.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D120181
When converted to the LLVM dialect, the memref.alloc and memref.free operations were generating calls to hardcoded 'malloc' and 'free' functions. This didn't leave any freedom to users to provide their custom implementation. Those operations now convert into calls to '_mlir_alloc' and '_mlir_free' functions, which have also been implemented into the runtime support library as wrappers to 'malloc' and 'free'. The same has been done for the 'aligned_alloc' function.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D128791
This patch adds constant folder for Log10Op which only support single and double precision floating-point.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D129740
After https://reviews.llvm.org/D128593 this is not needed (and not available). Was missed in original landing because integration tests do not run on pre-merge.
Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit.
The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with `RegisterEverything.h` and `mlir._mlir_libs._mlirRegisterEverything` being the one-stop entry points for the "upstream packages". The `_mlir_libs.__init__.py` now allows customization of the environment and Context by adding "initialization modules" to the `_mlir_libs` package. If present, `_mlirRegisterEverything` is treated as such a module. Others can be added by downstreams by adding a `_site_initialize_{i}.py` module, where '{i}' is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can:
* Perform load time customization to the global environment (i.e. registering passes, hooks, etc).
* Define a `register_dialects(registry: DialectRegistry)` function that can extend the `DialectRegistry` that will be used to bootstrap the `Context`.
* Define a `context_init_hook(context: Context)` function that will be added to a list of callbacks which will be invoked after dialect registration during `Context` initialization.
Note that the `MLIRPythonExtension.RegisterEverything` is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to `add_mlir_python_common_capi_library` and `add_mlir_python_modules`). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost.
CMake changes:
* `MLIRCAPIRegistration` -> `MLIRCAPIRegisterEverything` (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target)
* `MLIRPythonSoure.Passes` removed (without replacement: just drop)
* `MLIRPythonExtension.AllPassesRegistration` removed (without replacement: just drop)
* `MLIRPythonExtension.Conversions` removed (without replacement: just drop)
* `MLIRPythonExtension.Transforms` removed (without replacement: just drop)
Header changes:
* `mlir-c/Registration.h` is deleted. Dialect registration functionality is now in `IR.h`. Registration of upstream features are in `mlir-c/RegisterEverything.h`. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R.
Python APIs removed:
* mlir.transforms and mlir.conversions (previously only had an __init__.py which indirectly triggered `mlirRegisterTransformsPasses()` and `mlirRegisterConversionPasses()` respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization).
* mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used).
C-APIs changed:
* mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove -- it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call `mlirContextLoadAllAvailableDialects(MlirContext)` to emulate the prior behavior. Also see the `ir.c` test case (e.g. ` mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));`).
* mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so).
C-APIs added:
* mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose.
Python APIs added:
* mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry.
* mlir.ir.Context.append_dialect_registry(MlirDialectRegistry)
* mlir.ir.Context.load_all_available_dialects()
* mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a `register_dialects(MlirDialectRegistry)` entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the __init__.py and use it to monkey patch the Context to emulate prior behavior.
* Type caster and capsule support for MlirDialectRegistry
This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: https://github.com/llvm/llvm-project/issues/56037
Here is an example PR, minimally adapting IREE to these changes: https://github.com/iree-org/iree/pull/9638/files In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add `MLIRPythonExtension.RegisterEverything` to the list of Python sources getting built, and the old behavior will continue.
Reviewed By: mehdi_amini, ftynse
Differential Revision: https://reviews.llvm.org/D128593
An OpBuilder already exists for GEPs that does not have any struct indices for existing typed pointers, but no such builder exists for GEPs utilizing opaque pointers that has an explicit `basePtrType`.
Differential Revision: https://reviews.llvm.org/D129376
Clean up checks for alloc-like ops in analysis. Use the analysis
utility to properly check for the desired kind of effects. The previous
locality utility worked for all practical purposes but wasn't sound and
was locally duplicate code. Instead, use mlir::hasSingleEffect.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D129439
```
// -----// IR Dump Before LowerLinalgMicrokernels (iree-vmvx-lower-linalg-microkernels) //----- //
```
I've been meaning to suggest this for a long time, and I think the only reason we don't have it is because we didn't used to have the `getArgument()` handy when printing these comments. When debugging or putting a pipeline together based on such dumps, I often find myself grepping for the argument name of the pass (which is often related but not universally).
This change allows the user of LivenessBlockInfo to specify an op within the block and get a set of all values that are live as of that op. Semantically it relies on having a dominance-based region that has ordered operations. For DFG regions, computing liveness statically this way doesn't really make sense, it likely needs to be done at runtime.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D129447
A previous commit (f2b94bd) added some unnecessary statements that
dereferenced operations only to get the operations back. This patch
removes the unnecessary statements.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D129913
This patch allows custom attribute and type builders to return
something other than the C++ type of the attribute or type.
This is useful for attributes or types that may perform extra work during
construction (e.g. canonicalization) that could result in a different
kind of attribute or type being returned.
Reviewed By: rriddle, lattner
Differential Revision: https://reviews.llvm.org/D129792
This patch adds a pattern to decompose a `linalg.generic` operations
that
- has only parallel iterator types
- has more than 2 statements (including the yield)
into multiple `linalg.generic` operation such that each operation has
a single statement and a yield.
The pattern added here just splits the matching `linalg.generic` into
two `linalg.generic`s, one containing the first statement, and the
other containing the remaining. The same pattern can be applied
repeatedly on the second op to ultimately fully decompose the generic
op.
Differential Revision: https://reviews.llvm.org/D129704
The visitor functions for `Region` and `Block` types did not always
check the value returned by recursive calls. This caused the top-level
visitor invocation to return `WalkResult::advance()` even if one or more
recursive invocations returned `WalkResult::interrupt()`. This patch
fixes the problem by check if any recursive call is interrupted, and if
so, return `WalkResult::interrupt()`.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D129718
A new sparse_tensor operation allows for
custom reduction code to be injected during
linalg.generic lowering for sparse tensors.
An identity value is provided to indicate
the starting value of the reduction. A single
block region is required to contain the
custom reduce computation.
Reviewed by: aartbik
Differential Revision: https://reviews.llvm.org/D128004
This is a NFC change to make it easier to update this canonicalization
for more use cases. The refactoring makes things easier to
understand/adapt.
Differential Revision: https://reviews.llvm.org/D129829
This diff adds an integration test which does element wise multiplication for two sparse 3-d tensors of size 3x3x5
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D129638
In `linalg::tileConsumerAndFuseProducers`, there are two levels of
tiling and fusion; we partition the tile sizes and only use one
half for each of them. The partition is using the first non-parallel
dimension *after* interchange as the boundary. However, concrete
tiling happens *together with* loop interchange, so we still need
to provide the partial tile sizes *before* the interchange.
Otherwise, there will be inconsistency, which is what this patch
is to fix.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D129804
This patch modifies the implementation of `RewritePatternSet::add` to perfectly forward its arguments to pattern constructors. Without this, code like the following compiles but, due to the limited lifetime of the temporary TypeConverter, can produce unexpected behavior:
```
RewritePatternSet patterns(context);
patterns.add<SomeOpConversion, OtherOpConversion>(TypeConverter(), context);
if (failed(applyPartialConversion(getOperation(), target, std::move(patterns))))
return signalPassFailure();
```
The patch also changes the linalg fusion pattern implementation to correctly fill the test pattern set given the new behavior.
Author: Laszlo Kindrat <laszlokindrat@gmail.com>
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129601
This patch modifies the implementation of `RewritePatternSet::add` to perfectly forward its arguments to pattern constructors. Without this, code like the following compiles but, due to the limited lifetime of the temporary TypeConverter, can produce unexpected behavior:
```
RewritePatternSet patterns(context);
patterns.add<SomeOpConversion, OtherOpConversion>(TypeConverter(), context);
if (failed(applyPartialConversion(getOperation(), target, std::move(patterns))))
return signalPassFailure();
```
The patch also changes the linalg fusion pattern implementation to correctly fill the test pattern set given the new behavior.
Author: Laszlo Kindrat <laszlokindrat@gmail.com>
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129601
This is useful because MPInt.h defines identically-named functions that
operate on MPInts, which would otherwie become the only candidates of
overload resolution when calling e.g. ceilDiv from the mlir::presburger
namespace (iff MPInt.h is included). So to access the 64-bit overloads, an
explict call to mlir::ceilDiv would be required. This patch adds `using`
declarations allowing overload resolution to transparently call the right
function.
Reviewed By: Groverkss
Differential Revision: https://reviews.llvm.org/D129820
This commit extends the `raise` statements on errors in user-provided
code with `from e` clauses that attach the original exception to the one
being raised. This allows to debug the root cause of the error more
easily.
Reviewed By: SaurabhJha
Differential Revision: https://reviews.llvm.org/D129762
is out of range. Both intrinsics return a poison value.
Consequently, mark the intrinsics speculatable.
Differential Revision: https://reviews.llvm.org/D129656
The benchmark currently fails to run because it cannot find the `func`
symbol when using a `FuncOp`. I suppose that the breakage was introduced
by the extraction of the func dialect from the builtin dialect that
wasn't reflected in the benchmark yet.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D129738
Fixed some new memory leaks after migration to new
bufferization. One is expected, the other may need
some more careful analysis.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D129805
After recent bufferization improvement, this test
started failing due to missed zero initialization.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D129800
The lds_barrier op allows workgroups to wait at a barrier for
operations to/from their local data store (LDS) to complete without
incurring the performance penalties of a full memory fence.
Reviewed By: nirvedhmeshram
Differential Revision: https://reviews.llvm.org/D129522
bufferization.writable is used in most cases instead. All remaining test cases are updated. Some code that is no longer needed is deleted.
Differential Revision: https://reviews.llvm.org/D129739
This revision removes the LinalgPromotion pattern and adds a `transform.structured.promotion` op.
Since the LinalgPromotion transform allows the injection of arbitrary C++ via lambdas, the current
transform op does not handle it.
It is left for future work to decide what the right transform op control is for those cases.
Note the underlying implementation remains unchanged and the mechanism is still controllable by
lambdas from the API.
During this refactoring it was also determined that the `dynamicBuffers` option does not actually
connect to a change of behavior in the algorithm.
This also exhibits that the related test is wrong (and dangerous).
Both the option and the test are therefore removed.
Lastly, a test that connects patterns using the filter-based mechanism is removed: all the independent
pieces are already tested separately.
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785
Differential Revision: https://reviews.llvm.org/D129649
The constructor of PatternApplicator doesn't have a constructor that
accepts only a `RewritePatternSet` as currently used in the example
code in PatternRewriter.md. Instead, one has to turn it into a
`FrozenRewritePatternSet`.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D125236
This change removes the partial bufferization passes from the sparse compilation pipeline and replaces them with One-Shot Bufferize. One-Shot Analysis (and TensorCopyInsertion) is used to resolve all out-of-place bufferizations, dense and sparse. Dense ops are then bufferized with BufferizableOpInterface. Sparse ops are still bufferized in the Sparsification pass.
Details:
* Dense allocations are automatically deallocated, unless they are yielded from a block. (In that case the alloc would leak.) All test cases are modified accordingly. E.g., some funcs now have an "out" tensor argument that is returned from the function. (That way, the allocation happens at the call site.)
* Sparse allocations are *not* automatically deallocated. They must be "released" manually. (No change, this will be addressed in a future change.)
* Sparse tensor copies are not supported yet. (Future change)
* Sparsification no longer has to consider inplacability. If necessary, allocations and/or copies are inserted during TensorCopyInsertion. All tensors are inplaceable by the time Sparsification is running. Instead of marking a tensor as "not inplaceable", it can be marked as "not writable", which will trigger an allocation and/or copy during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D129356
- Adds verification for `nvgpu.mma.sync` op
- Adds tests to `mlir/test/Dialect/NVGPU/invalid.mlir`
- `nvgpu.mma.sync` verifier caught a bug and triggered a failure in m16n8k4_tf32_f32 variant in `mlir/test/Conversion/NVGPUToNVVM/nvgpu-to-nvvm.mlir`
- The output shape of vector holding thread-level accumulators was inconsistent and fixed in this change
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D129400
This pass tests patterns that are already tested elsewhere by applying them in a semi-targeted
fashion using anchor function and op names.
From now on, targeted tests should use the transform dialect interpreter.
Differential Revision: https://reviews.llvm.org/D129627
since vector.reduce support accumulator in all the cases remove the
assert assuming old definition.
Differential Revision: https://reviews.llvm.org/D129602
Avoids needing the two parallel functions as NamedAttrList already takes care
of caching DictionaryAttr and implicitly can convert from either.
Differential Revision: https://reviews.llvm.org/D129527