This reverts commit 13bdb7ab4a. The commit introduced/uncovered an unintended bug in models containing Conv2D.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D115079
A previous commit added support for converting elemental types contained in
LLVM dialect types in case they were not compatible with the LLVM dialect. It
was missing support for named structs as they could be recursive, which was not
supported by the conversion infra. Now that it is, add support for converting
such named structs.
Depends On D113579
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D113580
Allow ops that are not bufferizable in the input IR. (Deactivated by default.)
bufferization::ToMemrefOp and bufferization::ToTensorOp are generated at the bufferization boundaries.
Differential Revision: https://reviews.llvm.org/D114669
The implementation only allows to bit-cast between two 0-D vectors. We could
probably support casting from/to vectors like `vector<1xf32>`, but I wasn't
convinced that this would be important and it would require breaking the
invariant that `BitCastOp` works only on vectors with equal rank.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114854
This change provides `BufferizableOpInterface` implementations for ops from the Bufferization dialects. These ops are needed at the bufferization boundaries for partial bufferization.
Differential Revision: https://reviews.llvm.org/D114618
This is a lightweight operation, useful for writing unit tests. It will be utilized for testing in subsequent commits.
Differential Revision: https://reviews.llvm.org/D114693
This revision adds 0-d vector support to vector.transfer ops.
In the process, numerous cleanups are applied, in particular around normalizing
and reducing the number of builders.
Reviewed By: ThomasRaoux, springerm
Differential Revision: https://reviews.llvm.org/D114803
However, since CallOps have no aliasing OpResults, their OpOperands always bufferize out-of-place.
This change removes `bufferizesToMemoryWrite` from `CallOpInterface`. This method was called, but its return value did not matter.
Differential Revision: https://reviews.llvm.org/D114616
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.
This solves regression in IREE due to e4e4da86af
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D114838
This reverts commit 9a844c2a9b.
The new affine map generated by linearizeCollapsedDims should not drop
dimensions. We need to make sure we create a map with at least as many
dimensions as the source map. This prevents
FoldProducerReshapeOpByLinearization from generating invalid IR.
This solves regression in IREE due to e4e4da86af
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D114838
This reverts commit 29a50c5864.
After LLVM lowering, the original patch incorrectly moved alignment
information across an unconstrained GEP operation. This is only correct
for some index offsets in the GEP. It seems that the best approach is,
in fact, to rely on LLVM to propagate information from the llvm.assume()
to users.
Thanks to Thomas Raoux for catching this.
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114810
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.
As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.
Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.
Differential Revision: https://reviews.llvm.org/D114715
The canonical type of the result of the `memref.subview` needs to make
sure that the previously dropped unit-dimensions are the ones dropped
for the canonicalized type as well. This means the generic
`inferRankReducedResultType` cannot be used. Instead the current
dropped dimensions need to be querried and the same need to be dropped.
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D114751
For a 1x1 weight and stride of 1, the input/weight can be reshaped and passed into a fully connected op then reshaped back
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D114757
Add the decompose patterns that lower higher dimensional convolutions to lower dimensional ones to CodegenStrategy and use CodegenStrategy to test the decompose patterns. Additionally, remove the assertion that checks the anchor op name is set in the CodegenStrategyTest pass. Removing the assertion allows us to simplify the pipelines used in the interchange and decompose tests.
Depends On D114797
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114798
Add support for an empty anchor op string in vectorization. An empty anchor op string is useful after fusion when there are multiple different operations to vectorize.
Depends On D114689
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114690
Pad the operation using a top down traversal. The top down traversal unlocks folding opportunities and dim op canonicalizations due to the introduced extract slice operation after the padded operation.
Depends On D114585
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114689
Add CSE after every transformation. Transformations such as tiling introduce redundant computation, for example, one AffineMinOp for every operand dimension pair. Follow up transformations such as Padding and Hoisting benefit from CSE since comparing slice sizes simplifies to comparing SSA values instead of analyzing affine expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114585
This patch introduces a new conversion to convert bufferization.clone operations
into a memref.alloc and a memref.copy operation. This transformation is needed to
transform all remaining clones which "survive" all previous transformations, before
a given program is lowered further (to LLVM e.g.). Otherwise, these operations
cannot be handled anymore and lead to compile errors.
See: https://llvm.discourse.group/t/bufferization-error-related-to-memref-clone/4665
Differential Revision: https://reviews.llvm.org/D114233
Update the shapes of the convolution / pooling tests that where detected after enabling verification during printing (https://reviews.llvm.org/D114680). Also split the emit_structured_generic.py file that previously contained all tests into multiple separate files to simplify debugging.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D114731
* set_symbol_name, get_symbol_name, set_visibility, get_visibility, replace_all_symbol_uses, walk_symbol_tables
* In integrations I've been doing, I've been reaching for all of these to do both general IR manipulation and module merging.
* I don't love the replace_all_symbol_uses underlying APIs since they necessitate SYMBOL_COUNT walks and have various sharp edges. I'm hoping that whatever emerges eventually for this can still retain this simple API as a one-shot.
Differential Revision: https://reviews.llvm.org/D114687
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114399
The verifier computed an illegal type with negative dimension size when collapsing partially static memrefs.
Differential Revision: https://reviews.llvm.org/D114702
While working on an integration, I found a lot of inconsistencies on IR printing and verification. It turns out that we were:
* Only doing "soft fail" verification on IR printing of Operation, not of a Module.
* Failed verification was interacting badly with binary=True IR printing (causing a TypeError trying to pass an `str` to a `bytes` based handle).
* For systematic integrations, it is often desirable to control verification yourself so that you can explicitly handle errors.
This patch:
* Trues up the "soft fail" semantics by having `Module.__str__` delegate to `Operation.__str__` vs having a shortcut implementation.
* Fixes soft fail in the presence of binary=True (and adds an additional happy path test case to make sure the binary functionality works).
* Adds an `assume_verified` boolean flag to the `print`/`get_asm` methods which disables internal verification, presupposing that the caller has taken care of it.
It turns out that we had a number of tests which were generating illegal IR but it wasn't being caught because they were doing a print on the `Module` vs operation. All except two were trivially fixed:
* linalg/ops.py : Had two tests for direct constructing a Matmul incorrectly. Fixing them made them just like the next two tests so just deleted (no need to test the verifier only at this level).
* linalg/opdsl/emit_structured_generic.py : Hand coded conv and pooling tests appear to be using illegal shaped inputs/outputs, causing a verification failure. I just used the `assume_verified=` flag to restore the original behavior and left a TODO. Will get someone who owns that to fix it properly in a followup (would also be nice to break this file up into multiple test modules as it is hard to tell exactly what is failing).
Notes to downstreams:
* If, like some of our tests, you get verification failures after this patch, it is likely that your IR was always invalid and you will need to fix the root cause. To temporarily revert to prior (broken) behavior, replace calls like `print(module)` with `print(module.operation.get_asm(assume_verified=True))`.
Differential Revision: https://reviews.llvm.org/D114680
This changes the op to produce `AnyVectorOfAnyRank` following mostly the code for 1-D vectors.
Depends On D114598
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114550
This changes the op to produce `AnyVectorOfAnyRank` and implements this by just
inserting the element (skipping the shuffle that we do for the 1-D case).
Depends On D114549
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114598
This is commit 4 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).
This PR integrates the various components (root ordering algorithm, nondeterministic execution of PDL bytecode) to implement multi-root PDL matching. The main idea is for the pattern to specify mulitple candidate roots. The PDL-to-PDLInterp lowering selects one of these roots and "hangs" the pattern from this root, traversing the edges downwards (from operation to its operands) when possible and upwards (from values to its uses) when needed. The root is selected by invoking the optimal matching multiple times, once for each candidate root, and the connectors are determined form the optimal matching. The costs in the directed graph are equal to the number of upward edges that need to be traversed when connecting the given two candidate roots. It can be shown that, for this choice of the cost function, "hanging" the pattern an inner node is no better than from the optimal root.
The following three main additions were implemented as a part of this PR:
1. OperationPos predicate has been extended to allow tracing the operation accepting a value (the opposite of operation defining a value).
2. Predicate checking if two values are not equal - this is useful to ensure that we do not traverse the edge back downwards after we traversed it upwards.
3. Function for for building the cost graph among the candidate roots.
4. Updated buildPredicateList, building the predicates optimal branching has been determined.
Testing: unit tests (an integration test to follow once the stack of commits has landed)
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108550
This is commit 2 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).
This commit implements the features needed for the execution of the new operations pdl_interp.get_accepting_ops, pdl_interp.choose_op:
1. The implementation of the generation and execution of the two ops.
2. The addition of Stack of bytecode positions within the ByteCodeExecutor. This is needed because in pdl_interp.choose_op, we iterate over the values returned by pdl_interp.get_accepting_ops until we reach finalize. When we reach finalize, we need to return back to the position marked in the stack.
3. The functionality to extend the lifetime of values that cross the nondeterministic choice. The existing bytecode generator allocates the values to memory positions by representing the liveness of values as a collection of disjoint intervals over the matcher positions. This is akin to register allocation, and substantially reduces the footprint of the bytecode executor. However, because with iterative operation pdl_interp.choose_op, execution "returns" back, so any values whose original liveness cross the nondeterminstic choice must have their lifetime executed until finalize.
Testing: pdl-bytecode.mlir test
Reviewed By: rriddle, Mogball
Differential Revision: https://reviews.llvm.org/D108547
This is commit 1 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).
These operations are:
* pdl.get_accepting_ops: Returns a list of operations accepting the given value or a range of values at the specified position. Thus if there are two operations `%op1 = "foo"(%val)` and `%op2 = "bar"(%val)` accepting a value at position 0, `%ops = pdl_interp.get_accepting_ops of %val : !pdl.value at 0` will return both of them. This allows us to traverse upwards from a value to operations accepting the value.
* pdl.choose_op: Iteratively chooses one operation from a range of operations. Therefore, writing `%op = pdl_interp.choose_op from %ops` in the example above will select either `%op1`or `%op2`.
Testing: Added the corresponding test cases to mlir/test/Dialect/PDLInterp/ops.mlir.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D108543
Use primarily matvec instead of matmul to test hoist padding. Test the hoisting only starting from already padded IR. Use one-dimensional tiling only except for the tile_and_fuse test that exercises hoisting on a larger loop nest with fill and pad tensor operations in the backward slice.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114608
If `allowReturnMemref` is set to true, arbitrary memrefs may be returned from FuncOps. Also remove allocation hoisting code, which is only partly implemented at the moment.
The purpose of this commit is to untangle `bufferize` from `aliasInfo`. (Even with this change, they are not fully untangled yet.)
Differential Revision: https://reviews.llvm.org/D114507
Rename MLIR CAPI ExecutionEngine target for consistency:
MLIRCEXECUTIONENGINE -> MLIRCAPIExecutionEngine in line with other
targets.
Differential Revision: https://reviews.llvm.org/D114596
Rename the check prefixes to HOIST21 and HOIST32 to clarify the different flag configurations.
Depends On D114438
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114442
Instead of checking for unexpected operations (any operation with a region except for scf::For and `padTensorOp` or operations with a memory effect) while cloning the packing loop nest perform the checks early. Update `dropNonIndexDependencies` to check for unexpected operations. Additionally, check all of these operations have index type operands only.
Depends On D114428
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114438
Limit hoist padding to pad tensor ops that depend only on a constant value. Supporting arbitrary padding values that depend on computations part of the backward slice to hoist require complex analysis to ensure the computation can be hoisted.
Depends On D114420
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114428
Adapt hoist padding to filter the backward slice before cloning the packing loop nest. The filtering removes all operations that are not used to index the hoisted pad tensor op and its extract slice op. The filtering is needed to support the more complex loop nests created after fusion. For example, fusing the producer of an output operand can added linalg ops and pad tensor ops to the backward slice. These operations have regions and currently prevent hoisting.
The following example demonstrates the effect of the newly introduced `dropNonIndexDependencies` method that filters the backward slice:
```
%source = linalg.fill(%cst, %arg0)
scf.for %i
%unrelated = linalg.fill(%cst, %arg1) // not used to index %source!
scf.for %j (%arg2 = %unrelated)
scf.for %k // not used to index %source!
%ubi = affine.min #map(%i)
%ubj = affine.min #map(%j)
%slice = tensor.extract_slice %source [%i, %j] [%ubi, %ubj]
%padded_slice = linalg.pad_tensor %slice
```
dropNonIndexDependencies(%padded_slice, %slice)
removes [scf.for %k, linalg.fill(%cst, %arg1)] from backwardSlice.
Depends On D114175
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114420
Rename test/python/dialects/math.py -> math_dialect.py to avoid a
collision with a Python standard package of the same name. These test
scripts are run by path and are not part of a package. Python apparently
implicitly adds the containing directory to its PYTHONPATH. As such,
test scripts with common names run the risk of conflicting with global
names and resolution of an import for the latter happens to the former.
Differential Revision: https://reviews.llvm.org/D114568
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.
Differential Revision: https://reviews.llvm.org/D114138
For synthesizing an op's implementation of the generated interface
from {Min|Max}Version, we need to define an `initializer` and
`mergeAction`. The `initializer` specifies the initial version,
and `mergeAction` specifies how version specifications from
different parts of the op should be merged to generate a final
version requirements.
Previously we use the specified version enum as the type for both
the initializer and thus the final return type. This means we need
to perform `static_cast` over some hopefully not used number (`~0u`)
as the initializer. This is quite opaque and sort of not guaranteed
to work. Also, there are ops that have an enum attribute where some
values declare version requirements (e.g., enumerant `B` requires
v1.1+) but some not (e.g., enumerant `A` requires nothing). Then a
concrete op instance with `A` will still declare it implements the
version interface (because interface implementation is static for
an op) but actually theirs no requirements for version.
So this commit changes to use an more explicit `llvm::Optional`
to wrap around the returned version enum. This should make it
more clear.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D108312
The padding tests previously contained the tile loops. This revision removes the tile loops since padding itself does not consider the loops. Instead the induction variables are passed in as function arguments which promotes them to symbols in the affine expressions. Note that the pad-and-hoist.mlir test still exercises padding in the context of the full loop nest.
Depends On D114175
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114227
Add the makeComposedPadHighOp method which creates a new PadTensorOp if necessary. If the source to pad is actually the result of a sequence of padded LinalgOps, the method checks if padding is needed or if we can use the padded result of the padded LinalgOp sequence directly.
Example:
```
%0 = tensor.extract_slice %arg0 [%iv0, %iv1] [%sz0, %sz1]
%1 = linalg.pad_tensor %0 low[0, 0] high[...] { linalg.yield %cst }
%2 = linalg.matmul ins(...) outs(%1)
%3 = tensor.extract_slice %2 [0, 0] [%sz0, %sz1]
```
when padding %3 return %2 instead of introducing
```
%4 = linalg.pad_tensor %3 low[0, 0] high[...] { linalg.yield %cst }
```
Depends On D114161
Reviewed By: nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D114175
Change the failure condition of padOperandToSmallestStaticBoundingBox to never fail if the operand is already statically sized.
In particular:
- if the padding value computation fails -> return failure if the operand shape is dynamic and success if it is static.
- if there is no extract slice op -> return failure if the operand shape is dynamic and success if it is static.
The latter change prevents padding from failure if the output operand passed by iteration argument is statically sized since in this case the extract / insert slice pairs are removed by canonicalization.
Depends On D114153
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114161
Padding now can explicitly specify the padding value when non-zero is wanted.
This also includes bypassing pads when the pad does nothing.
Differential Revision: https://reviews.llvm.org/D113611
Transpose convolution decomposition is now performed in a separate pass. This
allows padding / constant propagation to be performed at the TOSA level. It
also adds support for striding when there is no dilation.
Differential Revision: https://reviews.llvm.org/D114409
This revision makes concrete use of 0-d vectors to extend the semantics of
InsertElementOp.
Reviewed By: dcaballe, pifon2a
Differential Revision: https://reviews.llvm.org/D114388
This revision starts making concrete use of 0-d vectors to extend the semantics of
ExtractElementOp.
In the process a new VectorOfAnyRank Tablegen OpBase.td is added to allow progressive transition to supporting 0-d vectors by gradually opting in.
Differential Revision: https://reviews.llvm.org/D114387
Refactored two new parser APIs parseGenericOperationAfterOperands and
parseCustomOperationName out of parseGenericOperation and parseCustomOperation.
Motivation: Sometimes an op can be printed in a special way if certain criteria
is met. While parsing, we need to handle all the versions.
`parseGenericOperationAfterOperands` is handy in situation where we already
parsed the operands and decide to fall back to default parsing.
`parseCustomOperationName` is useful when we need to know details (dialect,
operation name etc.) about a parsed token meant to be an mlir operation.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D113719
`memref.expand_shape` has verification logic to make sure
result dim must be static if all the collapsing src dims are static.
This can be relaxed once expand_shape supports more dynamism.
Differential Revision: https://reviews.llvm.org/D114391
This patch fixes a bug in loop fusion pass where the source loop was removed
even when the fused loop did not cover all iterations of the source loop.
This was because the fast hueristic check for checking if source loop and
fused loop have same iterations did not take into account steps in loop.
Reviewed By: dcaballe, bondhugula
Differential Revision: https://reviews.llvm.org/D114164
This reverts commit a9e236bed8.
This broke the Windows build:
mlir\include\mlir/Dialect/X86Vector/Transforms.h(28): error C2061: syntax error: identifier 'uint'
We cannot unconditionally generate memref.load ops for such cases;
need to check the source's type.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114376
MLIR supports recursive types but they could not be handled by the conversion
infrastructure directly as it would result in infinite recursion in
`convertType` for elemental types. Support this case by keeping the "call
stack" of nested type conversions in the TypeConverter class and by passing it
as an optional argument to the individual conversion callback. The callback can
then check if a specific type is present on the stack more than once to detect
and handle the recursive case.
This approach is preferred to the alternative approach of having a separate
callback dedicated to handling only the recursive case as the latter was
observed to introduce ~3% time overhead on a 50MB IR file even if it did not
contain recursive types.
This approach is also preferred to keeping a local stack in type converters
that need to handle recursive types as that would compose poorly in case of
out-of-tree or cross-project extensions.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D113579
Adapt tiling to always generate an extract/insert slice pair for output tensors even if the tensor is not tiled. Having an explicit extract/insert slice pair simplifies followup transformations such as padding and bufferization. In particular, it makes read and written iteration argument slices explicit.
Depends On D114067
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114085
Add a pattern to apply the new tile and fuse on tensors method. Integrate the pattern into the CodegenStrategy and use the CodegenStrategy to implement the tests.
Depends On D114012
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114067
Tile and fuse failed if the outermost tile loop is a reduction dimension. Add the necessary check to handle outermost reductions and introduce a test case to verify the change.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114012
Step towards removing the hard coded behavior for this trait and to instead use common interface.
Differential Revision: https://reviews.llvm.org/D114208
Add rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D112985
Instead of using shape_cast op in the pattern removing leading unit
dimensions we use extract/broadcast ops. This is part of the effort to
restrict ShapeCastOp fuirther in the future and only allow them to
convert to or from 1D vector.
This also adds extra canonicalization to fill the gaps in simplifying
broadcast/extract ops.
Differential Revision: https://reviews.llvm.org/D114205
- Move the #define s to the GPU Transform library from GPU Ops so that
SerializeToHsaco is non-trivially compiled
- Add required includes to SerializeToHsaco
- Move MCSubtargetInfo creation to the correct point in the
compilation process
- Change mlir in ROCM tests to account for renamed/moved ops
Differential Revision: https://reviews.llvm.org/D114184
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.
LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.
Reviewed By: nicolasvasilache, jpienaar
Differential Revision: https://reviews.llvm.org/D114139
This patch makes it possible to use the newly added useDefaultAttributePrinterParser and useDefaultTypePrinterParser dialect options without any using namespace declarations. Two things had to be done to make this possible:
* Fully qualify any type usages or functions from the mlir namespace in the generated C++ code
* Makes sure to emit the printers and parsers inside the same namespace as the Dialect
Differential Revision: https://reviews.llvm.org/D114168
Returning failure when tile sizes are all zero prevents the change in
the marker. This makes pattern rewriter run the pattern multiple times
only to exit when it hits a limit. Instead just clone the operation
(since tiling is essentially cloning in this case). Then the
transformation filter kicks in to avoid the pattern rewriter to be
invoked many times.
Differential Revision: https://reviews.llvm.org/D113949
Compiling code for AMD GPUs requires knowledge of which chipset is
being targeted, especially if the code uses chipset-specific
intrinsics (which is the case in a downstream convolution generator).
This commit adds `target`, `chipset` and `features` arguments to the
SerializeToHsaco constructor to enable passing in this required
information.
It also amends the ROCm integration tests to pass in the target
chipset, which is set to the chipset of the first GPU on the system
executing the tests.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D114107
Previously, in case there was only one `Optional` operand/result within
the list, we would always return `None` from the accessor, e.g., for a
single optional result we would generate:
```
return self.operation.results[0] if len(self.operation.results) > 1 else None
```
But what we really want is to return `None` only if the length of
`results` is smaller than the total number of element groups (i.e.,
the optional operand/result is in fact missing).
This commit also renames a few local variables in the generator to make
the distinction between `isVariadic()` and `isVariableLength()` a bit
more clear.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D113855
`BufferizableOpInterface::bufferize` will only be called on ops that
have tensor operands and/or results.
Differential Revision: https://reviews.llvm.org/D113962
NamedAttribute is currently represented as an std::pair, but this
creates an extremely clunky .first/.second API. This commit
converts it to a class, with better accessors (getName/getValue)
and also opens the door for more convenient API in the future.
Differential Revision: https://reviews.llvm.org/D113956
First version was vectors only. With some clever "path" insertion,
we now support any d-dimensional tensor. Up next: reductions too
Reviewed By: bixia, wrengr
Differential Revision: https://reviews.llvm.org/D114024
Floating point optimization can produce incorrect numerical resutls for
-0.0 + 0.0 optimization as result needs to be -0.0.
Reviewed By: eric-k256
Differential Revision: https://reviews.llvm.org/D114127
Transpose conv2d shape inference was incorrect, tests did not properly validate
that the shape inference was executing. Corrected shape inference, and extended
tests to actually execute.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D114026
The name seems to have been left over from a renaming effort on an unexercised
codepaths that are difficult to catch in Python. Fix it and add a test that
exercises the codepath.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D114004
There seems to be a consensus that we should allow 0D vectors:
https://llvm.discourse.group/t/should-we-have-0-d-vectors/3097
This commit is only the first step: it changes the verifier and the parser to
allow vectors like `vector<f32>` (but does not allow explicit 0 dimensions,
i.e., `vector<0xf32>` is not allowed).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114086
LLVM switchop currently only permits i32. Both LLVM IR and MLIR Standard switch permit other integer types leading to an illegal state when lowering an i8 switch from MLIR standard
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D113955
Limit the backtracking along def-use chains when a prefix is encountered as it would generate incorrect foldings.
Differential Revision: https://reviews.llvm.org/D113975
* It works similar to scf.for coversion, but convert condition and yield ops as part of scf.whille pattern so it don't need to maintain external state
Differential Revision: https://reviews.llvm.org/D113007
For the semi affine expressions, whenever rhs of a floordiv, ceildiv, mod
or product expression is a symbolic expression, we introduce a local variable
representing the result, and store the floordiv/ceildiv, mod or product
affine expression in LocalExprs. In this way the expression is flattened,
and trivial addition and subtraction related simplifications are performed.
Also rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Differential Revision: https://reviews.llvm.org/D112808
This reverts commit 94992670fc.
Build is broken with:
tools/mlir/include/mlir/Dialect/LLVMIR/LLVMOps.cpp.inc:23996:3: error: no matching function for call to 'printSwitchOpCases'
printSwitchOpCases(_odsPrinter, *this, getValue().getType(), getCaseValuesAttr(), getCaseDestinations(), getCaseOperands(), getCaseOperands().getTypes());
^~~~~~~~~~~~~~~~~~
LLVM switchop currently only permits i32. Both LLVM IR and MLIR Standard switch permit other integer types leading to an illegal state when lowering an i8 switch from MLIR standard
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D113955
This revision contains all "sparsification" ops and rewriting necessary to support sparse output tensors when the kernel has no reduction (viz. insertions occur in lexicographic order and are "injective"). This will be later generalized to allow reductions too. Also, this first revision only supports sparse 1-d tensors (viz. vectors) as output in the runtime support library. This will be generalized to n-d tensors shortly. But this way, the revision is kept to a manageable size.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D113705
Split tosa.reshape into three individual lowerings: collapse, expand and a
combination of both. Add simple dynamic shape support.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D113936
Multiply by one can be removed during canonicalization. This optimizes away unneeded operations.
Differential Revision: https://reviews.llvm.org/D113807
This is in prevision of dropping them altogether and using insert/extract based patterns.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D113928
When trying to connect the vectorization of depthwise convolutions to e2e execution
a number of problems surfaced.
Fix an off-by-one error on the size of the input vector (similary to what was previously done for regular conv).
Rewrite the lowering to vector.fma instead of vector.contract: the KW reduction dimension has already been unrolled and vector.contract requires a reduction dimension to be valid.
Differential Revision: https://reviews.llvm.org/D113884
FMAOp -> LLVM conversion is done progressively by peeling off 1 dimension from FMAOp at each pattern iteration. Add the recursively bounded property declaration to the pattern so that the rewriter can apply it multiple times.
Without this, FMAOps with 3+D do not lower to LLVM.
Differential Revision: https://reviews.llvm.org/D113886
Names should be consistent across all operations otherwise painful bugs will surface.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D113762
Re-applies D111513:
* Adds a full-fledged Python example dialect and tests to the Standalone example (need to do a bit of tweaking in the top level CMake and lit tests to adapt better to if not building with Python enabled).
* Rips out remnants of custom extension building in favor of pybind11_add_module which does the right thing.
* Makes python and extension sources installable (outputs to src/python/${name} in the install tree): Both Python and C++ extension sources get installed as downstreams need all of this in order to build a derived version of the API.
* Exports sources targets (with our properties that make everything work) by converting them to INTERFACE libraries (which have export support), as recommended for the forseeable future by CMake devs. Renames custom properties to start with lower-case letter, as also recommended/required (groan).
* Adds a ROOT_DIR argument to declare_mlir_python_extension since now all C++ sources for an extension must be under the same directory (to line up at install time).
* Downstreams will need to adapt by:
* Remove absolute paths from any SOURCES for declare_mlir_python_extension (I believe all downstreams are just using ${CMAKE_CURRENT_SOURCE_DIR} here, which can just be ommitted). May need to set ROOT_DIR if not relative to the current source directory.
* To allow further downstreams to install/build, will need to make sure that all C++ extension headers are also listed under SOURCES for declare_mlir_python_extension.
This reverts commit 1a6c26d1f5.
Reviewed By: stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D113732
At this time the 2 flavors of conv are a little too different to allow significant code sharing and other will likely come up.
so we go the easy route first by duplicating and adapting.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D113758
Per discussion on discord and various feature requests across bindings (Haskell and Rust bindings authors have asked me directly), we should be building a link-ready MLIR-C dylib which exports the C API and can be used without linking to anything else.
This patch:
* Adds a new MLIR-C aggregate shared library (libMLIR-C.so), which is similar in name and function to libLLVM-C.so.
* It is guarded by the new CMake option MLIR_BUILD_MLIR_C_DYLIB, which has a similar purpose/name to the LLVM_BUILD_LLVM_C_DYLIB option.
* On all platforms, this will work with both static, BUILD_SHARED_LIBS, and libMLIR builds, if supported:
* In static builds: libMLIR-C.so will export the CAPI symbols and statically link all dependencies into itself.
* In BUILD_SHARED_LIBS: libMLIR-C.so will export the CAPI symbols and have dynamic dependencies on implementation shared libraries.
* In libMLIR.so mode: same as static. libMLIR.so was not finished for actual linking use within the project. An eventual relayering so that libMLIR-C.so depends on libMLIR.so is possible but requires first re-engineering the latter to use the aggregate facility.
* On Linux, exported symbols are filtered to only the CAPI. On others (MacOS, Windows), all symbols are exported. A CMake status is printed unless if global visibility is hidden indicating that this has not yet been implemented. The library should still work, but it will be larger and more likely to conflict until fixed. Someone should look at lifting the corresponding support from libLLVM-C.so and adapting. Or, for special uses, just build with `-DCMAKE_CXX_VISIBILITY_PRESET=hidden -DCMAKE_C_VISIBILITY_PRESET=hidden`.
* Includes fixes to execution engine symbol export macros to enable default visibility. Without this, the advice to use hidden visibility would have resulted in test failures and unusable execution engine support libraries.
Differential Revision: https://reviews.llvm.org/D113731
* Depends on D111504, which provides the boilerplate for building aggregate shared libraries from installed MLIR.
* Adds a full-fledged Python example dialect and tests to the Standalone example (need to do a bit of tweaking in the top level CMake and lit tests to adapt better to if not building with Python enabled).
* Rips out remnants of custom extension building in favor of `pybind11_add_module` which does the right thing.
* Makes python and extension sources installable (outputs to src/python/${name} in the install tree): Both Python and C++ extension sources get installed as downstreams need all of this in order to build a derived version of the API.
* Exports sources targets (with our properties that make everything work) by converting them to INTERFACE libraries (which have export support), as recommended for the forseeable future by CMake devs. Renames custom properties to start with lower-case letter, as also recommended/required (groan).
* Adds a ROOT_DIR argument to `declare_mlir_python_extension` since now all C++ sources for an extension must be under the same directory (to line up at install time).
* Need to validate against a downstream or two and adjust, prior to submitting.
Downstreams will need to adapt by:
* Remove absolute paths from any SOURCES for `declare_mlir_python_extension` (I believe all downstreams are just using `${CMAKE_CURRENT_SOURCE_DIR}` here, which can just be ommitted). May need to set `ROOT_DIR` if not relative to the current source directory.
* To allow further downstreams to install/build, will need to make sure that all C++ extension headers are also listed under SOURCES for `declare_mlir_python_extension`.
Reviewed By: stephenneuendorffer, mikeurbach
Differential Revision: https://reviews.llvm.org/D111513
With `-Os` turned on, results in 2-5% binary size reduction
(depends on the original binary). Without it, the binary size
is essentially unchanged.
Depends on D113128
Differential Revision: https://reviews.llvm.org/D113331
Support load with broadcast, elementwise divf op and remove the
hardcoded restriction on the vector size. Picking the right size should
be enfored by user and will fail conversion to llvm/spirv if it is not
supported.
Differential Revision: https://reviews.llvm.org/D113618
Fusing into a reduction is only valid if doing so does not erase information on a reduction dimensions size.
Differential Revision: https://reviews.llvm.org/D113500
This revision adds an implementation of 2-D vector.transpose for 4x8 and 8x8 for
AVX2 and surfaces it to the Linalg level of control.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D113347
This decouples the printing/parsing from the "context" in which the parsing occurs.
This will allow to invoke these methods directly using an OpAsmParser/OpAsmPrinter.
Differential Revision: https://reviews.llvm.org/D113637
Identifier and StringAttr essentially serve the same purpose, i.e. to hold a string value. Keeping these seemingly identical pieces of functionality separate has caused problems in certain situations:
* Identifier has nice accessors that StringAttr doesn't
* Identifier can't be used as an Attribute, meaning strings are often duplicated between Identifier/StringAttr (e.g. in PDL)
The only thing that Identifier has that StringAttr doesn't is support for caching a dialect that is referenced by the string (e.g. dialect.foo). This functionality is added to StringAttr, as this is useful for StringAttr in generally the same ways it was useful for Identifier.
Differential Revision: https://reviews.llvm.org/D113536
The specific description is [[ https://llvm.discourse.group/t/adding-unsigned-integer-ceil-and-floor-in-std-dialect/4541 | Adding unsigned integer ceil in Std Dialect ]] .
When we lower ceilDivOp this will generate below code, sometimes we know m and n are unsigned intergal.Here are some redundant judgments about positive and negative.
So we need to add some unsigned operations to simplify the instructions.
```
ceilDiv(n, m)
x = (m > 0) ? -1 : 1
return (n*m>0) ? ((n+x) / m) + 1 : - (-n / m)
```
unsigned operations:
```
ceilDivU(n, m)
return n ==0 ? 0 : ((n - 1) / m) + 1
```
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D113363
* Store inplace bufferization decisions in `inplaceBufferized`.
* Remove `InPlaceSpec`. Use a bool instead.
* Use `BufferizableOpInterface::bufferizesToWritableMemory` and `bufferizesToWritableMemory` instead of `getInPlace(BlockArgument)`. The analysis does not care about inplacability of block arguments. It only cares whether the buffer can be written to or not.
* The `kInPlaceResultsAttrName` op attribute is for testing purposes only.
This commit further decouples BufferizationAliasInfo from other dialects such as SCF.
Differential Revision: https://reviews.llvm.org/D113375
After replacing then init_tensor with a new value, the new value must be inserted into the corresponding union/equivalence sets.
Differential Revision: https://reviews.llvm.org/D113374
Simply emit traits, interfaces & effects (with some minimal formatting) to the
generated docs to make this information easier to find in the docs.
Differential Revision: https://reviews.llvm.org/D113539
New TOSA pad operation can support explicitly specifying the pad value. Added
lowering to linalg that uses the explicit value.
Differential Revision: https://reviews.llvm.org/D113515
When doing topological sort we need to make sure an op is scheduled before any
of the ops within its regions.
Also change the algorithm to not be recursive in order to prevent potential
stack overflow.
Differential Revision: https://reviews.llvm.org/D113423
Use existing helper instead of handling only a subset of indices lowering
arithmetic. Also relax the restriction on the memref rank for the GPU mma ops
as we can now support any rank.
Differential Revision: https://reviews.llvm.org/D113383
Given that LLVM dialect types may now optionally contain types from other
dialects, which itself is motivated by dialect interoperability and progressive
lowering, the conversion should no longer assume that the outermost LLVM
dialect type can be left as is. Instead, it should inspect the types it
contains and attempt to convert them to the LLVM dialect. Introduce this
capability for LLVM array, pointer and structure types. Only literal structures
are currently supported as handling identified structures requires the
converison infrastructure to have a mechanism for avoiding infite recursion in
case of recursive types.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D112550
Use CodegenStrategy instead of a separate test pass to test iterator interchange.
Depends On D113409
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113550
Remove padding test pass that was replaced by CodegenStrategy.
Depends On D113411
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113412
Use CodegenStrategy instead of a separate test pass to test hoisting.
Depends On D113410
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113411
Use CodegenStrategy instead of a separate test pass to test padding.
Depends On D113409
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113410
Use AffineApplyOp instead of SubIOp to compute the padding width when creating a pad tensor operation.
Depends On D113382
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113404
Add a flag to control if CodegenStrategy runs the EnablePass between the transformations.
Depends On D113382
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113409
Remove the padding options from the tiling options since padding is now implemented by a separate pattern/pass introduced in https://reviews.llvm.org/D112412.
The revsion remove the tile-and-pad-tensors.mlir and replaces it with the pad.mlir that tests padding in isolation (without tiling). Similarly, hoist-padding.mlir is replaced by pad-and-hoist.mlir introduced in https://reviews.llvm.org/D112713.
Depends On D112838
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113382
The existing PostOrder traversal with special rules for certain ops was complicated and had a bug. Switch to PreOrder traversal.
Differential Revision: https://reviews.llvm.org/D113338
A tensor.insert_slice write does not conflict with a subsequent read of the source if the source is originating from a matching tensor.extract_slice.
Differential Revision: https://reviews.llvm.org/D113446
This uses the buffer-deallocation pass or, in case the test case does not use bufferization, has added explicit deallocs.
Differential Revision: https://reviews.llvm.org/D111059
Enables using the same iterator interface to these even though underlying storage is different.
Differential Revision: https://reviews.llvm.org/D113512
This breaking change requires to remove printing the mnemonic in the print()
method on Type/Attribute classes.
This makes it consistent with the parsing code which alread handles the
mnemonic outside of the parsing method.
This likely won't break the build for anyone, but tests will start
failing for dialects downstream. The fix is trivial and look like
going from:
void emitc::OpaqueType::print(DialectAsmPrinter &printer) const {
printer << "opaque<\"";
to:
void emitc::OpaqueAttr::print(DialectAsmPrinter &printer) const {
printer << "<\"";
Reviewed By: rriddle, aartbik
Differential Revision: https://reviews.llvm.org/D113334
Add a new `useDefaultAttributePrinterParser` boolean settings on the dialect
(default to false for now) that emits the boilerplate to dispatch attribute
parsing/printing to the auto-generated method.
We will likely turn this on by default in the future.
Differential Revision: https://reviews.llvm.org/D113329
In preparation for implementation subrange lookup on attributes.
Depends on D113039
Reviewed By: jpienaar, Chia-hungDuan
Differential Revision: https://reviews.llvm.org/D113128
There are several aspects of the API that either aren't easy to use, or are
deceptively easy to do the wrong thing. The main change of this commit
is to remove all of the `getValue<T>`/`getFlatValue<T>` from ElementsAttr
and instead provide operator[] methods on the ranges returned by
`getValues<T>`. This provides a much more convenient API for the value
ranges. It also removes the easy-to-be-inefficient nature of
getValue/getFlatValue, which under the hood would construct a new range for
the type `T`. Constructing a range is not necessarily cheap in all cases, and
could lead to very poor performance if used within a loop; i.e. if you were to
naively write something like:
```
DenseElementsAttr attr = ...;
for (int i = 0; i < size; ++i) {
// We are internally rebuilding the APFloat value range on each iteration!!
APFloat it = attr.getFlatValue<APFloat>(i);
}
```
Differential Revision: https://reviews.llvm.org/D113229
Add a new directive `either` to specify the operands can be matched in either order
Reviewed By: jpienaar, Mogball
Differential Revision: https://reviews.llvm.org/D110666
Generate static function for matching the type/attribute to reduce the
memory footprint.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D110199
Add pad_const field to tosa.pad.
Add builders to enable optional construction of pad_const in pad op.
Update documentation of tosa.clamp to match spec wording.
Signed-off-by: Suraj Sudhir <suraj.sudhir@arm.com>
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D113322
Declarative attribute and type formats with assembly formats. Define an
`assemblyFormat` field in attribute and type defs with a `mnemonic` to
generate a parser and printer.
```tablegen
def MyAttr : AttrDef<MyDialect, "MyAttr"> {
let parameters = (ins "int64_t":$count, "AffineMap":$map);
let mnemonic = "my_attr";
let assemblyFormat = "`<` $count `,` $map `>`";
}
```
Use `struct` to define a comma-separated list of key-value pairs:
```tablegen
def MyType : TypeDef<MyDialect, "MyType"> {
let parameters = (ins "int":$one, "int":$two, "int":$three);
let mnemonic = "my_attr";
let assemblyFormat = "`<` $three `:` struct($one, $two) `>`";
}
```
Use `struct(*)` to capture all parameters.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111594
Adapt the Fourier Motzkin elimination to take into account affine computations happening outside of the cloned loop nest.
Depends On D112713
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112838
The revision updates the packing loop search in hoist padding. Instead of considering all loops in the backward slice, we now compute a separate backward slice containing the index computations only. This modification ensures we do not add packing loops that are not used to index the packed buffer due to spurious dependencies. One instance where such spurious dependencies can appear is the extract slice operation introduced between the tile loops of a double tiling.
Depends On D112412
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112713
Added omp.sections and omp.section operation according to the
section 2.8.1 of OpenMP Standard 5.0.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D110844
Previously we didn't materialize conversions for arguments in certain
cases as the implicit type propagation was being heavily relied on
by many patterns. Now that those patterns have been fixed to
properly handle type conversions, we can drop the special behavior.
Differential Revision: https://reviews.llvm.org/D113233
The ODS-based Python op bindings generator has been generating incorrect
specification of the operand segment in presence if both optional and variadic
operand groups: optional groups were treated as variadic whereas they require
separate treatement. Make sure it is the case. Also harden the tests around
generated op constructors as they could hitherto accept the code for both
optional and variadic arguments.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D113259
The earlier reduction "scalarization" was only applied to a chain of
*innermost* and *for* loops. This revision generalizes this to any
nesting of for- and while-loops. This implies that reductions can be
implemented with a lot less load and store operations. The chaining
is implemented with a forest of yield statements (but not as bad as
when we would also include the while-induction).
Fixes https://bugs.llvm.org/show_bug.cgi?id=52311
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D113078
OpAdaptor::verify performs string lookups on an attribute dictionary. By
calling OpAdaptor::verify, Op::verify is not able to use cached attribute
identifiers for faster lookups.
Reviewed By: jpienaar, rriddle
Differential Revision: https://reviews.llvm.org/D113039
- Provide the operator overloads for constructing (semi-)affine expressions in
Python by combining existing expressions with constants.
- Make AffineExpr, AffineMap and IntegerSet hashable in Python.
- Expose the AffineExpr composition functionality.
Reviewed By: gysit, aoyal
Differential Revision: https://reviews.llvm.org/D113010
This better decouples transfer read/write from vector-only rewrite of conv.
This form is close to ready to plop into a new vector.conv op and the vector.transfer operations to be generalized as part of generic vectorization once the properties ConvolutionOpInterface are inferred from the indexing maps.
This also results in a nice perf boost in the dw == 1 cases.
Differential revision: https://reviews.llvm.org/D112822
This refactoring prepares conv1d vectorization for a future integration into
the generic codegen path.
Once transfer_read / transfer_write vectorization also supports sliding windows,
the special pattern for conv can disappear.
This will also likely need a vector.conv operation.
Differential Revision: https://reviews.llvm.org/D112797
The 2-D case can be rewritten to generate quite fewer instructions and a single vector.shuffle which seems to provide a nice performance boost.
Add this arrow to our quiver by exposing it with a new vector transform option.
Differential Revision: https://reviews.llvm.org/D113062
We'd like to take a progressive approach towards Fconvolution op
CodeGen, by 1) tiling it to fit compute hierarchy first, and then
2) tiling along window dimensions with size 1 to reduce the problem
to be matmul-like. After that, we can 3) downscale high-D convolution
ops to low-D by removing the size-1 window dimensions. The final
step would be 4) vectorizing the low-D convolution op directly.
We have patterns for 1), 2), and 4). This commit adds a pattern for
3) for `linalg.conv_2d_nhwc_hwcf` ops as a starter. Supporting other
high-D convolution ops should be similar and mechanical.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112928
Symbol tables are a largely useful top-level IR construct, for example, they
make it easy to access functions in a module by name instead of traversing the
list of module's operations to find the corresponding function.
Depends On D112886
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D112821
In order to support fusion with mma matrix type we need to be able to
execute elementwise operations on them. This add an op to be able to
support some basic elementwise operations. This is a is not a full
solution as it only supports a limited scope or operations. Ideally we would
want to be able to fuse with more kind of operations.
Differential Revision: https://reviews.llvm.org/D112857
wmma intrinsics have a large number of combinations, ideally we want to be able
to target all the different variants. To avoid a combinatorial explosion in the
number of mlir op we use attributes to represent the different variation of
load/store/mma ops. We also can generate with tablegen helpers to know which
combinations are available. Using this we can avoid having too hardcode a path
for specific shapes and can support more types.
This patch also adds boiler plates for tf32 op support.
Differential Revision: https://reviews.llvm.org/D112689
Add the shufflevector conversion. It only handles the static, i.e., IntegerAttr, index.
Co-authored: Xinyi Liu <xyliuhelen@gmail.com>
Reviewed by: antiagainst
Differential revision: https://reviews.llvm.org/D112161
Provide support for removing an operation from the block that contains it and
moving it back to detached state. This allows for the operation to be moved to
a different block, a common IR manipulation for, e.g., module merging.
Also fix a potential one-past-end iterator dereference in Operation::moveAfter
discovered in the process.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D112700
Added a type with different pointer/index bit width. Also
added some sanity CHECKs on the stored indices.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D112778
When operand is a subview we don't infer in_bounds and some default cases (e.g case in the tests) will crash with `operand is NULL` when converting to LLVM
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D112772
Add a strategy pass that pads and hoists after tiling and fusion.
Depends On D112412
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112480
Adding a padding and hoisting pattern, a test pass, and tests. The patch prepares the split of tiling/fusion and padding.
Depends On D112255
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112412
Adapt hoistPaddingOnTensors to leave replacing and erasing the old pad tensor operation to the caller. This change makes the function pattern friendly.
Depends On D112003
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112255
This patch extends the SubElementAttr interface to allow replacing a contained sub attribute. The attribute that should be replaced is identified by an index which denotes the n-th element returned by the accompanying walkImmediateSubElements method.
Using this addition the patch implements replacing SymbolRefAttrs contained within any dialect attributes.
Differential Revision: https://reviews.llvm.org/D111357
Rationale:
The silent exit(1) gives little clues on where the error occurs on failure
and may even be confusing at first. The CHECK testing of all computed values
and indices may be a little bit more elaborate, but it directly pinpoints
where errors happen if they occur. This style is also consistent with
the other tests, which I actually prefer.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112688
Add llvm.mlir.global_ctors and global_dtors ops and their translation
support to LLVM global_ctors/global_dtors global variables.
Differential Revision: https://reviews.llvm.org/D112524
This patch adds the inclusive clause (which was missed in previous
reorganization - https://reviews.llvm.org/D110903) in omp.wsloop operation.
Added a test for validating it.
Also fixes the order clause, which was not accepting any values. It now accepts
"concurrent" as a value, as specified in the standard.
Reviewed By: kiranchandramohan, peixin, clementval
Differential Revision: https://reviews.llvm.org/D112198
Allow lowering of wmma ops with 64bits indexes. Change the default
version of the test to use default layout.
Differential Revision: https://reviews.llvm.org/D112479
Analyze ops in a pseudo-random order to see if any assertions are triggered. Randomizing the order of analysis likely worsens the quality of the bufferization result (more out-of-place bufferizations). However, assertions should never fail, as that would indicate a problem with our implementation.
Differential Revision: https://reviews.llvm.org/D112581
This patch supports the atomic construct (read and write) following
section 2.17.7 of OpenMP 5.0 standard. Also added tests and
verifier for the same.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D111992
This also fixes the vector.shuffle C++ builder which had an incorrect type assumption that triggers with this new rewrite.
The vector.shuffle semantics were correct though.
Differential revision: https://reviews.llvm.org/D112578
The current implementation invokes materializations
whenever an input operand does not have a mapping for the
desired type, i.e. it requires materialization at the earliest possible
point. This conflicts with goal of dialect conversion (and also the
current documentation) which states that a materialization is only
required if the materialization is supposed to persist after the
conversion process has finished.
This revision refactors this such that whenever a target
materialization "might" be necessary, we insert an
unrealized_conversion_cast to act as a temporary materialization.
This allows for deferring the invocation of the user
materialization hooks until the end of the conversion process,
where we actually have a better sense if it's actually
necessary. This has several benefits:
* In some cases a target materialization hook is no longer
necessary
When performing a full conversion, there are some situations
where a temporary materialization is necessary. Moving forward,
these users won't need to provide any target materializations,
as the temporary materializations do not require the user to
provide materialization hooks.
* getRemappedValue can now handle values that haven't been
converted yet
Before this commit, it wasn't well supported to get the remapped
value of a value that hadn't been converted yet (making it
difficult/impossible to convert multiple operations in many
situations). This commit updates getRemappedValue to properly
handle this case by inserting temporary materializations when
necessary.
Another code-health related benefit is that with this change we
can move a majority of the complexity related to materializations
to the end of the conversion process, instead of handling adhoc
while conversion is happening.
Differential Revision: https://reviews.llvm.org/D111620
Rationale:
The currently used trait was demanding that all types are the same
which is not true (since the sparse part may change and the dim sizes
may be relaxed). This revision uses the correct trait and makes the
rank match test explicit in the verify method.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D112576
Polynomial approximation can be extented to support N-d vectors.
N-dimensional vectors are useful when vectorizing operations on N-dimensional
tiles. Before lowering to LLVM these vectors are usually unrolled or flattened
to 1-dimensional vectors.
Differential Revision: https://reviews.llvm.org/D112566
1.Combining kind min/max of Vector reduction op has been changed to
minf/maxf, minsi/maxsi, and minui/maxui. Modify getVectorReductionOp
accordingly.
2.Add min/max to supported reductions.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112246
Fix AffineExpr `getLargestKnownDivisor` for ceil/floor div cases.
In these cases, nothing can be inferred on the divisor of the
result.
Add test case for `mod` as well.
Differential Revision: https://reviews.llvm.org/D112523
The current behavior is conveniently allowing to iterate on the regions of an operation
implicitly by exposing an operation as Iterable. However this is also error prone and
code that may intend to iterate on the results or the operands could end up "working"
apparently instead of throwing a runtime error.
The lack of static type checking in Python contributes to the ambiguity here, it seems
safer to not do this and require and explicit qualification to iterate (`op.results`, `op.regions`, ...).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111697
Specification specified the output type for quantized average pool should be
an i32. Only accumulator should be an i32, result type should match the input
type.
Caused in https://reviews.llvm.org/D111590
Reviewed By: sjarus, GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D112484
Using callbacks for allocation/deallocation allows users to override
the default.
Also add an option to comprehensive bufferization pass to use `alloca`
instead of `alloc`s. Note that this option is just for testing. The
option to use `alloca` does not work well with the option to allow for
returning memrefs.
Even though tensor.cast is not part of the sparse tensor dialect,
it may be used to cast static dimension sizes to dynamic dimension
sizes for sparse tensors without changing the actual sparse tensor
itself. Those cases should be lowered properly when replacing sparse
tensor types with their opaque pointers. Likewise, no op sparse
conversions are handled by this revision in a similar manner.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112173
Using callbacks for allocation/deallocation allows users to override
the default.
Also add an option to comprehensive bufferization pass to use `alloca`
instead of `alloc`s. Note that this option is just for testing. The
option to use `alloca` does not work well with the option to allow for
returning memrefs.
Differential Revision: https://reviews.llvm.org/D112166
In several cases, operation result types can be unambiguously inferred from
operands and attributes at operation construction time. Stop requiring the user
to provide these types as arguments in the ODS-generated constructors in Python
bindings. In particular, handle the SameOperandAndResultTypes and
FirstAttrDerivedResultType traits as well as InferTypeOpInterface using the
recently added interface support. This is a significant usability improvement
for IR construction, similar to what C++ ODS provides.
Depends On D111656
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111811
Introduce the initial support for operation interfaces in C API and Python
bindings. Interfaces are a key component of MLIR's extensibility and should be
available in bindings to make use of full potential of MLIR.
This initial implementation exposes InferTypeOpInterface all the way to the
Python bindings since it can be later used to simplify the operation
construction methods by inferring their return types instead of requiring the
user to do so. The general infrastructure for binding interfaces is defined and
InferTypeOpInterface can be used as an example for binding other interfaces.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111656
Splitting the WsLoop tests they were getting harder to debug with the offsets over 100 for some of them.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D112407
This removes duplication and makes nesting more clear.
It also reduces the amount of changes necessary for exposing future options.
Differential revision: https://reviews.llvm.org/D112344
This removes duplication and makes nesting more clear.
It also reduces the amount of changes necessary for exposing future options.
Differential revision: https://reviews.llvm.org/D112344
This patch adds a polynomial approximation that matches the
approximation in Eigen.
Note that the approximation only applies to vectorized inputs;
the scalar rsqrt is left unmodified.
The approximation is protected with a flag since it emits an AVX2
intrinsic (generated via the X86Vector). This is the only reasonably
clean way that I could find to generate the exact approximation that
I wanted (i.e. an identical one to Eigen's).
I considered two alternatives:
1. Introduce a Rsqrt intrinsic in LLVM, which doesn't exist yet.
I believe this is because there is no definition of Rsqrt that
all backends could agree on, since hardware instructions that
implement it have widely varying degrees of precision.
This is something that the standard could mandate, but Rsqrt is
not part of IEEE754, so I don't think this option is feasible.
2. Emit fdiv(1.0, sqrt) with fast math flags to allow reciprocal
transformations. Although portable, this doesn't allow us
to generate exactly the code we want; it is the LLVM backend,
and not MLIR, who controls what code is generated based on the
target CPU.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D112192
Pass the modifiers from the Flang parser to FIR/MLIR workshare
loop operation.
Not yet supporting the SIMD modifier, which is a bit more work
than just adding it to the list of modifiers, so will go in a
separate patch.
This adds a new field to the WsLoopOp.
Also add test for dynamic WSLoop, checking that dynamic schedule calls
the init and next functions as expected.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111053
This commit adds support for scf::IfOp to comprehensive bufferization. Support is currently limited to cases where both branches yield tensors that bufferize to the same buffer.
To keep the analysis simple, scf::IfOp are treated as memory writes for analysis purposes, even if no op inside any branch is writing. (scf::ForOps are handled in the same way.)
Differential Revision: https://reviews.llvm.org/D111929
ConstantOp should be used instead of ConstantIntOp to be able to support index type.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D112191
Handle contraction op like all the other generic op reductions. This
simpifies the code. We now rely on contractionOp canonicalization to
keep the same code quality.
Differential Revision: https://reviews.llvm.org/D112171
add several patterns that will simplify contraction vectorization in the
future. With those canonicalizationns we will be able to remove the special
case for contration during vectorization and rely on those transformations to
avoid materizalizing broadcast ops.
Differential Revision: https://reviews.llvm.org/D112121
In the stride == 1 case, conv1d reads contiguous data along the input dimension. This can be advantageaously used to bulk memory transfers and compute while avoiding unrolling. Experimentally, this can yield speedups of up to 50%.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D112139
An InitTensorOp is replaced with an ExtractSliceOp on the InsertSliceOp's destination. This optimization is applied after analysis and only to InsertSliceOps that were decided to bufferize inplace. Another analysis on the new ExtractSliceOp is needed after the rewrite.
Differential Revision: https://reviews.llvm.org/D111955
This patch supports the ordered construct in OpenMP dialect following
Section 2.19.9 of the OpenMP 5.1 standard. Also lowering to LLVM IR
using OpenMP IRBduiler. Lowering to LLVM IR for ordered simd directive
is not supported yet since LLVM optimization passes do not support it
for now.
Reviewed By: kiranchandramohan, clementval, ftynse, shraiysh
Differential Revision: https://reviews.llvm.org/D110015
This is required for bufferization of scf::IfOp, which is added in a subsequent commit.
Some ops (scf::ForOp, TiledLoopOp) require PreOrder traversal to make sure that bbArgs are mapped before bufferizing the loop body.
Differential Revision: https://reviews.llvm.org/D111924
This patch supports the ordered construct in OpenMP dialect following
Section 2.19.9 of the OpenMP 5.1 standard. Also lowering to LLVM IR
using OpenMP IRBduiler. Lowering to LLVM IR for ordered simd directive
is not supported yet since LLVM optimization passes do not support it
for now.
Reviewed By: kiranchandramohan, clementval, ftynse, shraiysh
Differential Revision: https://reviews.llvm.org/D110015
The current implementation used explicit index->int64_t casts for some, but
not all instances of passing values of type "index" in and from the sparse
support library. This revision makes the situation more consistent by
using new "index_t" type at all such places (which allows for less trivial
casting in the generated MLIR code). Note that the current revision still
assumes that "index" is 64-bit wide. If we want to support targets with
alternative "index" bit widths, we need to build the support library different.
But the current revision is a step forward by making this requirement explicit
and more visible.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D112122
Add a pattern to take a rank-reducing subview and drop inner most
contiguous unit dim.
This is useful when lowering vector to backends with 1d vector types.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D111561
According to the OpenMP 5.0 standard, names and hints of critical operation are
closely related. The following are the restrictions on them:
- Unless the effect is as if `hint(omp_sync_hint_none)` was specified, the
critical construct must specify a name.
- If the hint clause is specified, each of the critical constructs with the
same name must have a hint clause for which the hint-expression evaluates to
the same value.
These restrictions will be enforced by design if the hint expression is a part
of the `omp.critical.declare` operation.
- Any operation with no "name" will be considered to have
`hint(omp_sync_hint_none)`.
- All the operations with the same "name" will have the same hint value.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D112134
Follow up to also use the prefixed emitters in OpFormatGen (moved
getGetterName(s) and getSetterName(s) to Operator as that is most
convenient usage wise even though it just depends on Dialect). Prefix
accessors in Test dialect and follow up on missed changes in
OpDefinitionsGen.
Differential Revision: https://reviews.llvm.org/D112118
This revision uses the newly refactored StructuredGenerator to create a simple vectorization for conv1d_nwc_wcf.
Note that the pattern is not specific to the op and is technically not even specific to the ConvolutionOpInterface (modulo minor details related to dilations and strides).
The overall design follows the same ideas as the lowering of vector::ContractionOp -> vector::OuterProduct: it seeks to be minimally complex, composable and extensible while avoiding inference analysis. Instead, we metaprogram the maps/indexings we expect and we match against them.
This is just a first stab and still needs to be evaluated for performance.
Other tradeoffs are possible that should be explored.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111894
This canonicalizer replaces reshapes of constant tensors that contain the updated shape (skipping the reshape operation).
Differential Revision: https://reviews.llvm.org/D112038
Code reorganized in OpenMPDialect.cpp to have all functions corresponding to an operation together.
Added parseClauses function to avoid code duplication while parsing clauses in OpenMP operations. Also added printers and verifiers for clauses, which are being used for multiple operations.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D110903
The change is based on the proposal from the following discussion:
https://llvm.discourse.group/t/rfc-memreftype-affine-maps-list-vs-single-item/3968
* Introduce `MemRefLayoutAttr` interface to get `AffineMap` from an `Attribute`
(`AffineMapAttr` implements this interface).
* Store layout as a single generic `MemRefLayoutAttr`.
This change removes the affine map composition feature and related API.
Actually, while the `MemRefType` itself supported it, almost none of the upstream
can work with more than 1 affine map in `MemRefType`.
The introduced `MemRefLayoutAttr` allows to re-implement this feature
in a more stable way - via separate attribute class.
Also the interface allows to use different layout representations rather than affine maps.
For example, the described "stride + offset" form, which is currently supported in ASM parser only,
can now be expressed as separate attribute.
Reviewed By: ftynse, bondhugula
Differential Revision: https://reviews.llvm.org/D111553
This revison lifts the artificial restriction on having exact matches between
source and destination type shapes. A static size may become dynamic. We still
reject changing a dynamic size into a static size to avoid the need for a
runtime "assert" on the conversion. This revision also refactors some of the
conversion code to share same-content buffers.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111915
Use wider range for approximating Tanh to match results computed in Eigen with AVX.
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D112011
Starting with a mostly NFC change to be able to differentiate between
mechanical changes from ones that require more detailed review.
This will be used to flush out flow before flipping dialects used
outside local testing. As this dialect is not intended to be used
generally rather than in tests in core, I will not be following 2 week
staging approach here.
AnyAttrOf, similar to AnyTypeOf, expects the attribute to be one of the
given attributes.
For instance, `AnyAttrOf<[I32Attr, StrAttr]>` expects either a `I32Attr`,
or a `StrAttr`.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111739
When folding A->B->C => A->C only accept A->C that is valid shape cast
Reviewed By: ThomasRaoux, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111473
The rules were too restrictive, causing out-of-place bufferization when the result of two ExtractSliceOp is fed into an InsertSliceOp.
Differential Revision: https://reviews.llvm.org/D111861
This revision also adds a few passes to the sparse compiler part to unify the transformation sequence with all other paths we currently use.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111900
This is the only lowering to Linalg Tosa has, so it's needlessly
verbose. Likely this was a carry over from IREE's usage where we
originally lowered to linalg on buffers (the only linalg that existed at
the time), so the everything on tensors needed the suffix. We're dropping
it in IREE also, having transitioned entirely to using Linalg on
tensors.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D111911
Next step towards supporting sparse tensors outputs.
Also some minor refactoring of enum constants as well
as replacing tensor arguments with proper buffer arguments
(latter is required for more general sizes arguments for
the sparse_tensor.init operation, as well as more general
spares_tensor.convert operations later)
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D111771
`DefaultValuedAttr<StrAttr, "">` and `ConstantAttr<StrAttr, "">`
result in bugs in which TableGen will not recognize that the attribute
has a default value, because `""` is an empty TableGen string.
Strings no longer have special treatment. Instead, string values must be
wrapped in quotes: "\"foo\"". Two helpers, `DefaultValuedStrAttr` and
`ConstantStrAttr` have been added to keep code clean.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111855
From the perspective of analysis, scf::ForOp is treated as a black box. Basic block arguments do not alias with their respective OpOperands on the ForOp, so they do not participate in conflict analysis with ops defined outside of the loop.
However, bufferizesToMemoryRead and bufferizesToMemoryWrite on the scf::ForOp itself are used to determine how the scf::ForOp interacts with its surrounding ops.
Differential Revision: https://reviews.llvm.org/D111775
For each memory read, follow SSA use-def chains to find the op that produces the data being read (i.e., the most recent write). A memory write to an alias is a conflict if it takes places after the "most recent write" but before the read.
This CL introduces two main changes:
* There is a concise definition of a conflict. Given a piece of IR with InPlaceSpec annotations and a computes alias set, it is easy to compute whether this program has a conflict. No need to consider multiple cases such as "read of operand after in-place write" etc.
* No need to check for clobbering.
Differential Revision: https://reviews.llvm.org/D111287
Allow emitting get & set prefix for accessors generated for ops. If
enabled, then the argument/return/region name gets converted from
snake_case to UpperCamel and prefix added. The attribute also allows
generating both the current "raw" method along with the prefix'd one to
make it easier to stage changes.
The option is added on the dialect and currently defaults to existing
raw behavior. The expectation is that the staging where both are
generated would be short lived and so optimized to keeping the changes
local/less invasive (it just generates two functions for each accessor
with the same body - most of these internally again call a helper
function). But generation can be optimized if needed.
I'm unsure about OpAdaptor classes as there it is all get methods (it is
a named view into raw data structures), so prefix doesn't add much.
This starts with emitting raw-only form (as current behavior) as
default, then one can opt-in to raw & prefixed, then just prefixed. The
default in OpBase will switch to prefixed-only to be consistent with
MLIR style guide. And the option potentially removed later (considered
enabling specifying prefix but current discussion more pro keeping it
limited and stuck with that).
Also add more explicit checking for pruned functions to avoid emitting
where no function was added (and so avoiding dereferencing nullptr)
during op def/decl generation.
See https://bugs.llvm.org/show_bug.cgi?id=51916 for further discussion.
Differential Revision: https://reviews.llvm.org/D111033
Emit reduction during op vectorization instead of doing it when creating the
transfer write. This allow us to not broadcast output arguments for reduction
initial value.
Differential Revision: https://reviews.llvm.org/D111825
Part of the arith update broke UiToFp32. Fixed the lowering and included a new
test to detect a regression.
Differential Revision: https://reviews.llvm.org/D111772
I am unclear this is reproducible with correct IR but atm the verifier for InsertSliceOp
is not powerful enough and this triggers an infinite loop that is worth fixing independently.
Differential Revision: https://reviews.llvm.org/D111812
Improve support for variadic regions in ODS-generated operation view classes.
In particular, make generated constructors take an extra argument that
specifies the number of variadic regions if the operation has them. Previously,
there was no mechanism to specify a non-zero number of variadic regions. Also
generate named accessors to regions.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111783
MemRefType was using a wrong `isa` function in the bindings code, which
could lead to invalid IR being constructed. Also run the verifier in
memref dialect tests.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111784
Setting the nofold attribute enables packing an operand. At the moment, the attribute is set by default. The pack introduces a callback to control the flag.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111718
After removing the last LinalgOps that have no region attached we can verify there is a region. The patch performs the following changes:
- Move the SingleBlockImplicitTerminator trait further up the the structured op base class.
- Adapt the LinalgOp verification since the trait only check if there is 0 or 1 block.
- Introduce a getBlock method on the LinalgOp interface.
- Access the LinalgOp body using either getBlock() or getBody() if the concrete operation type is known.
This patch is a follow up to https://reviews.llvm.org/D111233.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111393
* Incorporates a reworked version of D106419 (which I have closed but has comments on it).
* Extends the standalone example to include a minimal CAPI (for registering its dialect) and a test which, from out of tree, creates an aggregate dylib and links a little sample program against it. This will likely only work today in *static* MLIR builds (until the TypeID fiasco is finally put to bed). It should work on all platforms, though (including Windows - albeit I haven't tried this exact incarnation there).
* This is the biggest pre-requisite to being able to build out of tree MLIR Python-based projects from an installed MLIR/LLVM.
* I am rather nauseated by the CMake shenanigans I had to endure to get this working. The primary complexity, above and beyond the previous patch is because (with no reason given), it is impossible to export target properties that contain generator expressions... because, of course it isn't. In this case, the primary reason we use generator expressions on the individual embedded libraries is to support arbitrary ordering. Since that need doesn't apply to out of tree (which import everything via FindPackage at the outset), we fall back to a more imperative way of doing the same thing if we detect that the target was imported. Gross, but I don't expect it to need a lot of maintenance.
* There should be a relatively straight-forward path from here to rebase libMLIR.so on top of this facility and also make it include the CAPI.
Differential Revision: https://reviews.llvm.org/D111504
This is the first step towards supporting general sparse tensors as output
of operations. The init sparse tensor is used to materialize an empty sparse
tensor of given shape and sparsity into a subsequent computation (similar to
the dense tensor init operation counterpart).
Example:
%c = sparse_tensor.init %d1, %d2 : tensor<?x?xf32, #SparseMatrix>
%0 = linalg.matmul
ins(%a, %b: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%c: tensor<?x?xf32, #SparseMatrix>) -> tensor<?x?xf32, #SparseMatrix>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111684
The type can be inferred trivially, but it is currently done as string
stitching between ODS and C++ and is not easily exposed to Python.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111712
When writing the user-facing documentation, I noticed several inconsistencies
and asymmetries in the Python API we provide. Fix them by adding:
- the `owner` property to regions, similarly to blocks;
- the `isinstance` method to any class derived from `PyConcreteAttr`,
`PyConcreteValue` and `PyConreteAffineExpr`, similar to `PyConcreteType` to
enable `isa`-like calls without having to handle exceptions;
- a mechanism to create the first block in the region as we could only create
blocks relative to other blocks, with is impossible in an empty region.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111556
This exposes creating a CallSiteLoc with a callee & list of frames for
callers. Follows the creation approach in C++ side where a list of
frames may be provided.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111670
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
1. To avoid two ExecutionModeOp using the same name, adding the value of execution mode in name when converting to LLVM dialect.
2. To avoid syntax error in spv.OpLoad, add OpTypeSampledImage into SPV_Type.
Reviewed by:antiagainst
Differential revision:https://reviews.llvm.org/D111193
We shouldn't broadcast the original value when doing reduction. Instead
we compute the reduction and then combine it with the original value.
Differential Revision: https://reviews.llvm.org/D111666
This patch teaches `isProjectedPermutation` and `inverseAndBroadcastProjectedPermutation`
utilities to deal with maps representing an explicit broadcast, e.g., (d0, d1) -> (d0, 0).
This extension is needed to enable vectorization of such explicit broadcast in Linalg.
Reviewed By: pifon2a, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111563
Average pool assumed the same input/output type. Result type for integers
is always an i32, should be updated appropriately.
Reviewed By: GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D111590
If I remember correctly this wasn't done previously because dim used to
be in the memref dialect.
Differential Revision: https://reviews.llvm.org/D111651
This revision takes advantage of the recently added support for 0-d transfers and vector.multi_reduction that return a scalar.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D111626
This revision updates the op semantics, printer, parser and verifier to allow 0-d transfers.
Until 0-d vectors are available, such transfers have a special form that transits through vector<1xt>.
This is a stepping stone towards the longer term work of adding 0-d vectors and will help significantly reduce corner cases in vectorization.
Transformations and lowerings do not yet support this form, extensions will follow.
Differential Revision: https://reviews.llvm.org/D111559
vector.multi_reduction currently does not allow reducing down to a scalar.
This creates corner cases that are hard to handle during vectorization.
This revision extends the semantics and adds the proper transforms, lowerings and canonicalizations to allow lowering out of vector.multi_reduction to other abstractions all the way to LLVM.
In a future, where we will also allow 0-d vectors, scalars will still be relevant: 0-d vector and scalars are not equivalent on all hardware.
In the process, splice out the implementation patterns related to vector.multi_reduce into a new file.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D111442
`hint-expression` is an IntegerAttr, because it can be a combination of multiple values from the enum `omp_sync_hint_t` (Section 2.17.12 of OpenMP 5.0)
Reviewed By: ftynse, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D111360
Call `printType(subElemType)` instead of `os << subElemType` for them.
It allows to handle type aliases inside complex types.
As a side effect, fixed `test.int` parsing.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111536
* Call `llvm_canonicalize_cmake_booleans` for all CMake options,
which are propagated to `lit.local.cfg` files.
* Use Python native boolean values instead of strings for such options.
This fixes the cases, when CMake variables have values other than `ON` (like `TRUE`).
This might happen due to IDE integration or due to CMake preset usage.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D110073
Operations that have the InferTypeOpInterface trait can now omit the return
types in their custom assembly formats.
Differential Revision: https://reviews.llvm.org/D111326
This relaxes vectorization of dense memrefs a bit so that affine expressions
are allowed in more outer dimensions. Vectorization of non unit stride
references is disabled though, since this seems ineffective anyway.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111469
1. Add support to vectorize induction variables of loops that are
not mapped to any vector dimension in SuperVectorize pass.
2. Fix a bug in getForInductionVarOwner.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D111370
This test is crashing 9 out of 10 runs in CI, but I can't reproduce
locally right now. Disabling to get the CI back to green and avoid
backsliding with more ASAN issues that would go unnoticed.
Instead of hard-coding results for both Intel and AMD, let's relax
the checks to simplify the test while supporting both implementations.
Note that:
- If a new hardware implementation comes up in the future, it is likely
to pass the relaxed tests, i.e. no future maintenance burden for us.
- If something terribly wrong happens (e.g. instead of rsqrt we
execute 1/sqrt), the tests will probably catch it, since the relaxed
tests expect low precision (e.g. rsqrt(1) != 1.0).
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D111461
TensorLiteralParser::getHexAttr does a isIntOrIndexOrFloat check and properly handles index elements, but TensorLiteralParser::getAttr that calls into it has a mismatched check. This just makes the checks match so that index element attrs can parse when of type tensor.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D111374
The purpose of this revision is to make "write into non-writable memory" conflict detection easier to understand.
The main idea is that there is a conflict in the case of inplace bufferization if:
1. Someone writes to (an alias of) opOperand, opResult or the to-be-bufferized op writes itself.
2. And, opOperand or opResult aliases a non-writable buffer.
Differential Revision: https://reviews.llvm.org/D111379
This commit adds a pattern to perform constant folding on linalg
generic ops which are essentially transposes. We see real cases
where model importers may generate such patterns.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D110597
Introduce support for accepting ops instead of values when constructing ops. A
single-result op can be used instead of a value, including in lists of values,
and any op can be used instead of a list of values. This is similar to, but
more powerful, than the C++ API that allows for implicitly casting an OpType to
Value if it is statically known to have a single result - the cast in Python is
based on the op dynamically having a single result, and also handles the
multi-result case. This allows to build IR in a more concise way:
op = dialect.produce_multiple_results()
other = dialect.produce_single_result()
dialect.consume_multiple_results(other, op)
instead of having to access the results manually
op = dialect.produce.multiple_results()
other = dialect.produce_single_result()
dialect.consume_multiple_results(other.result, op.operation.results)
The dispatch is implemented directly in Python and is triggered automatically
for autogenerated OpView subclasses. Extension OpView classes should use the
functions provided in ods_common.py if they want to implement this behavior.
An alternative could be to implement the dispatch in the C++ bindings code, but
it would require to forward opaque types through all Python functions down to a
binding call, which makes it hard to inspect them in Python, e.g., to obtain
the types of values.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111306
The convolution op is one of the remaining hard coded Linalg operations that have no region attached. It got obsolete due to the OpDSL convolution operations. Removing it allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.
Test needed due to specialized implementations are removed. Tiling and fusion tests are replaced by variants using linalg.conv_2d.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111233
These kind of function can behave differently on these X86 chips, there
isn't really "one true answer" so we'll accept both.
Also remove spurious passes and use mattr="avx" to match the instruction
used here.
Differential Revision: https://reviews.llvm.org/D111373
Currently Affine LICM checks iterOperands and does not hoist out any
instruction containing iterOperands. We should check iterArgs instead.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D111090
* Need to investigate the proper solution to https://github.com/pybind/pybind11/issues/3336 or engineer something different.
* The attempt to produce an empty buffer_info as a workaround triggers asan/ubsan.
* Usage of this API does not arise naturally in practice yet, and it is more important to be asan/crash clean than have a solution right now.
* Switching back to raising an exception, even though that triggers terminate().
* This already half existed in terms of reading the raw buffer backing a DenseElementsAttr.
* Documented the precise expectations of the buffer layout.
* Extended the Python API to support construction from bitcasted buffers, allowing construction of all primitive element types (even those that lack a compatible representation in Python).
* Specifically, the Python API can now load all integer types at all bit widths and all floating point types (f16, f32, f64, bf16).
Differential Revision: https://reviews.llvm.org/D111284
It was bundling quite a lot of patterns that convert high-D
vector ops into low-D elementary ops. It might not be good
for all of the patterns to happen for a particular downstream
user. For example, `ShapeCastOpRewritePattern` rewrites
`vector.shape_cast` into data movement extract/insert ops.
Instead, split the entry point into multiple ones so users
can pull in patterns on demand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D111225
ConstShapeOp has a constant shape, so its type can always be static.
We still allow it to have ShapeType though.
Differential Revision: https://reviews.llvm.org/D111139
Update OpDSL to support unsigned integers by adding unsigned min/max/cast signatures. Add tests in OpDSL and on the C++ side to verify the proper signed and unsigned operations are emitted.
The patch addresses an issue brought up in https://reviews.llvm.org/D111170.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D111230
Add folding rule for std.or op when an operand has all bits set.
or(x, <all bits set>) -> <all bits set>
Differential Revision: https://reviews.llvm.org/D111206
This was causing a subsequent assert/crash when a type converter failed to convert a block argument.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D110985
For the type lattice, we (now) use the "less specialized or equal" partial
order, leading to the bottom representing the empty set, and the top
representing any type.
This naming is more in line with the generally used conventions, where the top
of the lattice is the full set, and the bottom of the lattice is the empty set.
A typical example is the powerset of a finite set: generally, meet would be the
intersection, and join would be the union.
```
top: {a,b,c}
/ | \
{a,b} {a,c} {b,c}
| X X |
{a} { b } {c}
\ | /
bottom: { }
```
This is in line with the examined lattice representations in LLVM:
* lattice for `BitTracker::BitValue` in `Hexagon/BitTracker.h`
* lattice for constant propagation in `HexagonConstPropagation.cpp`
* lattice in `VarLocBasedImpl.cpp`
* lattice for address space inference code in `InferAddressSpaces.cpp`
Reviewed By: silvas, jpienaar
Differential Revision: https://reviews.llvm.org/D110766
Implement min and max using the newly introduced std operations instead of relying on compare and select.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D111170
This fixes round-trip / ambiguity when an operation in the standard dialect would
have the same name as an operation in the default dialect.
Differential Revision: https://reviews.llvm.org/D111204
This patch extends Linalg core vectorization with support for min/max reductions
in linalg.generic ops. It enables the reduction detection for min/max combiner ops.
It also renames MIN/MAX combining kinds to MINS/MAXS to make the sign explicit for
floating point and signed integer types. MINU/MAXU should be introduce din the future
for unsigned integer types.
Reviewed By: pifon2a, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D110854
This avoids keeping references to passes that may be freed by
the time that the pass manager has finished executing (in the
non-crash case).
Fixes PR#52069
Differential Revision: https://reviews.llvm.org/D111106
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
These are considered noops.
Buferization will still fail on scf.execute_region which yield values.
This is used to make comprehensive bufferization interoperate better with external clients.
Differential Revision: https://reviews.llvm.org/D111130
The discussion in https://reviews.llvm.org/D110425 demonstrated that "packing"
may be a confusing term to define the behavior of this op in presence of the
attribute. Instead, indicate the intended effect of preventing the folder from
being applied.
Reviewed By: nicolasvasilache, silvas
Differential Revision: https://reviews.llvm.org/D111046
This patch is mainly to propogate location attribute from spv.GlobalVariable to llvm.mlir.global.
It also contains three small changes.
1. Remove the restriction on UniformConstant In SPIRVToLLVM.cpp;
2. Remove the errorCheck on relaxedPrecision when deserializering SPIR-V in Deserializer.cpp
3. In SPIRVOps.cpp, let ConstantOp take signedInteger too.
Co-authered: Alan Liu <alanliu.yf@gmail.com> and Xinyi Liu <xyliuhelen@gmail.com>
Reviewed by:antiagainst
Differential revision: https://reviews.llvm.org/D110207
The new constructor relies on type-based dynamic dispatch and allows one to
construct call operations given an object representing a FuncOp or its name as
a string, as opposed to requiring an explicitly constructed attribute.
Depends On D110947
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110948
Constructing a ConstantOp using the default-generated API is verbose and
requires to specify the constant type twice: for the result type of the
operation and for the type of the attribute. It also requires to explicitly
construct the attribute. Provide custom constructors that take the type once
and accept a raw value instead of the attribute. This requires dynamic dispatch
based on type in the constructor. Also provide the corresponding accessors to
raw values.
In addition, provide a "refinement" class ConstantIndexOp similar to what
exists in C++. Unlike other "op view" Python classes, operations cannot be
automatically downcasted to this class since it does not correspond to a
specific operation name. It only exists to simplify construction of the
operation.
Depends On D110946
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110947
Provide a couple of quality-of-life usability improvements for Python bindings,
in particular:
* give access to the list of types for the list of op results or block
arguments, similarly to ValueRange->TypeRange,
* allow for constructing empty dictionary arrays,
* support construction of array attributes by concatenating an existing
attribute with a Python list of attributes.
All these are required for the upcoming customization of builtin and standard
ops.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110946
Add missing mlir-capi-*-test tool substitutions in order to fix CAPI
test failures when mlir is not installed yet.
Differential Revision: https://reviews.llvm.org/D110991
Include mlir_tools_dir in the PATH used in test environment,
as otherwise mlir-reduce is unable to find mlir-opt when building
standalone (and hence mlir_tools_dir != llvm_tools_dir).
Differential Revision: https://reviews.llvm.org/D110992
First the leak sanitizer has to be disabled, as even an empty script
leads to leak detection with Python.
Then we need to preload the ASAN runtime, as the main binary (python)
won't be linked against it. This will only work on Linux right now.
Differential Revision: https://reviews.llvm.org/D111004
Exposes mlir::TypeID to the C API as MlirTypeID along with various accessors
and helper functions.
Differential Revision: https://reviews.llvm.org/D110897
The pooling ops are among the last remaining hard coded Linalg operations that have no region attached. They got obsolete due to the OpDSL pooling operations. Removing them allows us to delete specialized code and tests that are not needed for the OpDSL counterparts that rely on the standard code paths.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110909
Add support for dynamic shared memory for GPU launch ops: add an
optional operand to gpu.launch and gpu.launch_func ops to specify the
amount of "dynamic" shared memory to use. Update lowerings to connect
this operand to the GPU runtime.
Differential Revision: https://reviews.llvm.org/D110800
For convolution, the input window dimension's access affine map
is of the form `(d0 * s0 + d1)`, where `d0`/`d1` is the output/
filter window dimension, and `s0` is the stride.
When tiling, https://reviews.llvm.org/D109267 changed how the
way dimensions are acquired. Instead of directly querying using
`*.dim` ops on the original convolution op, we now get it by
applying the access affine map to the loop upper bounds. This
is fine for dimensions having single-dimension affine maps,
like matmul, but not for convolution input. It will cause
incorrect compuation and out of bound. A concrete example, say
we have 1x225x225x3 (NHWC) input, 3x3x3x32 (HWCF) filter, and
1x112x112x3 (NHWC) output with stride 2, (112 * 2 + 3) would be
227, which is different from the correct input window dimension
size 225.
Instead, we should first calculate the max indices for each loop,
and apply the affine map to them, and then plus one to get the
dimension size. Note this makes no difference for matmul-like
ops given they will have `d0 - 1 + 1` effectively.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110849
* This could have been removed some time ago as it only had one op left in it, which is redundant with the new approach.
* `matmul_i8_i8_i32` (the remaining op) can be trivially replaced by `matmul`, which natively supports mixed precision.
Differential Revision: https://reviews.llvm.org/D110792
This is an important core dialect that has not been exposed previously. Set up
the default bindings generation and provide a nicer wrapper for the `for` loop
with access to the loop configuration and body.
Depends On D110758
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110759
Without this change, these attributes can only be accessed through the generic
operation attribute dictionary provided the caller knows the special operation
attribute names used for this purpose. Add some Python wrapping to support this
use case.
Also provide access to function arguments usable inside the function along with
a couple of quality-of-life improvements in using block arguments (function
arguments being the arguments of its entry block).
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110758
The former is redundant because the later carries it as part of
its builder. Add a getContext() helper method to DialectAsmParser
to make this more convenient, and stop passing the context around
explicitly. This simplifies ODS generated parser hooks for attrs
and types.
This resolves PR51985
Recommit 4b32f8bac4 after fixing a dependency.
Differential Revision: https://reviews.llvm.org/D110796
The former is redundant because the later carries it as part of
its builder. Add a getContext() helper method to DialectAsmParser
to make this more convenient, and stop passing the context around
explicitly. This simplifies ODS generated parser hooks for attrs
and types.
This resolves PR51985
Differential Revision: https://reviews.llvm.org/D110796
This revision retires a good portion of the complexity of the codegen strategy and puts the logic behind pass logic.
Differential revision: https://reviews.llvm.org/D110678
Unroll-and-jam currently doesn't work when the loop being unroll-and-jammed
or any of its inner loops has iter_args. This patch modifies the
unroll-and-jam utility to support loops with iter_args.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110085
Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110572
This integration tests runs a fused and non-fused version of
sampled matrix multiplication. Both should eventually have the
same performance!
NOTE: relies on pending tensor.init fix!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110444
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110442
New mode option that allows for either running the default fusion kind that happens today or doing either of producer-consumer or sibling fusion. This will also be helpful to minimize the compile-time of the fusion tests.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D110102
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.
Add tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D110373
This revision extracts padding hoisting in a new file and cleans it up in prevision of future improvements and extensions.
Differential Revision: https://reviews.llvm.org/D110414
When splitting with linalg.copy, cannot write into the destination alloc directly. Instead, write into a subview of the alloc.
Differential Revision: https://reviews.llvm.org/D110512
We currently, incorrectly, assume that a range always has at least
one element when building a contiguous range. This commit adds
a proper empty check to avoid crashing.
Differential Revision: https://reviews.llvm.org/D110457
For such cases, the type of the constant DenseElementsAttr is
different from the transpose op return type.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110446
This patch introduces a generic reduction detection utility that works
across different dialecs. It is mostly a generalization of the reduction
detection algorithm in Affine. The reduction detection logic in Affine,
Linalg and SCFToOpenMP have been replaced with this new generic utility.
The utility takes some basic components of the potential reduction and
returns: 1) the reduced value, and 2) a list with the combiner operations.
The logic to match reductions involving multiple combiner operations disabled
until we can properly test it.
Reviewed By: ftynse, bondhugula, nicolasvasilache, pifon2a
Differential Revision: https://reviews.llvm.org/D110303
These are among the last operations still defined explicitly in C++. I've
tried to keep this commit as NFC as possible, but these ops
definitely need a non-NFC cleanup at some point.
Differential Revision: https://reviews.llvm.org/D110440
* If the input is a constant splat value, we just
need to reshape it.
* If the input is a general constant with one user,
we can also constant fold it, without bloating
the IR.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D110439
This commits updates the remaining usages of the ArrayRef<Value> based
matchAndRewrite/rewrite methods in favor of the new OpAdaptor
overload.
Differential Revision: https://reviews.llvm.org/D110360
Initially, the padding transformation and the related operation were only used
to guarantee static shapes of subtensors in tiled operations. The
transformation would not insert the padding operation if the shapes were
already static, and the overall code generation would actively remove such
"noop" pads. However, this transformation can be also used to pack data into
smaller tensors and marshall them into faster memory, regardless of the size
mismatches. In context of expert-driven transformation, we should assume that,
if padding is requested, a potentially padded tensor must be always created.
Update the transformation accordingly. To do this, introduce an optional
`packing` attribute to the `pad_tensor` op that serves as an indication that
the padding is an intentional choice (as opposed to side effect of type
normalization) and should be left alone by cleanups.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110425
This canonicalization pattern complements the tensor.cast(pad_tensor) one in
propagating constant type information when possible. It contributes to the
feasibility of pad hoisting.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110343
* Do not discard static result type information that cannot be inferred from lower/upper padding.
* Add optional argument to `PadTensorOp::inferResultType` for specifying known result dimensions.
Differential Revision: https://reviews.llvm.org/D110380
This is only noticeable when using an attribute across dialects I think.
Previously the namespace would be ommited, but it wouldn't matter as
long as the generated code stays within a single namespace.
Differential Revision: https://reviews.llvm.org/D110367
Enables putting types and attributes in sets and in dicts as keys.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D110301
This test makes sure kernels map to efficient sparse code, i.e. all
compressed for-loops, no co-iterating while loops. In addition, this
revision removes the special constant folding inside the sparse
compiler in favor of Mahesh' new generic linalg folding. Thanks!
NOTE: relies on Mahesh fix, which needs to be rebased first
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110001
When both a DefaultValuedAttr and a successor or variadic region was specified, this would generate invalid C++ declaration. There would be the parameter with a default value, followed by the successors/regions, which don't have a default, which is invalid.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110205
The current folder of constant -> generic op only handles splat
constants. The same logic holds for scalar constants. Teach the
pattern to handle such cases.
Differential Revision: https://reviews.llvm.org/D109982
This fixes a bug where we discover new information about the arguments of an
already executable edge, but don't visit the arguments. We only visit the arguments, and not the block itself, so this commit shouldn't really affect performance at all.
Fixes PR#51871
Differential Revision: https://reviews.llvm.org/D110197
Now not just SUM, but also PRODUCT, AND, OR, XOR. The reductions
MIN and MAX are still to be done (also depends on recognizing
these operations in cmp-select constructs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110203
Previously, the translation to LLVM IR would emit IR that directly uses
a scope metadata node in case only one scope was in use in alias.scopes
or noalias metadata. It should always be a list of scopes. The verifier
change in 8700f2bd36 enforced this and
broke the test. Fix the translation to always create a list of scopes
using a new metadata node, update and reenable the respective test.
Fixes PR51919.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110140
Compute the tiled producer slice dimensions directly starting from the consumer not using the producer at all.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110147
It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection.
This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases.
The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted.
Differential Revision: https://reviews.llvm.org/D110072
Previously, comprehensive bufferize would consider all aliasing reads and writes to
the result buffer and matching operand. This resulted in spurious dependences
being considered and resulted in too many unnecessary copies.
Instead, this revision revisits the gathering of read and write alias sets.
This results in fewer alloc and copies.
An exhaustive test cases is added that considers all possible permutations of
`matmul(extract_slice(fill), extract_slice(fill), ...)`.
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
the MLIR parser itself. Provides a standard interface for doing this that
is less error prone and less boilerplate.
While here, extend Delimiter to support <> and {} delimited sequences as
well (I have a use for <> in CIRCT specifically).
Differential Revision: https://reviews.llvm.org/D110122
This revision refactors ElementsAttr into an Attribute Interface.
This enables a common interface with which to interact with
element attributes, without needing to modify the builtin
dialect. It also removes a majority (if not all?) of the need for
the current OpaqueElementsAttr, which was originally intended as
a way to opaquely represent data that was not representable by
the other builtin constructs.
The new ElementsAttr interface not only allows for users to
natively represent their data in the way that best suits them,
it also allows for efficient opaque access and iteration of the
underlying data. Attributes using the ElementsAttr interface
can directly expose support for interacting with the held
elements using any C++ data type they claim to support. For
example, DenseIntOrFpElementsAttr supports iteration using
various native C++ integer/float data types, as well as
APInt/APFloat, and more. ElementsAttr instances that refer to
DenseIntOrFpElementsAttr can use all of these data types for
iteration:
```c++
DenseIntOrFpElementsAttr intElementsAttr = ...;
ElementsAttr attr = intElementsAttr;
for (uint64_t value : attr.getValues<uint64_t>())
...;
for (APInt value : attr.getValues<APInt>())
...;
for (IntegerAttr value : attr.getValues<IntegerAttr>())
...;
```
ElementsAttr also supports failable range/iterator access,
allowing for selective code paths depending on data type
support:
```c++
ElementsAttr attr = ...;
if (auto range = attr.tryGetValues<uint64_t>()) {
for (uint64_t value : *range)
...;
}
```
Differential Revision: https://reviews.llvm.org/D109190
SparseElementsAttr currently does not perform any verfication on construction, with the only verification existing within the parser. This revision moves the parser verification to SparseElementsAttr, and also adds additional verification for when a sparse index is not valid.
Differential Revision: https://reviews.llvm.org/D109189
Some patterns may share the common DAG structures. Generate a static
function to do the match logic to reduce the binary size.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D105797
For `memref.subview` operations, when there are more than one
unit-dimensions, the strides need to be used to figure out which of
the unit-dims are actually dropped.
Differential Revision: https://reviews.llvm.org/D109418
Add an interface that allows grouping together all covolution and
pooling ops within Linalg named ops. The interface currently
- the indexing map used for input/image access is valid
- the filter and output are accessed using projected permutations
- that all loops are charecterizable as one iterating over
- batch dimension,
- output image dimensions,
- filter convolved dimensions,
- output channel dimensions,
- input channel dimensions,
- depth multiplier (for depthwise convolutions)
Differential Revision: https://reviews.llvm.org/D109793