Previously, OpDSL did not support rank polymorphism, which required a separate implementation of linalg.fill. This revision extends OpDSL to support rank polymorphism for a limited class of operations that access only scalars and tensors of rank zero. At operation instantiation time, it scales these scalar computations to multi-dimensional pointwise computations by replacing the empty indexing maps with identity index maps. The revision does not change the DSL itself, instead it adapts the Python emitter and the YAML generator to generate different indexing maps and and iterators depending on the rank of the first output.
Additionally, the revision introduces a `linalg.fill_tensor` operation that in a future revision shall replace the current handwritten `linalg.fill` operation. `linalg.fill_tensor` is thus only temporarily available and will be renamed to `linalg.fill`.
Reviewed By: nicolasvasilache, stellaraccident
Differential Revision: https://reviews.llvm.org/D119003
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191
If the result operand has a unit leading dim it is removed from all operands.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119206
Fix fold-memref-subview-ops for affine.load/store. We need to expand out
the affine apply on its operands.
Differential Revision: https://reviews.llvm.org/D119402
Reuse the higher precision F32 approximation for the F16 one (by expanding and
truncating). This is partly RFC as I'm not sure what the expectations are here
(e.g., these are only for F32 and should not be expanded, that reusing
higher-precision ones for lower precision is undesirable due to increased
compute cost and only approximations per exact type is preferred, or this is
appropriate [at least as fallback] but we need to see how to make it more
generic across all the patterns here).
Differential Revision: https://reviews.llvm.org/D118968
For 0-D as well as 1-D vectors, both these patterns should
return a failure as there is no need to collapse the shape
of the source. Currently, only 1-D vectors were handled. This
patch handles the 0-D case as well.
Reviewed By: Benoit, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119202
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
Fix the verification function of spirv::ConstantOp to allow nesting
array attributes.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D118939
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.
Differential Revision: https://reviews.llvm.org/D119153
This is both more efficient and more ergonomic to use, as inverting a
bit vector is trivial while inverting a set is annoying.
Sadly this leaks into a bunch of APIs downstream, so adapt them as well.
This would be NFC, but there is an ordering dependency in MemRefOps's
computeMemRefRankReductionMask. This is now deterministic, previously it
was dependent on SmallDenseSet's unspecified iteration order.
Differential Revision: https://reviews.llvm.org/D119076
Adapt `tileConsumerAndFuseProducers` to return failure if the generated tile loop nest is empty since all tile sizes are zero. Additionally, fix `LinalgTileAndFuseTensorOpsPattern` to return success if the pattern applied successfully.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D118878
Induction variable calculation was ignoring scf.for step value. Fix it to get
the correct induction variable value in the prologue.
Differential Revision: https://reviews.llvm.org/D118932
-- This commit adds a canonicalization pattern on scf.while to remove
the loop invariant arguments.
-- An argument is considered loop invariant if the iteration argument value is
the same as the corresponding one being yielded (at the same position) in both
the before/after block of scf.while.
-- For the arguments removed, their use within scf.while and their corresponding
scf.while's result are replaced with their corresponding initial value.
Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com>
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116923
This is completely unused upstream, and does not really have well defined semantics
on what this is supposed to do/how this fits into the ecosystem. Given that, as part of
splitting up the standard dialect it's best to just remove this behavior, instead of try
to awkwardly fit it somewhere upstream. Downstream users are encouraged to
define their own operations that clearly can define the semantics of this.
This also uncovered several lingering uses of ConstantOp that weren't
updated to use arith::ConstantOp, and worked during conversions because
the constant was removed/converted into something else before
verification.
See https://llvm.discourse.group/t/standard-dialect-the-final-chapter/ for more discussion.
Differential Revision: https://reviews.llvm.org/D118654
This is part of the larger effort to split the standard dialect. This will also allow for pruning some
additional dependencies on Standard (done in a followup).
Differential Revision: https://reviews.llvm.org/D118202
This revision avoids incorrect hoisting of alloca'd buffers across an AutomaticAllocationScope boundary.
In the more general case, we will probably need a ParallelScope-like interface.
Differential Revision: https://reviews.llvm.org/D118768
Use type inference when building the TransferWriteOp in the TransferWritePermutationLowering. Previously, the result type has been set to Type() which triggers an assertion if the pattern is used with tensors instead of memrefs.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D118758
Following the discussion in D118318, mark `arith.addf/mulf` commutative.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D118600
Support affine.load/store ops in fold-memref-subview ops pass. The
existing pass just "inlines" the subview operation on load/stores by
inserting affine.apply ops in front of the memref load/store ops: this
is by design always consistent with the semantics on affine.load/store
ops and the same would work even more naturally/intuitively with the
latter.
Differential Revision: https://reviews.llvm.org/D118565
Update SCF pass cmd line names to prefix `scf`. This is consistent with
guidelines/convention on how to name dialect passes. This also avoids
ambiguity on the context given the multiple `for` operations in the
tree.
NFC.
Differential Revision: https://reviews.llvm.org/D118564
There was a bug where some of the OpOperands needed in the replacement op were not in scope.
It does not matter where the replacement op is inserted. Any insertion point is OK as long as there are no dominance errors. In the worst case, the newly inserted op will bufferize out-of-place. This is no worse than not eliminating the InitTensorOp at all.
Differential Revision: https://reviews.llvm.org/D117685
The bufferization of arith.constant ops is also switched over to BufferizableOpInterface-based bufferization. The old implementation is deleted. Both implementations utilize GlobalCreator, now renamed to just `getGlobalFor`.
GlobalCreator no longer maintains a set of all created allocations to avoid duplicate allocations of the same constant. Instead, `getGlobalFor` scans the module to see if there is already a global allocation with the same constant value.
For compatibility reasons, it is still possible to create a pass that bufferizes only `arith.constant`. This pass (createConstantBufferizePass) could be deleted once all users were switched over to One-Shot bufferization.
Differential Revision: https://reviews.llvm.org/D118483
This patch adds the vector.scan op which computes the
scan for a given n-d vector. It requires specifying the operator,
the identity element and whether the scan is inclusive or
exclusive.
TEST: Added test in ops.mlir
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D117171
This is in preparation of switching `-tensor-constant-bufferize` and `-arith-bufferize` to BufferizableOpInterface-based implementations.
Differential Revision: https://reviews.llvm.org/D118324
This commit switches the `tensor-bufferize` pass over to BufferizableOpInterface-based bufferization.
Differential Revision: https://reviews.llvm.org/D118246
The pass can currently not handle to_memref(to_tensor(x)) folding where a cast is necessary. This is required with the new unified bufferization. There is already a canonicalization pattern that handles such foldings and it should be used during this pass.
Differential Revision: https://reviews.llvm.org/D117988