The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.
Aside from the directory change, the pattern and pass creation API have been renamed:
* populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
* populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
* createLowerToLLVMPass -> createConvertFuncToLLVMPass
Differential Revision: https://reviews.llvm.org/D120778
The default lowering of vector transpose operations generates a large sequence of
scalar extract/insert operations, one pair for each scalar element in the input tensor.
In other words, the vector transpose is scalarized. However, there are transpose
patterns where one or more adjacent high-order dimensions are not transposed (for
example, in the transpose pattern [1, 0, 2, 3], dimensions 2 and 3 are not transposed).
This patch improves the lowering of those cases by not scalarizing them and extracting/
inserting a full n-D vector, where 'n' is the number of adjacent high-order dimensions
not being transposed. By doing so, we prevent the scalarization of the code and generate a
more performant vector version.
Paradoxically, this patch shouldn't improve the performance of transpose operations if
we are using LLVM. The LLVM pipeline is able to optimize away some of the extract/insert
operations and the SLP vectorizer is converting the scalar operations back to its vector
form. However, scalarizing a vector version of the code in MLIR and relying on the SLP
vectorizer to reconstruct the vector code again is highly undesirable for several reasons.
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120601
This patch extends the existing if combining canonicalization to also handle the case where a value returned by the first if is used within the body of the second if.
This patch also extends if combining to support if's whose conditions are logical negations of each other.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120924
We can simplify an extractvalue of an insertvalue to extract out of the base of the insertvalue, if the insert and extract are at distinct and non-prefix'd indices
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120915
Extend isLoopMemoryParallel check to include locally allocated memrefs.
This strengthens and also speeds up the dependence check used by the
utility by excluding locally allocated memrefs where appropriate.
Additional memref dialect ops can be supported exhaustively via proper
interfaces.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D120617
This commit deletes the old dialect conversion-based bufferization patterns, which are now obsolete.
Differential Revision: https://reviews.llvm.org/D120883
Add support for integer and float types into the data layout subsystem with
default logic similar to LLVM IR. Given the flexibility of the sybsystem, the
logic can be easily overwritten by operations if necessary. This provides the
connection necessary, e.g., for the GPU target where alignment requirements for
integers and floats differ from those provided by default (although still
compatible with the LLVM IR model). Previously, it was impossible to use
non-default alignment requirements for integer and float types, which could
lead to incorrect address and size calculations when targeting GPUs.
Depends On D120737
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D120739
This patch adds assemblyFormat for `omp.critical.declare`, `omp.atomic.read`,
`omp.atomic.write`, `omp.atomic.update` and `omp.atomic.capture`.
Also removing those clauses from `parseClauses` that aren't needed
anymore, thanks to the new assemblyFormats.
Reviewed By: NimishMishra, rriddle
Differential Revision: https://reviews.llvm.org/D120248
The remanants of Standard was renamed to Func, but the test directory
remained named as Standard. In adidition to fixing the name, this commit
also moves the tests for operations not in the Func dialect to the proper
parent dialect test directory.
The last remaining operations in the standard dialect all revolve around
FuncOp/function related constructs. This patch simply handles the initial
renaming (which by itself is already huge), but there are a large number
of cleanups unlocked/necessary afterwards:
* Removing a bunch of unnecessary dependencies on Func
* Cleaning up the From/ToStandard conversion passes
* Preparing for the move of FuncOp to the Func dialect
See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061
Differential Revision: https://reviews.llvm.org/D120624
As discussed in https://reviews.llvm.org/D119743 scf.parallel would continuously stack allocate since the alloca op was placd in the wsloop rather than the omp.parallel. This PR is the second stage of the fix for that problem. Specifically, we now introduce an alloca scope around the inlined body of the scf.parallel and enable a canonicalization to hoist the allocations to the surrounding allocation scope (e.g. omp.parallel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120423
The llvm.mlir.global operation accepts a region as initializer. This region
corresponds to an LLVM IR constant expression and therefore should not accept
operations with side effects. Add a corresponding verifier.
Reviewed By: wsmoses, bondhugula
Differential Revision: https://reviews.llvm.org/D120632
The revision extends OpDSL with unary and binary function attributes. A function attribute, makes the operations used in the body of a structured operation configurable. For example, a pooling operation may take an aggregation function attribute that specifies if the op shall implement a min or a max pooling. The goal of this revision is to define less and more flexible operations.
We may thus for example define an element wise op:
```
linalg.elem(lhs, rhs, outs=[out], op=BinaryFn.mul)
```
If the op argument is not set the default operation is used.
Depends On D120109
Reviewed By: nicolasvasilache, aartbik
Differential Revision: https://reviews.llvm.org/D120110
Add a pattern matcher for ExtractSliceOp when its source is a constant.
The matching heuristics can be governed by the control function since
generating a new constant is not always beneficial.
Differential Revision: https://reviews.llvm.org/D119605
If we have a chain of `tensor.insert_slice` ops inserting some
`tensor.pad` op into a `linalg.fill` and ranges do not overlap,
we can also elide the `tensor.pad` later.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120446
Fold tensor.insert_slice(tensor.pad(<input>), linalg.fill) into
tensor.insert_slice(<input>, linalg.fill) if the padding value and
the filling value are the same.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120410
Improve the LinalgOp verification to ensure the iterator types is known. Previously, unknown iterator types have been ignored without warning, which can lead to confusing bugs.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120649
The AVX2 lowering for transpose operations is only applicable to f32 vector types.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D120427
The existing AVX2 lowering patterns for the transpose op only triggers if the
input vector is 2-D. This patch extends the patterns to trigger for n-D vectors
which are effectively 2-D vectors (e.g., vector<1x4x1x8x1). The main constraint
for the generalized AVX2 patterns to be applicable to these vectors is that the
dimensions that are greater than one must be transposed. Otherwise, the existing
patterns are not applicable.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D119505
This change gives explicit order of verifier execution and adds
`hasRegionVerifier` and `verifyWithRegions` to increase the granularity
of verifier classification. The orders are as below,
1. InternalOpTrait will be verified first, they can be run independently.
2. `verifyInvariants` which is constructed by ODS, it verifies the type,
attributes, .etc.
3. Other Traits/Interfaces that have marked their verifier as
`verifyTrait` or `verifyWithRegions=0`.
4. Custom verifier which is defined in the op and has marked
`hasVerifier=1`
If an operation has regions, then it may have the second phase,
5. Traits/Interfaces that have marked their verifier as
`verifyRegionTrait` or
`verifyWithRegions=1`. This implies the verifier needs to access the
operations in its regions.
6. Custom verifier which is defined in the op and has marked
`hasRegionVerifier=1`
Note that the second phase will be run after the operations in the
region are verified. Based on the verification order, you will be able to
avoid verifying duplicate things.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D116789
Previously, OpDSL operation used hardcoded type conversion operations (cast or cast_unsigned). Supporting signed and unsigned casts thus meant implementing two different operations. Type function attributes allow us to define a single operation that has a cast type function attribute which at operation instantiation time may be set to cast or cast_unsigned. We may for example, defina a matmul operation with a cast argument:
```
@linalg_structured_op
def matmul(A=TensorDef(T1, S.M, S.K), B=TensorDef(T2, S.K, S.N), C=TensorDef(U, S.M, S.N, output=True),
cast=TypeFnAttrDef(default=TypeFn.cast)):
C[D.m, D.n] += cast(U, A[D.m, D.k]) * cast(U, B[D.k, D.n])
```
When instantiating the operation the attribute may be set to the desired cast function:
```
linalg.matmul(lhs, rhs, outs=[out], cast=TypeFn.cast_unsigned)
```
The revsion introduces a enum in the Linalg dialect that maps one-by-one to the type functions defined by OpDSL.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D119718
This transformation is useful to break dependency between consecutive loop
iterations by increasing the size of a temporary buffer. This is usually
combined with heavy software pipelining.
Differential Revision: https://reviews.llvm.org/D119406
This adds a variable op, emitted as C/C++ locale variable, which can be
used if the `emitc.constant` op is not sufficient.
As an example, the canonicalization pass would transform
```mlir
%0 = "emitc.constant"() {value = 0 : i32} : () -> i32
%1 = "emitc.constant"() {value = 0 : i32} : () -> i32
%2 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
%3 = emitc.apply "&"(%1) : (i32) -> !emitc.ptr<i32>
emitc.call "write"(%2, %3) : (!emitc.ptr<i32>, !emitc.ptr<i32>) -> ()
```
into
```mlir
%0 = "emitc.constant"() {value = 0 : i32} : () -> i32
%1 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
%2 = emitc.apply "&"(%0) : (i32) -> !emitc.ptr<i32>
emitc.call "write"(%1, %2) : (!emitc.ptr<i32>, !emitc.ptr<i32>) -> ()
```
resulting in pointer aliasing, as %1 and %2 point to the same address.
In such a case, the `emitc.variable` operation can be used instead.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D120098
The current implementation of ShuffleVectorOp assumes all vectors are
scalable. LLVM IR allows shufflevector operations on scalable vectors,
and the current translation between LLVM Dialect and LLVM IR does the
rigth thing when the shuffle mask is all zeroes. This is required to
do a splat operation on a scalable vector, but it doesn't make sense
for scalable vectors outside of that operation, i.e.: with non-all zero
masks.
Differential Revision: https://reviews.llvm.org/D118371
In D115022, we introduced an optimization where OpResults of a `linalg.generic` may bufferize in-place with an "in" OpOperand if the corresponding "out" OpOperand is not used in the computation.
This optimization can lead to unexpected behavior if the newly chosen OpOperand is in the same alias set as another OpOperand (that is used in the computation). In that case, the newly chosen OpOperand must bufferize out-of-place. This can be confusing to users, as always choosing the "out" OpOperand (regardless of whether it is used) would be expected when having the notion of "destination-passing style" in mind.
With this change, we go back to always bufferizing in-place with "out" OpOperands by default, but letting users override the behavior with a bufferization option.
Differential Revision: https://reviews.llvm.org/D120182
Given a cmpf of either uitofp or sitofp and a constant, attempt to canonicalize it to a cmpi.
This PR rewrites equivalent code within LLVM to now apply to MLIR arith.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117257
+ compare block size with the unrollable inner dimension
+ reduce nesting in the code and simplify a bit IR building
Reviewed By: cota
Differential Revision: https://reviews.llvm.org/D120075
The related functionality is moved over to the bufferization dialect. Test cases are cleaned up a bit.
Differential Revision: https://reviews.llvm.org/D120191
This commit adds canonicalization pattern in `linalg.generic` op
for static shape inference. If any of the inputs or outputs have
static shape or is casted from a tensor of static shape, then
shapes of all the inputs and outputs can be inferred by using the
affine map of the static shape input/output.
Signed-Off-By: Prateek Gupta <prateek@nod-labs.com>
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D118929
This patch adds assemblyFormat for omp.sections operation.
Some existing functions have been altered to fit the custom directive
in assemblyFormat. This has led to their callsites to get modified too,
but those will be removed in later patches, when other operations get
their assemblyFormat. All operations were not changed in one patch for
ease of review.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D120176
Add `BufferizableOpInterface::verifyAnalysis`. Ops can implement this method to check for expected invariants and limitations.
The purpose of this change is to introduce a modular way of checking assertions such as `assertScfForAliasingProperties`.
Differential Revision: https://reviews.llvm.org/D120189
This patch adds assemblyFormat for omp.parallel operation.
Some existing functions have been altered to fit the custom directive
in assemblyFormat. This has led to their callsites to get modified too,
but those will be removed in later patches, when other operations get
their assemblyFormat. All operations were not changed in one patch for
ease of review.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D120157
This patch removes the following clauses from OpenMP Dialect:
- private
- firstprivate
- lastprivate
- shared
- default
- copyin
- copyprivate
The privatization clauses are being handled in the flang frontend. The
data copying clauses are not being handled anywhere for now. Once
we have a better picture of how to handle these clauses in OpenMP
Dialect, we can add these. For the time being, removing unneeded
clauses.
For detailed discussion about this refer to [[ https://discourse.llvm.org/t/rfc-privatisation-in-openmp-dialect/3526 | Privatisation in OpenMP dialect ]]
Reviewed By: kiranchandramohan, clementval
Differential Revision: https://reviews.llvm.org/D120029
This is a bit awkward since ExtractOp allows both `f32` and
`vector<1xf32>` results for a scalar extraction. Allow both, but make
inference return the scalar to make this as NFC as possible.
This change changes the handling of trailing dimensions with unknown
extent. Users of the changessociationIndicesForReshape helper should
see benefits when transforming reshape like operations into
expand/collapse pairs if the higher-rank type has trailing unknown
dimensions.
The motivating example is a reshape from tensor<16x1x?xi32> to
tensor<16xi32> that can be modeled as collapsing the three dimensions.
Differential Revision: https://reviews.llvm.org/D119730
Previously, NaNs would be dropped in favor of bounded values which was
strictly incorrect. Now the min/max operation propagate this
information. Not all uses of min/max need this, but the given change
will help protect future additions, and this prevents the need for an
additional cmpf and select operation to handle NaNs.
Differential Revision: https://reviews.llvm.org/D120020
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D119883
In SPIR-V, resources are represented as global variables that
are bound to certain descriptor. SPIR-V requires those global
variables to be declared as aliased if multiple ones are bound
to the same slot. Such aliased decorations can cause issues
for transcompilers like SPIRV-Cross when converting to source
shading languages like MSL.
So this commit adds a pass to perform analysis of aliased
resources and see if we can unify them into one.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119872
This would create a double free when a memref is passed twice to the
same op. This wasn't a problem at the time the pass was written but is
common since the introduction of scf.while.
There's a latent non-determinism that's triggered by the test, but this
change is messy enough as-is so I'll leave that for later.
Differential Revision: https://reviews.llvm.org/D120044
Previously `gpu-kernel-outlining` pass was also doing index computation sinking into gpu.launch before actual outlining.
Split ops sinking from `gpu-kernel-outlining` pass into separate pass, so users can use theirs own sinking pass before outlining.
To achieve old behavior users will need to call both passes: `-gpu-launch-sink-index-computations -gpu-kernel-outlining`.
Differential Revision: https://reviews.llvm.org/D119932
A very small refactoring, but a big impact on tests that expect an exact order.
This revision fixes the tests, but also makes them less brittle for similar
minor changes in the future!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119992
This commit adds a pattern to wrap a tensor.pad op with
an scf.if op to separate the cases where we don't need padding
(all pad sizes are actually zeros) and where we indeed need
padding.
This pattern is meant to handle padding inside tiled loops.
Under such cases the padding sizes typically depend on the
loop induction variables. Splitting them would allow treating
perfect tiles and edge tiles separately.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D117018
Fusion of `linalg.generic` with
`tensor.expand_shape/tensor.collapse_shape` currently handles fusion
with reshape by expanding the dimensionality of the `linalg.generic`
operation. This helps fuse elementwise operations better since they
are fused at the highest dimensionality while keeping all indexing
maps involved projected permutations. The intent of these is to push
the reshape to the boundaries of functions.
The presence of named ops (or other ops across which the reshape
cannot be propagated) stops the propagation to the edges of the
function. At this stage, the converse patterns that fold the reshapes
with generic ops by collapsing the dimensions of the generic op can
push the reshape towards edges. In particular it helps the case where
reshapes exist in between named ops and generic ops.
`linalg.named_op` -> `tensor.expand_shape` -> `linalg.generic`
Pushing the reshape down will help fusion of `linalg.named_op` ->
`linalg.generic` using tile + fuse transformations.
This pattern is intended to replace the following patterns
1) FoldReshapeByLinearization : These patterns create indexing maps
that are not projected permutations that affect future
transformations. They are only useful for folding unit-dimensions.
2) PushReshapeByExpansion : This pattern has the same functionality
but has some restrictions
a) It tries to avoid creating new reshapes that limits its
applicability. The pattern added here can achieve the same
functionality through use of the `controlFn` that allows clients
of the pattern freedom to make this decision.
b) It does not work for ops with indexing semantics.
These patterns will be deprecated in a future patch.
Differential Revision: https://reviews.llvm.org/D119365
This allow user to register a callback that can annotate operations
during software pipelining. This allows user potential annotate op to
know what part of the pipeline they correspond to.
Differential Revision: https://reviews.llvm.org/D119866
Casting between scalable vectors and fixed-length vectors doesn't make
sense. If one of the operands is scalable, the other has to be scalable
to be able to guarantee they have the same shape at runtime.
Differential Revision: https://reviews.llvm.org/D119568
This patch changes the syntax of omp.atomic.update to allow the other
dialects to modify the variable with appropriate operations in the
region.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D119522
Add verifier for gpu.alloc op to verify if the dimension operand counts
and symbol operand counts are same as their memref counterparts.
Differential Revision: https://reviews.llvm.org/D117427
Also, it seems Khronos has changed html spec format so small adjustment to script was needed.
Base op parsing is also probably broken.
Differential Revision: https://reviews.llvm.org/D119678
Adds a pointer type to EmitC. The emission of pointers is so far only
possible by using the `emitc.opaque` type
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D119337
Index attributes had no default value, which means the attribute values had to be set on the operation. This revision adds a default parameter to `IndexAttrDef`. After the change, every index attribute has to define a default value. For example, we may define the following strides attribute:
```
```
When using the operation the default stride is used if the strides attribute is not set. The mechanism is implemented using `DefaultValuedAttr`.
Additionally, the revision uses the naming index attribute instead of attribute more consistently, which is a preparation for follow up revisions that will introduce function attributes.
Depends On D119125
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D119126
Previously, OpDSL did not support rank polymorphism, which required a separate implementation of linalg.fill. This revision extends OpDSL to support rank polymorphism for a limited class of operations that access only scalars and tensors of rank zero. At operation instantiation time, it scales these scalar computations to multi-dimensional pointwise computations by replacing the empty indexing maps with identity index maps. The revision does not change the DSL itself, instead it adapts the Python emitter and the YAML generator to generate different indexing maps and and iterators depending on the rank of the first output.
Additionally, the revision introduces a `linalg.fill_tensor` operation that in a future revision shall replace the current handwritten `linalg.fill` operation. `linalg.fill_tensor` is thus only temporarily available and will be renamed to `linalg.fill`.
Reviewed By: nicolasvasilache, stellaraccident
Differential Revision: https://reviews.llvm.org/D119003
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191
If the result operand has a unit leading dim it is removed from all operands.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119206
Fix fold-memref-subview-ops for affine.load/store. We need to expand out
the affine apply on its operands.
Differential Revision: https://reviews.llvm.org/D119402
Reuse the higher precision F32 approximation for the F16 one (by expanding and
truncating). This is partly RFC as I'm not sure what the expectations are here
(e.g., these are only for F32 and should not be expanded, that reusing
higher-precision ones for lower precision is undesirable due to increased
compute cost and only approximations per exact type is preferred, or this is
appropriate [at least as fallback] but we need to see how to make it more
generic across all the patterns here).
Differential Revision: https://reviews.llvm.org/D118968
For 0-D as well as 1-D vectors, both these patterns should
return a failure as there is no need to collapse the shape
of the source. Currently, only 1-D vectors were handled. This
patch handles the 0-D case as well.
Reviewed By: Benoit, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D119202
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
There are a few different test passes that check elementwise fusion in
Linalg. Consolidate them to a single pass controlled by different pass
options (in keeping with how `TestLinalgTransforms` exists).
Fix the verification function of spirv::ConstantOp to allow nesting
array attributes.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D118939
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.
Differential Revision: https://reviews.llvm.org/D119153
This is both more efficient and more ergonomic to use, as inverting a
bit vector is trivial while inverting a set is annoying.
Sadly this leaks into a bunch of APIs downstream, so adapt them as well.
This would be NFC, but there is an ordering dependency in MemRefOps's
computeMemRefRankReductionMask. This is now deterministic, previously it
was dependent on SmallDenseSet's unspecified iteration order.
Differential Revision: https://reviews.llvm.org/D119076
Adapt `tileConsumerAndFuseProducers` to return failure if the generated tile loop nest is empty since all tile sizes are zero. Additionally, fix `LinalgTileAndFuseTensorOpsPattern` to return success if the pattern applied successfully.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D118878
Induction variable calculation was ignoring scf.for step value. Fix it to get
the correct induction variable value in the prologue.
Differential Revision: https://reviews.llvm.org/D118932
-- This commit adds a canonicalization pattern on scf.while to remove
the loop invariant arguments.
-- An argument is considered loop invariant if the iteration argument value is
the same as the corresponding one being yielded (at the same position) in both
the before/after block of scf.while.
-- For the arguments removed, their use within scf.while and their corresponding
scf.while's result are replaced with their corresponding initial value.
Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com>
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D116923
This is completely unused upstream, and does not really have well defined semantics
on what this is supposed to do/how this fits into the ecosystem. Given that, as part of
splitting up the standard dialect it's best to just remove this behavior, instead of try
to awkwardly fit it somewhere upstream. Downstream users are encouraged to
define their own operations that clearly can define the semantics of this.
This also uncovered several lingering uses of ConstantOp that weren't
updated to use arith::ConstantOp, and worked during conversions because
the constant was removed/converted into something else before
verification.
See https://llvm.discourse.group/t/standard-dialect-the-final-chapter/ for more discussion.
Differential Revision: https://reviews.llvm.org/D118654
This is part of the larger effort to split the standard dialect. This will also allow for pruning some
additional dependencies on Standard (done in a followup).
Differential Revision: https://reviews.llvm.org/D118202
This revision avoids incorrect hoisting of alloca'd buffers across an AutomaticAllocationScope boundary.
In the more general case, we will probably need a ParallelScope-like interface.
Differential Revision: https://reviews.llvm.org/D118768
Use type inference when building the TransferWriteOp in the TransferWritePermutationLowering. Previously, the result type has been set to Type() which triggers an assertion if the pattern is used with tensors instead of memrefs.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D118758
Following the discussion in D118318, mark `arith.addf/mulf` commutative.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D118600
Support affine.load/store ops in fold-memref-subview ops pass. The
existing pass just "inlines" the subview operation on load/stores by
inserting affine.apply ops in front of the memref load/store ops: this
is by design always consistent with the semantics on affine.load/store
ops and the same would work even more naturally/intuitively with the
latter.
Differential Revision: https://reviews.llvm.org/D118565
Update SCF pass cmd line names to prefix `scf`. This is consistent with
guidelines/convention on how to name dialect passes. This also avoids
ambiguity on the context given the multiple `for` operations in the
tree.
NFC.
Differential Revision: https://reviews.llvm.org/D118564
There was a bug where some of the OpOperands needed in the replacement op were not in scope.
It does not matter where the replacement op is inserted. Any insertion point is OK as long as there are no dominance errors. In the worst case, the newly inserted op will bufferize out-of-place. This is no worse than not eliminating the InitTensorOp at all.
Differential Revision: https://reviews.llvm.org/D117685
The bufferization of arith.constant ops is also switched over to BufferizableOpInterface-based bufferization. The old implementation is deleted. Both implementations utilize GlobalCreator, now renamed to just `getGlobalFor`.
GlobalCreator no longer maintains a set of all created allocations to avoid duplicate allocations of the same constant. Instead, `getGlobalFor` scans the module to see if there is already a global allocation with the same constant value.
For compatibility reasons, it is still possible to create a pass that bufferizes only `arith.constant`. This pass (createConstantBufferizePass) could be deleted once all users were switched over to One-Shot bufferization.
Differential Revision: https://reviews.llvm.org/D118483
This patch adds the vector.scan op which computes the
scan for a given n-d vector. It requires specifying the operator,
the identity element and whether the scan is inclusive or
exclusive.
TEST: Added test in ops.mlir
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D117171
This is in preparation of switching `-tensor-constant-bufferize` and `-arith-bufferize` to BufferizableOpInterface-based implementations.
Differential Revision: https://reviews.llvm.org/D118324
This commit switches the `tensor-bufferize` pass over to BufferizableOpInterface-based bufferization.
Differential Revision: https://reviews.llvm.org/D118246
The pass can currently not handle to_memref(to_tensor(x)) folding where a cast is necessary. This is required with the new unified bufferization. There is already a canonicalization pattern that handles such foldings and it should be used during this pass.
Differential Revision: https://reviews.llvm.org/D117988
Prefix "affine-" to affine transform passes that were missing it -- to
avoid ambiguity and for uniformity. There were only two needed this.
Move mispaced affine coalescing test case file.
NFC.
Differential Revision: https://reviews.llvm.org/D118314
These transformations already operate on memref operations (as part of
splitting up the standard dialect). Now that the operations have moved,
it's time for these transformations to move as well.
Differential Revision: https://reviews.llvm.org/D118285
This is part of splitting up the standard dialect. The move makes sense anyways,
given that the memref dialect already holds memref.atomic_rmw which is the non-region
sibling operation of std.generic_atomic_rmw (the relationship is even more clear given
they have nearly the same description % how they represent the inner computation).
Differential Revision: https://reviews.llvm.org/D118209
The GPU dialect currently contains an explicit reference to LLVMFuncOp
during verification to handle the situation where the kernel has already been
converted. This commit changes that reference to instead use FunctionOpInterface,
which has two main benefits:
* It allows for removing an otherwise unnecessary dependency on the LLVM dialect
* It removes hardcoded assumptions about the lowering path and use of the GPU dialect
Differential Revision: https://reviews.llvm.org/D118172
This is for compatibility with existing bufferization passes. Also clean up memref type generation a bit.
Differential Revision: https://reviews.llvm.org/D118243
If we are extracting it is more useful to push the index_cast past the
extraction. This increases the chance the tensor.extract can evaluated at
compile time.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D118204
There is not much of a benefit to reshape a from element vs reloading it.
Updated to progagate shape manipulations into the output type of
tensor.from_elements.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D118201
Fusion of reshape ops by linearization incorrectly inverted the
indexing map before linearizing dimensions. This leads to incorrect
indexing maps used in the fused operation.
Differential Revision: https://reviews.llvm.org/D117908
This pattern is not written to handle operations with `linalg.index`
operations in its body, i.e. operations that have index semantics.
Differential Revision: https://reviews.llvm.org/D117856
This is mostly a copy of the existing tensor.from_elements bufferization. Once TensorInterfaceImpl.cpp is moved to the tensor dialect, the existing rewrite pattern can be deleted.
Differential Revision: https://reviews.llvm.org/D117775
This is mostly a copy of the existing tensor.generate bufferization. Once TensorInterfaceImpl.cpp is moved to the tensor dialect, the existing rewrite pattern can be deleted.
Differential Revision: https://reviews.llvm.org/D117770
This patch supports the atomic construct (capture) following section 2.17.7 of OpenMP 5.0 standard. Also added tests for the same.
Reviewed By: peixin, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D115851
Add a transpose option to hoist padding to transpose the padded tensor before storing it into the packed tensor. The early transpose improves the memory access patterns of the actual compute kernel. The patch introduces a transpose right after the hoisted pad tensor and a second transpose inside the compute loop. The second transpose can either be fused into the compute operation or will canonicalize away when lowering to vector instructions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D117893
Both insertion points are valid. This is to make BufferizableOpInteface-based bufferization compatible with existing partial bufferization test cases. (So less changes are necessary to unit tests.)
Differential Revision: https://reviews.llvm.org/D117986
This is the only op that is not supported via BufferizableOpInterfaceImpl bufferization. Once this op is supported we can switch `tensor-bufferize` over to the new unified bufferization.
Differential Revision: https://reviews.llvm.org/D117985
When 2 clamp ops are in a row, they can be canonicalized into a single clamp
that uses the most constrained range
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D117934
Rationale:
Although file I/O is a bit alien to MLIR itself, we provide two convenient ways
for sparse tensor I/O. The input part was already there (behind the swiss army
knife sparse_tensor.new). Now we have a sparse_tensor.out to write out data. As
before, the ops are kept vague and may change in the future. For now this
allows us to compare TACO vs MLIR very easily.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D117850
Implement a taylor series approximation for atan and add an atan2 lowering
that uses atan's appromation. This includes tests for edge cases and tests
for each quadrant.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D115682
In some cases, fusion can produce illegal operations if after fusion
the range of some of the loops cannot be computed from shapes of its
operands. Check for this case and abort the fusion if this happens.
Differential Revision: https://reviews.llvm.org/D117602
This allows to pipe sequences of `mlir-opt -split-input-file | mlir-opt -split-input-file`.
Depends On D117750
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117756
PDLDialect being a somewhat user-facing dialect and whose ops contain exclusively other PDL ops in their regions can take advantage of `OpAsmOpInterface` to provide nicer IR.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117828
Unbound OperationOp in the matcher (i.e. one with no uses) is already disallowed by the verifier. However, an OperationOp in the rewriter is not side-effect free -- it's creating an op!
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D117825
This commits explicitly states that negative values and values exceeding
vector dimensions are allowed in vector.create_mask (but not in
vector.constant_mask). These values are now truncated when
canonicalizing vector.create_mask to vector.constant_mask.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D116069