Without it BufferDeallocationPass process only CloneOps created during pass itself and ignore all CloneOps that were already present in IR.
For our specific usecase:
```
func @dealloc_existing_clones(%arg0: memref<?x?xf64>, %arg1: memref<?x?xf64>) -> memref<?x?xf64> {
return %arg0 : memref<?x?xf64>
}
```
Input arguments will be freed immediately after return from function and we want to prolong lifetime for the returned argument.
To achieve this we explicitly add clones to all input memrefs and expect that BufferDeallocationPass will add correct deallocs to them (unnessesary clone+dealloc pairs will be canonicalized away later).
Differential Revision: https://reviews.llvm.org/D104973
Adapt the StructuredOp verifier to ensure all operands are either in the input or the output group. The change is possible after adding support for scalar input operands (https://reviews.llvm.org/D104220).
Differential Revision: https://reviews.llvm.org/D104783
The current code does not preserve the order of the parallel
dimensions when doing multi-reductions and thus we can end
up in scenarios where the result shape does not match the
desired shape after reduction.
This patch fixes that by ensuring that the parallel indices
are in order and then concatenates them to the reduction dimensions
so that the reduction dimensions are innermost.
Differential Revision: https://reviews.llvm.org/D104884
Input/output types can be integers, which represent a quantized convolution.
Update verifier to expect this behavior.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D104949
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
MemRefDataFlow performs mem2reg style operations for affine load/stores. Unfortunately, it is not presently correct in the presence of external operations such as memref.cast, or function calls. This diff extends the functionality of the pass to remain correct in the presence of such ops.
Differential Revision: https://reviews.llvm.org/D104053
A canonicalization accidentally will remove a memref allocation if it is only stored into. However, this is incorrect if the allocation is the value being stored, not the allocation being stored into.
Differential Revision: https://reviews.llvm.org/D104947
Given a select that returns the logical negation of the condition, replace it with a not of the condition.
Differential Revision: https://reviews.llvm.org/D104966
Reduce code duplication: Move various helper functions, that are duplicated in TensorDialect, MemRefDialect, LinalgDialect, StandardDialect, into a new StaticValueUtils.cpp.
Differential Revision: https://reviews.llvm.org/D104687
Depends On D104780
Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.
Algorithm outline:
1. Collapse scf.parallel dimensions into a single dimension
2. Compute the block size for the parallel operations from the 1d problem size
3. Launch parallel tasks
4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
5. Each parallel task computes the original parallel operation body using scf.for loop nest
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D104850
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
Moves iteration lattice/merger code into new SparseTensor/Utils directory. A follow-up CL will add lattice/merger unit tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104757
scf::ForOp bufferization analysis proceeds just like for any other op (including FuncOp) at its boundaries; i.e. if:
1. The tensor operand is inplaceable.
2. The matching result has no subsequent read (i.e. all reads dominate the scf::ForOp).
3. In and does not create a RAW interference.
then it can bufferize inplace.
Still there are a few differences:
1. bbArgs for an scf::ForOp are always considered inplaceable when seen from ops inside the body. This is because a) either the matching tensor operand is not inplaceable and an alloc will be inserted (which makes bbArg itself inplaceable); or b) the tensor operand and bbArg are both already inplaceable.
2. Bufferization within the scf::ForOp body has implications to the outside world : the scf.yield terminator may well ping-pong values of the same type. This muddies the water for alias analysis and is not supported atm. Such cases result in a pass failure.
Differential revision: https://reviews.llvm.org/D104490
In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.
For example, in the following code:
```
scf.for %i = %c0 to %arg1 step %c1 {
%0 = addi %arg2, %i : index
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
we can lift `%0` up into the scf.for loop range, as it is the only user of %i:
```
%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
Reviewed By: mehdi_amini, ftynse, Anthony
Differential Revision: https://reviews.llvm.org/D104289
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
Slowly we are moving toward full support of sparse tensor *outputs*. First
step was support for all-dense annotated "sparse" tensors. This step adds
support for truly sparse tensors, but only for operations in which the values
of a tensor change, but not the nonzero structure (this was refered to as
"simply dynamic" in the [Bik96] thesis).
Some background text was posted on discourse:
https://llvm.discourse.group/t/sparse-tensors-in-mlir/3389/25
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D104577
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
Adapt the FillOp definition to use a scalar operand instead of a capture. This patch is a follow up to https://reviews.llvm.org/D104109. As the input operands are in front of the output operands the patch changes the internal operand order of the FillOp. The pretty printed version of the operation remains unchanged though. The patch also adapts the linalg to standard lowering to ensure the c signature of the FillOp remains unchanged as well.
Differential Revision: https://reviews.llvm.org/D104121
TosaMakeBroadcastable needs to include tosa.div, which was added later in the
specification.
Reviewed By: sjarus, NatashaKnk
Differential Revision: https://reviews.llvm.org/D104157
The approximation relays on range reduced version y \in [0, pi/2]. An input x will have
the property that sin(x) = sin(y), -sin(y), cos(y), -cos(y) depends on which quadrable x
is in, where sin(y) and cos(y) are approximated with 5th degree polynomial (of x^2).
As a result a single pattern can be used to compute approximation for both sine and cosine.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D104582
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
* Remove dependency: Standard --> MemRef
* Add dependencies: GPUToNVVMTransforms --> MemRef, Linalg --> MemRef, MemRef --> Tensor
* Note: The `subtensor_insert_propagate_dest_cast` test case in MemRef/canonicalize.mlir will be moved to Tensor/canonicalize.mlir in a subsequent commit, which moves over the remaining Tensor ops from the Standard dialect to the Tensor dialect.
Differential Revision: https://reviews.llvm.org/D104506
This revision adds a BufferizationAliasInfo which maintains and updates information about which tensors will alias once bufferized, which bufferized tensors are equivalent to others and how to handle clobbers.
Bufferization greedily tries to bufferize inplace by:
1. first trying to bufferize SubTensorInsertOp inplace, in reverse order (these are deemed the most expensives).
2. then trying to bufferize all non SubTensorOp / SubTensorInsertOp, in reverse order.
3. lastly trying to bufferize all SubTensorOp in reverse order.
Reverse order is a heuristic that seems to work nicely because structured tensor codegen very often proceeds by:
1. take a subset of a tensor
2. compute on that subset
3. insert the result subset into the full tensor and yield a new tensor.
BufferizationAliasInfo + equivalence sets + clobber analysis allows bufferizing nested
subtensor/compute/subtensor_insert sequences inplace to a certain extent.
To fully realize inplace bufferization, additional container-containee analysis will be necessary and is left for a subsequent commit.
Differential revision: https://reviews.llvm.org/D104110
Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.
Differential Revision: https://reviews.llvm.org/D75837
Based on dicussion in
[this](https://llvm.discourse.group/t/remove-canonicalizer-for-memref-dim-via-shapedtypeopinterface/3641)
thread the pattern to resolve the `memref.dim` of a value that is a
result of an operation that implements the
`InferShapedTypeOpInterface` is moved to a separate pass instead of
running it as a canonicalization pass. This allows shape resolution to
happen when explicitly required, instead of automatically through a
canonicalization.
Differential Revision: https://reviews.llvm.org/D104321
Make store to load fwd condition for -memref-dataflow-opt less
conservative. Post dominance info is not really needed. Add additional
check for common cases.
Differential Revision: https://reviews.llvm.org/D104174
To control the number of outer parallel loops, we need to process the
outer loops first and hence pre-order walk fixes the issue.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D104361
We have several ways of introducing a scalar invariant value into
linalg generic ops (should we limit this somewhat?). This revision
makes sure we handle all of them correctly in the sparse compiler.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D104335
This is a very careful start with alllowing sparse tensors at the
left-hand-side of tensor index expressions (viz. sparse output).
Note that there is a subtle difference between non-annotated tensors
(dense, remain n-dim, handled by classic bufferization) and all-dense
annotated "sparse" tensors (linearized to 1-dim without overhead
storage, bufferized by sparse compiler, backed by runtime support library).
This revision gently introduces some new IR to facilitate annotated outputs,
to be generalized to truly sparse tensors in the future.
Reviewed By: gussmith23, bixia
Differential Revision: https://reviews.llvm.org/D104074
This doesn't add any canonicalizations, but executes the same
simplification on bufferSemantic linalg.generic ops by using
linalg::ReshapeOp instead of linalg::TensorReshapeOp.
Differential Revision: https://reviews.llvm.org/D103513
The parser of generic op did not recognize the output from mlir-opt when there
are multiple outputs. One would wrap the result types with braces, and one would
not. The patch makes the behavior the same.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D104256
Interface patterns are unique in that they get added to every operation that also implements that interface, given that they aren't tied to individual operations. When the same interface pattern gets added to multiple operations (such as the current behavior with Linalg), an reference to each of these patterns is added to every op (meaning that an operation will now have N references to effectively the same pattern). This revision fixes this problematic behavior in Linalg, and can bring upwards of a 25% reduction in compile time in Linalg based workloads.
Differential Revision: https://reviews.llvm.org/D104160
There's no need for `toSmallVector()` as `SmallVector.h` already provides a `to_vector` free function that takes a range.
Reviewed By: Quuxplusone
Differential Revision: https://reviews.llvm.org/D104024
Up to now all structured op operands are assumed to be shaped. The patch relaxes this assumption and allows scalar input operands. In contrast to shaped operands scalar operands are not indexed and directly forwarded to the body of the operation. As all other operands, scalar operands are associated to an indexing map that in case of a scalar or a 0D-operand has an empty range.
We will use scalar operands as a replacement for the capture mechanism. In contrast to captures, the approach ensures we can generate the function signature from the operand list and it prevents outdated capture values in case a transformation updates only the capture operand but not the hidden body of a named operation.
Removing captures and updating existing operations such as linalg.fill is left for a later patch.
The patch depends on https://reviews.llvm.org/D103891 and https://reviews.llvm.org/D103890.
Differential Revision: https://reviews.llvm.org/D104109
The padding of such ops is not generated in a vectorized way. Instead, emit a tensor::GenerateOp.
We may vectorize GenerateOps in the future.
Differential Revision: https://reviews.llvm.org/D103879
If the source operand of a linalg.pad_op operation has static shape, vectorize the copying of the source.
Differential Revision: https://reviews.llvm.org/D103747
Currently limited to constant pad values. Any combination of dynamic/static tensor sizes and padding sizes is supported.
Differential Revision: https://reviews.llvm.org/D103679
The generic vectorization pattern handles only those cases, where
low and high padding is zero. This is already handled by a
canonicalization pattern.
Also add a new canonicalization test case to ensure that tensor cast ops
are properly inserted.
A more general vectorization pattern will be added in a subsequent commit.
Differential Revision: https://reviews.llvm.org/D103590
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a transfer_write.
Differential Revision: https://reviews.llvm.org/D103137
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a subtensor_insert.
Differential Revision: https://reviews.llvm.org/D103780
Vectorize linalg.pad_tensor without generating a linalg.init_tensor when consumed by a transfer_read.
Differential Revision: https://reviews.llvm.org/D103735
* Add a helper function that returns the constant padding value (if applicable).
* Remove existing getConstantYieldValueFromBlock function, which does almost the same.
* Adapted from D103243.
Differential Revision: https://reviews.llvm.org/D104004
Add `tensor.insert` op to make `tensor.extract`/`tensor.insert` work in pairs
for `scalar` domain. Like `subtensor`/`subtensor_insert` work in pairs in
`tensor` domain, and `vector.transfer_read`/`vector.transfer_write` work in
pairs in `vector` domain.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D104139
The commit simplifies affine.if ops :
The affine if operation gets removed if the condition is universally true or false and then/else block is merged with the parent block.
Signed-off-by: Shashij Gupta shashij.gupta@polymagelabs.com
Reviewed By: bondhugula, pr4tgpt
Differential Revision: https://reviews.llvm.org/D104015
## Introduction
This proposal describes the new op to be added to the `std` (and later moved `memref`)
dialect called `alloca_scope`.
## Motivation
Alloca operations are easy to misuse, especially if one relies on it while doing
rewriting/conversion passes. For example let's consider a simple example of two
independent dialects, one defines an op that wants to allocate on-stack and
another defines a construct that corresponds to some form of looping:
```
dialect1.looping_op {
%x = dialect2.stack_allocating_op
}
```
Since the dialects might not know about each other they are going to define a
lowering to std/scf/etc independently:
```
scf.for … {
%x_temp = std.alloca …
… // do some domain-specific work using %x_temp buffer
… // and store the result into %result
%x = %result
}
```
Later on the scf and `std.alloca` is going to be lowered to llvm using a
combination of `llvm.alloca` and unstructured control flow.
At this point the use of `%x_temp` is bound to either be either optimized by
llvm (for example using mem2reg) or in the worst case: perform an independent
stack allocation on each iteration of the loop. While the llvm optimizations are
likely to succeed they are not guaranteed to do so, and they provide
opportunities for surprising issues with unexpected use of stack size.
## Proposal
We propose a new operation that defines a finer-grain allocation scope for the
alloca-allocated memory called `alloca_scope`:
```
alloca_scope {
%x_temp = alloca …
...
}
```
Here the lifetime of `%x_temp` is going to be bound to the narrow annotated
region within `alloca_scope`. Moreover, one can also return values out of the
alloca_scope with an accompanying `alloca_scope.return` op (that behaves
similarly to `scf.yield`):
```
%result = alloca_scope {
%x_temp = alloca …
…
alloca_scope.return %myvalue
}
```
Under the hood the `alloca_scope` is going to lowered to a combination of
`llvm.intr.stacksave` and `llvm.intr.strackrestore` that are going to be invoked
automatically as control-flow enters and leaves the body of the `alloca_scope`.
The key value of the new op is to allow deterministic guaranteed stack use
through an explicit annotation in the code which is finer-grain than the
function-level scope of `AutomaticAllocationScope` interface. `alloca_scope`
can be inserted at arbitrary locations and doesn’t require non-trivial
transformations such as outlining.
## Which dialect
Before memref dialect is split, `alloca_scope` can temporarily reside in `std`
dialect, and later on be moved to `memref` together with the rest of
memory-related operations.
## Implementation
An implementation of the op is available [here](https://reviews.llvm.org/D97768).
Original commits:
* Add initial scaffolding for alloca_scope op
* Add alloca_scope.return op
* Add no region arguments and variadic results
* Add op descriptions
* Add failing test case
* Add another failing test
* Initial implementation of lowering for std.alloca_scope
* Fix backticks
* Fix getSuccessorRegions implementation
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D97768
This is a roll forward of D102679.
This patch simplifies the implementation of Sequence and makes it compatible with llvm::reverse.
It exposes the reverse iterators through rbegin/rend which prevents a dangling reference in std::reverse_iterator::operator++().
Note: Compared to D102679, this patch introduces a `asSmallVector()` member function and fixes compilation issue with GCC 5.
Differential Revision: https://reviews.llvm.org/D103948
This brings us closer to replacing the LLVM data layout string with a
first-class layout modeling in MLIR.
Depends On D103945
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D103946
Allow gpu ops implementing the async interface to already be async when running the GpuAsyncRegionPass.
That pass threads a 'current token' through a block with ops implementing the gpu async interface.
After this change, existing async ops (returning a !gpu.async.token) set the current token.
Existing synchronous `gpu.wait` ops reset the current token.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D103396
This allows us to remove the `spv.mlir.endmodule` op and
all the code associated with it.
Along the way, tightened the APIs for `spv.module` a bit
by removing some aliases. Now we use `getRegion` to get
the only region, and `getBody` to get the region's only
block.
Reviewed By: mravishankar, hanchung
Differential Revision: https://reviews.llvm.org/D103265
ArmSVE-specific memory operations are needed to generate end-to-end
code for as long as MLIR core doesn't support scalable vectors. This
instructions will be eventually unnecessary, for now they're required
for more complex testing.
Differential Revision: https://reviews.llvm.org/D103535
These `arm_sve.cmp` functions are needed to generate scalable vector
masks as long as scalable vectors are not part of the standard types.
Once in standard, these can be removed and `std.cmp` can be used
instead.
Differential Revision: https://reviews.llvm.org/D103473
This reverts commit e772216e70
(and fixup 7f6c878a2c).
The build is broken with gcc5 host compiler:
In file included from
from mlir/lib/Dialect/Utils/StructuredOpsUtils.cpp:9:
tools/mlir/include/mlir/IR/BuiltinAttributes.h.inc:424:57: error: type/value mismatch at argument 1 in template parameter list for 'template<class ItTy, class FuncTy, class FuncReturnTy> class llvm::mapped_iterator'
std::function<T(ptrdiff_t)>>;
^
tools/mlir/include/mlir/IR/BuiltinAttributes.h.inc:424:57: note: expected a type, got 'decltype (seq<ptrdiff_t>(0, 0))::const_iterator'
This is both more efficient and more ergonomic than going
through an std::string, e.g. when using llvm::utostr and
in string concat cases.
Unfortunately we can't just overload ::get(). This causes an
ambiguity because both twine and stringref implicitly convert
from std::string.
Differential Revision: https://reviews.llvm.org/D103754
Currently canonicalizations of a store and a cast try to fold all casts into the store.
In the case where the operand being stored is itself a cast, this is illegal as the type of the value being stored
will change. This PR fixes this by not checking the value for folding with a cast.
Depends on https://reviews.llvm.org/D103828
Differential Revision: https://reviews.llvm.org/D103829
This patch simplifies the implementation of Sequence and makes it compatible with llvm::reverse.
It exposes the reverse iterators through rbegin/rend which prevents a dangling reference in std::reverse_iterator::operator++().
Differential Revision: https://reviews.llvm.org/D102679
These `arm_sve.cmp` functions are needed to generate scalable vector
masks as long as scalable vectors are not part of the standard types.
Once in standard, these can be removed and `std.cmp` can be used
instead.
Differential Revision: https://reviews.llvm.org/D103473
In an operation in the true/false dest of a branch,
one can assume that the operation itself was true/false if
only that edge can reach the operation.
Differential Revision: https://reviews.llvm.org/D101709
This patch add canonicalization for the standalone data operation with constant if condition.
It is extracted from this patch D103325.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103712
Convert data operands from the acc.parallel operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103337
Implements better naming for results of spv.mlir.addressof ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103594
* Add hasUnitStride and hasZeroOffset to OffsetSizeAndStrideOpInterface. These functions are useful for various patterns. E.g., some vectorization patterns apply only for tensor ops with zero offsets and/or unit stride.
* Add getConstantIntValue and isEqualConstantInt helper functions, which are useful for implementing the two above functions, as well as various patterns.
Differential Revision: https://reviews.llvm.org/D103763
Controlled by a compiler option, if 32-bit indices can be handled
with zero/sign-extention alike (viz. no worries on non-negative
indices), scatter/gather operations can use the more efficient
32-bit SIMD version.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D103632
* Rename PadTensorOpVectorizationPattern to GenericPadTensorOpVectorizationPattern.
* Make GenericPadTensorOpVectorizationPattern a private pattern, to be instantiated via populatePadTensorOpVectorizationPatterns.
* Factor out parts of PadTensorOpVectorizationPattern into helper functions.
This commit prepares PadTensorOpVectorizationPattern for a series of subsequent commits that add more specialized PadTensorOp vectorization patterns.
Differential Revision: https://reviews.llvm.org/D103681
Convert data operands from the acc.data operation using the same conversion pattern than D102170.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D103332
Introduces a test pass that rewrites PadTensorOps with static shapes as a sequence of:
```
linalg.init_tensor // to create output
linalg.fill // to initialize with padding value
linalg.generic // to copy the original contents to the padded tensor
```
The pass can be triggered with:
- `--test-linalg-transform-patterns="test-transform-pad-tensor"`
Differential Revision: https://reviews.llvm.org/D102804
Replace the uses of deprecated Structured Op Interface methods in TestLinalgElementwiseFusion.cpp, TestLinalgFusionTransforms.cpp, and Transforms.cpp. The patch is based on https://reviews.llvm.org/D103394.
Differential Revision: https://reviews.llvm.org/D103528
Adding methods to access operand properties via OpOperands and mark outdated methods as deprecated.
Differential Revision: https://reviews.llvm.org/D103394
Implements better naming for results of `spv.Constant` ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103152
Depends On D103109
If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103203
Depends On D103102
Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups
Will be addressed in the followup PRs
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103109
Support reference counted values implicitly passed (live) only to some of the successors.
Example: if branched to ^bb2 token will leak, unless `drop_ref` operation is properly created
```
^entry:
%token = async.runtime.create : !async.token
cond_br %cond, ^bb1, ^bb2
^bb1:
async.runtime.await %token
async.runtime.drop_ref %token
br ^bb2
^bb2:
return
```
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103102
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.
Differential Revision: https://reviews.llvm.org/D103023
This revision refactors and simplifies the pattern detection logic: thanks to SSA value properties, we can actually look at all the uses of a given value and avoid having to pattern-match specific chains of operations.
A bufferization pattern for subtensor is added and specific inplaceability analysis is implemented for the simple case of subtensor. More advanced use cases will follow.
Differential revision: https://reviews.llvm.org/D102512
Allow support for specifying empty IVs in an `affine.parallel`.
For example:
```
affine.parallel () = () to () {
affine.yield
}
```
Reviewed By: bondhugula, jbruestle
Differential Revision: https://reviews.llvm.org/D102895
Prevent users of `iter_args` of an affine for loop from being hoisted
out of it. Otherwise, LICM leads to a violation of the SSA dominance
(as demonstrated in the added test case).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=50103
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D102984
This previously handled memref::SubviewOp, but this can be extended to
all ops implementing the interface.
Differential Revision: https://reviews.llvm.org/D103076
Fix inconsistent MLIR CMake variable names. Consistently name them as
MLIR_ENABLE_<feature>.
Eg: MLIR_CUDA_RUNNER_ENABLED -> MLIR_ENABLE_CUDA_RUNNER
MLIR follows (or has mostly followed) the convention of naming
cmake enabling variables in the from MLIR_ENABLE_... etc. Using a
convention here is easy and also important for convenience. A counter
pattern was started with variables named MLIR_..._ENABLED. This led to a
sequence of related counter patterns: MLIR_CUDA_RUNNER_ENABLED,
MLIR_ROCM_RUNNER_ENABLED, etc.. From a naming standpoint, the imperative
form is more meaningful. Additional discussion at:
https://llvm.discourse.group/t/mlir-cmake-enable-variable-naming-convention/3520
Switch all inconsistent ones to the ENABLE form. Keep the couple of old
mappings needed until buildbot config is migrated.
Differential Revision: https://reviews.llvm.org/D102976
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.
This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101921
This revision completes the "dimension ordering" feature
of sparse tensor types that enables the programmer to
define a preferred order on dimension access (other than
the default left-to-right order). This enables e.g. selection
of column-major over row-major storage for sparse matrices,
but generalized to any rank, as in:
dimOrdering = affine_map<(i,j,k,l,m,n,o,p) -> (p,o,j,k,i,l,m,n)>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102856
The previous implementation did not handle casting behavior properly and
did not consider aliases.
Differential Revision: https://reviews.llvm.org/D102785
This pattern inlines operands to a linalg.generic operation that use a constant
index and hence are loop-invariant scalars. This reduces the number of
linalg.generic operands and unlocks some canonicalizations that rely on seeing
an explicit tensor.extract.
Differential Revision: https://reviews.llvm.org/D102682
Skip the sparsification pass for Linalg ops without annotated tensors
(or cases that are not properly handled yet).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102787
The patch extends the yaml code generation to support the following new OpDSL constructs:
- captures
- constants
- iteration index accesses
- predefined types
These changes have been introduced by revision
https://reviews.llvm.org/D101364.
Differential Revision: https://reviews.llvm.org/D102075
VectorTransferPermutationMapLoweringPatterns can be enabled via a pass option. These additional patterns lower permutation maps to minor identity maps with broadcasting, if possible, allowing for more efficient vector load/stores. The option is deactivated by default.
Differential Revision: https://reviews.llvm.org/D102593
LinalgOps that are all parallel do not use the value of `outs`
tensor. The semantics is that the `outs` tensor is fully
overwritten. Using anything other than `init_tensor` can add false
dependencies between operations, when the use is just for the shape of
the tensor. Adding a canonicalization to always use `init_tensor` in
such cases, breaks this dependence.
Differential Revision: https://reviews.llvm.org/D102561
Original interfaces are not safe to be called during dialect conversion.
This is because some ops (e.g. `dynamic_reshape(input, target_shape)`)
depend on the values of their operands to calculate the output shape.
However the operands may be out of reach during dialect conversion (e.g.
converting from tensor world to buffer world). This patch provides a new
kind of interface which accpets user-provided operands to solve this
problem.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D102317
- Enables inferring return type for ConstShape, takes into account valid return types;
- The compatible return type function could be reused, leaving that for next use refactoring;
Differential Revision: https://reviews.llvm.org/D102182
The experimental flag for "inplace" bufferization in the sparse
compiler can be replaced with the new inplace attribute. This gives
a uniform way of expressing the more efficient way of bufferization.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102538
Broadcast dimensions of vector transfer ops are always in-bounds. This is consistent with the fact that the starting position of a transfer is always in-bounds.
Differential Revision: https://reviews.llvm.org/D102566
Splitting the memref dialect lead to an introduction of several dependencies
to avoid compilation issues. The canonicalize pass also depends on the
memref dialect, but it shouldn't. This patch resolves the dependencies
and the unintuitive includes are removed. However, the dependency moves
to the constructor of the std dialect.
Differential Revision: https://reviews.llvm.org/D102060
Replace the templated linalgLowerOpToLoops method by three specialized methods linalgOpToLoops, LinalgOpToParallelLoops, and linalgOpToAffineLoops.
Differential Revision: https://reviews.llvm.org/D102324
Add TransferWritePermutationLowering, which replaces permutation maps of TransferWriteOps with vector.transpose.
Differential Revision: https://reviews.llvm.org/D102548
We are moving from just dense/compressed to more general dim level
types, so we need more than just an "i1" array for annotations.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102520
This change allows the SRC and DST of dma_start operations to be located in the
same memory space. This applies to both the Affine dialect and Memref dialect
versions of these Ops. The documention has been updated to reflect this by
explicitly stating overlapping memory locations are not supported (undefined
behavior).
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D102274
This covers the extremely common case of replacing all uses of a Value
with a new op that is itself a user of the original Value.
This should also be a little bit more efficient than the
`SmallPtrSet<Operation *, 1>{op}` idiom that was being used before.
Differential Revision: https://reviews.llvm.org/D102373
Support OpImageQuerySize in spirv dialect
co-authored-by: Alan Liu <alanliu.yf@gmail.com>
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D102029
Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.
This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.
Differential Revision: https://reviews.llvm.org/D101745
Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.
This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.
Differential Revision: https://reviews.llvm.org/D101745
The current static checker for linalg does not work on the decreasing
index cases well. So, this is to Update the current static bound checker
for linalg to cover decreasing index cases.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D102302
Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D102170
First step in adding alignment as an attribute to MLIR global definitions. Alignment can be specified for global objects in LLVM IR. It can also be specified as a named attribute in the LLVMIR dialect of MLIR. However, this attribute has no standing and is discarded during translation from MLIR to LLVM IR. This patch does two things: First, it adds the attribute to the syntax of the llvm.mlir.global operation, and by doing this it also adds accessors and verifications. The syntax is "align=XX" (with XX being an integer), placed right after the value of the operation. Second, it allows transforming this operation to and from LLVM IR. It is checked whether the value is an integer power of 2.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D101492
This is actually necessary for correctness, as memref.reinterpret_cast
doesn't verify if the output shape doesn't match the static sizes.
Differential Revision: https://reviews.llvm.org/D102232
VectorTransfer split previously only split read xfer ops. This adds
the same logic to write ops. The resulting code involves 2
conditionals for write ops while read ops only needed 1, but the created
ops are built upon the same patterns, so pattern matching/expectations
are all consistent other than in regards to the if/else ops.
Differential Revision: https://reviews.llvm.org/D102157
All glue and clutter in the linalg ops has been replaced by proper
sparse tensor type encoding. This code is no longer needed. Thanks
to ntv@ for giving us a temporary home in linalg.
So long, and thanks for all the fish.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102098
A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".
1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types
NOTE: next CL will remove *all* obsoleted sparse code in Linalg
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102095
According to the API contract, LinalgLoopDistributionOptions
expects to work on parallel iterators. When getting processor
information, only loop ranges for parallel dimensions should
be fed in. But right now after generating scf.for loop nests,
we feed in *all* loops, including the ones materialized for
reduction iterators. This can cause unexpected distribution
of reduction dimensions. This commit fixes it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D102079
In the buffer deallocation pass, unranked memref types are not properly supported.
After investigating this issue, it turns out that the Clone and Dealloc operation
does not support unranked memref types in the current implementation.
This patch adds the missing feature and enables the transformation of any memref
type.
This patch solves this bug: https://bugs.llvm.org/show_bug.cgi?id=48385
Differential Revision: https://reviews.llvm.org/D101760
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.
This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.
This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.
Differential Revision: https://reviews.llvm.org/D102035
Replace all `linalg.indexed_generic` ops by `linalg.generic` ops that access the iteration indices using the `linalg.index` op.
Differential Revision: https://reviews.llvm.org/D101612
The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.
Differential Revision: https://reviews.llvm.org/D101949
This expose a lambda control instead of just a boolean to control unit
dimension folding.
This however gives more control to user to pick a good heuristic.
Folding reshapes helps fusion opportunities but may generate sub-optimal
generic ops.
Differential Revision: https://reviews.llvm.org/D101917
Fixing a minor bug which lead to element type of the output being
modified when folding reshapes with generic op.
Differential Revision: https://reviews.llvm.org/D101942
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.
This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:
- TargetTriple
- ObjectFileType
- SubtargetInfo
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101462
These instructions map to SVE-specific instrinsics that accept a
predicate operand to support control flow in vector code.
Differential Revision: https://reviews.llvm.org/D100982
This patch adds support for vectorizing loops with 'iter_args'
implementing known reductions along the vector dimension. Comparing to
the non-vector-dimension case, two additional things are done during
vectorization of such loops:
- The resulting vector returned from the loop is reduced to a scalar
using `vector.reduce`.
- In some cases a mask is applied to the vector yielded at the end of
the loop to prevent garbage values from being written to the
accumulator.
Vectorization of reduction loops is disabled by default. To enable it, a
map from loops to array of reduction descriptors should be explicitly passed to
`vectorizeAffineLoops`, or `vectorize-reductions=true` should be passed
to the SuperVectorize pass.
Current limitations:
- Loops with a non-unit step size are not supported.
- n-D vectorization with n > 1 is not supported.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100694
The old index op handling let the new index operations point back to the
producer block. As a result, after fusion some index operations in the
fused block had back references to the old producer block resulting in
illegal IR. The patch now relies on a block and value mapping to avoid
such back references.
Differential Revision: https://reviews.llvm.org/D101887
While we figure out how to best add Standard support for scalable
vectors, these instructions provide a workaround for basic arithmetic
between scalable vectors.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100837
This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.
NOTE: the actual removal of the last glue and clutter in Linalg will follow
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101811
TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering.
Differential Revision: https://reviews.llvm.org/D101808