Some rewriters take more iterations to converge, add a parameter to overwrite
the built-in maximum iteration count.
Fix PR48073.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91553
This replaces the old type decomposition logic that was previously mixed
into bufferization, and makes it easily accessible.
This also deletes TestFinalizingBufferize, because after we remove the type
decomposition, it doesn't do anything that is not already provided by
func-bufferize.
Differential Revision: https://reviews.llvm.org/D90899
The current code allows strided layouts, but the number of elements allocated is ambiguous. It could be either the number of elements in the shape (the current implementation), or the amount of elements required to not index out-of-bounds with the given maps (which would require evaluating the layout map).
If we require the canonical layouts, the two will be the same.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D91523
This adds a simple definition of a "workshare loop" operation for
the OpenMP MLIR dialect, excluding the "reduction" and "allocate"
clauses and without a custom parser and pretty printer.
The schedule clause also does not yet accept the modifiers that are
permitted in OpenMP 5.0.
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Reviewed By: ftynse, clementval
Differential Revision: https://reviews.llvm.org/D86071
The logic of vector on boolean was missed. This patch adds the logic and test on
it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D91403
scf.parallel is currently not a good fit for tiling on tensors.
Instead provide a path to parallelism directly through scf.for.
For now, this transformation ignores the distribution scheme and always does a block-cyclic mapping (where block is the tile size).
Differential revision: https://reviews.llvm.org/D90475
motivated by a refactoring in the new sparse code (yet to be merged), this avoids some lengthy code dup
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91465
Support multi-dimension vector for InsertMap/ExtractMap op and update the
transformations. Currently the relation between IDs and dimension is implicitly
deduced from the types. We can then calculate an AffineMap based on it. In the
future the AffineMap could be part of the operation itself.
Differential Revision: https://reviews.llvm.org/D90995
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
The index type does not have a bitsize and hence the size of corresponding allocations cannot be computed. Instead, the promotion pass now has an explicit option to specify the size of index.
Differential Revision: https://reviews.llvm.org/D91360
This exposes a hook to configure legality of operations such that only
`scf.parallel` operations that have mapping attributes are marked as
illegal. Consequently, the transformation can now also be applied to
mixed forms.
Differential Revision: https://reviews.llvm.org/D91340
A recent refactoring removed the need to interleave verifier passes and instead opted to verify during the normal execution of passes instead. As such, the old verify pass is no longer necessary and can be removed.
Differential Revision: https://reviews.llvm.org/D91212
This revision adds support in the parser/printer for "deferrable" aliases, i.e. those that can be resolved after printing has finished. This allows for printing aliases for operation locations after the module instead of before, i.e. this is now supported:
```
"foo.op"() : () -> () loc(#loc)
#loc = loc("some_location")
```
Differential Revision: https://reviews.llvm.org/D91227
This removes the need to have an explicit `cast<>` given that we always know it `isa` instance of the interface.
Differential Revision: https://reviews.llvm.org/D91304
Some users have native c++ data types that correspond to floating point values stored within a DenseElementsAttr that do not have a corresponding native C++ data type(e.g. bfloat16/half/etc). This revision allows for such users to use those native types directly, and removes the need to go through APFloat when the much faster native value path is available.
Differential Revision: https://reviews.llvm.org/D91402
07f1047f41 changed the CMake detection to use find_package(Python3 ...
but didn't update the lit configuration to use the expected Python3_EXECUTABLE
cmake variable to point to the interpreter path.
This resulted in an empty path on MacOS.
- Move isSupportedMemRefType() to ConvertToLLVMPatterns and check if the
memref element type is supported there.
Differential Revision: https://reviews.llvm.org/D91374
The previous code defined it as allocating a new memref for its result.
However, this is not how it is treated by the dialect conversion framework,
that does the equivalent of inserting and folding it away internally
(even independent of any canonicalization patterns that we have
defined).
The semantics as they were previously written were also very
constraining: Nontrivial analysis is needed to prove that the new
allocation isn't needed for correctness (e.g. to avoid aliasing).
By removing those semantics, we avoid losing that information.
Differential Revision: https://reviews.llvm.org/D91382
We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.
Differential Revision: https://reviews.llvm.org/D91306
It was incorrect in the presence of a tensor argument with multiple
uses.
The bufferization of subtensor_insert was writing into a converted
memref operand, but there is no guarantee that the converted memref for
that operand is safe to write into. In this case, the same converted
memref is written to in-place by the subtensor_insert bufferization,
violating the tensor-level semantics.
I left some comments in a TODO about ways forward on this. I will be
working actively on this problem in the coming days.
Differential Revision: https://reviews.llvm.org/D91371
- Remove the default valued arguments from these functions.
- Besides FuncOp, looks like no other in-tree op is using these functions.
Differential Revision: https://reviews.llvm.org/D91369
The tokens are already handled by the lexer. This revision exposes them
through the parser interface.
This revision also adds missing functions for question mark parsing and
completes the list of valid punctuation tokens in the documentation.
Differential Revision: https://reviews.llvm.org/D90907
Add an ODS-backed generator of default builders. This currently does not
support operation with attribute arguments, for which the builder is
just ignored. Attribute support will be introduced separately for
builders and accessors.
Default builders are always generated with the same number of result and
operand groups as the ODS specification, i.e. one group per each operand
or result. Optional elements accept None but cannot be omitted. Variadic
groups accept iterable objects and cannot be replaced with a single
object.
For some operations, it is possible to infer the result type given the
traits, but most traits rely on inline pieces of C++ that we cannot
(yet) forward to Python bindings. Since the Ops where the inference is
possible (having the `SameOperandAndResultTypes` trait or
`TypeMatchesWith` without transform field) are a small minority, they
also require the result type to make the builder syntax more consistent.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D91190
Added documentation about the bufferization features.
Furthermore, the usage of pre- and post-processing is described.
This also includes information about optimization functionalities.
Differential Revision: https://reviews.llvm.org/D90675
This change does two main things
1) An operation might have multiple dependences to the same
producer. Not tracking them correctly can result in incorrect code
generation with fusion. To rectify this the dependence tracking
needs to also have the operand number in the consumer.
2) Improve the logic used to find the fused loops making it easier to
follow. The only constraint for fusion is that linalg ops (on
buffers) have update semantics for the result. Fusion should be
such that only one iteration of the fused loop (which is also a
tiled loop) must touch only one (disjoint) tile of the output. This
could be relaxed by allowing for recomputation that is the default
when oeprands are tensors, or can be made legal with promotion of
the fused view (in future).
Differential Revision: https://reviews.llvm.org/D90579
Exposing the C versions of the methods of the sparse runtime support lib
through header files will enable using the same methods in an MLIR program
as well as a C++ program, which will simplify future benchmarking comparisons
(e.g. comparing MLIR generated code with eigen for Matrix Market sparse matrices).
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91316
This CL integrates the new sparse annotations (hereto merely added as fully
transparent attributes) more tightly to the generic linalg op in order to add
verification of the annotations' consistency as well as to make make other
passes more aware of their presence (in the long run, rewriting rules must
preserve the integrity of the annotations).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91224
Previous the textual form of the pass pipeline would implicitly nest,
instead we opt for the explicit form here: this has less surprise.
This also avoids asserting in the bindings when passing a pass pipeline
with incorrect nesting.
Differential Revision: https://reviews.llvm.org/D91233
If block A and B are in different regions and region of A is not an ancestor of
B, either A is included in region of B or the two regions are disjoint. In both
case A doesn't post-dominate B.
Differential Revision: https://reviews.llvm.org/D91225
The MLIR_ASYNCRUNTIME_EXPORT macro was being defined to be either
__declspec(dllexport) or __declspec(dllimport), depending on whether
mlir_c_runner_utils_EXPORTS is defined. The latter was a copy/paste
error and should have been mlir_async_runtime_EXPORTS.
Additionally, the uses of that macro in the .cpp file were unnecessary,
as only function declarations need to be exported, not their definitions.
Differential Revision: https://reviews.llvm.org/D91196
The previous logic for inlining a region A with N blocks into region B
would produce incorrect results on rollback for N greater than 1. This
rollback logic would leave blocks 1..N in region B and only move block 0
to region A.
The new inlining action recording stores the block move actions from N-1
to 0. Now on roll back, block 0 is moved to region A and then 1..N is
appended to the list of blocks in region A.
Differential Revision: https://reviews.llvm.org/D91185
This adds getters for `llvm.align` and `llvm.noalias` strings that are used
as attribute names in the llvm dialect.
Differential Revision: https://reviews.llvm.org/D91166