This test is crashing 9 out of 10 runs in CI, but I can't reproduce
locally right now. Disabling to get the CI back to green and avoid
backsliding with more ASAN issues that would go unnoticed.
mlir-cpu-runner/math_polynomial_approx.mlir
This test case is currently failing on SystemZ, but it does not appear to
necessarily be a target specific problem. See discussion at
https://bugs.llvm.org/show_bug.cgi?id=51204.
Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce `unrealized_conversion_cast` operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up `unrealized_conversion_cast`s even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on `unrealized_conversion_cast` as a means to indicate failure
to eliminate casts).
In fact, the need to eliminate A->B->A `unrealized_conversion_cast`s is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when `unrealized_conversion_cast`s could not have been
eliminated since it is the only reason why this pass can fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109507
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
This was missing and also there was a bug in the lowering itself, which went unnoticed due to it.
Differential Revision: https://reviews.llvm.org/D105122
This reverts commit 652f4b5140.
Re-enable MLLIR JIT tests.
The MLIR Bot was updated to export LD_LIBRARY_PATH=/usr/lib64, which
seem to fix this issue.
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
The patch changes the pretty printed FillOp operand order from output, value to value, output. The change is a follow up to https://reviews.llvm.org/D104121 that passes the fill value using a scalar input instead of the former capture semantics.
Differential Revision: https://reviews.llvm.org/D104356
The approximation relays on range reduced version y \in [0, pi/2]. An input x will have
the property that sin(x) = sin(y), -sin(y), cos(y), -cos(y) depends on which quadrable x
is in, where sin(y) and cos(y) are approximated with 5th degree polynomial (of x^2).
As a result a single pattern can be used to compute approximation for both sine and cosine.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D104582
Depends On D103109
If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103203
Depends On D103102
Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups
Will be addressed in the followup PRs
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103109
Depends On D95311
Previous automatic-ref-counting pass worked with high level async operations (e.g. async.execute), however async values reference counting is a runtime implementation detail.
New pass mostly relies on the save liveness analysis to place drop_ref operations, and does better verification of CFG with different liveIn sets in block successors.
This is almost NFC change. No new reference counting ideas, just a cleanup of the previous version.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D95390
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
Historically, Linalg To LLVM conversion subsumed numerous other conversions,
including (affine) loop lowerings to CFG and conversions from the Standard and
Vector dialects to the LLVM dialect. This was due to the insufficient support
for partial conversions in the infrastructure that essentially required
conversions that involve type change (in this case, !linalg.range to
!llvm.struct) to be performed in a single conversion sweep. This is no longer
the case so remove the subsumed conversions and run them as separate passes
when necessary.
Depends On D95317
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D96008
This revision improves the usage of the codegen strategy by adding a few flags that
make it easier to control for the CLI.
Usage of ModuleOp is replaced by FuncOp as this created issues in multi-threaded mode.
A simple benchmarking capability is added for linalg.matmul as well as linalg.matmul_column_major.
This latter op is also added to linalg.
Now obsolete linalg integration tests that also take too long are deleted.
Correctness checks are still missing at this point.
Differential revision: https://reviews.llvm.org/D95531
Depends On D95000
Move async.execute outlining and async -> async.runtime lowering into the separate Async transformation pass
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D95311
Continue the convergence between LLVM dialect and built-in types by replacing
the bfloat, half, float and double LLVM dialect types with their built-in
counterparts. At the API level, this is a direct replacement. At the syntax
level, we change the keywords to `bf16`, `f16`, `f32` and `f64`, respectively,
to be compatible with the built-in type syntax. The old keywords can still be
parsed but produce a deprecation warning and will be eventually removed.
Depends On D94178
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94179
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.
Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.
Reviewed By: mehdi_amini, silvas, antiagainst
Differential Revision: https://reviews.llvm.org/D94178
1. Add new methods to Async runtime API to support yielding async values
2. Add lowering from `async.yield` with value payload to the new runtime API calls
`async.value` lowering requires that payload type is convertible to LLVM and supported by `llvm.mlir.cast` (DialectCast) operation.
Reviewed By: csigg
Differential Revision: https://reviews.llvm.org/D93592
Test flakiness was fixed by: 9edcedf7f2
Runs these tests to verify that all parts of the lowering work correctly.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D93384
Depends On D89963
**Automatic reference counting algorithm outline:**
1. `ReturnLike` operations forward the reference counted values without
modifying the reference count.
2. Use liveness analysis to find blocks in the CFG where the lifetime of
reference counted values ends, and insert `drop_ref` operations after
the last use of the value.
3. Insert `add_ref` before the `async.execute` operation capturing the
value, and pairing `drop_ref` before the async body region terminator,
to release the captured reference counted value when execution
completes.
4. If the reference counted value is passed only to some of the block
successors, insert `drop_ref` operations in the beginning of the blocks
that do not have reference coutned value uses.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D90716
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
The pass combines patterns of ExpandAtomic, ExpandMemRefReshape,
StdExpandDivs passes. The pass is meant to legalize STD for conversion to LLVM.
Differential Revision: https://reviews.llvm.org/D91082
- Convert `global_memref` to LLVM::GlobalOp.
- Convert `get_global_memref` to a memref descriptor with a pointer to the first element
of the global stashed in it.
- Extend unit test and a mlir-cpu-runner test to validate the generated LLVM IR.
Differential Revision: https://reviews.llvm.org/D90803
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292
(1) simplify integer printing logic by always using 64-bit print
(2) add index support (since vector<16xindex> is planned to be added)
(3) adjust naming convention print_x -> printX
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D88436