ExecutionEngine/LLJIT do not run globals destructors in loaded dynamic libraries when destroyed, and threads managed by ThreadPool can race with program termination, and it leads to segfaults.
TODO: Re-enable threading after fixing a problem with destructors, or removing static globals from dynamic library.
Differential Revision: https://reviews.llvm.org/D92368
1. Move ThreadPool ownership to the runtime, and wait for the async tasks completion in the destructor.
2. Remove MLIR_ASYNCRUNTIME_EXPORT from method definitions because they are unnecessary in .cpp files, as only function declarations need to be exported, not their definitions.
3. Fix concurrency bugs in group emplace and potential use-after-free in token emplace.
Tested internally 10k runs in `async.mlir` and `async-group.mlir`.
Fixed: https://bugs.llvm.org/show_bug.cgi?id=48267
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D91988
Depends On D89963
**Automatic reference counting algorithm outline:**
1. `ReturnLike` operations forward the reference counted values without
modifying the reference count.
2. Use liveness analysis to find blocks in the CFG where the lifetime of
reference counted values ends, and insert `drop_ref` operations after
the last use of the value.
3. Insert `add_ref` before the `async.execute` operation capturing the
value, and pairing `drop_ref` before the async body region terminator,
to release the captured reference counted value when execution
completes.
4. If the reference counted value is passed only to some of the block
successors, insert `drop_ref` operations in the beginning of the blocks
that do not have reference coutned value uses.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D90716
Depends On D89958
1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.
Example:
```
scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
"do_some_compute"(%i, %j): () -> ()
}
```
Converted to:
```
%c0 = constant 0 : index
%c1 = constant 1 : index
// Compute blocks sizes for each induction variable.
%num_blocks_i = ... : index
%num_blocks_j = ... : index
%block_size_i = ... : index
%block_size_j = ... : index
// Create an async group to track async execute ops.
%group = async.create_group
scf.for %bi = %c0 to %num_blocks_i step %c1 {
%block_start_i = ... : index
%block_end_i = ... : index
scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
%block_start_j = ... : index
%block_end_j = ... : index
// Execute the body of original parallel operation for the current
// block.
%token = async.execute {
scf.for %i = %block_start_i to %block_end_i step %si {
scf.for %j = %block_start_j to %block_end_j step %sj {
"do_some_compute"(%i, %j): () -> ()
}
}
}
// Add produced async token to the group.
async.add_to_group %token, %group
}
}
// Await completion of all async.execute operations.
async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.
At the end it waits for the completiom of all async execute operations.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D89963
The MLIR_ASYNCRUNTIME_EXPORT macro was being defined to be either
__declspec(dllexport) or __declspec(dllimport), depending on whether
mlir_c_runner_utils_EXPORTS is defined. The latter was a copy/paste
error and should have been mlir_async_runtime_EXPORTS.
Additionally, the uses of that macro in the .cpp file were unnecessary,
as only function declarations need to be exported, not their definitions.
Differential Revision: https://reviews.llvm.org/D91196
This reverts commit 4986d5eaff with
proper patches to CMakeLists.txt:
- Add MLIRAsync as a dependency to MLIRAsyncToLLVM
- Add Coroutines as a dependency to MLIRExecutionEngine
Lower from Async dialect to LLVM by converting async regions attached to `async.execute` operations into LLVM coroutines (https://llvm.org/docs/Coroutines.html):
1. Outline all async regions to functions
2. Add LLVM coro intrinsics to mark coroutine begin/end
3. Use MLIR conversion framework to convert all remaining async types and ops to LLVM + Async runtime function calls
All `async.await` operations inside async regions converted to coroutine suspension points. Await operation outside of a coroutine converted to the blocking wait operations.
Implement simple runtime to support concurrent execution of coroutines.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D89292