Commit Graph

37 Commits

Author SHA1 Message Date
bakhtiyar ec0e4545ca Make AsyncParallelForRewrite parameterizable with a cost model which drives deciding the parallelization granularity.
Reviewed By: ezhulenev, mehdi_amini

Differential Revision: https://reviews.llvm.org/D115423
2021-12-19 08:41:01 -08:00
Eugene Zhulenev 49ce40e9ab [mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations
Depends On D115263

By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115436
2021-12-09 06:50:50 -08:00
Eugene Zhulenev 9f151b784b [mlir] AsyncParallelFor: sink constants into the parallel compute function
With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115263
2021-12-09 06:48:23 -08:00
Mehdi Amini ee0908703d Change the printing/parsing behavior for Attributes used in declarative assembly format
The new form of printing attribute in the declarative assembly is eliding the `#dialect.mnemonic` prefix to only keep the `<....>` part.

Differential Revision: https://reviews.llvm.org/D113873
2021-12-08 02:02:37 +00:00
Mogball a54f4eae0e [MLIR] Replace std ops with arith dialect ops
Precursor: https://reviews.llvm.org/D110200

Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.

Renamed all instances of operations in the codebase and in tests.

Reviewed By: rriddle, jpienaar

Differential Revision: https://reviews.llvm.org/D110797
2021-10-13 03:07:03 +00:00
bakhtiyar bdde959533 Remove unnecessary async group creates and awaits.
Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D110605
2021-09-28 14:52:08 -07:00
Eugene Zhulenev 92db09cde0 [mlir] AsyncRuntime: use int64_t for ref counting operations
Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D110550
2021-09-27 07:55:01 -07:00
Eugene Zhulenev fd52b4357a [mlir] Async: check awaited operand error state after sync await
Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D109229
2021-09-04 05:00:17 -07:00
Eugene Zhulenev b537c5b414 [mlir] Async: clone constants into async.execute functions and parallel compute functions
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D107007
2021-08-02 12:17:41 -07:00
bakhtiyar 1c144410e7 Refactor AsyncToAsyncRuntime pass to boost understandability.
Depends On D106730

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D106731
2021-07-29 12:01:07 -07:00
bakhtiyar 9a5bc83660 Add an escape-hatch for conversion of funcs with blocking awaits to coroutines.
Currently TFRT does not support top-level coroutines, so this functionality will allow to have a single blocking await at the top level until TFRT implements the necessary functionality.

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D106730
2021-07-29 08:52:28 -07:00
bakhtiyar 6ea22d4626 Optionally eliminate blocking runtime.await calls by converting functions to coroutines.
Interop parallelism requires needs awaiting on results. Blocking awaits are bad for performance. TFRT supports lightweight resumption on threads, and coroutines are an abstraction than can be used to lower the kernels onto TFRT threads.

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D106508
2021-07-28 12:37:05 -07:00
Eugene Zhulenev de7a4e53a2 [mlir] Async: lower SCF operations into CFG inside coroutines
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D106747
2021-07-24 14:36:26 -07:00
Eugene Zhulenev 6c1f655818 [mlir] Async: special handling for parallel loops with zero iterations
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D106590
2021-07-23 01:22:59 -07:00
Eugene Zhulenev f57b2420b2 [mlir:Async] Add an async reference counting pass based on the user defined policy
Depends On D104999

Automatic reference counting based on the liveness analysis can add a lot of reference counting overhead at runtime. If the IR is known to be constrained to few particular "shapes", it's much more efficient to provide a custom reference counting policy that will specify where it is required to update the async value reference count.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D105037
2021-06-29 12:53:09 -07:00
Eugene Zhulenev 9ccdaac8f9 [mlir:Async] Fix a bug in automatic refence counting around function calls
Depends On D104998

Function calls "transfer ownership" to the callee and it puts additional constraints on the reference counting optimization pass

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D104999
2021-06-29 09:35:43 -07:00
Eugene Zhulenev a8f819c6d8 [mlir:Async] Remove async operations if it is statically known that the parallel operation has a single compute block
Depends On D104850

Add a test that verifies that canonicalization removes all async overheads if it is statically known that the scf.parallel operation will be computed using a single block.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D104891
2021-06-29 09:26:28 -07:00
Eugene Zhulenev 34a164c938 [mlir:Async] Submit accidentally omitted changes
Accidentally pushed old branches that did not include all the changes discussed in the PRs.

https://reviews.llvm.org/rGd43b23608ad664f02f56e965ca78916bde220950
https://reviews.llvm.org/rG86ad0af87054c3cccd68d32e103a6f1f6c6194c7

Differential Revision: https://reviews.llvm.org/D104943
2021-06-25 12:23:02 -07:00
Eugene Zhulenev 86ad0af870 [mlir:Async] Implement recursive async work splitting for scf.parallel operation (async-parallel-for pass)
Depends On D104780

Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.

Algorithm outline:
1. Collapse scf.parallel dimensions into a single dimension
2. Compute the block size for the parallel operations from the 1d problem size
3. Launch parallel tasks
4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
5. Each parallel task computes the original parallel operation body using scf.for loop nest

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D104850
2021-06-25 10:34:39 -07:00
Eugene Zhulenev d43b23608a [mlir:Async] Add the size parameter to the async.group
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.

Reviewed By: ftynse, herhut

Differential Revision: https://reviews.llvm.org/D104780
2021-06-25 10:26:50 -07:00
Eugene Zhulenev d8c84d2a4e [mlir] Async: Add error propagation support to async groups
Depends On D103109

If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D103203
2021-05-27 09:35:11 -07:00
Eugene Zhulenev 39957aa424 [mlir] Add error state and error propagation to async runtime values
Depends On D103102

Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups

Will be addressed in the followup PRs

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D103109
2021-05-27 09:28:47 -07:00
Eugene Zhulenev c412979cde [mlir] Async reference counting for block successors with divergent reference counted liveness
Support reference counted values implicitly passed (live) only to some of the successors.

Example: if branched to ^bb2 token will leak, unless `drop_ref` operation is properly created

```
^entry:
  %token = async.runtime.create : !async.token
   cond_br %cond, ^bb1, ^bb2
^bb1:
  async.runtime.await %token
  async.runtime.drop_ref %token
  br ^bb2
^bb2:
  return
```

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D103102
2021-05-27 09:21:59 -07:00
Eugene Zhulenev a6628e596e [mlir] Async: add automatic reference counting at async.runtime operations level
Depends On D95311

Previous automatic-ref-counting pass worked with high level async operations (e.g. async.execute), however async values reference counting is a runtime implementation detail.

New pass mostly relies on the save liveness analysis to place drop_ref operations, and does better verification of CFG with different liveIn sets in block successors.

This is almost NFC change. No new reference counting ideas, just a cleanup of the previous version.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95390
2021-04-12 18:54:55 -07:00
Julian Gross e2310704d8 [MLIR] Create memref dialect and move dialect-specific ops from std.
Create the memref dialect and move dialect-specific ops
from std dialect to this dialect.

Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
AssumeAlignmentOp -> MemRef_AssumeAlignmentOp
DeallocOp -> MemRef_DeallocOp
DimOp -> MemRef_DimOp
MemRefCastOp -> MemRef_CastOp
MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
LoadOp -> MemRef_LoadOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
SubViewOp -> MemRef_SubViewOp
TransposeOp -> MemRef_TransposeOp
TensorLoadOp -> MemRef_TensorLoadOp
TensorStoreOp -> MemRef_TensorStoreOp
TensorToMemRefOp -> MemRef_BufferCastOp
ViewOp -> MemRef_ViewOp

The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667

Differential Revision: https://reviews.llvm.org/D98041
2021-03-15 11:14:09 +01:00
Alexander Belyaev a89035d750 Revert "[MLIR] Create memref dialect and move several dialect-specific ops from std."
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h

Working on a fix.

This reverts commit 8aa6c3765b.
2021-02-18 12:49:52 +01:00
Julian Gross 8aa6c3765b [MLIR] Create memref dialect and move several dialect-specific ops from std.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.

Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp

The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667

Differential Revision: https://reviews.llvm.org/D96425
2021-02-18 11:29:39 +01:00
Eugene Zhulenev 25f80e16d1 [mlir] Async: add a separate pass to lower from async to async.coro and async.runtime
Depends On D95000

Move async.execute outlining and async -> async.runtime lowering into the separate Async transformation pass

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95311
2021-01-26 03:33:20 -08:00
Eugene Zhulenev 2f7baffdc1 [mlir:async] Use ODS to define async types
Depends On D94923

Migrate Async dialect to ODS `TypeDef`

Reviewed By: ftynse, rriddle

Differential Revision: https://reviews.llvm.org/D95000
2021-01-26 02:37:50 -08:00
Eugene Zhulenev 9c53b8e52e [mlir:Async] Add intermediate async.coro and async.runtime operations to simplify Async to LLVM lowering
[NFC] No new functionality, mostly a cleanup and one more abstraction level between Async and LLVM IR.

Instead of lowering from Async to LLVM coroutines and Async Runtime API in one shot, do it progressively via async.coro and async.runtime operations.

1. Lower from async to async.runtime/coro (e.g. async.execute to function with coro setup and runtime calls)
2. Lower from async.runtime/coro to LLVM intrinsics and runtime API calls

Intermediate coro/runtime operations will allow to run transformations on a higher level IR and do not try to match IR based on the LLVM::CallOp properties.

Although async.coro is very close to LLVM coroutines, it is not exactly the same API, instead it is optimized for usability in async lowering, and misses a lot of details that are present in @llvm.coro intrinsic.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D94923
2021-01-25 14:04:33 -08:00
Kazuaki Ishizaki 2b638ed5a1 [mlir] NFC: fix trivial typos
fix typos under docs, test, and tools directories

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D94158
2021-01-07 02:36:02 +09:00
Eugene Zhulenev a86a9b5ef7 [mlir] Automatic reference counting for Async values + runtime support for ref counted objects
Depends On D89963

**Automatic reference counting algorithm outline:**

1. `ReturnLike` operations forward the reference counted values without
    modifying the reference count.
2. Use liveness analysis to find blocks in the CFG where the lifetime of
   reference counted values ends, and insert `drop_ref` operations after
   the last use of the value.
3. Insert `add_ref` before the `async.execute` operation capturing the
   value, and pairing `drop_ref` before the async body region terminator,
   to release the captured reference counted value when execution
   completes.
4. If the reference counted value is passed only to some of the block
   successors, insert `drop_ref` operations in the beginning of the blocks
   that do not have reference coutned value uses.

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D90716
2020-11-20 03:08:44 -08:00
Eugene Zhulenev c30ab6c2a3 [mlir] Transform scf.parallel to scf.for + async.execute
Depends On D89958

1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values
2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop.

Example:

```
   scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) {
     "do_some_compute"(%i, %j): () -> ()
   }
```

Converted to:

```
   %c0 = constant 0 : index
   %c1 = constant 1 : index

   // Compute blocks sizes for each induction variable.
   %num_blocks_i = ... : index
   %num_blocks_j = ... : index
   %block_size_i = ... : index
   %block_size_j = ... : index

   // Create an async group to track async execute ops.
   %group = async.create_group

   scf.for %bi = %c0 to %num_blocks_i step %c1 {
     %block_start_i = ... : index
     %block_end_i   = ... : index

     scf.for %bj = %c0 t0 %num_blocks_j step %c1 {
       %block_start_j = ... : index
       %block_end_j   = ... : index

       // Execute the body of original parallel operation for the current
       // block.
       %token = async.execute {
         scf.for %i = %block_start_i to %block_end_i step %si {
           scf.for %j = %block_start_j to %block_end_j step %sj {
             "do_some_compute"(%i, %j): () -> ()
           }
         }
       }

       // Add produced async token to the group.
       async.add_to_group %token, %group
     }
   }

   // Await completion of all async.execute operations.
   async.await_all %group
```
In this example outer loop launches inner block level loops as separate async
execute operations which will be executed concurrently.

At the end it waits for the completiom of all async execute operations.

Reviewed By: ftynse, mehdi_amini

Differential Revision: https://reviews.llvm.org/D89963
2020-11-13 04:02:56 -08:00
Eugene Zhulenev 61dce0f308 [mlir] Add async.await operation to async dialect
Add async.await operation to "unwrap" async.values

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D89137
2020-10-12 21:05:36 -07:00
Eugene Zhulenev 4e69a52952 [MLIR] Add async token/value arguments to async.execute op
Async execute operation can take async arguments as dependencies.

Change `async.execute` custom parser/printer format to use `%value as %unwrapped: !async.value<!type>` sytax.

Reviewed By: mehdi_amini, herhut

Differential Revision: https://reviews.llvm.org/D88601
2020-10-09 08:52:27 -07:00
Eugene Zhulenev 655af658c9 [MLIR] Add async.value type to Async dialect
Return values from async regions as !async.value<...>.

Reviewed By: mehdi_amini, csigg

Differential Revision: https://reviews.llvm.org/D88510
2020-09-30 11:30:06 -07:00
Eugene Zhulenev 05a3b4fe30 [MLIR] Add Async dialect with trivial async.region operation
Start Async dialect for modeling asynchronous execution.

Reviewed By: mehdi_amini, herhut

Differential Revision: https://reviews.llvm.org/D88459
2020-09-29 11:11:08 -07:00