llvm-project

Commit Graph

Author	SHA1	Message	Date
Eugene Zhulenev	b537c5b414	[mlir] Async: clone constants into async.execute functions and parallel compute functions Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D107007	2021-08-02 12:17:41 -07:00
bakhtiyar	1c144410e7	Refactor AsyncToAsyncRuntime pass to boost understandability. Depends On D106730 Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D106731	2021-07-29 12:01:07 -07:00
bakhtiyar	9a5bc83660	Add an escape-hatch for conversion of funcs with blocking awaits to coroutines. Currently TFRT does not support top-level coroutines, so this functionality will allow to have a single blocking await at the top level until TFRT implements the necessary functionality. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D106730	2021-07-29 08:52:28 -07:00
bakhtiyar	6ea22d4626	Optionally eliminate blocking runtime.await calls by converting functions to coroutines. Interop parallelism requires needs awaiting on results. Blocking awaits are bad for performance. TFRT supports lightweight resumption on threads, and coroutines are an abstraction than can be used to lower the kernels onto TFRT threads. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D106508	2021-07-28 12:37:05 -07:00
Eugene Zhulenev	de7a4e53a2	[mlir] Async: lower SCF operations into CFG inside coroutines Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D106747	2021-07-24 14:36:26 -07:00
Eugene Zhulenev	6c1f655818	[mlir] Async: special handling for parallel loops with zero iterations Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D106590	2021-07-23 01:22:59 -07:00
Benjamin Kramer	ce857d3cfd	[mlir][async] Remove unused variable. NFC.	2021-07-01 12:24:55 +02:00
Stella Laurenzo	485cc55edf	[mlir] Generare .cpp.inc files for dialects. * Previously, we were only generating .h.inc files. We foresee the need to also generate implementations and this is a step towards that. * Discussed in https://llvm.discourse.group/t/generating-cpp-inc-files-for-dialects/3732/2 * Deviates from the discussion above by generating a default constructor in the .cpp.inc file (and adding a tablegen bit that disables this in case if this is user provided). * Generating the destructor started as a way to flush out the missing includes (produces a link error), but it is a strict improvement on its own that is worth doing (i.e. by emitting key methods in the .cpp file, we root vtables in one translation unit, which is a non-controversial improvement). Differential Revision: https://reviews.llvm.org/D105070	2021-06-29 20:10:30 +00:00
Eugene Zhulenev	c1194c2ec3	[mlir:Async] Change async-parallel-for block size/count calculation Depends On D105037 Avoid creating too many tasks when the number of workers is large. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D105126	2021-06-29 12:57:11 -07:00
Eugene Zhulenev	f57b2420b2	[mlir:Async] Add an async reference counting pass based on the user defined policy Depends On D104999 Automatic reference counting based on the liveness analysis can add a lot of reference counting overhead at runtime. If the IR is known to be constrained to few particular "shapes", it's much more efficient to provide a custom reference counting policy that will specify where it is required to update the async value reference count. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D105037	2021-06-29 12:53:09 -07:00
Eugene Zhulenev	9ccdaac8f9	[mlir:Async] Fix a bug in automatic refence counting around function calls Depends On D104998 Function calls "transfer ownership" to the callee and it puts additional constraints on the reference counting optimization pass Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D104999	2021-06-29 09:35:43 -07:00
Eugene Zhulenev	a8f819c6d8	[mlir:Async] Remove async operations if it is statically known that the parallel operation has a single compute block Depends On D104850 Add a test that verifies that canonicalization removes all async overheads if it is statically known that the scf.parallel operation will be computed using a single block. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D104891	2021-06-29 09:26:28 -07:00
Eugene Zhulenev	34a164c938	[mlir:Async] Submit accidentally omitted changes Accidentally pushed old branches that did not include all the changes discussed in the PRs. https://reviews.llvm.org/rGd43b23608ad664f02f56e965ca78916bde220950 https://reviews.llvm.org/rG86ad0af87054c3cccd68d32e103a6f1f6c6194c7 Differential Revision: https://reviews.llvm.org/D104943	2021-06-25 12:23:02 -07:00
Eugene Zhulenev	86ad0af870	[mlir:Async] Implement recursive async work splitting for scf.parallel operation (async-parallel-for pass) Depends On D104780 Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks. Algorithm outline: 1. Collapse scf.parallel dimensions into a single dimension 2. Compute the block size for the parallel operations from the 1d problem size 3. Launch parallel tasks 4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space 5. Each parallel task computes the original parallel operation body using scf.for loop nest Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D104850	2021-06-25 10:34:39 -07:00
Eugene Zhulenev	d43b23608a	[mlir:Async] Add the size parameter to the async.group Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group. Reviewed By: ftynse, herhut Differential Revision: https://reviews.llvm.org/D104780	2021-06-25 10:26:50 -07:00
Christian Sigg	674dd9d08e	[mlir] Fix body-less async.execute printing Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D103686	2021-06-09 08:07:11 +02:00
Eugene Zhulenev	8f23fac4da	[mlir:Async] Convert assertions to async errors only inside async functions Differential Revision: https://reviews.llvm.org/D103278	2021-05-27 12:49:00 -07:00
Eugene Zhulenev	9136b7d075	[mlir] AsyncRefCounting: check that LivenessBlockInfo is not nullptr Differential Revision: https://reviews.llvm.org/D103270	2021-05-27 10:54:21 -07:00
Eugene Zhulenev	d8c84d2a4e	[mlir] Async: Add error propagation support to async groups Depends On D103109 If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D103203	2021-05-27 09:35:11 -07:00
Eugene Zhulenev	39957aa424	[mlir] Add error state and error propagation to async runtime values Depends On D103102 Not yet implemented: 1. Error handling after synchronous await 2. Error handling for async groups Will be addressed in the followup PRs Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D103109	2021-05-27 09:28:47 -07:00
Eugene Zhulenev	c412979cde	[mlir] Async reference counting for block successors with divergent reference counted liveness Support reference counted values implicitly passed (live) only to some of the successors. Example: if branched to ^bb2 token will leak, unless `drop_ref` operation is properly created ``` ^entry: %token = async.runtime.create : !async.token cond_br %cond, ^bb1, ^bb2 ^bb1: async.runtime.await %token async.runtime.drop_ref %token br ^bb2 ^bb2: return ``` Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D103102	2021-05-27 09:21:59 -07:00
Nico Weber	297a5b7cbc	[mlir] hopefully final round of iwyu fixes after `ba7a92c01e`	2021-04-21 11:03:06 -04:00
River Riddle	4efb7754e0	[mlir][NFC] Add a using directive for llvm::SetVector Differential Revision: https://reviews.llvm.org/D100436	2021-04-15 16:09:34 -07:00
Eugene Zhulenev	8a316b00d6	[mlir] Convert async dialect passes from function passes to op agnostic passes Differential Revision: https://reviews.llvm.org/D100401	2021-04-13 11:46:00 -07:00
Eugene Zhulenev	a6628e596e	[mlir] Async: add automatic reference counting at async.runtime operations level Depends On D95311 Previous automatic-ref-counting pass worked with high level async operations (e.g. async.execute), however async values reference counting is a runtime implementation detail. New pass mostly relies on the save liveness analysis to place drop_ref operations, and does better verification of CFG with different liveIn sets in block successors. This is almost NFC change. No new reference counting ideas, just a cleanup of the previous version. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95390	2021-04-12 18:54:55 -07:00
Mehdi Amini	973ddb7d6e	Define a `NoTerminator` traits that allows operations with a single block region to not provide a terminator In particular for Graph Regions, the terminator needs is just a historical artifact of the generalization of MLIR from CFG region. Operations like Module don't need a terminator, and before Module migrated to be an operation with region there wasn't any needed. To validate the feature, the ModuleOp is migrated to use this trait and the ModuleTerminator operation is deleted. This patch is likely to break clients, if you're in this case: - you may iterate on a ModuleOp with `getBody()->without_terminator()`, the solution is simple: just remove the ->without_terminator! - you created a builder with `Builder::atBlockTerminator(module_body)`, just use `Builder::atBlockEnd(module_body)` instead. - you were handling ModuleTerminator: it isn't needed anymore. - for generic code, a `Block::mayNotHaveTerminator()` may be used. Differential Revision: https://reviews.llvm.org/D98468	2021-03-25 03:59:03 +00:00
Chris Lattner	dc4e913be9	[PatternMatch] Big mechanical rename OwningRewritePatternList -> RewritePatternSet and insert -> add. NFC This doesn't change APIs, this just cleans up the many in-tree uses of these names to use the new preferred names. We'll keep the old names around for a couple weeks to help transitions. Differential Revision: https://reviews.llvm.org/D99127	2021-03-22 17:20:50 -07:00
Chris Lattner	3a506b31a3	Change OwningRewritePatternList to carry an MLIRContext with it. This updates the codebase to pass the context when creating an instance of OwningRewritePatternList, and starts removing extraneous MLIRContext parameters. There are many many more to be removed. Differential Revision: https://reviews.llvm.org/D99028	2021-03-21 10:06:31 -07:00
Mehdi Amini	79f736c150	Switch generatedTypeParser/generatedAttributeParser to return an OptionalParseResult This allows the caller to distinguish between a parse error or an unmatched keyword. It fixes the redundant error that was emitted by the caller when the generated parser would fail. Differential Revision: https://reviews.llvm.org/D98162	2021-03-09 19:43:45 +00:00
River Riddle	3dfa86149e	[mlir][IR] Refactor the internal implementation of Value The current implementation of Value involves a pointer int pair with several different kinds of owners, i.e. BlockArgumentImpl, Operation , TrailingOpResult. This design arose from the desire to save memory overhead for operations that have a very small number of results (generally 0-2). There are, unfortunately, many problematic aspects of the current implementation that make Values difficult to work with or just inefficient. Operation result types are stored as a separate array on the Operation. This is very inefficient for many reasons: we use TupleType for multiple results, which can lead to huge amounts of memory usage if multi-result operations change types frequently(they do). It also means that simple methods like Value::getType/Value::setType now require complex logic to get to the desired type. Value only has one pointer bit free, severely limiting the ability to use it in things like PointerUnion/PointerIntPair. Given that we store the kind of a Value along with the "owner" pointer, we only leave one bit free for users of Value. This creates situations where we end up nesting PointerUnions to be able to use Value in one. As noted above, most of the methods in Value need to branch on at least 3 different cases which is both inefficient, possibly error prone, and verbose. The current storage of results also creates problems for utilities like ValueRange/TypeRange, which want to efficiently store base pointers to ranges (of which Operation isn't really useful as one). This revision greatly simplifies the implementation of Value by the introduction of a new ValueImpl class. This class contains all of the state shared between all of the various derived value classes; i.e. the use list, the type, and the kind. This shared implementation class provides several large benefits: * Most of the methods on value are now branchless, and often one-liners. * The "kind" of the value is now stored in ValueImpl instead of Value This frees up all of Value's pointer bits, allowing for users to take full advantage of PointerUnion/PointerIntPair/etc. It also allows for storing more operation results as "inline", 6 now instead of 2, freeing up 1 word per new inline result. * Operation result types are now stored in the result, instead of a side array This drops the size of zero-result operations by 1 word. It also removes the memory crushing use of TupleType for operations results (which could lead up to hundreds of megabytes of "dead" TupleTypes in the context). This also allowed restructured ValueRange, making it simpler and one word smaller. This revision does come with two conceptual downsides: * Operation::getResultTypes no longer returns an ArrayRef<Type> This conceptually makes some usages slower, as the iterator increment is slightly more complex. * OpResult::getOwner is slightly more expensive, as it now requires a little bit of arithmetic From profiling, neither of the conceptual downsides have resulted in any perceivable hit to performance. Given the advantages of the new design, most compiles are slightly faster. Differential Revision: https://reviews.llvm.org/D97804	2021-03-03 14:33:37 -08:00
Christian Sigg	8c074cb0b7	[mlir] Mark OpState::getAttrs() deprecated. Fix call sites. The method will be removed 2 weeks later. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D97464	2021-02-25 20:54:42 +01:00
Eugene Zhulenev	25f80e16d1	[mlir] Async: add a separate pass to lower from async to async.coro and async.runtime Depends On D95000 Move async.execute outlining and async -> async.runtime lowering into the separate Async transformation pass Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95311	2021-01-26 03:33:20 -08:00
Eugene Zhulenev	2f7baffdc1	[mlir:async] Use ODS to define async types Depends On D94923 Migrate Async dialect to ODS `TypeDef` Reviewed By: ftynse, rriddle Differential Revision: https://reviews.llvm.org/D95000	2021-01-26 02:37:50 -08:00
Eugene Zhulenev	9c53b8e52e	[mlir:Async] Add intermediate async.coro and async.runtime operations to simplify Async to LLVM lowering [NFC] No new functionality, mostly a cleanup and one more abstraction level between Async and LLVM IR. Instead of lowering from Async to LLVM coroutines and Async Runtime API in one shot, do it progressively via async.coro and async.runtime operations. 1. Lower from async to async.runtime/coro (e.g. async.execute to function with coro setup and runtime calls) 2. Lower from async.runtime/coro to LLVM intrinsics and runtime API calls Intermediate coro/runtime operations will allow to run transformations on a higher level IR and do not try to match IR based on the LLVM::CallOp properties. Although async.coro is very close to LLVM coroutines, it is not exactly the same API, instead it is optimized for usability in async lowering, and misses a lot of details that are present in @llvm.coro intrinsic. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D94923	2021-01-25 14:04:33 -08:00
Kazuaki Ishizaki	f88fab5006	[mlir] NFC: fix trivial typos fix typo under include and lib directories Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D94220	2021-01-08 02:10:12 +09:00
River Riddle	1b97cdf885	[mlir][IR][NFC] Move context/location parameters of builtin Type::get methods to the start of the parameter list This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified. Differential Revision: https://reviews.llvm.org/D93432	2020-12-17 13:01:36 -08:00
Christian Sigg	0bf4a82a5a	[mlir] Use mlir::OpState::operator->() to get to methods of mlir::Operation. This is a preparation step to remove the corresponding methods from OpState. Reviewed By: silvas, rriddle Differential Revision: https://reviews.llvm.org/D92878	2020-12-09 12:11:32 +01:00
Eugene Zhulenev	94e645f9cc	[mlir] Async: Add numWorkerThreads argument to createAsyncParallelForPass Add an option to pass the number of worker threads to select the number of async regions for parallel for transformation. ``` std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass(int numWorkerThreads); ``` Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D92835	2020-12-08 10:30:14 -08:00
Christian Sigg	c4a0405902	Add `Operation* OpState::operator->()` to provide more convenient access to members of Operation. Given that OpState already implicit converts to Operator*, this seems reasonable. The alternative would be to add more functions to OpState which forward to Operation. Reviewed By: rriddle, ftynse Differential Revision: https://reviews.llvm.org/D92266	2020-12-02 15:46:20 +01:00
Eugene Zhulenev	a86a9b5ef7	[mlir] Automatic reference counting for Async values + runtime support for ref counted objects Depends On D89963 Automatic reference counting algorithm outline: 1. `ReturnLike` operations forward the reference counted values without modifying the reference count. 2. Use liveness analysis to find blocks in the CFG where the lifetime of reference counted values ends, and insert `drop_ref` operations after the last use of the value. 3. Insert `add_ref` before the `async.execute` operation capturing the value, and pairing `drop_ref` before the async body region terminator, to release the captured reference counted value when execution completes. 4. If the reference counted value is passed only to some of the block successors, insert `drop_ref` operations in the beginning of the blocks that do not have reference coutned value uses. Reviewed By: silvas Differential Revision: https://reviews.llvm.org/D90716	2020-11-20 03:08:44 -08:00
Eugene Zhulenev	c30ab6c2a3	[mlir] Transform scf.parallel to scf.for + async.execute Depends On D89958 1. Adds `async.group`/`async.awaitall` to group together multiple async tokens/values 2. Rewrite scf.parallel operation into multiple concurrent async.execute operations over non overlapping subranges of the original loop. Example: ``` scf.for (%i, %j) = (%lbi, %lbj) to (%ubi, %ubj) step (%si, %sj) { "do_some_compute"(%i, %j): () -> () } ``` Converted to: ``` %c0 = constant 0 : index %c1 = constant 1 : index // Compute blocks sizes for each induction variable. %num_blocks_i = ... : index %num_blocks_j = ... : index %block_size_i = ... : index %block_size_j = ... : index // Create an async group to track async execute ops. %group = async.create_group scf.for %bi = %c0 to %num_blocks_i step %c1 { %block_start_i = ... : index %block_end_i = ... : index scf.for %bj = %c0 t0 %num_blocks_j step %c1 { %block_start_j = ... : index %block_end_j = ... : index // Execute the body of original parallel operation for the current // block. %token = async.execute { scf.for %i = %block_start_i to %block_end_i step %si { scf.for %j = %block_start_j to %block_end_j step %sj { "do_some_compute"(%i, %j): () -> () } } } // Add produced async token to the group. async.add_to_group %token, %group } } // Await completion of all async.execute operations. async.await_all %group ``` In this example outer loop launches inner block level loops as separate async execute operations which will be executed concurrently. At the end it waits for the completiom of all async execute operations. Reviewed By: ftynse, mehdi_amini Differential Revision: https://reviews.llvm.org/D89963	2020-11-13 04:02:56 -08:00
Eugene Zhulenev	bb0d5f767d	[mlir] Add NumberOfExecutions analysis + update RegionBranchOpInterface interface to query number of region invocations Implements RFC discussed in: https://llvm.discourse.group/t/rfc-operationinstancesinterface-or-any-better-name/2158/10 Reviewed By: silvas, ftynse, rriddle Differential Revision: https://reviews.llvm.org/D90922	2020-11-11 01:43:17 -08:00
John Demme	035e12e664	[MLIR] [ODS] Allowing attr-dict in custom directive Enhance tblgen's declarative assembly format to allow `attr-dict` in custom directives. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D89772	2020-10-28 01:24:16 +00:00
Christian Sigg	8c176b6029	[mlir] Catch async.yield operands not matching the number of async.execute results. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D90211	2020-10-27 19:39:34 +01:00
Eugene Zhulenev	61dce0f308	[mlir] Add async.await operation to async dialect Add async.await operation to "unwrap" async.values Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D89137	2020-10-12 21:05:36 -07:00
Eugene Zhulenev	4e69a52952	[MLIR] Add async token/value arguments to async.execute op Async execute operation can take async arguments as dependencies. Change `async.execute` custom parser/printer format to use `%value as %unwrapped: !async.value<!type>` sytax. Reviewed By: mehdi_amini, herhut Differential Revision: https://reviews.llvm.org/D88601	2020-10-09 08:52:27 -07:00
Eugene Zhulenev	655af658c9	[MLIR] Add async.value type to Async dialect Return values from async regions as !async.value<...>. Reviewed By: mehdi_amini, csigg Differential Revision: https://reviews.llvm.org/D88510	2020-09-30 11:30:06 -07:00
Eugene Zhulenev	05a3b4fe30	[MLIR] Add Async dialect with trivial async.region operation Start Async dialect for modeling asynchronous execution. Reviewed By: mehdi_amini, herhut Differential Revision: https://reviews.llvm.org/D88459	2020-09-29 11:11:08 -07:00

48 Commits