llvm-project

Commit Graph

Author	SHA1	Message	Date
Christian Sigg	8b2eb7c494	[mlir] Add in-dialect lowering of gpu.all_reduce. Reviewers: ftynse, nicolasvasilache, herhut Reviewed By: ftynse, herhut Subscribers: liufengdb, aartbik, herhut, merge_guards_bot, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72129	2020-01-20 13:43:43 +01:00
Benjamin Kramer	0133cc60e4	Revert "[mlir] Create a gpu.module operation for the GPU Dialect." This reverts commit `4624a1e8ac`. Causing problems downstream.	2020-01-15 17:52:17 +01:00
Tres Popp	4624a1e8ac	[mlir] Create a gpu.module operation for the GPU Dialect. Summary: This is based on the use of code constantly checking for an attribute on a model and instead represents the distinct operaion with a different op. Instead, this op can be used to provide better filtering. Reviewers: herhut, mravishankar, antiagainst, rriddle Reviewed By: herhut, antiagainst, rriddle Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72336	2020-01-14 12:05:47 +01:00
Alex Zinenko	08778d8c4f	[mlir][GPU] introduce utilities for promotion to workgroup memory Introduce a set of function that promote a memref argument of a `gpu.func` to workgroup memory using memory attribution. The promotion boils down to additional loops performing the copy from the original argument to the attributed memory in the beginning of the function, and back at the end of the function using all available threads. The loop bounds are specified so as to adapt to any size of the workgroup. These utilities are intended to compose with other existing utilities (loop coalescing and tiling) in cases where the distribution of work across threads is uneven, e.g. copying a 2D memref with only the threads along the "x" dimension. Similarly, specialization of the kernel to specific launch sizes should be implemented as a separate pass combining constant propagation and canonicalization. Introduce a simple attribute-driven pass to test the promotion transformation since we don't have a heuristic at the moment. Differential revision: https://reviews.llvm.org/D71904	2020-01-09 10:06:00 +01:00
Christian Sigg	42d46b4efa	Add gpu.shuffle op. This will allow us to lower most of gpu.all_reduce (when all_reduce doesn't exist in the target dialect) within the GPU dialect, and only do target-specific lowering for the shuffle op. PiperOrigin-RevId: 286548256	2019-12-20 02:52:52 -08:00
Alex Zinenko	40ef46fba4	Harden the requirements to memory attribution types in gpu.func When memory attributions are present in `gpu.func`, require that they are of memref type and live in memoryspaces 3 and 5 for workgroup and private memory attributions, respectively. Adapt the conversion from the GPU dialect to the NVVM dialect to drop the private memory space from attributions as NVVM is able to model them as local `llvm.alloca`s in the default memory space. PiperOrigin-RevId: 286161763	2019-12-18 03:38:55 -08:00
Alex Zinenko	6273fa0c6a	Plug gpu.func into the GPU lowering pipelines This updates the lowering pipelines from the GPU dialect to lower-level dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation instead of a standard function annotated with an attribute. In particular, the kernel outlining is updated to produce gpu.func instead of std.func and the individual conversions are updated to consume gpu.funcs and disallow standard funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved in the generic syntax, but can also be used with the custom syntax on gpu.funcs. The special kind of function for GPU allows one to use additional features such as memory attribution. PiperOrigin-RevId: 285822272	2019-12-16 12:12:48 -08:00
Alex Zinenko	d5e627f84b	Introduce Linkage attribute to the LLVM dialect LLVM IR supports linkage on global objects such as global variables and functions. Introduce the Linkage attribute into the LLVM dialect, backed by an integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory. Implement parsing/printing of the attribute and conversion to LLVM IR. See tensorflow/mlir#277. PiperOrigin-RevId: 283309328	2019-12-02 03:28:10 -08:00
Alex Zinenko	bf4692dc49	Introduce gpu.func Introduce a new function-like operation to the GPU dialect to provide a placeholder for the execution semantic description and to add support for GPU memory hierarchy. This aligns with the overall goal of the dialect to expose the common abstraction layer for GPU devices, in particular by providing an MLIR unit of semantics (i.e. an operation) for memory modeling. This proposal has been discussed in the mailing list: https://groups.google.com/a/tensorflow.org/d/msg/mlir/RfXNP7Hklsc/MBNN7KhjAgAJ As decided, the "convergence" aspect of the execution model will be factored out into a new discussion and therefore is not included in this commit. This commit only introduces the operation but does not hook it up with the remaining flow. The intention is to develop the new flow while keeping the old flow operational and do the switch in a simple, separately reversible commit. PiperOrigin-RevId: 282357599	2019-11-25 08:10:37 -08:00
Alex Zinenko	b5af3784a6	Don't force newline before function attributes Due to legacy reasons, a newline character followed by two spaces was always inserted before the attributes of the function Op in pretty form. This breaks formatting when functions are nested in some other operations. Don't print the newline and just put the attributes on the same line, which is also more consistent with module Op. Line breaking aware of indentation can be introduced separately into the parser if deemed useful. PiperOrigin-RevId: 281721793	2019-11-21 05:08:19 -08:00
Stephan Herhut	abb626686d	Extend kernel outlining to also consider dim worth inlining. PiperOrigin-RevId: 281483447	2019-11-20 02:59:35 -08:00
MLIR Team	9fbf52e330	Look for SymbolRefAttr in KernelOutlining instead of hard-coding CallOp This code should be exercised using the existing kernel outlining unit test, but let me know if I should add a dedicated unit test using a fake call instruction as well. PiperOrigin-RevId: 279436321	2019-11-08 19:13:13 -08:00
River Riddle	2b61b7979e	Convert the Canonicalize and CSE passes to generic Operation Passes. This allows for them to be used on other non-function, or even other function-like, operations. The algorithms are already generic, so this is simply changing the derived pass type. The majority of this change is just ensuring that the nesting of these passes remains the same, as the pass manager won't auto-nest them anymore. PiperOrigin-RevId: 276573038	2019-10-24 15:01:09 -07:00
Kazuaki Ishizaki	f28c5aca17	Fix minor spelling tweaks (NFC) Closes tensorflow/mlir#175 PiperOrigin-RevId: 275726876	2019-10-20 09:44:36 -07:00
Stephan Herhut	3622e1833f	Use StrEnumAttr for gpu.allreduce op instead of StringAttr to better encode constraints. PiperOrigin-RevId: 275448372	2019-10-18 04:44:48 -07:00
Christian Sigg	fe0ee32da5	Add gpu.barrier op to synchronize invocations of a local workgroup. Adding gen table for rewrite patterns from GPU to NVVM dialect. Copy missing op documentation from GPUOps.td to GPU.md. PiperOrigin-RevId: 275419588	2019-10-18 00:30:44 -07:00
Christian Sigg	d2f0f847af	Support custom accumulator provided as region to gpu.all_reduce. In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region. Adds a gpu.yield op to specify the result of the accumulation. Also support more types (integers) and accumulations (mul). PiperOrigin-RevId: 275065447	2019-10-16 10:43:44 -07:00
Alex Zinenko	90d65d32d6	Use named modules for gpu.launch_func The kernel function called by gpu.launch_func is now placed into an isolated nested module during the outlining stage to simplify separate compilation. Until recently, modules did not have names and could not be referenced. This limitation was circumvented by introducing a stub kernel at the same name at the same nesting level as the module containing the actual kernel. This relation is only effective in one direction: from actual kernel function to its launch_func "caller". Leverage the recently introduced symbol name attributes on modules to refer to a specific nested module from `gpu.launch_func`. This removes the implicit connection between the identically named stub and kernel functions. It also enables support for `gpu.launch_func`s to call different kernels located in the same module. PiperOrigin-RevId: 273491891	2019-10-08 04:30:32 -07:00
Nicolas Vasilache	ddf737c5da	Promote MemRefDescriptor to a pointer to struct when passing function boundaries in LLVMLowering. The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio). Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues. Solving the ABI problem generally is a very complex problem and most likely involves taking a dependence on clang that we do not want atm. A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement. PiperOrigin-RevId: 271591708	2019-09-27 09:57:36 -07:00
Christian Sigg	116dac00ba	Add AllReduceOp to GPU dialect with lowering to NVVM. The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup". The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024. PiperOrigin-RevId: 271290265	2019-09-26 00:17:50 -07:00
Christian Sigg	74cdbf5909	Clone called functions into nested GPU module. PiperOrigin-RevId: 270891190	2019-09-24 06:29:54 -07:00
Christian Sigg	b8676da1fc	Outline GPU kernel function into a nested module. Roll forward of commit `5684a12`. When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module. PiperOrigin-RevId: 270639748	2019-09-23 03:17:01 -07:00
George Karpenkov	2df646bef6	Automated rollback of commit `5684a12434` PiperOrigin-RevId: 270126672	2019-09-19 14:34:30 -07:00
MLIR Team	5684a12434	Outline GPU kernel function into a nested module. When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module. PiperOrigin-RevId: 269987720	2019-09-19 01:51:28 -07:00
Stephan Herhut	318ff019cf	Addressing some late review comments on kernel inlining. Just formatting and better lit tests, no functional change. PiperOrigin-RevId: 267942907	2019-09-09 01:15:47 -07:00
Stephan Herhut	7eb25cd367	Make GPU kernel outlining test independent of value names. PiperOrigin-RevId: 267323604	2019-09-05 01:46:26 -07:00
Stephan Herhut	dfd06af562	Make GPU kernel outlining inline constants. It is generally beneficial to pass less arguments to a kernel, so cloning constants into the kernel is beneficial. PiperOrigin-RevId: 267139084	2019-09-04 06:16:07 -07:00
Stephan Herhut	e90542c03b	Add verification for dimension attribute on GPUDialect index operations. PiperOrigin-RevId: 266073204	2019-08-28 23:39:57 -07:00
Alex Zinenko	60965b4612	Move GPU dialect to {lib,include/mlir}/Dialect Per tacit agreement, individual dialects should now live in lib/Dialect/Name with headers in include/mlir/Dialect/Name and tests in test/Dialect/Name. PiperOrigin-RevId: 259896851	2019-07-25 00:41:17 -07:00

29 Commits