llvm-project

Commit Graph

Author	SHA1	Message	Date
Christian Sigg	2c48e3629c	[MLIR] Adding gpu.host_register op and lower it to a runtime call. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D85631	2020-08-10 22:46:17 +02:00
Mehdi Amini	4091413c00	Remove debug flags from test (NFC)	2020-08-02 16:59:20 +00:00
Christian Sigg	2dd7a9cc2d	[MLIR] NFC: Rename mcuMemHostRegister* to mgpuMemHostRegister* to make it consistent with the other cuda-runner functions and ROCm. Summary: Rename mcuMemHostRegister* to mgpuMemHostRegister*. Reviewers: herhut Reviewed By: herhut Subscribers: yaxunl, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes Tags: #mlir Differential Revision: https://reviews.llvm.org/D84583	2020-07-27 15:48:05 +02:00
Frederik Gossen	904f91db5f	[MLIR][Standard] Make the `dim` operation index an operand. Allow for dynamic indices in the `dim` operation. Rather than an attribute, the index is now an operand of type `index`. This allows to apply the operation to dynamically ranked tensors. The correct lowering of dynamic indices remains to be implemented. Differential Revision: https://reviews.llvm.org/D81551	2020-06-10 13:54:47 +00:00
Christian Sigg	62adfed30a	Unrank mcuMemHostRegister tensor argument. Reviewers: herhut Reviewed By: herhut Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80118	2020-05-19 13:58:54 +02:00
Stephan Herhut	69040d5b0b	[MLIR] Allow for multiple gpu modules during translation. This change makes the ModuleTranslation threadsafe by locking on the LLVMContext. Furthermore, we now clone the llvm module into a new context when compiling to PTX similar to what the OrcJit does. Differential Revision: https://reviews.llvm.org/D78207	2020-04-16 14:18:31 +02:00
Christian Sigg	b43ae21e60	Fix all-reduce int tests by host-registering memrefs. Reduce amount of boiler plate to register host memory. Summary: Fix all-reduce int tests by host-registering memrefs. Reviewers: herhut Reviewed By: herhut Subscribers: clementval, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76563	2020-03-23 11:48:13 +01:00
Valentin Clement	d4d62fcab6	[MLIR] Add test for multiple gpu.all_reduce in the same kernel when lowering to NVVM Summary: This patch add tests when lowering multiple `gpu.all_reduce` operations in the same kernel. This was previously failing. Differential Revision: https://reviews.llvm.org/D75930	2020-03-19 16:36:38 +01:00
Valentin Clement	c7380995f8	[MLIR] Add `and`, `or`, `xor`, `min`, `max` too gpu.all_reduce and the nvvm lowering Summary: This patch add some builtin operation for the gpu.all_reduce ops. - for Integer only: `and`, `or`, `xor` - for Float and Integer: `min`, `max` This is useful for higher level dialect like OpenACC or OpenMP that can lower to the GPU dialect. Differential Revision: https://reviews.llvm.org/D75766	2020-03-11 14:07:04 +01:00
Stephan Herhut	f6790a1c63	Revert "[MLIR] Add `and`, `or`, `xor`, `min`, `max` too gpu.all_reduce and the nvvm lowering" Attribution to original author got lost.	2020-03-11 14:07:04 +01:00
Stephan Herhut	2eff566b07	[MLIR] Add `and`, `or`, `xor`, `min`, `max` too gpu.all_reduce and the nvvm lowering Summary: This patch add some builtin operation for the gpu.all_reduce ops. - for Integer only: `and`, `or`, `xor` - for Float and Integer: `min`, `max` This is useful for higher level dialect like OpenACC or OpenMP that can lower to the GPU dialect. Differential Revision: https://reviews.llvm.org/D75766	2020-03-10 21:09:06 +01:00
Alex Zinenko	5a1778057f	[mlir] use unpacked memref descriptors at function boundaries The existing (default) calling convention for memrefs in standard-to-LLVM conversion was motivated by interfacing with LLVM IR produced from C sources. In particular, it passes a pointer to the memref descriptor structure when calling the function. Therefore, the descriptor is allocated on stack before the call. This convention leads to several problems. PR44644 indicates a problem with stack exhaustion when calling functions with memref-typed arguments in a loop. Allocating outside of the loop may lead to concurrent access problems in case the loop is parallel. When targeting GPUs, the contents of the stack-allocated memory for the descriptor (passed by pointer) needs to be explicitly copied to the device. Using an aggregate type makes it impossible to attach pointer-specific argument attributes pertaining to alignment and aliasing in the LLVM dialect. Change the default calling convention for memrefs in standard-to-LLVM conversion to transform a memref into a list of arguments, each of primitive type, that are comprised in the memref descriptor. This avoids stack allocation for ranked memrefs (and thus stack exhaustion and potential concurrent access problems) and simplifies the device function invocation on GPUs. Provide an option in the standard-to-LLVM conversion to generate auxiliary wrapper function with the same interface as the previous calling convention, compatible with LLVM IR porduced from C sources. These auxiliary functions pack the individual values into a descriptor structure or unpack it. They also handle descriptor stack allocation if necessary, serving as an allocation scope: the memory reserved by `alloca` will be freed on exiting the auxiliary function. The effect of this change on MLIR-generated only LLVM IR is minimal. When interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration only needs to require auxiliary functions and change the function name to call the wrapper function instead of the original function. This also opens the door to forwarding aliasing and alignment information from memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.	2020-02-10 15:03:43 +01:00
Stephan Herhut	283b5e733d	[MLIR] Make gpu.launch implicitly capture uses of values defined above. Summary: In the original design, gpu.launch required explicit capture of uses and passing them as operands to the gpu.launch operation. This was motivated by infrastructure restrictions rather than design. This change lifts the requirement and removes the concept of kernel arguments from gpu.launch. Instead, the kernel outlining transformation now does the explicit capturing. This is a breaking change for users of gpu.launch. Differential Revision: https://reviews.llvm.org/D73769	2020-02-03 10:08:48 +01:00
Stephan Herhut	2692751895	Add 'gpu.terminator' operation. Summary: The 'gpu.terminator' operation is used as the terminator for the regions of gpu.launch. This is to disambugaute them from the return operation on 'gpu.func' functions. This is a breaking change and users of the gpu dialect will need to adapt their code when producting 'gpu.launch' operations. Reviewers: nicolasvasilache Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73620	2020-01-30 12:41:41 +01:00
Christian Sigg	42d46b4efa	Add gpu.shuffle op. This will allow us to lower most of gpu.all_reduce (when all_reduce doesn't exist in the target dialect) within the GPU dialect, and only do target-specific lowering for the shuffle op. PiperOrigin-RevId: 286548256	2019-12-20 02:52:52 -08:00
nmostafa	daff60cd68	Add UnrankedMemRef Type Closes tensorflow/mlir#261 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/261 from nmostafa:nmostafa/unranked 96b6e918f6ed64496f7573b2db33c0b02658ca45 PiperOrigin-RevId: 284037040	2019-12-05 13:13:20 -08:00
Nicolas Vasilache	9059cf392d	Automated rollback of commit `d60133f89b` PiperOrigin-RevId: 282574110	2019-11-26 08:47:48 -08:00
Christian Sigg	d60133f89b	Changing directory shortcut for CPU/GPU runner utils. Moving cuda-runtime-wrappers.so into subdirectory to match libmlir_runner_utils.so. Provide parent directory when running test and load .so from subdirectory. PiperOrigin-RevId: 282410749	2019-11-25 12:30:54 -08:00
Christian Sigg	d7c17195a4	Change CUDA tests to use print_memref. Swap dimensions in all-reduce-op test. PiperOrigin-RevId: 281791744	2019-11-21 11:26:36 -08:00
Christian Sigg	f868adafee	Make type and rank explicit in mcuMemHostRegister function. Fix registered size of indirect MemRefType kernel arguments. PiperOrigin-RevId: 281362940	2019-11-19 13:13:02 -08:00
Christian Sigg	d2f0f847af	Support custom accumulator provided as region to gpu.all_reduce. In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region. Adds a gpu.yield op to specify the result of the accumulation. Also support more types (integers) and accumulations (mul). PiperOrigin-RevId: 275065447	2019-10-16 10:43:44 -07:00
Christian Sigg	7c765d97f9	Support reduction of partial warps. gpu.all_reduce now supports block sizes that are not multiple of 32. PiperOrigin-RevId: 273255204	2019-10-07 03:31:00 -07:00
Christian Sigg	1129931a62	Change all_reduce lowering to support 2D and 3D blocks. Perform second reduce only with first warp. This requires an additional __sync_threads(), but doesn't need special handling when the last warp is small. This simplifies support for block sizes that are not multiple of 32. Supporting partial warp reduce will be done in a separate CL. PiperOrigin-RevId: 272168917	2019-10-01 02:51:15 -07:00
Christian Sigg	116dac00ba	Add AllReduceOp to GPU dialect with lowering to NVVM. The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup". The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024. PiperOrigin-RevId: 271290265	2019-09-26 00:17:50 -07:00
Alex Zinenko	0d82a292b0	JitRunner: support entry functions returning void JitRunner can use as entry points functions that produce either a single '!llvm.f32' value or a list of memrefs. Memref support is legacy and was introduced before MLIR could lower memref allocation and deallocation to malloc/free calls so as to allocate the memory externally, and is likely to be dropped in the future since it unconditionally runs affine+standard-to-llvm lowering on the module instead of accepting the LLVM dialect. CUDA runner relies on memref-based flow in the runner without actually returning anything. Introduce a runner flow to use functions that return void as entry points. PiperOrigin-RevId: 264381686	2019-08-20 07:46:17 -07:00
Stephan Herhut	e8b21a75f8	Add an mlir-cuda-runner tool. This tool allows to execute MLIR IR snippets written in the GPU dialect on a CUDA capable GPU. For this to work, a working CUDA install is required and the build has to be configured with MLIR_CUDA_RUNNER_ENABLED set to 1. PiperOrigin-RevId: 256551415	2019-07-04 07:53:54 -07:00

26 Commits