Commit Graph

4 Commits

Author SHA1 Message Date
Christian Sigg 1129931a62 Change all_reduce lowering to support 2D and 3D blocks.
Perform second reduce only with first warp. This requires an additional __sync_threads(), but doesn't need special handling when the last warp is small. This simplifies support for block sizes that are not multiple of 32.

Supporting partial warp reduce will be done in a separate CL.

PiperOrigin-RevId: 272168917
2019-10-01 02:51:15 -07:00
Christian Sigg 116dac00ba Add AllReduceOp to GPU dialect with lowering to NVVM.
The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup".

The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024.

PiperOrigin-RevId: 271290265
2019-09-26 00:17:50 -07:00
Alex Zinenko 0d82a292b0 JitRunner: support entry functions returning void
JitRunner can use as entry points functions that produce either a single
'!llvm.f32' value or a list of memrefs.  Memref support is legacy and was
introduced before MLIR could lower memref allocation and deallocation to
malloc/free calls so as to allocate the memory externally, and is likely to be
dropped in the future since it unconditionally runs affine+standard-to-llvm
lowering on the module instead of accepting the LLVM dialect.  CUDA runner
relies on memref-based flow in the runner without actually returning anything.
Introduce a runner flow to use functions that return void as entry points.

PiperOrigin-RevId: 264381686
2019-08-20 07:46:17 -07:00
Stephan Herhut e8b21a75f8 Add an mlir-cuda-runner tool.
This tool allows to execute MLIR IR snippets written in the GPU dialect
on a CUDA capable GPU. For this to work, a working CUDA install is required
and the build has to be configured with MLIR_CUDA_RUNNER_ENABLED set to 1.

PiperOrigin-RevId: 256551415
2019-07-04 07:53:54 -07:00