llvm-project

Commit Graph

Author	SHA1	Message	Date
Denis Khalikov	bfb2ce0256	[mlir][vulkan-runner] Use C-compatible wrapper emission. A memref argument is converted into a pointer-to-struct argument of type `{T, T, i64, i64[N], i64[N]}*` in the wrapper function, where T is the converted element type and N is the memref rank. Differential Revision: https://reviews.llvm.org/D76059	2020-03-17 07:54:41 -04:00
Lei Zhang	3148f10b17	[mlir][spirv] Use spv.vce in spv.module and wire up (de)serialization This commits changes the definition of spv.module to use the #spv.vce attribute for specifying (version, capabilities, extensions) triple so that we can have better API and custom assembly form. Since now we have proper modelling of the triple, (de)serialization is wired up to use them. With the new UpdateVCEPass, we don't need to manually specify the required extensions and capabilities anymore when creating a spv.module. One just need to call UpdateVCEPass before serialization to get the needed version/extensions/capabilities. Differential Revision: https://reviews.llvm.org/D75872	2020-03-12 19:37:45 -04:00
Lei Zhang	e115a40f50	[mlir][spirv] Use separate attribute for (version, capabilities, extensions) We also need the (version, capabilities, extensions) triple on the spv.module op. Thus far we have been using separate 'extensions' and 'capabilities' attributes there and 'version' is missing. Creating a separate attribute for the trip allows us to reuse the assembly form and verification. Differential Revision: https://reviews.llvm.org/D75868	2020-03-12 19:37:45 -04:00
aartbik	078776a679	[mlir] [VectorOps] Progressively lower vector.outerproduct to LLVM Summary: This replaces the direct lowering of vector.outerproduct to LLVM with progressive lowering into elementary vectors ops to avoid having the similar lowering logic at several places. NOTE1: with the new progressive rule, the lowered llvm is slightly more elaborate than with the direct lowering, but the generated assembly is just as optimized; still if we want to stay closer to the original, we should add a "broadcast on extract" to shuffle rewrite (rather than special cases all the lowering steps) NOTE2: the original outerproduct lowering code should now be removed but some linalg test work directly on vector and contain some dead code, so this requires another CL Reviewers: nicolasvasilache, andydavis1 Reviewed By: nicolasvasilache, andydavis1 Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75956	2020-03-12 13:45:42 -07:00
Christian Sigg	fc421d7ca3	[MLIR] Remove all-reduce lowering from GPU to NVVM. Use in-dialect lowering instead. Reviewers: herhut, mravishankar Reviewed By: herhut Subscribers: merge_guards_bot, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73794	2020-03-11 15:17:54 +01:00
Denis Khalikov	1090a83069	[mlir][vulkan-runner] Update mlir-vulkan-runner execution driver. * Adds GpuLaunchFuncToVulkanLaunchFunc conversion pass. * Moves a serialization of the `spirv::Module` from LaunchFuncToVulkanCalls pass to newly created pass. * Updates LaunchFuncToVulkanCalls instrumentation pass, adds `initVulkan` and `deinitVulkan` runtime calls. * Adds `bindResource` call to bind specifc resource by the given descriptor set and descriptor binding. * Eliminates static construction and desctruction of `VulkanRuntimeManager`. Differential Revision: https://reviews.llvm.org/D75192	2020-03-10 15:58:31 -04:00
Nicolas Vasilache	63b683a816	[mlir][Vector] Add a vector.matrix_multiply op on 1-D vectors Summary: This op mirrors the llvm.intr counterpart and allows lowering + type conversions in a progressive fashion. Differential Revision: https://reviews.llvm.org/D75775	2020-03-09 13:34:03 -04:00
Alexander Belyaev	3147342ae7	[MLIR] Change custom printer/parser for loop.parallel and loop.reduce.	2020-03-09 15:11:48 +01:00
River Riddle	988249a506	[mlir] Refactor a few users to no longer rely on the successor operand API of Operation. The existing API for successor operands on operations is in the process of being removed. This revision simplifies a later one that completely removes the existing API. Differential Revision: https://reviews.llvm.org/D75316	2020-03-05 12:51:59 -08:00
River Riddle	01f7431b5b	[mlir][DeclarativeParser] Add support for formatting operations with AttrSizedOperandSegments. This attribute details the segment sizes for operand groups within the operation. This revision add support for automatically populating this attribute in the declarative parser. Differential Revision: https://reviews.llvm.org/D75315	2020-03-05 12:51:28 -08:00
Alex Zinenko	aff6bf4ff8	[mlir] support conversion of parallel reduction loops to std Recently introduced support for converting sequential reduction loops to CFG of basic blocks in the Standard dialect makes it possible to perform a staged conversion of parallel reduction loops into a similar CFG by using sequential loops as an intermediate step. This is already the case for parallel loops without reduction, so extend the pattern to support an additional use case. Differential Revision: https://reviews.llvm.org/D75599	2020-03-04 16:37:17 +01:00
Adrian Kuegel	91acb5b3e1	Add rsqrt op to Standard dialect and lower it to LLVM dialect. Summary: This adds an rsqrt op to the standard dialect, and lowers it as 1 / sqrt to the LLVM dialect. Differential Revision: https://reviews.llvm.org/D75353	2020-03-04 13:13:31 +01:00
Alex Zinenko	8ba8ab8c95	[mlir] support reductions in loop to std conversion Summary: Introduce support for converting loop.for operations with loop-carried values to a CFG in the standard dialect. This is achieved by passing loop-carried values as block arguments to the loop condition block. This block dominates both the loop body and the block immediately following the loop, so the arguments of this block are remain visible there. Differential Revision: https://reviews.llvm.org/D75513	2020-03-03 18:21:13 +01:00
Stephan Herhut	10ec1860a8	[MLIR][GPU] Add error checking to loop.parallel to gpu transform. Summary: Instead of crashing on malformed input, the pass now produces error messages. Differential Revision: https://reviews.llvm.org/D75468	2020-03-03 13:29:09 +01:00
Stephan Herhut	d17428d951	[MLIR][GPU] fix loop trip count computation in LoopsToGPU Summary: Added brackets to fix the loop trip count computation. The brackets ensure the bounds are subtracted before we divide the result by the step of the loop. Differential Revision: https://reviews.llvm.org/D75449	2020-03-02 15:53:33 +01:00
Tim Shen	67c1615440	[MLIR] Add vector support for fpexp and fptrunc. Differential Revision: https://reviews.llvm.org/D75150	2020-02-28 12:24:45 -08:00
Tim Shen	0d65000e11	[MLIR] Add llvm.mlir.cast op for semantic preserving cast between dialect types. Summary: See discussion here: https://llvm.discourse.group/t/rfc-dialect-type-cast-op/538/11 Reviewers: ftynse Subscribers: bixia, sanjoy.google, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Differential Revision: https://reviews.llvm.org/D75141	2020-02-28 12:20:23 -08:00
Adrian Kuegel	39e1c1fa9e	Add GPU lowerings for the different log ops. Summary: This adds GPU lowerings for log, log10 and log2. Reviewers: mravishankar, herhut Subscribers: jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75239	2020-02-27 15:25:02 +01:00
Stephan Herhut	5e6d724633	[MLIR][GPU] Properly model step in parallel loop to gpu conversion. Summary: The original patch had TODOs to add support for step computations, which this commit addresses. The computations are expressed using affine expressions so that the affine canonicalizers can simplify the full bound and index computations. Also cleans up the code a little and exposes the pass in the header file. Differential Revision: https://reviews.llvm.org/D75052	2020-02-25 14:22:50 +01:00
Frank Laub	fe210a1ff2	[MLIR] Add std.atomic_rmw op Summary: The RFC for this op is here: https://llvm.discourse.group/t/rfc-add-std-atomic-rmw-op/489 The std.atmomic_rmw op provides a way to support read-modify-write sequences with data race freedom. It is intended to be used in the lowering of an upcoming affine.atomic_rmw op which can be used for reductions. A lowering to LLVM is provided with 2 paths: - Simple patterns: llvm.atomicrmw - Everything else: llvm.cmpxchg Differential Revision: https://reviews.llvm.org/D74401	2020-02-24 16:54:21 -08:00
Hanhan Wang	29ad9d6b26	[mlir][spirv] Add lowering for load/store zero-rank memref from std to SPIR-V. Differential Revision: https://reviews.llvm.org/D74874	2020-02-21 14:41:12 -05:00
Nagy Mostafa	bc7b26c333	[MLIR] Allow Loop dialect IfOp and ForOp to define values This patch implements the RFCs proposed here: https://llvm.discourse.group/t/rfc-modify-ifop-in-loop-dialect-to-yield-values/463 https://llvm.discourse.group/t/rfc-adding-operands-and-results-to-loop-for/459/19. It introduces the following changes: - All Loop Ops region, except for ReduceOp, terminate with a YieldOp. - YieldOp can have variadice operands that is used to return values out of IfOp and ForOp regions. - Change IfOp and ForOp syntax and representation to define values. - Add unit-tests and update .td documentation. - YieldOp is a terminator to loop.for/if/parallel - YieldOp custom parser and printer Lowering is not supported at the moment, and will be in a follow-up PR. Thanks. Reviewed By: bondhugula, nicolasvasilache, rriddle Differential Revision: https://reviews.llvm.org/D74174	2020-02-21 10:05:32 -08:00
Tim Shen	f581e655ec	[MLIR] Add std.assume_alignment op. Reviewers: ftynse, nicolasvasilache, andydavis1 Subscribers: bixia, sanjoy.google, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74378	2020-02-18 17:55:07 -08:00
Alex Zinenko	39cb2a8fc7	[mlir] Fix argument attribute attribute reassignment in ConvertStandardToLLVM The commit switching the calling convention for memrefs (`5a1778057`) inadvertently introduced a bug in the function argument attribute conversion: due to incorrect indexing of function arguments it was not assigning the attributes to the arguments beyond those generated from the first original argument. This was not caught in the commit since the test suite does have a test for converting multi-argument functions with argument attributes. Fix the bug and add relevant tests.	2020-02-14 10:22:33 +01:00
Denis Khalikov	a062a3ed7f	[mlir][spirv] Add ConvertGpuLaunchFuncToVulkanCallsPass Implement a pass to convert gpu.launch_func op into a sequence of Vulkan runtime calls. The Vulkan runtime API surface is huge so currently we don't expose separate external functions in IR for each of them, instead we expose a few external functions to wrapper libraries which manages Vulkan runtime. Differential Revision: https://reviews.llvm.org/D74549	2020-02-13 14:10:07 -05:00
Stephan Herhut	715783d415	[MLIR][GPU] Implement initial mapping from loop.parallel to gpu.launch. Summary: To unblock other work, this implements basic lowering based on mapping attributes that have to be provided on all loop.parallel. The lowering does not yet support reduce. Differential Revision: https://reviews.llvm.org/D73893	2020-02-13 16:54:16 +01:00
Tobias Gysi	4f865b7794	[mlir] support creating memref descriptors from static shape with non-zero offset This patch adapts the method MemRefDescriptor::fromStaticShape to support static non-zero offsets. The updated method uses the getStridesAndOffset method to extract strides and offset. The patch also adapts the test cases since sizes and strides are now set in forward instead of reverse order. Differential Revision: https://reviews.llvm.org/D74474	2020-02-12 22:40:49 +01:00
Pierre Oechsel	fd11cda251	[mlir] StdToLLVM: Add error when the sourceMemRef of a subview is not a llvm type. A memref_cast casting to a memref with a non identity map can't be lowered to llvm. Take the following case: ``` func @invalid_memref_cast(%arg0: memref<?x?xf64>) { %c1 = constant 1 : index %c0 = constant 0 : index %5 = memref_cast %arg0 : memref<?x?xf64> to memref<?x?xf64, #map1> %25 = std.subview %5[%c0, %c0][%c1, %c1][] : memref<?x?xf64, #map1> to memref<?x?xf64, #map1> return } ``` When lowering the subview mlir was assuming `%5` to have an llvm type (which is not the case as mlir failed to lower the memref_cast). Differential Revision: https://reviews.llvm.org/D74466	2020-02-12 15:13:18 +01:00
Lei Zhang	d3e7816d85	[mlir][spirv] Introduce spv.func Thus far we have been using builtin func op to model SPIR-V functions. It was because builtin func op used to have special treatment in various parts of the core codebase (e.g., pass pipelines, etc.) and it's easy to bootstrap the development of the SPIR-V dialect. But nowadays with general op concepts and region support we don't have such limitations and it's time to tighten the SPIR-V dialect for completeness. This commits introduces a spv.func op to properly model SPIR-V functions. Compared to builtin func op, it can provide the following benefits: * We can control the full op so we can integrate SPIR-V information bits (e.g., function control) in a more integrated way and define our own assembly form and enforcing better verification. * We can have a better dialect and library boundary. At the current moment only functions are modelled with an external op. With this change, all ops modelling SPIR-V concpets will be spv.* ops and registered to the SPIR-V dialect. * We don't need to special-case func op anymore when creating ConversionTarget declaring SPIR-V dialect as legal. This is quite important given we'll see more and more conversions in the future. In the process, bumps a few FuncOp methods to the FunctionLike trait. Differential Revision: https://reviews.llvm.org/D74226	2020-02-12 07:46:43 -05:00
aartbik	e83b7b99da	[mlir] [VectorOps] Implement vector.reduce operation Summary: This new operation operates on 1-D vectors and forms the bridge between vector.contract and llvm intrinsics for vector reductions. Reviewers: nicolasvasilache, andydavis1, ftynse Reviewed By: nicolasvasilache Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74370	2020-02-11 11:31:59 -08:00
Diego Caballero	696f80736b	[mlir] Turn flags in ConvertStandardToLLVM into pass flags Follow-up on D72802. Turn -convert-std-to-llvm-use-alloca and -convert-std-to-llvm-bare-ptr-memref-call-conv into pass flags of LLVMLoweringPass. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D73912	2020-02-11 10:28:30 -08:00
Alex Zinenko	ea3a25e4f5	[mlir] StdToLLVM: add a separate test for the new memref calling convention	2020-02-11 13:56:25 +01:00
Lei Zhang	50aeeed8a2	[mlir][spirv] Use spv.entry_point_abi in GPU to SPIR-V conversions We have spv.entry_point_abi for specifying the local workgroup size. It should be decorated onto input gpu.func ops to drive the SPIR-V CodeGen to generate the proper SPIR-V module execution mode. Compared to using command-line options for specifying the configuration, using attributes also has the benefits that 1) we are now able to use different local workgroup for different entry points and 2) the tests contains the configuration directly. Differential Revision: https://reviews.llvm.org/D74012	2020-02-10 16:24:48 -05:00
Tobias Gysi	1555d7f729	[mlir] subview op lowering for target memrefs with const offset The current standard to llvm conversion pass lowers subview ops only if dynamic offsets are provided. This commit extends the lowering with a code path that uses the constant offset of the target memref for the subview op lowering (see Example 3 of the subview op definition for an example) if no dynamic offsets are provided. Differential Revision: https://reviews.llvm.org/D74280	2020-02-10 17:35:17 +01:00
Alex Zinenko	5a1778057f	[mlir] use unpacked memref descriptors at function boundaries The existing (default) calling convention for memrefs in standard-to-LLVM conversion was motivated by interfacing with LLVM IR produced from C sources. In particular, it passes a pointer to the memref descriptor structure when calling the function. Therefore, the descriptor is allocated on stack before the call. This convention leads to several problems. PR44644 indicates a problem with stack exhaustion when calling functions with memref-typed arguments in a loop. Allocating outside of the loop may lead to concurrent access problems in case the loop is parallel. When targeting GPUs, the contents of the stack-allocated memory for the descriptor (passed by pointer) needs to be explicitly copied to the device. Using an aggregate type makes it impossible to attach pointer-specific argument attributes pertaining to alignment and aliasing in the LLVM dialect. Change the default calling convention for memrefs in standard-to-LLVM conversion to transform a memref into a list of arguments, each of primitive type, that are comprised in the memref descriptor. This avoids stack allocation for ranked memrefs (and thus stack exhaustion and potential concurrent access problems) and simplifies the device function invocation on GPUs. Provide an option in the standard-to-LLVM conversion to generate auxiliary wrapper function with the same interface as the previous calling convention, compatible with LLVM IR porduced from C sources. These auxiliary functions pack the individual values into a descriptor structure or unpack it. They also handle descriptor stack allocation if necessary, serving as an allocation scope: the memory reserved by `alloca` will be freed on exiting the auxiliary function. The effect of this change on MLIR-generated only LLVM IR is minimal. When interfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integration only needs to require auxiliary functions and change the function name to call the wrapper function instead of the original function. This also opens the door to forwarding aliasing and alignment information from memrefs to LLVM IR pointers in the standrd-to-LLVM conversion.	2020-02-10 15:03:43 +01:00
MaheshRavishankar	aaddca1efd	[mlir][GPUToSPIRV] Modify the lowering of gpu.block_dim to be consistent with Vulkan SPEC The existing lowering of gpu.block_dim added a global variable with the WorkGroupSize decoration. This raises an error within Vulkan/SPIR-V validation since Vulkan requires this to have a constant initializer. This is not yet supported in SPIR-V dialect. Changing the lowering to return the workgroup size as a constant value instead, obtained from spv.entry_point_abi attribute gets around the issue for now. The validation goes through since the workgroup size is specified using spv.execution_mode operation.	2020-02-08 22:30:03 -08:00
Nicolas Vasilache	681f929f59	[mlir][VectorOps] Introduce a `vector.fma` op that works on n-D vectors and lowers to `llvm.intrin.fmuladd` Summary: The `vector.fma` operation is portable enough across targets that we do not want to keep it wrapped under `vector.outerproduct` and `llvm.intrin.fmuladd`. This revision lifts the op into the vector dialect and implements the lowering to LLVM by using two patterns: 1. a pattern that lowers from n-D to (n-1)-D by unrolling when n > 2 2. a pattern that converts from 1-D to the proper LLVM representation Reviewers: ftynse, stellaraccident, aartbik, dcaballe, jsetoain, tetuante Reviewed By: aartbik Subscribers: fhahn, dcaballe, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74075	2020-02-07 15:44:53 -05:00
Nicolas Vasilache	499ad45877	[mlir][VectorOps] Expose and use llvm.intrin.fma* Summary: This revision exposes the portable `llvm.fma` intrinsic in LLVMOps and uses it in lieu of `llvm.fmuladd` when lowering the `vector.outerproduct` op to LLVM. This guarantees proper `fma` instructions will be emitted if the target ISA supports it. `llvm.fmuladd` does not have this guarantee in its semantics, despite evidence that the proper x86 instructions are emitted. For more details, see https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic. Reviewers: ftynse, aartbik, dcaballe, fhahn Reviewed By: aartbik Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74219	2020-02-07 15:38:40 -05:00
aartbik	e52414b1ae	[mlir][VectorOps] Generalized vector.print to i32/i64 Summary: Lowering to LLVM IR was restricted to float/double. This CL also adds the integral values. Reviewers: andydavis1, nicolasvasilache, ftynse Reviewed By: nicolasvasilache, ftynse Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74179	2020-02-07 09:25:30 -08:00
Lei Zhang	13b197c7d1	[mlir][spirv] Add dialect-specific attribute for target environment We were using normal dictionary attribute for target environment specification. It becomes cumbersome with more and more fields. This commit changes the modelling to a dialect-specific attribute, where we can have control over its storage and assembly form. Differential Revision: https://reviews.llvm.org/D73959	2020-02-04 21:33:13 -05:00
Lei Zhang	399887c9e4	[mlir][spirv] Add resource limits into target environment This commit adds two resource limits, max_compute_workgroup_size and max_compute_workgroup_invocations as resource limits to the target environment. They are not used at the current moment, but they will affect the SPIR-V CodeGen. Adding for now to have a proper target environment modelling. Differential Revision: https://reviews.llvm.org/D73905	2020-02-04 08:35:19 -05:00
Stephan Herhut	283b5e733d	[MLIR] Make gpu.launch implicitly capture uses of values defined above. Summary: In the original design, gpu.launch required explicit capture of uses and passing them as operands to the gpu.launch operation. This was motivated by infrastructure restrictions rather than design. This change lifts the requirement and removes the concept of kernel arguments from gpu.launch. Instead, the kernel outlining transformation now does the explicit capturing. This is a breaking change for users of gpu.launch. Differential Revision: https://reviews.llvm.org/D73769	2020-02-03 10:08:48 +01:00
Diego Caballero	e5aaf30cf1	[mlir] Introduce bare ptr calling convention for MemRefs in LLVM dialect Summary: This patch introduces an alternative calling convention for MemRef function arguments in LLVM dialect. It converts MemRef function arguments to LLVM bare pointers to the MemRef element type instead of creating a MemRef descriptor. Bare pointers are then promoted to a MemRef descriptors at the beginning of the function. This calling convention is only enabled with a flag. Reviewers: ftynse, bondhugula, nicolasvasilache, rriddle, mehdi_amini Reviewed By: ftynse, rriddle, mehdi_amini Subscribers: Joonsoo, flaub, merge_guards_bot, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, aartbik, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72802	2020-01-31 15:19:38 -08:00
aartbik	c8fc76a99b	[mlir] [VectorOps] fixed bug in vector.insert_strided_slice lowering Summary: Rationale: When lowering to LLVM for different rank insert (n vs k), the offset arrays needs to drop one dimension (becomes n-1), but the strides array needs to be preserved (remains k). With regression test. Note that this example was actually in the documentation, so extra important to do it right :-) Reviewers: nicolasvasilache, andydavis1, ftynse Reviewed By: nicolasvasilache, ftynse Subscribers: Joonsoo, merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73733	2020-01-31 11:29:46 -08:00
Lei Zhang	df71000d7d	[mlir][spirv] Convert linalg.generic for reduction to SPIR-V ops This commit adds a pattern to lower linalg.generic for reduction to spv.GroupNonUniform* ops. Right now this only supports integer reduction on 1-D input memref. Shader entry point ABI is queried to make sure that the input memref's shape matches the local workgroup's invocation configuration. This makes sure that the workload fits in one local workgroup so that we can leverage SPIR-V group non-uniform operations. linglg.generic is a structured op that preserves the right level of information. It is easier to recognize reduction at this level than performing analysis on loops. This commit also exposes `getElementPtr` in SPIRVLowering.h given that it's a generally useful utility function. Differential Revision: https://reviews.llvm.org/D73437	2020-01-31 09:37:04 -05:00
Stephan Herhut	84695dd4d7	Fix conversion of loops to GPU with no block/thread dimensions. Summary: The current code assumes that one always maps at least one loop to block dimensions and at least one loop to thread dimensions. If either is not the case, a loop would get mapped twice. Differential Revision: https://reviews.llvm.org/D73685	2020-01-31 11:00:28 +01:00
Lubomir Litchev	fcabccd3d9	[MLIR] Add the sqrt operation to mlir. Summary: Add and pipe through the sqrt operation for Standard and LLVM dialects. Reviewers: nicolasvasilache, ftynse Reviewed By: ftynse Subscribers: frej, ftynse, merge_guards_bot, flaub, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73571	2020-01-30 08:07:38 -08:00
Julian Gross	addc27bc43	Changed wrong ROCDL instructions in GPU lowering. Summary: In the scope of the lowering phase from GPU to ROCDL, the intructions for the conversion patterns seems to be wrong. According to https://github.com/ROCm-Developer-Tools/HIP/blob/master/include/hip/hcc_detail/math_fwd.h the instructions need two underscores in the beginning instead of one. Reviewers: nicolasvasilache, herhut, rriddle Reviewed By: herhut, rriddle Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73535	2020-01-30 15:37:00 +01:00
Stephan Herhut	2692751895	Add 'gpu.terminator' operation. Summary: The 'gpu.terminator' operation is used as the terminator for the regions of gpu.launch. This is to disambugaute them from the return operation on 'gpu.func' functions. This is a breaking change and users of the gpu dialect will need to adapt their code when producting 'gpu.launch' operations. Reviewers: nicolasvasilache Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73620	2020-01-30 12:41:41 +01:00
Stephan Herhut	fdcecefe30	Add lowering for loop.parallel to cfg. Summary: This also removes the explicit pattern for loop.terminator to ensure that the terminator is only erased if the parent op is rewritten. Reductions are not yet supported. Reviewers: nicolasvasilache Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73348	2020-01-28 11:55:51 +01:00
Julian Gross	664d2f5bad	Add tanh lowering from Standard dialect to NVVM and ROCDL. Summary: The tanh lowering from Standard dialect to NVVM and ROCDL was not working. The conversion pattern are inserted in the lowering files. The test cases for the lowerings were added in the test files. Reviewers: nicolasvasilache, ftynse, herhut Reviewed By: ftynse, herhut Subscribers: merge_guards_bot, ftynse, jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73471	2020-01-28 11:01:10 +01:00
aartbik	459cf6e500	[mlir] [VectorOps] Lowering of vector.extract/insert_slices to LLVM IR Summary: Uses progressive lowering to convert vector.extract_slices and vector_insert_slices to equivalent vector operations that can be subsequently lowered into LLVM. Reviewers: nicolasvasilache, andydavis1, rriddle Reviewed By: nicolasvasilache, rriddle Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72808	2020-01-27 10:35:48 -08:00
Denis Khalikov	4460cb5bcd	[mlir][spirv] Add lowering for composite std.constant. Add lowering for constant operation with ranked tensor type to spv.constant with spv.array type. Differential Revision: https://reviews.llvm.org/D73022	2020-01-22 08:25:00 -05:00
Denis Khalikov	3023352a7d	[mlir][spirv] Simplify scalar type size calculation. Simplify scalar type size calculation and reject boolean memrefs. Differential Revision: https://reviews.llvm.org/D72999	2020-01-21 12:15:37 -05:00
Tres Popp	9a52ea5cf9	Create a gpu.module operation for the GPU Dialect. Summary: This is based on the use of code constantly checking for an attribute on a model and instead represents the distinct operaion with a different op. Instead, this op can be used to provide better filtering. Reverts "Revert "[mlir] Create a gpu.module operation for the GPU Dialect."" This reverts commit ac446302ca4145cdc89f377c0c364c29ee303be5 after fixing internal Google issues. This additionally updates ROCDL lowering to use the new gpu.module. Reviewers: herhut, mravishankar, antiagainst, nicolasvasilache Subscribers: jholewinski, mgorny, mehdi_amini, jpienaar, burmako, shauheen, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits, mravishankar, rriddle, antiagainst, bkramer Tags: #llvm Differential Revision: https://reviews.llvm.org/D72921	2020-01-21 14:05:03 +01:00
Denis Khalikov	29779894af	[mlir][spirv] Add lowering from `loop.if` to `spv.selection` When lowering `loop.if` to `spv.selection` we explicitly create a selection header block before the control flow diverges and a merge block where control flow subsequently converges. Differential Revision: https://reviews.llvm.org/D72836	2020-01-17 12:04:12 -05:00
Benjamin Kramer	0133cc60e4	Revert "[mlir] Create a gpu.module operation for the GPU Dialect." This reverts commit `4624a1e8ac`. Causing problems downstream.	2020-01-15 17:52:17 +01:00
Tres Popp	4624a1e8ac	[mlir] Create a gpu.module operation for the GPU Dialect. Summary: This is based on the use of code constantly checking for an attribute on a model and instead represents the distinct operaion with a different op. Instead, this op can be used to provide better filtering. Reviewers: herhut, mravishankar, antiagainst, rriddle Reviewed By: herhut, antiagainst, rriddle Subscribers: liufengdb, aartbik, jholewinski, mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72336	2020-01-14 12:05:47 +01:00
River Riddle	4268e4f4b8	[mlir] Change the syntax of AffineMapAttr and IntegerSetAttr to avoid conflicts with function types. Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`. Reviewed By: nicolasvasilache, ftynse Differential Revision: https://reviews.llvm.org/D72429	2020-01-13 13:24:39 -08:00
Julian Gross	202ab273e6	[mlir] Added missing GPU lowering ops. Summary: This diff adds missing GPU lowering ops to MLIR. Reviewers: herhut, pifon2a, ftynse Tags: #pre-merge_beta_testing, #llvm Differential Revision: https://reviews.llvm.org/D72439	2020-01-13 17:10:54 +01:00
Nicolas Vasilache	2d515e49d8	[mlir][VectorOps] Implement insert_strided_slice conversion Summary: This diff implements the progressive lowering of insert_strided_slice. Two cases appear: 1. when the source and dest vectors have different ranks, extract the dest subvector at the proper offset and reduce to case 2. 2. when they have the same rank N: a. if the source and dest type are the same, the insertion is trivial: just forward the source b. otherwise, iterate over all N-1 D subvectors and create an extract/insert_strided_slice/insert replacement, reducing the problem to vecotrs of the same N-1 rank. This combines properly with the other conversion patterns to lower all the way to LLVM. Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante, nicolasvasilache Reviewed By: andydavis1 Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72317	2020-01-09 03:13:01 -05:00
Nicolas Vasilache	65678d9384	[mlir][VectorOps] Implement strided_slice conversion Summary: This diff implements the progressive lowering of strided_slice to either: 1. extractelement + insertelement for the 1-D case 2. extract + optional strided_slice + insert for the n-D case. This combines properly with the other conversion patterns to lower all the way to LLVM. Appropriate tests are added. Reviewers: ftynse, rriddle, AlexEichenberger, andydavis1, tetuante Reviewed By: andydavis1 Subscribers: merge_guards_bot, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72310	2020-01-09 03:03:51 -05:00
Denis Khalikov	eac01f63a6	[mlir][spirv] Add lowering for std.fpext, std.fptrunc, std.sitofp. Differential Revision: https://reviews.llvm.org/D72137	2020-01-07 22:13:07 -05:00
Lei Zhang	dab2921f77	Revert "[mlir][spirv] Add lowering for std.fpext, std.fptrunc, std.sitofp." This reverts commit `7e7f849a6d` because it recorded the wrong commit author.	2020-01-07 22:11:17 -05:00
Denis Khalikov	dd495e8a87	[mlir][spirv] Add lowering for std cmp ops. Differential Revision: https://reviews.llvm.org/D72296	2020-01-07 21:51:51 -05:00
Denis Khalikov	9883b14cd1	[mlir][spirv] Add lowering for standard bit ops Differential Revision: https://reviews.llvm.org/D72205	2020-01-07 21:45:54 -05:00
Lei Zhang	7e7f849a6d	[mlir][spirv] Add lowering for std.fpext, std.fptrunc, std.sitofp. Differential Revision: https://reviews.llvm.org/D72137	2020-01-07 21:28:49 -05:00
Lei Zhang	b3d2867769	[mlir][spirv] Fix shader ABI attribute prefix and add verification This commit fixes shader ABI attributes to use `spv.` as the prefix so that they match the dialect's namespace. This enables us to add verification hooks in the SPIR-V dialect to verify them. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D72062	2020-01-03 07:44:27 -05:00
Tung Le Duc	e5957ac3d7	[mlir] Fix the wrong computation of dynamic strides for lowering AllocOp to LLVM Leftover change from before the MLIR merge, reviewed at accepted at https://github.com/tensorflow/mlir/pull/338.	2019-12-28 23:33:28 +01:00
MaheshRavishankar	c3d3569d4c	[mlir] Convert std.and/std.or ops to spv.LogicalAnd/spv.LogicalOr The conversion from std.and/std.or to spv.LogicalAnd/spv.LogicalOr is only valid for boolean (i1) types. Modify BinaryOpPattern in StandardToSPIRV.td to allow limiting the type of the operands for which the pattern is applied. Differential Revision: https://reviews.llvm.org/D71881	2019-12-27 11:33:17 -08:00
Manuel Freiberger	22954a0e40	Add integer bit-shift operations to the standard dialect. Rename the 'shlis' operation in the standard dialect to 'shift_left'. Add tests for this operation (these have been missing so far) and add a lowering to the 'shl' operation in the LLVM dialect. Add also 'shift_right_signed' (lowered to LLVM's 'ashr') and 'shift_right_unsigned' (lowered to 'lshr'). The original plan was to name these operations 'shift.left', 'shift.right.signed' and 'shift.right.unsigned'. This works if the operations are prefixed with 'std.' in MLIR assembly. Unfortunately during import the short form is ambigous with operations from a hypothetical 'shift' dialect. The best solution seems to omit dots in standard operations for now. Closes tensorflow/mlir#226 PiperOrigin-RevId: 286803388	2019-12-22 10:02:13 -08:00
Aart Bik	1d47564a53	[VectorOps] unify vector dialect "subscripts" PiperOrigin-RevId: 286650682	2019-12-20 15:33:04 -08:00
Christian Sigg	42d46b4efa	Add gpu.shuffle op. This will allow us to lower most of gpu.all_reduce (when all_reduce doesn't exist in the target dialect) within the GPU dialect, and only do target-specific lowering for the shuffle op. PiperOrigin-RevId: 286548256	2019-12-20 02:52:52 -08:00
Aart Bik	15f800f4bc	[VectorOps] minor cleanup: vector dialect "subscripts" are i32 Introduces some centralized methods to move towards consistent use of i32 as vector subscripts. Note: sizes/strides/offsets attributes are still i64 PiperOrigin-RevId: 286434133	2019-12-19 11:51:08 -08:00
Aart Bik	d9b500d3bb	[VectorOps] Add vector.print definition, with lowering support Examples: vector.print %f : f32 vector.print %x : vector<4xf32> vector.print %y : vector<3x4xf32> vector.print %z : vector<2x3x4xf32> LLVM lowering replaces these with fully unrolled calls into a small runtime support library that provides some basic printing operations (single value, opening closing bracket, comma, newline). PiperOrigin-RevId: 286230325	2019-12-18 11:31:34 -08:00
Uday Bondhugula	47034c4bc5	Introduce prefetch op: affine -> std -> llvm intrinsic Introduce affine.prefetch: op to prefetch using a multi-dimensional subscript on a memref; similar to affine.load but has no effect on semantics, but only on performance. Provide lowering through std.prefetch, llvm.prefetch and map to llvm's prefetch instrinsic. All attributes reflected through the lowering - locality hint, rw, and instr/data cache. affine.prefetch %0[%i, %j + 5], false, 3, true : memref<400x400xi32> Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#225 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/225 from bondhugula:prefetch 4c3b4e93bc64d9a5719504e6d6e1657818a2ead0 PiperOrigin-RevId: 286212997	2019-12-18 10:00:04 -08:00
Alex Zinenko	40ef46fba4	Harden the requirements to memory attribution types in gpu.func When memory attributions are present in `gpu.func`, require that they are of memref type and live in memoryspaces 3 and 5 for workgroup and private memory attributions, respectively. Adapt the conversion from the GPU dialect to the NVVM dialect to drop the private memory space from attributions as NVVM is able to model them as local `llvm.alloca`s in the default memory space. PiperOrigin-RevId: 286161763	2019-12-18 03:38:55 -08:00
Alex Zinenko	6273fa0c6a	Plug gpu.func into the GPU lowering pipelines This updates the lowering pipelines from the GPU dialect to lower-level dialects (NVVM, SPIRV) to use the recently introduced gpu.func operation instead of a standard function annotated with an attribute. In particular, the kernel outlining is updated to produce gpu.func instead of std.func and the individual conversions are updated to consume gpu.funcs and disallow standard funcs after legalization, if necessary. The attribute "gpu.kernel" is preserved in the generic syntax, but can also be used with the custom syntax on gpu.funcs. The special kind of function for GPU allows one to use additional features such as memory attribution. PiperOrigin-RevId: 285822272	2019-12-16 12:12:48 -08:00
Aart Bik	cd5dab8ad7	[VectorOps] Add [insert/extract]element definition together with lowering to LLVM Similar to insert/extract vector instructions but (1) work on 1-D vectors only (2) allow for a dynamic index %c3 = constant 3 : index %0 = vector.insertelement %arg0, %arg1[%c : index] : vector<4xf32> %1 = vector.extractelement %arg0[%c3 : index] : vector<4xf32> PiperOrigin-RevId: 285792205	2019-12-16 09:52:46 -08:00
Alex Zinenko	0684aa9a8b	Make memref promotion during std->LLVM lowering the default calling convention During the conversion from the standard dialect to the LLVM dialect, memref-typed arguments are promoted from registers to memory and passed into functions by pointer. This had been introduced into the lowering to work around the abesnce of calling convention modeling in MLIR to enable better interoperability with LLVM IR generated from C, and has been exerciced for several months. Make this promotion the default calling covention when converting to the LLVM dialect. This adds the documentation, simplifies the code and makes the conversion consistent across function operations and function types used in other places, e.g. in high-order functions or attributes, which would not follow the same rule previously. PiperOrigin-RevId: 285751280	2019-12-16 05:17:14 -08:00
River Riddle	b030e4a4ec	Try to fold operations in DialectConversion when trying to legalize. This change allows for DialectConversion to attempt folding as a mechanism to legalize illegal operations. This also expands folding support in OpBuilder::createOrFold to generate new constants when folding, and also enables it to work in the context of a PatternRewriter. PiperOrigin-RevId: 285448440	2019-12-13 16:47:26 -08:00
Aart Bik	1c81adf362	[VectorOps] Add lowering of vector.shuffle to LLVM IR For example, a shuffle %1 = vector.shuffle %arg0, %arg1 [0 : i32, 1 : i32] : vector<2xf32>, vector<2xf32> becomes a direct LLVM shuffle 0 = llvm.shufflevector %arg0, %arg1 [0 : i32, 1 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>"> but %1 = vector.shuffle %a, %b[1 : i32, 0 : i32, 2: i32] : vector<1x4xf32>, vector<2x4xf32> becomes the more elaborate (note the index permutation that drives argument selection for the extract operations) %0 = llvm.mlir.undef : !llvm<"[3 x <4 x float>]"> %1 = llvm.extractvalue %arg1[0] : !llvm<"[2 x <4 x float>]"> %2 = llvm.insertvalue %1, %0[0] : !llvm<"[3 x <4 x float>]"> %3 = llvm.extractvalue %arg0[0] : !llvm<"[1 x <4 x float>]"> %4 = llvm.insertvalue %3, %2[1] : !llvm<"[3 x <4 x float>]"> %5 = llvm.extractvalue %arg1[1] : !llvm<"[2 x <4 x float>]"> %6 = llvm.insertvalue %5, %4[2] : !llvm<"[3 x <4 x float>]"> PiperOrigin-RevId: 285268164	2019-12-12 14:11:56 -08:00
Nicolas Vasilache	782ae29678	Retire !linalg.buffer type - NFC This type is not used anymore now that Linalg view and subview have graduated to std and that alignment is supported on alloc. PiperOrigin-RevId: 285213424	2019-12-12 10:03:57 -08:00
Ehsan Toosi	f7bffad5a7	Added lowering of `std.tanh` to llvm function call to `tanh` and `tanhf`. Closes tensorflow/mlir#312 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/312 from dfki-ehna:tanh 9e89b072ff91ff390ad739501745114feb3ac856 PiperOrigin-RevId: 285205674	2019-12-12 09:25:15 -08:00
Christian Sigg	9b85582682	Automated rollback of commit `f68ac464d8` PiperOrigin-RevId: 285162061	2019-12-12 03:48:38 -08:00
Christian Sigg	f68ac464d8	Switch from shfl.bfly to shfl.down. Both work for the current use case, but the latter allows implementing prefix sums and is a little easier to understand for partial warps. PiperOrigin-RevId: 285145287	2019-12-12 01:28:01 -08:00
Denis Khalikov	d968f9696d	[spirv] Add lowering for std.fdiv, std.frem, std.fsub Closes tensorflow/mlir#313 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/313 from denis0x0D:sandbox/lowering_std_farith 41715070a74d13bfa9401957478978c1bb8006c0 PiperOrigin-RevId: 285023586	2019-12-11 11:17:35 -08:00
Aart Bik	9826fe5c9f	[VectorOps] Add lowering of vector.insert to LLVM IR For example, an insert %0 = vector.insert %arg0, %arg1[3 : i32] : f32 into vector<4xf32> becomes %0 = llvm.mlir.constant(3 : i32) : !llvm.i32 %1 = llvm.insertelement %arg0, %arg1[%0 : !llvm.i32] : !llvm<"<4 x float>"> A more elaborate example, inserting an element in a higher dimension vector %0 = vector.insert %arg0, %arg1[3 : i32, 7 : i32, 15 : i32] : f32 into vector<4x8x16xf32> becomes %0 = llvm.extractvalue %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]"> %1 = llvm.mlir.constant(15 : i32) : !llvm.i32 %2 = llvm.insertelement %arg0, %0[%1 : !llvm.i32] : !llvm<"<16 x float>"> %3 = llvm.insertvalue %2, %arg1[3 : i32, 7 : i32] : !llvm<"[4 x [8 x <16 x float>]]"> PiperOrigin-RevId: 284882443	2019-12-10 17:12:49 -08:00
Nicolas Vasilache	ad38e49806	Uniformize Vector transforms as patterns on the model of Linalg - NFC This reorganizes the vector transformations to be more easily testable as patterns and more easily composable into fused passes in the future. PiperOrigin-RevId: 284817474	2019-12-10 11:54:33 -08:00
Aart Bik	1fe65688d4	[VectorOps] Add a ShuffleOp to the VectorOps dialect For example %0 = vector.shuffle %x, %y [3 : i32, 2 : i32, 1 : i32, 0 : i32] : vector<2xf32>, vector<2xf32> yields a vector<4xf32> result with a permutation of the elements of %x and %y PiperOrigin-RevId: 284657191	2019-12-09 16:15:41 -08:00
Mahesh Ravishankar	4a62019eb8	Add lowering for module with gpu.kernel_module attribute. The existing GPU to SPIR-V lowering created a spv.module for every function with gpu.kernel attribute. A better approach is to lower the module that the function lives in (which has the attribute gpu.kernel_module) to a spv.module operation. This better captures the host-device separation modeled by GPU dialect and simplifies the lowering as well. PiperOrigin-RevId: 284574688	2019-12-09 09:52:21 -08:00
Andy Davis	312ccb1c0f	Unify vector op unrolling transformation. Unifies vector op unrolling transformation, by using the same unrolling implementation for contraction and elementwise operations. Removes fakefork/join operations which are non longer needed now that we have the InsertStridedSlice operation. PiperOrigin-RevId: 284570784	2019-12-09 09:35:15 -08:00
Kazuaki Ishizaki	ae05cf27c6	Minor spelling tweaks Closes tensorflow/mlir#304 PiperOrigin-RevId: 284568358	2019-12-09 09:23:48 -08:00
Aart Bik	d37f27251f	[VecOps] Rename vector.[insert\|extract]element to just vector.[insert\|extract] Since these operations lower to [insert\|extract][element\|value] at LLVM dialect level, neither element nor value would correctly reflect the meaning. PiperOrigin-RevId: 284240727	2019-12-06 12:39:25 -08:00
Aart Bik	b36aaeafb1	[VectorOps] Add lowering of vector.broadcast to LLVM IR For example, a scalar broadcast %0 = vector.broadcast %x : f32 to vector<2xf32> return %0 : vector<2xf32> which expands scalar x into vector [x,x] by lowering to the following LLVM IR dialect to implement the duplication over the leading dimension. %0 = llvm.mlir.undef : !llvm<"<2 x float>"> %1 = llvm.mlir.constant(0 : index) : !llvm.i64 %2 = llvm.insertelement %x, %0[%1 : !llvm.i64] : !llvm<"<2 x float>"> %3 = llvm.shufflevector %2, %0 [0 : i32, 0 : i32] : !llvm<"<2 x float>">, !llvm<"<2 x float>"> return %3 : vector<2xf32> In the trailing dimensions, the operand is simply "passed through", unless a more elaborate "stretch" is required. For example %0 = vector.broadcast %arg0 : vector<1xf32> to vector<4xf32> return %0 : vector<4xf32> becomes %0 = llvm.mlir.undef : !llvm<"<4 x float>"> %1 = llvm.mlir.constant(0 : index) : !llvm.i64 %2 = llvm.extractelement %arg0[%1 : !llvm.i64] : !llvm<"<1 x float>"> %3 = llvm.mlir.constant(0 : index) : !llvm.i64 %4 = llvm.insertelement %2, %0[%3 : !llvm.i64] : !llvm<"<4 x float>"> %5 = llvm.shufflevector %4, %0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : !llvm<"<4 x float>">, !llvm<"<4 x float>"> llvm.return %5 : !llvm<"<4 x float>"> PiperOrigin-RevId: 284219926	2019-12-06 11:02:29 -08:00
Alex Zinenko	e216a72ab8	Add conversions of GPU func with memory attributions to LLVM/NVVM GPU functions use memory attributions, a combination of Op attributes and region arguments, to specify function-wide buffers placed in workgroup or private memory spaces. Introduce a lowering pattern for GPU functions to be converted to LLVM functions taking into account memory attributions. Workgroup attributions get transformed into module-level globals with unique names derived from function names. Private attributions get converted into llvm.allocas inside the function body. In both cases, we inject at the beginning of the function the IR that obtains the raw pointer to the data and populates a MemRef descriptor based on the MemRef type of buffer, making attributions compose with the rest of the MemRef lowering and transparent for use with std.load and std.store. While using raw pointers instead of descriptors might have been more efficient, it is better implemented as a canonicalization or a separate transformation so that non-attribution memrefs could also benefit from it. PiperOrigin-RevId: 284208396	2019-12-06 10:08:43 -08:00
Andy Davis	41f8e105fa	Unroll vector masks along with their associated vector arguments. Updates vector ContractionOp to use proper vector masks (produced by CreateMaskOp/ConstantMaskOp). Leverages the following canonicalizations in unrolling unit test: CreateMaskOp -> ConstantMaskOp, StridedSliceOp(ConstantMaskOp) -> ConstantMaskOp Removes IndexTupleOp (no longer needed now that we have vector mask ops). Updates all unit tests. PiperOrigin-RevId: 284182168	2019-12-06 07:37:28 -08:00
Uday Bondhugula	3ade6a7d15	DimOp folding for alloc/view dynamic dimensions Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#253 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/253 from bondhugula:dimop a4b464f24ae63fd259114558d87e11b8ee4dae86 PiperOrigin-RevId: 284169689	2019-12-06 06:00:54 -08:00
nmostafa	daff60cd68	Add UnrankedMemRef Type Closes tensorflow/mlir#261 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/261 from nmostafa:nmostafa/unranked 96b6e918f6ed64496f7573b2db33c0b02658ca45 PiperOrigin-RevId: 284037040	2019-12-05 13:13:20 -08:00
Mahesh Ravishankar	4d61a79db4	Allow specification of the workgroup size for GPUToSPIRV lowering. SPIR-V/Vulkan spec requires the workgroups size to be specified with the spv.ExecutionMode operation. This was hard-wired to be set to a particular value. It is now changed to be configurable by clients of the pass or of the patterns that implement the lowering from GPU to SPIRV. PiperOrigin-RevId: 284017482	2019-12-05 11:31:57 -08:00
Nicolas Vasilache	b3f7cf80a7	Add a CL option to Standard to LLVM lowering to use alloca instead of malloc/free. In the future, a more configurable malloc and free interface should be used and exposed via extra parameters to the `createLowerToLLVMPass`. Until requirements are gathered, a simple CL flag allows generating code that runs successfully on hardware that cannot use the stdlib. PiperOrigin-RevId: 283833424	2019-12-04 14:16:00 -08:00
Nicolas Vasilache	edfaf925cf	Drop MaterializeVectorTransfers in favor of simpler declarative unrolling Now that we have unrolling as a declarative pattern, we can drop a full pass that has gone stale. In the future we may want to add specific unrolling patterns for VectorTransferReadOp. PiperOrigin-RevId: 283806880	2019-12-04 12:11:42 -08:00
Andy Davis	34e1f4aa51	Adds support for unrolling single-result vector operations with iterator type lists and indexing maps to a target vector size. Adds unit tests for unrolling the vector ContractionOp with different iteration orders. PiperOrigin-RevId: 283747503	2019-12-04 06:53:37 -08:00
Nicolas Vasilache	5c0c51a997	Refactor dependencies to expose Vector transformations as patterns - NFC This CL refactors some of the MLIR vector dependencies to allow decoupling VectorOps, vector analysis, vector transformations and vector conversions from each other. This makes the system more modular and allows extracting VectorToVector into VectorTransforms that do not depend on vector conversions. This refactoring exhibited a bunch of cyclic library dependencies that have been cleaned up. PiperOrigin-RevId: 283660308	2019-12-03 17:52:10 -08:00
Mahesh Ravishankar	c5ba37b6ae	Add a pass to legalize operations before lowering to SPIR-V. Not all StandardOps can be lowered to SPIR-V. For example, subview op implementation requires use of pointer bitcasts which is not valid according to SPIR-V spec (or at least is ambiguous about it). Such ops need to be removed/transformed before lowering to SPIR-V. The SPIRVLegalizationPass is added a place where such legalizations can be added. Current implementation folds the subview ops with load/stores so that the lowering itself does not have to convert a subview op. PiperOrigin-RevId: 283642981	2019-12-03 16:06:17 -08:00
Mahesh Ravishankar	353fb2bd38	Convert MemRefType to a linearized array in SPIR-V lowering. The SPIR-V lowering used nested !spv.arrays to represented multi-dimensional arrays, with the hope that in-conjunction with the layout annotations, the shape and layout of memref can be represented directly. It is unclear though how portable this representation will end up being. It will rely on driver compilers implementing complex index computations faithfully. A more portable approach is to use linearized arrays to represent memrefs and explicitly instantiate all the index computation in SPIR-V. This gives added benefit that we can further optimize the generated code in MLIR before generating the SPIR-V binary. PiperOrigin-RevId: 283571167	2019-12-03 10:21:16 -08:00
Alex Zinenko	993e79e9bd	Fix ViewOp to have at most one offset operand As described in the documentation, ViewOp is expected to take an optional dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser did not include a check for the offset being a single value and accepeted a list of values instead. Furthermore, several tests have been exercising the wrong syntax of a ViewOp, passing multiple values to the dyanmic stride list, which was not caught by the parser. The trailing values could have been erronously interpreted as dynamic sizes. This is likely due to resyntaxing of the ViewOp, with the previous syntax taking the list of sizes before the offset. Update the tests to use the syntax with the offset preceding the sizes. Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of operands with offset in the trailing position, and erronously relied on the permissive parsing that interpreted trailing dynamic offset values as leading dynamic sizes. Fix the lowering to use the correct order of operands. PiperOrigin-RevId: 283532506	2019-12-03 06:23:04 -08:00
Stephan Herhut	2125c0e3a8	Extend conversion of SubViewOp to llvm to also support cases where size and stride are constant (i.e., there are no size and stride operands). We recently added canonicalization that rewrites constant size and stride operands to SubViewOp into static information in the type, so these patterns now occur during code generation. PiperOrigin-RevId: 283524688	2019-12-03 05:11:49 -08:00
Alex Zinenko	d5e627f84b	Introduce Linkage attribute to the LLVM dialect LLVM IR supports linkage on global objects such as global variables and functions. Introduce the Linkage attribute into the LLVM dialect, backed by an integer storage. Use this attribute on LLVM::GlobalOp and make it mandatory. Implement parsing/printing of the attribute and conversion to LLVM IR. See tensorflow/mlir#277. PiperOrigin-RevId: 283309328	2019-12-02 03:28:10 -08:00
Mahesh Ravishankar	03620fa70a	Misc changes to lowering to SPIR-V. These changes to SPIR-V lowering while adding support for lowering SUbViewOp, but are not directly related. - Change the lowering of MemRefType to !spv.ptr<!spv.struct<!spv.array<...>[offset]>, ..> This is consistent with the Vulkan spec. - To enable testing a simple pattern of lowering functions is added to ConvertStandardToSPIRVPass. This is just used to convert the type of the arguments of the function. The added function lowering itself is not meant to be the way functions are eventually lowered into SPIR-V dialect. PiperOrigin-RevId: 282589644	2019-11-26 10:11:34 -08:00
Nicolas Vasilache	174076a157	Use vector.InsertStridedSlice in Vector -> Vector unrolling This CL uses the recently added op to finish the implementation of Vector -> Vector unrolling by replacing the "fake join op" by a series of InsertStridedSliceOp. Test is updated accordingly PiperOrigin-RevId: 282451126	2019-11-25 15:56:37 -08:00
MLIR Team	1012c492f0	Allow LLVM::ExtractElementOp to have non-i32 indices. Also change the text format a bit, so that indices are braced by squares. PiperOrigin-RevId: 282437095	2019-11-25 14:44:52 -08:00
Mahesh Ravishankar	bd485afda0	Introduce attributes that specify the final ABI for a spirv::ModuleOp. To simplify the lowering into SPIR-V, while still respecting the ABI requirements of SPIR-V/Vulkan, split the process into two 1) While lowering a function to SPIR-V (when the function is an entry point function), allow specifying attributes on arguments and function itself that describe the ABI of the function. 2) Add a pass that materializes the ABI described in the function. Two attributes are needed. 1) Attribute on arguments of the entry point function that describe the descriptor_set, binding, storage class, etc, of the spv.globalVariable this argument will be replaced by 2) Attribute on function that specifies workgroup size, etc. (for now only workgroup size). Add the pass -spirv-lower-abi-attrs to materialize the ABI described by the attributes. This change makes the SPIRVBasicTypeConverter class unnecessary and is removed, further simplifying the SPIR-V lowering path. PiperOrigin-RevId: 282387587	2019-11-25 11:19:56 -08:00
Lei Zhang	aaafeac89b	[spirv] NFC: rename test files and sort tests inside PiperOrigin-RevId: 282132339	2019-11-23 06:58:38 -08:00
Mahesh Ravishankar	19212105dd	Changes to SubViewOp to make it more amenable to canonicalization. The current SubViewOp specification allows for either all offsets, shape and stride to be dynamic or all of them to be static. There are opportunities for more fine-grained canonicalization based on which of these are static. For example, if the sizes are static, the result memref is of static shape. The specification of SubViewOp is modified to allow on or more of offsets, shapes and strides to be statically specified. The verification is updated to ensure that the result type of the subview op is consistent with which of these are static and which are dynamic. PiperOrigin-RevId: 281560457	2019-11-20 12:32:51 -08:00
Nicolas Vasilache	fa14d4f6ab	Implement unrolling of vector ops to finer-grained vector ops as a pattern. This CL uses the pattern rewrite infrastructure to implement a simple VectorOps -> VectorOps legalization strategy to unroll coarse-grained vector operations into finer grained ones. The transformation is written using local pattern rewrites to allow composition with other rewrites. It proceeds by iteratively introducing fake cast ops and cleaning canonicalizing or lowering them away where appropriate. This is an example of writing transformations as compositions of local pattern rewrites that should enable us to make them significantly more declarative. PiperOrigin-RevId: 281555100	2019-11-20 11:49:36 -08:00
Alex Zinenko	8961d8e32f	Change conversion CLI flag from -lower-to-llvm to -convert-std-to-llvm The command-line flag name `lower-to-llvm` for the pass performing dialect conversion from the Standard dialect to the LLVM dialect is misleading and inconsistent with most of the conversion passses. It leads the user to believe that there are no restrictions on what can be converted, while in fact only a subset of the Standard dialect can be converted (with operations from other dialects converted by separate passes). Use `convert-std-to-llvm` that better reflects what the pass does and is consistent with most other conversions. PiperOrigin-RevId: 281238797	2019-11-19 00:34:51 -08:00
Alex Zinenko	b8dc3fd812	Rename CLI flags -lower-gpu-ops-to--ops to -convert-gpu-to- This makes the flags consistent with the naming scheme used elsewhere in the codebase for dialect conversions. PiperOrigin-RevId: 281027517	2019-11-18 02:43:10 -08:00
Nicolas Vasilache	0b271b7dfe	Refactor the LowerVectorTransfers pass to use the RewritePattern infra - NFC This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering. PiperOrigin-RevId: 280529784	2019-11-14 15:40:07 -08:00
Alex Zinenko	bf5916e7a4	Use MemRefDescriptor in Vector-to-LLVM convresion Following up on the consolidation of MemRef descriptor conversion, update Vector-to-LLVM conversion to use the helper class that abstracts away the implementation details of the MemRef descriptor. This also makes the types of the attributes in emitted llvm.insert/extractelement operations consistently i64 instead of a mix of index and i64. PiperOrigin-RevId: 280441451	2019-11-14 09:05:42 -08:00
Nicolas Vasilache	f2b6ae9991	Move VectorOps to Tablegen - (almost) NFC This CL moves VectorOps to Tablegen and cleans up the implementation. This is almost NFC but 2 changes occur: 1. an interface change occurs in the padding value specification in vector_transfer_read: the value becomes non-optional. As a shortcut we currently use %f0 for all paddings. This should become an OpInterface for vectorization in the future. 2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>` Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect. The op documentation is moved to the .td file. PiperOrigin-RevId: 280430869	2019-11-14 08:15:23 -08:00
Nicolas Vasilache	0bd6390b54	Deprecate linalg.subview in favor of std.subview This CL uses the now standard std.subview in linalg. Two shortcuts are currently taken to allow this port: 1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier. 2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction. These will be fixed in subsequent commits after discussions. PiperOrigin-RevId: 280250129	2019-11-13 12:10:09 -08:00
Mahesh Ravishankar	2be53603e9	Add operations needed to support lowering of AffineExpr to SPIR-V. Lowering of CmpIOp, DivISOp, RemISOp, SubIOp and SelectOp to SPIR-V dialect enables the lowering of operations generated by AffineExpr -> StandardOps conversion into the SPIR-V dialect. PiperOrigin-RevId: 280039204	2019-11-12 13:20:06 -08:00
Mahesh Ravishankar	104af84f4c	Add Conversion to lower loop::ForOp to spirv::LoopOp. loop::ForOp can be lowered to the structured control flow represented by spirv::LoopOp by making the continue block of the spirv::LoopOp the loop latch and the merge block the exit block. The resulting spirv::LoopOp has a single back edge from the continue to header block, and a single exit from header to merge. PiperOrigin-RevId: 280015614	2019-11-12 11:33:27 -08:00
Nicolas Vasilache	51de3f688e	Add LLVM lowering of std.subview A followup CL will replace usage of linalg.subview by std.subview. PiperOrigin-RevId: 279961981	2019-11-12 07:23:18 -08:00
Nicolas Vasilache	f51a155337	Add support for alignment attribute in std.alloc. This CL adds an extra pointer to the memref descriptor to allow specifying alignment. In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing. After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers. This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ). PiperOrigin-RevId: 279959463	2019-11-12 07:06:54 -08:00
Nicolas Vasilache	ffebc8ce1d	Drop spurious test file PiperOrigin-RevId: 278959717	2019-11-06 16:00:57 -08:00
Nicolas Vasilache	7f6c6084b5	Add lowering of std.view to LLVM This CL ports the lowering of linalg.view to the newly introduced std.view. Differences in implementation relate to std.view having slightly different semantics: 1. a static or dynamic offset can be specified. 2. the size of the (contiguous) shape is passed instead of a range. 3. static size and stride information is extracted from the memref type rather than the range. Besides these differences, lowering behaves the same. A future CL will update Linalg to use this unified infrastructure. PiperOrigin-RevId: 278948853	2019-11-06 15:06:16 -08:00
Mahesh Ravishankar	9cbbd8f4df	Support lowering of imperfectly nested loops into GPU dialect. The current lowering of loops to GPU only supports lowering of loop nests where the loops mapped to workgroups and workitems are perfectly nested. Here a new lowering is added to handle lowering of imperfectly nested loop body with the following properties 1) The loops partitioned to workgroups are perfectly nested. 2) The loop body of the inner most loop partitioned to workgroups can contain one or more loop nests that are to be partitioned across workitems. Each individual loops nests partitioned to workitems should also be perfectly nested. 3) The number of workgroups and workitems are not deduced from the loop bounds but are passed in by the caller of the lowering as values. 4) For statements within the perfectly nested loop nest partitioned across workgroups that are not loops, it is valid to have all threads execute that statement. This is NOT verified. PiperOrigin-RevId: 277958868	2019-11-01 10:52:06 -07:00
Mehdi Amini	ce9477934a	Add a test for lowering GPU ops that cover cases where the symbol table isn't held by a ModuleOp (NFC) PiperOrigin-RevId: 277752004	2019-10-31 10:35:15 -07:00
Alexander Belyaev	780a108d31	Fix include guards and add tests for OpToFuncCallLowering. PiperOrigin-RevId: 276859463	2019-10-26 08:21:36 -07:00
Alex Zinenko	5f867d26b4	Use LLVM_Type instead of AnyType in the definition of LLVM_CallOp The type constraint had to be relaxed due to the order of lowering passes in the examples, that since has been fixed. The relaxed version was still used by the CUDA lowering for launch sizes of `index` type. This is not necessary since the GPU dialect does not restrict the type of the launch size operands. Use an LLVM type instead and restore the check in the LLVM_CallOp definition. PiperOrigin-RevId: 275920109	2019-10-21 14:12:19 -07:00
Nicolas Vasilache	2823b68580	Implement lowering of VectorTypeCastOp to LLVM A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine. A relevant test is added. This is a step towards solving tensorflow/mlir#189. PiperOrigin-RevId: 275538981	2019-10-18 14:00:06 -07:00
Christian Sigg	fe0ee32da5	Add gpu.barrier op to synchronize invocations of a local workgroup. Adding gen table for rewrite patterns from GPU to NVVM dialect. Copy missing op documentation from GPUOps.td to GPU.md. PiperOrigin-RevId: 275419588	2019-10-18 00:30:44 -07:00
Mahesh Ravishankar	54a8473470	Makes spv.module generated by GPU->SPIRV conversion spec compliant Makes the spv.module generated by the GPU to SPIR-V conversion SPIR-V spec compliant (validated using spirv-val from Vulkan tools). 1) Separate out the VulkanLayoutUtils from DecorateSPIRVCompositeTypeLayoutPass to make it reusable within the Type converter in SPIR-V lowering infrastructure. This is used to compute the layout of the !spv.struct used in global variable type description. 2) Set the capabilities of the spv.module to Shader (needed for use of Logical Memory Model, and the extensions to SPV_KHR_storage_buffer_storage_class for use of Storage Buffer) PiperOrigin-RevId: 275081486	2019-10-16 11:53:07 -07:00
Christian Sigg	d2f0f847af	Support custom accumulator provided as region to gpu.all_reduce. In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region. Adds a gpu.yield op to specify the result of the accumulation. Also support more types (integers) and accumulations (mul). PiperOrigin-RevId: 275065447	2019-10-16 10:43:44 -07:00
Alex Zinenko	8c2ea32072	Emit LLVM IR equivalent of sizeof when lowering alloc operations Originally, the lowering of `alloc` operations has been computing the number of bytes to allocate when lowering based on the properties of MLIR type. This does not take into account type legalization that happens when compiling LLVM IR down to target assembly. This legalization can widen the type, potentially leading to out-of-bounds accesses to `alloc`ed data due to mismatches between address computation that takes the widening into account and allocation that does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of bytes to be allocated: %0 = getelementptr %type* null, %indexType 0 %1 = ptrtoint %type* %0 to %indexType adapted from http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt PiperOrigin-RevId: 274159900	2019-10-11 06:33:26 -07:00
Alexander Belyaev	00d2a37e32	Add unary ops and ExpOp to Standard Dialect. PiperOrigin-RevId: 274152154	2019-10-11 05:13:55 -07:00
Alex Zinenko	304e44a6b0	LLVM conversion: harden a test to check for LLVM funcs rather than any funcs This test was not updated in the original commit that switched to using LLVM functions since it wasn't broken by that change. FileCheck was able to match the `func` part of `llvm.func` to the expected pattern and continue as usual. Make sure the `llvm.` dialect prefix is included in the expected output. PiperOrigin-RevId: 274127281	2019-10-11 01:36:38 -07:00
Alex Zinenko	08a2ce8a14	Standard-to-LLVM conversion: check that operands have LLVM types In Standard to LLVM dialect conversion, the binary op conversion pattern implicitly assumed some operands were of LLVM IR dialect type. This is not necessarily true, for example if the Ops that produce those operands did not match the existing convresion patterns. Check if all operands are of LLVM IR dialect type and if not, fail to patch the binary op pattern. Closes tensorflow/mlir#168 PiperOrigin-RevId: 274063207	2019-10-10 17:19:57 -07:00
Mahesh Ravishankar	28d7f9c052	Add lowering of constant ops to SPIR-V. The lowering is specified as a pattern and is done only if the result is a SPIR-V scalar type or vector type. Handling ConstantOp with index return type needs special handling since SPIR-V dialect does not have index types. Based on the bitwidth of the attribute value, either i32 or i64 is chosen. Other constant lowerings are left as a TODO. PiperOrigin-RevId: 274056805	2019-10-10 17:19:57 -07:00
Alex Zinenko	5e7959a353	Use llvm.func to define functions with wrapped LLVM IR function type This function-like operation allows one to define functions that have wrapped LLVM IR function type, in particular variadic functions. The operation was added in parallel to the existing lowering flow, this commit only switches the flow to use it. Using a custom function type makes the LLVM IR dialect type system more consistent and avoids complex conversion rules for functions that previously had to use the built-in function type instead of a wrapped LLVM IR dialect type and perform conversions during the analysis. PiperOrigin-RevId: 273910855	2019-10-10 01:34:06 -07:00
Mahesh Ravishankar	e2ed25bc43	Make SPIR-V lowering infrastructure follow Vulkan SPIR-V validation. The lowering infrastructure needs to be enhanced to lower into a spv.Module that is consistent with the SPIR-V spec. The following changes are needed 1) The Vulkan/SPIR-V validation rules dictates entry functions to have signature of void(void). This requires changes to the function signature conversion infrastructure within the dialect conversion framework. When an argument is dropped from the original function signature, a function can be specified that when invoked will return the value to use as a replacement for the argument from the original function. 2) Some changes to the type converter to make the converted type consistent with the Vulkan/SPIR-V validation rules, a) Add support for converting dynamically shaped tensors to spv.rtarray type. b) Make the global variable of type !spv.ptr<!spv.struct<...>> 3) Generate the entry point operation for the kernel functions and automatically compute all the interface variables needed PiperOrigin-RevId: 273784229	2019-10-09 11:25:58 -07:00
Alex Zinenko	11d12670da	GPUToCUDA: attach CUBIN to the nested module rather than to the function Originally, we were attaching attributes containing CUBIN blobs to the kernel function called by `gpu.launch_func`. This kernel is now contained in a nested module that is used as a compilation unit. Attach compiled CUBIN blobs to the module rather than to the function since we were compiling the module. This also avoids duplication of the attribute on multiple kernels within the same module. PiperOrigin-RevId: 273497303	2019-10-08 05:11:26 -07:00
Alex Zinenko	52e082b6ed	GPUToCUDA: emit addressof directly instead of wrapping it into a getter function Originally, the CUBIN getter function was introduced as a mechanism to circumvent the absence of globals in the LLVM dialect. It would allocate memory and populate it with the CUBIN data. LLVM dialect now supports globals and they are already used to store CUBIN data, making the getter function a trivial address computation of a global. Emit the address computation directly at the place of `gpu.launch_func` instead of putting it in a function and calling it. This simplifies the conversion flow and prepares it for using the DialectConversion infrastructure. PiperOrigin-RevId: 273496221	2019-10-08 05:03:42 -07:00
Alex Zinenko	16af5924cb	Fuse GenerateCubinAccessors pass into LaunchFunctToCuda Now that the accessor function is a trivial getter of the global variable, it makes less sense to have the getter generation as a separate pass. Move the getter generation into the lowering of `gpu.launch_func` to CUDA calls. This change is mostly code motion, but the process can be simplified further by generating the addressof inplace instead of using a call. This is will be done in a follow-up. PiperOrigin-RevId: 273492517	2019-10-08 04:35:33 -07:00
Alex Zinenko	90d65d32d6	Use named modules for gpu.launch_func The kernel function called by gpu.launch_func is now placed into an isolated nested module during the outlining stage to simplify separate compilation. Until recently, modules did not have names and could not be referenced. This limitation was circumvented by introducing a stub kernel at the same name at the same nesting level as the module containing the actual kernel. This relation is only effective in one direction: from actual kernel function to its launch_func "caller". Leverage the recently introduced symbol name attributes on modules to refer to a specific nested module from `gpu.launch_func`. This removes the implicit connection between the identically named stub and kernel functions. It also enables support for `gpu.launch_func`s to call different kernels located in the same module. PiperOrigin-RevId: 273491891	2019-10-08 04:30:32 -07:00
MLIR Team	0dfa7fc908	Add fpext and fptrunc to the Standard dialect and includes conversion to LLVM PiperOrigin-RevId: 272768027	2019-10-03 16:37:24 -07:00
Alex Zinenko	e0d78eac23	NFC: rename Conversion/ControlFlowToCFG to Conversion/LoopToStandard This makes the name of the conversion pass more consistent with the naming scheme, since it actually converts from the Loop dialect to the Standard dialect rather than working with arbitrary control flow operations. PiperOrigin-RevId: 272612112	2019-10-03 01:35:03 -07:00
Nicolas Vasilache	9604bb6269	Extract MemRefType::getStridesAndOffset as a free function and fix dynamic offset determination. This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not. PiperOrigin-RevId: 272505798	2019-10-02 13:25:05 -07:00

1 2 3 4 5 ...

287 Commits