Commit Graph

269 Commits

Author SHA1 Message Date
Nicolas Vasilache 0b271b7dfe Refactor the LowerVectorTransfers pass to use the RewritePattern infra - NFC
This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering.

PiperOrigin-RevId: 280529784
2019-11-14 15:40:07 -08:00
Alex Zinenko bf5916e7a4 Use MemRefDescriptor in Vector-to-LLVM convresion
Following up on the consolidation of MemRef descriptor conversion, update
Vector-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This also makes the types of
the attributes in emitted llvm.insert/extractelement operations consistently
i64 instead of a mix of index and i64.

PiperOrigin-RevId: 280441451
2019-11-14 09:05:42 -08:00
Nicolas Vasilache f2b6ae9991 Move VectorOps to Tablegen - (almost) NFC
This CL moves VectorOps to Tablegen and cleans up the implementation.

This is almost NFC but 2 changes occur:
  1. an interface change occurs in the padding value specification in vector_transfer_read:
     the value becomes non-optional. As a shortcut we currently use %f0 for all paddings.
     This should become an OpInterface for vectorization in the future.
  2. the return type of vector.type_cast is trivial and simplified to `memref<vector<...>>`

Relevant roundtrip and invalid tests that used to sit in core are moved to the vector dialect.

The op documentation is moved to the .td file.

PiperOrigin-RevId: 280430869
2019-11-14 08:15:23 -08:00
Nicolas Vasilache 0bd6390b54 Deprecate linalg.subview in favor of std.subview
This CL uses the now standard std.subview in linalg.
Two shortcuts are currently taken to allow this port:
1. the type resulting from a view is currently degraded to fully dynamic to pass the SubViewOp verifier.
2. indexing into SubViewOp may access out of bounds since lowering to LLVM does not currently enforce it by construction.

These will be fixed in subsequent commits after discussions.

PiperOrigin-RevId: 280250129
2019-11-13 12:10:09 -08:00
Mahesh Ravishankar 2be53603e9 Add operations needed to support lowering of AffineExpr to SPIR-V.
Lowering of CmpIOp, DivISOp, RemISOp, SubIOp and SelectOp to SPIR-V
dialect enables the lowering of operations generated by AffineExpr ->
StandardOps conversion into the SPIR-V dialect.

PiperOrigin-RevId: 280039204
2019-11-12 13:20:06 -08:00
Mahesh Ravishankar 104af84f4c Add Conversion to lower loop::ForOp to spirv::LoopOp.
loop::ForOp can be lowered to the structured control flow represented
by spirv::LoopOp by making the continue block of the spirv::LoopOp the
loop latch and the merge block the exit block. The resulting
spirv::LoopOp has a single back edge from the continue to header
block, and a single exit from header to merge.
PiperOrigin-RevId: 280015614
2019-11-12 11:33:27 -08:00
Nicolas Vasilache 51de3f688e Add LLVM lowering of std.subview
A followup CL will replace usage of linalg.subview by std.subview.

PiperOrigin-RevId: 279961981
2019-11-12 07:23:18 -08:00
Nicolas Vasilache f51a155337 Add support for alignment attribute in std.alloc.
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.

In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.

After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.

This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).

PiperOrigin-RevId: 279959463
2019-11-12 07:06:54 -08:00
Nicolas Vasilache ffebc8ce1d Drop spurious test file
PiperOrigin-RevId: 278959717
2019-11-06 16:00:57 -08:00
Nicolas Vasilache 7f6c6084b5 Add lowering of std.view to LLVM
This CL ports the lowering of linalg.view to the newly introduced std.view.
Differences in implementation relate to std.view having slightly different semantics:
1. a static or dynamic offset can be specified.
2. the size of the (contiguous) shape is passed instead of a range.
3. static size and stride information is extracted from the memref type rather than the range.

Besides these differences, lowering behaves the same.
A future CL will update Linalg to use this unified infrastructure.

PiperOrigin-RevId: 278948853
2019-11-06 15:06:16 -08:00
Mahesh Ravishankar 9cbbd8f4df Support lowering of imperfectly nested loops into GPU dialect.
The current lowering of loops to GPU only supports lowering of loop
nests where the loops mapped to workgroups and workitems are perfectly
nested. Here a new lowering is added to handle lowering of imperfectly
nested loop body with the following properties
1) The loops partitioned to workgroups are perfectly nested.
2) The loop body of the inner most loop partitioned to workgroups can
contain one or more loop nests that are to be partitioned across
workitems. Each individual loops nests partitioned to workitems should
also be perfectly nested.
3) The number of workgroups and workitems are not deduced from the
loop bounds but are passed in by the caller of the lowering as values.
4) For statements within the perfectly nested loop nest partitioned
across workgroups that are not loops, it is valid to have all threads
execute that statement. This is NOT verified.

PiperOrigin-RevId: 277958868
2019-11-01 10:52:06 -07:00
Mehdi Amini ce9477934a Add a test for lowering GPU ops that cover cases where the symbol table isn't held by a ModuleOp (NFC)
PiperOrigin-RevId: 277752004
2019-10-31 10:35:15 -07:00
Alexander Belyaev 780a108d31 Fix include guards and add tests for OpToFuncCallLowering.
PiperOrigin-RevId: 276859463
2019-10-26 08:21:36 -07:00
Alex Zinenko 5f867d26b4 Use LLVM_Type instead of AnyType in the definition of LLVM_CallOp
The type constraint had to be relaxed due to the order of lowering passes in
the examples, that since has been fixed. The relaxed version was still used by
the CUDA lowering for launch sizes of `index` type. This is not necessary since
the GPU dialect does not restrict the type of the launch size operands. Use an
LLVM type instead and restore the check in the LLVM_CallOp definition.

PiperOrigin-RevId: 275920109
2019-10-21 14:12:19 -07:00
Nicolas Vasilache 2823b68580 Implement lowering of VectorTypeCastOp to LLVM
A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine.

A relevant test is added.

This is a step towards solving tensorflow/mlir#189.

PiperOrigin-RevId: 275538981
2019-10-18 14:00:06 -07:00
Christian Sigg fe0ee32da5 Add gpu.barrier op to synchronize invocations of a local workgroup.
Adding gen table for rewrite patterns from GPU to NVVM dialect.

Copy missing op documentation from GPUOps.td to GPU.md.

PiperOrigin-RevId: 275419588
2019-10-18 00:30:44 -07:00
Mahesh Ravishankar 54a8473470 Makes spv.module generated by GPU->SPIRV conversion spec compliant
Makes the spv.module generated by the GPU to SPIR-V conversion SPIR-V
spec compliant (validated using spirv-val from Vulkan tools).

1) Separate out the VulkanLayoutUtils from
DecorateSPIRVCompositeTypeLayoutPass to make it reusable within the
Type converter in SPIR-V lowering infrastructure. This is used to
compute the layout of the !spv.struct used in global variable type
description.
2) Set the capabilities of the spv.module to Shader (needed for use of
Logical Memory Model, and the extensions to
SPV_KHR_storage_buffer_storage_class for use of Storage Buffer)

PiperOrigin-RevId: 275081486
2019-10-16 11:53:07 -07:00
Christian Sigg d2f0f847af Support custom accumulator provided as region to gpu.all_reduce.
In addition to specifying the type of accumulation through the 'op' attribute, the accumulation can now also be specified as arbitrary code region.

Adds a gpu.yield op to specify the result of the accumulation.

Also support more types (integers) and accumulations (mul).

PiperOrigin-RevId: 275065447
2019-10-16 10:43:44 -07:00
Alex Zinenko 8c2ea32072 Emit LLVM IR equivalent of sizeof when lowering alloc operations
Originally, the lowering of `alloc` operations has been computing the number of
bytes to allocate when lowering based on the properties of MLIR type. This does
not take into account type legalization that happens when compiling LLVM IR
down to target assembly. This legalization can widen the type, potentially
leading to out-of-bounds accesses to `alloc`ed data due to mismatches between
address computation that takes the widening into account and allocation that
does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of
bytes to be allocated:
  %0 = getelementptr %type* null, %indexType 0
  %1 = ptrtoint %type* %0 to %indexType
adapted from
http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt

PiperOrigin-RevId: 274159900
2019-10-11 06:33:26 -07:00
Alexander Belyaev 00d2a37e32 Add unary ops and ExpOp to Standard Dialect.
PiperOrigin-RevId: 274152154
2019-10-11 05:13:55 -07:00
Alex Zinenko 304e44a6b0 LLVM conversion: harden a test to check for LLVM funcs rather than any funcs
This test was not updated in the original commit that switched to using LLVM
functions since it wasn't broken by that change. FileCheck was able to match
the `func` part of `llvm.func` to the expected pattern and continue as usual.
Make sure the `llvm.` dialect prefix is included in the expected output.

PiperOrigin-RevId: 274127281
2019-10-11 01:36:38 -07:00
Alex Zinenko 08a2ce8a14 Standard-to-LLVM conversion: check that operands have LLVM types
In Standard to LLVM dialect conversion, the binary op conversion pattern
implicitly assumed some operands were of LLVM IR dialect type. This is not
necessarily true, for example if the Ops that produce those operands did not
match the existing convresion patterns. Check if all operands are of LLVM IR
dialect type and if not, fail to patch the binary op pattern.

Closes tensorflow/mlir#168

PiperOrigin-RevId: 274063207
2019-10-10 17:19:57 -07:00
Mahesh Ravishankar 28d7f9c052 Add lowering of constant ops to SPIR-V.
The lowering is specified as a pattern and is done only if the result
is a SPIR-V scalar type or vector type.
Handling ConstantOp with index return type needs special handling
since SPIR-V dialect does not have index types. Based on the bitwidth
of the attribute value, either i32 or i64 is chosen.
Other constant lowerings are left as a TODO.

PiperOrigin-RevId: 274056805
2019-10-10 17:19:57 -07:00
Alex Zinenko 5e7959a353 Use llvm.func to define functions with wrapped LLVM IR function type
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.

Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.

PiperOrigin-RevId: 273910855
2019-10-10 01:34:06 -07:00
Mahesh Ravishankar e2ed25bc43 Make SPIR-V lowering infrastructure follow Vulkan SPIR-V validation.
The lowering infrastructure needs to be enhanced to lower into a
spv.Module that is consistent with the SPIR-V spec. The following
changes are needed
1) The Vulkan/SPIR-V validation rules dictates entry functions to have
signature of void(void). This requires changes to the function
signature conversion infrastructure within the dialect conversion
framework. When an argument is dropped from the original function
signature, a function can be specified that when invoked will return
the value to use as a replacement for the argument from the original
function.
2) Some changes to the type converter to make the converted type
consistent with the Vulkan/SPIR-V validation rules,
   a) Add support for converting dynamically shaped tensors to
   spv.rtarray type.
   b) Make the global variable of type !spv.ptr<!spv.struct<...>>
3) Generate the entry point operation for the kernel functions and
automatically compute all the interface variables needed

PiperOrigin-RevId: 273784229
2019-10-09 11:25:58 -07:00
Alex Zinenko 11d12670da GPUToCUDA: attach CUBIN to the nested module rather than to the function
Originally, we were attaching attributes containing CUBIN blobs to the kernel
function called by `gpu.launch_func`. This kernel is now contained in a nested
module that is used as a compilation unit. Attach compiled CUBIN blobs to the
module rather than to the function since we were compiling the module. This
also avoids duplication of the attribute on multiple kernels within the same
module.

PiperOrigin-RevId: 273497303
2019-10-08 05:11:26 -07:00
Alex Zinenko 52e082b6ed GPUToCUDA: emit addressof directly instead of wrapping it into a getter function
Originally, the CUBIN getter function was introduced as a mechanism to
circumvent the absence of globals in the LLVM dialect. It would allocate memory
and populate it with the CUBIN data. LLVM dialect now supports globals and they
are already used to store CUBIN data, making the getter function a trivial
address computation of a global. Emit the address computation directly at the
place of `gpu.launch_func` instead of putting it in a function and calling it.
This simplifies the conversion flow and prepares it for using the
DialectConversion infrastructure.

PiperOrigin-RevId: 273496221
2019-10-08 05:03:42 -07:00
Alex Zinenko 16af5924cb Fuse GenerateCubinAccessors pass into LaunchFunctToCuda
Now that the accessor function is a trivial getter of the global variable, it
makes less sense to have the getter generation as a separate pass. Move the
getter generation into the lowering of `gpu.launch_func` to CUDA calls. This
change is mostly code motion, but the process can be simplified further by
generating the addressof inplace instead of using a call. This is will be done
in a follow-up.

PiperOrigin-RevId: 273492517
2019-10-08 04:35:33 -07:00
Alex Zinenko 90d65d32d6 Use named modules for gpu.launch_func
The kernel function called by gpu.launch_func is now placed into an isolated
nested module during the outlining stage to simplify separate compilation.
Until recently, modules did not have names and could not be referenced. This
limitation was circumvented by introducing a stub kernel at the same name at
the same nesting level as the module containing the actual kernel. This
relation is only effective in one direction: from actual kernel function to its
launch_func "caller".

Leverage the recently introduced symbol name attributes on modules to refer to
a specific nested module from `gpu.launch_func`. This removes the implicit
connection between the identically named stub and kernel functions. It also
enables support for `gpu.launch_func`s to call different kernels located in the
same module.

PiperOrigin-RevId: 273491891
2019-10-08 04:30:32 -07:00
MLIR Team 0dfa7fc908 Add fpext and fptrunc to the Standard dialect and includes conversion to LLVM
PiperOrigin-RevId: 272768027
2019-10-03 16:37:24 -07:00
Alex Zinenko e0d78eac23 NFC: rename Conversion/ControlFlowToCFG to Conversion/LoopToStandard
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.

PiperOrigin-RevId: 272612112
2019-10-03 01:35:03 -07:00
Nicolas Vasilache 9604bb6269 Extract MemRefType::getStridesAndOffset as a free function and fix dynamic offset determination.
This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not.

PiperOrigin-RevId: 272505798
2019-10-02 13:25:05 -07:00
Deven Desai e81b3129b4 [ROCm] Adding pass to lower GPU Dialect to ROCDL Dialect.
This is a follow-up to the PRtensorflow/mlir#146 which introduced the ROCDL Dialect. This PR introduces a pass to lower GPU Dialect to the ROCDL Dialect. As with the previous PR, this one builds on the work done by @whchung, and addresses most of the review comments in the original PR.

Closes tensorflow/mlir#154

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/154 from deven-amd:deven-lower-gpu-to-rocdl 809893e08236da5ab6a38e3459692fa04247773d
PiperOrigin-RevId: 272390729
2019-10-02 01:50:30 -07:00
Alex Zinenko c760f233b3 Fix and simplify CallOp/CallIndirectOp to LLVM::CallOp conversion
A recent ABI compatibility change affected the conversion from standard
CallOp/CallIndirectOp to LLVM::CallOp by changing its signature. In order to
analyze the signature, the code was looking up the callee symbol in the module.
This is incorrect since, during the conversion, the module may contain both the
original and the converted function op that have the same symbol name. There is
no strict guarantee on which of the two symbols will be found by the lookup.
The conversion was not failing because the type legalizer converts the LLVM
types to themselves making the original and the converted function signatures
ultimately produce the same type.

Instead of looking up the function signature to get the list of result types,
use the types of the CallOp/CallIndirectOp results which must match those of
the function in valid IR. These types are guaranteed to be the original,
unconverted types when converting the operation. Furthermore, this avoids the
need to perform a lookup of a symbol name in the module which may be expensive.

Finally, propagate attributes as-is from the original op to the converted op
since they share the attribute name for the callee of direct calls and the rest
of attributes are not affected by the conversion. This removes the need for
additional contorsions between direct and indirect calls to extract the name of
the optional callee attribute only to insert it back. This also prevents the
conversion from unintentionally dropping the other attributes of the op.

PiperOrigin-RevId: 272218871
2019-10-01 08:41:50 -07:00
Nicolas Vasilache 923b33ea16 Normalize MemRefType lowering to LLVM as strided MemRef descriptor
This CL finishes the implementation of the lowering part of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).

Strided memrefs correspond conceptually to the following templated C++ struct:
```
template <typename Elem, size_t Rank>
struct {
  Elem *ptr;
  int64_t offset;
  int64_t sizes[Rank];
  int64_t strides[Rank];
};
```
The linearization procedure for address calculation for strided memrefs is the same as for linalg views:
`base_offset + SUM_i index_i * stride_i`.

The following CL will unify Linalg and Standard by removing !linalg.view in favor of strided memrefs.

PiperOrigin-RevId: 272033399
2019-09-30 11:58:54 -07:00
Nicolas Vasilache b569c969f0 Remove spurious debug spew in tests
PiperOrigin-RevId: 271624731
2019-09-27 12:31:43 -07:00
Nicolas Vasilache ddf737c5da Promote MemRefDescriptor to a pointer to struct when passing function boundaries in LLVMLowering.
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.

A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.

PiperOrigin-RevId: 271591708
2019-09-27 09:57:36 -07:00
Christian Sigg 116dac00ba Add AllReduceOp to GPU dialect with lowering to NVVM.
The reduction operation is currently fixed to "add", and the scope is fixed to "workgroup".

The implementation is currently limited to sizes that are multiple 32 (warp size) and no larger than 1024.

PiperOrigin-RevId: 271290265
2019-09-26 00:17:50 -07:00
Uday Bondhugula 458ede8775 Introduce splat op + provide its LLVM lowering
- introduce splat op in standard dialect (currently for int/float/index input
  type, output type can be vector or statically shaped tensor)
- implement LLVM lowering (when result type is 1-d vector)
- add constant folding hook for it
- while on Ops.cpp, fix some stale names

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Closes tensorflow/mlir#141

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/141 from bondhugula:splat 48976a6aa0a75be6d91187db6418de989e03eb51
PiperOrigin-RevId: 270965304
2019-09-24 12:44:58 -07:00
Nicolas Vasilache 42d8fa667b Normalize lowering of MemRef types
The RFC for unifying Linalg and Affine compilation passes into an end-to-end flow with a predictable ABI and linkage to external function calls raised the question of why we have variable sized descriptors for memrefs depending on whether they have static or dynamic dimensions  (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).

This CL standardizes the ABI on the rank of the memrefs.
The LLVM struct for a memref becomes equivalent to:
```
template <typename Elem, size_t Rank>
struct {
  Elem *ptr;
  int64_t sizes[Rank];
};
```

PiperOrigin-RevId: 270947276
2019-09-24 11:21:49 -07:00
Christian Sigg b8676da1fc Outline GPU kernel function into a nested module.
Roll forward of commit 5684a12.

When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.

PiperOrigin-RevId: 270639748
2019-09-23 03:17:01 -07:00
Manuel Freiberger 2c11997d48 Add integer sign- and zero-extension and truncation to standard.
This adds sign- and zero-extension and truncation of integer types to the
standard dialects. This allows to perform integer type conversions without
having to go to the LLVM dialect and introduce custom type casts (between
standard and LLVM integer types).

Closes tensorflow/mlir#134

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/134 from ombre5733:sext-zext-trunc-in-std c7657bc84c0ca66b304e53ec03797e09152e4d31
PiperOrigin-RevId: 270479722
2019-09-21 16:14:56 -07:00
George Karpenkov 2df646bef6 Automated rollback of commit 5684a12434
PiperOrigin-RevId: 270126672
2019-09-19 14:34:30 -07:00
MLIR Team 5684a12434 Outline GPU kernel function into a nested module.
When outlining GPU kernels, put the kernel function inside a nested module. Then use a nested pipeline to generate the cubins, independently per kernel. In a final pass, move the cubins back to the parent module.

PiperOrigin-RevId: 269987720
2019-09-19 01:51:28 -07:00
Lei Zhang 113aadddf9 Update SPIR-V symbols and use GLSL450 instead of VulkanKHR
SPIR-V recently publishes v1.5, which brings a bunch of symbols
into core. So the suffix "KHR"/"EXT"/etc. is removed from the
symbols. We use a script to pull information from the spec
directly.

Also changed conversion and tests to use GLSL450 instead of
VulkanKHR memory model. GLSL450 is still the main memory model
supported by Vulkan shaders and it does not require extra
capability to enable.

PiperOrigin-RevId: 268992661
2019-09-13 15:26:32 -07:00
MLIR Team b5652720c1 Retain address space during MLIR > LLVM conversion.
PiperOrigin-RevId: 267206460
2019-09-04 12:26:52 -07:00
Alex Zinenko c6f8adad8e Move LLVMIR dialect tests from test/LLVMIR to test/Dialect and test/Conversion
This follows up on the recent restructuring that moved the dialects under
lib/Dialect and inter-dialect conversions to lib/Conversion. Originally, the
tests for both the LLVMIR dialect itself and the conversion from Standard to
LLVMIR dialect lived under test/LLVMIR.  This no longer reflects the code
structure.  Move the tests to either test/Dialect/LLVMIR or
test/Conversion/StandardToLLVM depending on the features they exercise.

PiperOrigin-RevId: 267159219
2019-09-04 08:38:18 -07:00
Alex Zinenko c335d9d313 LLVM dialect: prefix auxiliary operations with "mlir."
Some of the operations in the LLVM dialect are required to model the LLVM IR in
MLIR, for example "constant" operations are needed to declare a constant value
since MLIR, unlike LLVM, does not support immediate values as operands.  To
avoid confusion with actual LLVM operations, we prefix such axuiliary
operations with "mlir.".

PiperOrigin-RevId: 266942838
2019-09-03 09:10:56 -07:00
Mahesh Ravishankar 4ced99c085 Enhance GPU To SPIR-V conversion to support builtins and load/store ops.
To support a conversion of a simple load-compute-store kernel from GPU
dialect to SPIR-V dialect, the conversion of operations like
"gpu.block_dim", "gpu.thread_id" which allow threads to get the launch
conversion is needed. In SPIR-V these are specified as global
variables with builin attributes. This CL adds support to specify
builtin variables in SPIR-V conversion framework. This is used to
convert the relevant operations from GPU dialect to SPIR-V dialect.
Also add support for conversion of load/store operation in Standard
dialect to SPIR-V dialect.
To simplify the conversion add a method to build a spv.AccessChain
operation that automatically determines the return type based on the
base pointer type and the indices provided.

PiperOrigin-RevId: 265718525
2019-08-27 10:50:23 -07:00
Alex Zinenko 006fcce44a ConvertLaunchFuncToCudaCalls: use LLVM dialect globals
This conversion has been using a stack-allocated array of i8 to store the
null-terminated kernel name in order to pass it to the CUDA wrappers expecting
a C string because the LLVM dialect was missing support for globals.  Now that
the suport is introduced, use a global instead.

Refactor global string construction from GenerateCubinAccessors into a common
utility function living in the LLVM namespace.

PiperOrigin-RevId: 264382489
2019-08-20 07:52:01 -07:00
Alex Zinenko 0f974817b5 LLVM dialect: prefix operations that correspond to intrinsics with "intr."
LLVM intrinsics have an open name space and their names can potentially overlap
with names of LLVM instructions (LLVM intrinsics are functions, not
instructions).  In MLIR, LLVM intrinsics are modeled as operations, so it needs
to make sure their names cannot clash with the instructions.  Use the "intr."
prefix for intrinsics in the LLVM dialect.

PiperOrigin-RevId: 264372173
2019-08-20 06:38:52 -07:00
Mahesh Ravishankar 377bfb3a14 Fix parsing/printing of spv.globalVariable and spv._address_of
Change the prining/parsing of spv.globalVariable to print the type of
the variable after the ':' to be consistent with MLIR convention.
The spv._address_of should print the variable type after the ':'. It was
mistakenly printing the address of the return value. Add a (missing)
test that should have caught that.
Also move spv.globalVariable and spv._address_of tests to
structure-ops.mlir.

PiperOrigin-RevId: 264204686
2019-08-19 11:39:25 -07:00
Mahesh Ravishankar d745101339 Add spirv::GlobalVariableOp that allows module level definition of variables
FuncOps in MLIR use explicit capture. So global variables defined in
module scope need to have a symbol name and this should be used to
refer to the variable within the function. This deviates from SPIR-V
spec, which assigns an SSA value to variables at all scopes that can
be used to refer to the variable, which requires SPIR-V functions to
allow implicit capture. To handle this add a new op,
spirv::GlobalVariableOp that can be used to define module scope
variables.
Since instructions need an SSA value, an new spirv::AddressOfOp is
added to convert a symbol reference to an SSA value for use with other
instructions.
This also means the spirv::EntryPointOp instruction needs to change to
allow initializers to be specified using symbol reference instead of
SSA value
The current spirv::VariableOp which returns an SSA value (as defined
by SPIR-V spec) can still be used to define function-scope variables.
PiperOrigin-RevId: 263951109
2019-08-17 10:20:13 -07:00
Nicolas Vasilache f826ceef3c Extend vector.outerproduct with an optional 3rd argument
This CL adds an optional third argument to the vector.outerproduct instruction.
When such a third argument is specified, it is added to the result of the outerproduct and  is lowered to FMA intrinsic when the lowering supports it.

In the future, we can add an attribute on the `vector.outerproduct` instruction to modify the operations for which to emit code (e.g. "+/*", "max/+", "min/+", "log/exp" ...).

This CL additionally performs minor cleanups in the vector lowering and adds tests to improve coverage.

This has been independently verified to result in proper fma instructions for haswell as follows.

Input:
```
func @outerproduct_add(%arg0: vector<17xf32>, %arg1: vector<8xf32>, %arg2: vector<17x8xf32>) -> vector<17x8xf32> {
  %2 = vector.outerproduct %arg0, %arg1, %arg2 : vector<17xf32>, vector<8xf32>
  return %2 : vector<17x8xf32>
}
}
```

Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```

Output:
```
outerproduct_add:                       # @outerproduct_add
# %bb.0:
        ...
        vmovaps 112(%rbp), %ymm8
        vbroadcastss    %xmm0, %ymm0
        ...
        vbroadcastss    64(%rbp), %ymm15
        vfmadd213ps     144(%rbp), %ymm8, %ymm0 # ymm0 = (ymm8 * ymm0) + mem
        ...
        vfmadd213ps     400(%rbp), %ymm8, %ymm9 # ymm9 = (ymm8 * ymm9) + mem
        ...
```
PiperOrigin-RevId: 263743359
2019-08-16 03:53:26 -07:00
Alex Zinenko 88de8b2a2b GenerateCubinAccessors: use LLVM dialect constants
The GenerateCubinAccessors was generating functions that fill
dynamically-allocated memory with the binary constant of a CUBIN attached as a
stirng attribute to the GPU kernel.  This approach was taken to circumvent the
missing support for global constants in the LLVM dialect (and MLIR in general).
Global constants were recently added to the LLVM dialect.  Change the
GenerateCubinAccessors pass to emit a global constant array of characters and a
function that returns a pointer to the first character in the array.

PiperOrigin-RevId: 263092052
2019-08-13 01:39:21 -07:00
Nicolas Vasilache 252ada4932 Add lowering of vector dialect to LLVM dialect.
This CL is step 3/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.

This CL adds support for converting MLIR n-D vector types to (n-1)-D arrays of 1-D LLVM vectors and a conversion VectorToLLVM that lowers the `vector.extractelement` and `vector.outerproduct` instructions to the proper mix of `llvm.vectorshuffle`, `llvm.extractelement` and `llvm.mulf`.

This has been independently verified to produce proper avx2 code.

Input:
```
func @vec_1d(%arg0: vector<4xf32>, %arg1: vector<8xf32>) -> vector<8xf32> {
  %2 = vector.outerproduct %arg0, %arg1 : vector<4xf32>, vector<8xf32>
  %3 = vector.extractelement %2[0 : i32]: vector<4x8xf32>
  return %3 : vector<8xf32>
}
```

Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```

Output:
```
vec_1d:                                 # @vec_1d
# %bb.0:
        vbroadcastss    %xmm0, %ymm0
        vmulps  %ymm1, %ymm0, %ymm0
        retq
```
PiperOrigin-RevId: 262895929
2019-08-12 04:08:57 -07:00
Mahesh Ravishankar ea56025f1e Initial implementation to translate kernel fn in GPU Dialect to SPIR-V Dialect
This CL adds an initial implementation for translation of kernel
function in GPU Dialect (used with a gpu.launch_kernel) op to a
spv.Module. The original function is translated into an entry
function.
Most of the heavy lifting is done by adding TypeConversion and other
utility functions/classes that provide most of the functionality to
translate from Standard Dialect to SPIR-V Dialect. These are intended
to be reusable in implementation of different dialect conversion
pipelines.
Note : Some of the files for have been renamed to be consistent with
the norm used by the other Conversion frameworks.
PiperOrigin-RevId: 260759165
2019-07-30 11:55:55 -07:00
Nicolas Vasilache e78ea03b24 Replace linalg.for by loop.for
With the introduction of the Loop dialect, uses of the `linalg.for` operation can now be subsumed 1-to-1 by `loop.for`.
This CL performs the replacement and tests are updated accordingly.

PiperOrigin-RevId: 258322565
2019-07-16 13:44:57 -07:00
Nicolas Vasilache cca53e8527 Extract std.for std.if and std.terminator in their own dialect
These ops should not belong to the std dialect.
This CL extracts them in their own dialect and updates the corresponding conversions and tests.

PiperOrigin-RevId: 258123853
2019-07-16 13:43:18 -07:00
Nicolas Vasilache cab671d166 Lower affine control flow to std control flow to LLVM dialect
This CL splits the lowering of affine to LLVM into 2 parts:
1. affine -> std
2. std -> LLVM

The conversions mostly consists of splitting concerns between the affine and non-affine worlds from existing conversions.
Short-circuiting of affine `if` conditions was never tested or exercised and is removed in the process, it can be reintroduced later if needed.

LoopParametricTiling.cpp is updated to reflect the newly added ForOp::build.

PiperOrigin-RevId: 257794436
2019-07-12 08:44:28 -07:00
River Riddle 89bc449cee Standardize the value numbering in the AsmPrinter.
Change the AsmPrinter to number values breadth-first so that values in adjacent regions can have the same name. This allows for ModuleOp to contain operations that produce results. This also standardizes the special name of region entry arguments to "arg[0-9+]" now that Functions are also operations.

PiperOrigin-RevId: 257225069
2019-07-09 10:41:00 -07:00
Alex Zinenko 80e2871087 Extend AffineToGPU to support Linalg loops
Extend the utility that converts affine loop nests to support other types of
loops by abstracting away common behavior through templates.  This also
slightly simplifies the existing Affine to GPU conversion by always passing in
the loop step as an additional kernel argument even though it is a known
constant.  If it is used, it will be propagated into the loop body by the
existing canonicalization pattern and can be further constant-folded, otherwise
it will be dropped by canonicalization.

This prepares for the common loop abstraction that will be used for converting
to GPU kernels, which is conceptually close to Linalg loops, while maintaining
the existing conversion operational.

PiperOrigin-RevId: 257172216
2019-07-09 05:26:50 -07:00
Stephan Herhut e8b21a75f8 Add an mlir-cuda-runner tool.
This tool allows to execute MLIR IR snippets written in the GPU dialect
on a CUDA capable GPU. For this to work, a working CUDA install is required
and the build has to be configured with MLIR_CUDA_RUNNER_ENABLED set to 1.

PiperOrigin-RevId: 256551415
2019-07-04 07:53:54 -07:00
Stephan Herhut 630119f84f Add a pass that inserts getters for all cubins found via nvvm.cubin
annotations.

Getters are required as there are currently no global constants in MLIR and this
is an easy way to unblock CUDA execution while waiting for those.

PiperOrigin-RevId: 255169002
2019-06-26 05:33:11 -07:00
Stephan Herhut c72c6c3907 Make GPU to CUDA transformations independent of CUDA runtime.
The actual transformation from PTX source to a CUDA binary is now factored out,
enabling compiling and testing the transformations independently of a CUDA
runtime.

MLIR has still to be built with NVPTX target support for the conversions to be
built and tested.

PiperOrigin-RevId: 255167139
2019-06-26 05:16:37 -07:00
River Riddle 679a3b4191 Change the attribute dictionary syntax to separate name and value with '='.
The current syntax separates the name and value with ':', but ':' is already overloaded by several other things(e.g. trailing types). This makes the syntax difficult to parse in some situtations:

Old:
  "foo: 10 : i32"

New:
  "foo = 10 : i32"
PiperOrigin-RevId: 255097928
2019-06-25 19:06:34 -07:00
Alex Zinenko 2628641b23 GPUtoNVVM: adjust integer bitwidth when lowering special register ops
GPU dialect operations (launch and launch_func) use `index` type for thread and
block index values inside the kernel, for compatibility with affine loops.
NVVM dialect operations, following the NVVM intrinsics, use `!llvm.i32` type,
which does not necessarily have the same bit width as the lowered `index` type.
Optionally sign-extend (indices are signed) or truncate the result of the NVVM
dialect operation to the bit width of the lowered `index` type before passing
it to other operations.  This behavior is consistent with `std.index_cast`.  We
cannot use the latter since we are targeting LLVM dialect types directly,
rather than standard integer types.

PiperOrigin-RevId: 254980868
2019-06-25 09:21:26 -07:00
Stephan Herhut a14eeacf2c Add lowering pass from GPU dialect operations to LLVM/NVVM intrinsics.
PiperOrigin-RevId: 253551452
2019-06-19 23:03:30 -07:00
Alex Zinenko ee6f84aebd Convert a nest affine loops to a GPU kernel
This converts entire loops into threads/blocks.  No check on the size of the
block or grid, or on the validity of parallelization is performed, it is under
the responsibility of the caller to strip-mine the loops and to perform the
dependence analysis before calling the conversion.

PiperOrigin-RevId: 253189268
2019-06-19 23:02:02 -07:00