Commit Graph

33 Commits

Author SHA1 Message Date
Wen-Heng (Jack) Chung 2fd6403a6d [mlir][gpu] Introduce mlir-rocm-runner.
Summary:
`mlir-rocm-runner` is introduced in this commit to execute GPU modules on ROCm
platform. A small wrapper to encapsulate ROCm's HIP runtime API is also inside
the commit.

Due to behavior of ROCm, raw pointers inside memrefs passed to `gpu.launch`
must be modified on the host side to properly capture the pointer values
addressable on the GPU.

LLVM MC is used to assemble AMD GCN ISA coming out from
`ConvertGPUKernelToBlobPass` to binary form, and LLD is used to produce a shared
ELF object which could be loaded by ROCm HIP runtime.

gfx900 is the default target be used right now, although it could be altered via
an option in `mlir-rocm-runner`. Future revisions may consider using ROCm Agent
Enumerator to detect the right target on the system.

Notice AMDGPU Code Object V2 is used in this revision. Future enhancements may
upgrade to AMDGPU Code Object V3.

Bitcode libraries in ROCm-Device-Libs, which implements math routines exposed in
`rocdl` dialect are not yet linked, and is left as a TODO in the logic.

Reviewers: herhut

Subscribers: mgorny, tpr, dexonsmith, mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, frgossen, Kayjukh, jurahul, llvm-commits

Tags: #mlir, #llvm

Differential Revision: https://reviews.llvm.org/D80676
2020-06-05 09:46:39 -05:00
Nicolas Vasilache 882ba48474 [mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.
Summary:

This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).

While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.

Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:

    1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
    2. implicit and eager discovery of dims and symbols when parsing
    3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)

A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.

For the following "Tensor Comprehension-inspired" string:

```
    def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
      C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
    }
```

With -gen-ods-decl=1, this emits (modulo formatting):

```
      def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
        NInputs<2>,
        NOutputs<1>,
        NamedStructuredOpTraits]> {
          let arguments = (ins Variadic<LinalgOperand>:$views);
          let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
          let extraClassDeclaration = [{
            llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
            llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
            void regionBuilder(ArrayRef<BlockArgument> args);
          }];
          let hasFolder = 1;
      }
```

With -gen-ods-impl, this emits (modulo formatting):

```
      llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
          return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
                                            getParallelIteratorTypeName(),
                                            getParallelIteratorTypeName(),
                                            getReductionIteratorTypeName() };
      }
      llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
      {
        MLIRContext *context = getContext();
        AffineExpr d0, d1, d2, d3;
        bindDims(context, d0, d1, d2, d3);
        return SmallVector<AffineMap, 8>{
            AffineMap::get(4, 0, {d0, d1, d3}),
            AffineMap::get(4, 0, {d3, d2}),
            AffineMap::get(4, 0, {d0, d1, d2}) };
      }
      void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
        using namespace edsc;
        using namespace intrinsics;
        ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);

        ValueHandle _4 = std_mulf(_0, _1);
        ValueHandle _5 = std_addf(_2, _4);
        (linalg_yield(ValueRange{ _5 }));
      }
```

Differential Revision: https://reviews.llvm.org/D77067
2020-04-10 13:59:25 -04:00
Nicolas Vasilache 4408e6a96a [mlir][test] NFC - Rename cblas to mlir_test_cblas
The "cblas" lib under mlir/test is meant as a simple integration demonstration.
However it is installed and ends up conflicting with external projects who want to
define the real cblas.
Rename to avoid conflicts.

Differential revision: https://reviews.llvm.org/D76615
2020-04-09 16:13:33 -04:00
Uday Bondhugula 7fca0e9797 [MLIR] Add simple runner utilities for timing
Add utilities print_flops, rtclock for timing / benchmarking. Add
mlir_runner_utils_dir test conf variable.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76912
2020-03-31 23:08:29 +05:30
Nicolas Vasilache 4a966e5dd7 [mlir] NFC - Split out RunnerUtils that don't require a C++ runtime
Summary:
This revision split out a new CRunnerUtils library that supports
MLIR execution on targets without a C++ runtime.

Differential Revision: https://reviews.llvm.org/D75257
2020-02-27 14:14:11 -05:00
Nicolas Vasilache 512f345a5d [mlir] Hotfix - Rename MLIRRuntimeUtils to mlir_runtime_utils 2020-02-27 12:58:41 -05:00
Nicolas Vasilache fcfd3a281c [mlir] NFC - Move runner utils from mlir-cpu-runner to ExecutionEngine
Runner utils are useful beyond just CPU and hiding them within the test directory
makes it unnecessarily harder to reuse in other projects.
2020-02-27 10:02:24 -05:00
Lei Zhang 8358ddbe5d [mlir][spirv] NFC: Move test passes to test/lib
Previously C++ test passes for SPIR-V were put under
test/Dialect/SPIRV. Move them to test/lib/Dialect/SPIRV
to create a better structure.

Also fixed one of the test pass to use new
PassRegistration mechanism.

Differential Revision: https://reviews.llvm.org/D75066
2020-02-24 14:17:02 -05:00
Denis Khalikov 896ee361a6 [mlir][spirv] Add mlir-vulkan-runner
Add an initial version of mlir-vulkan-runner execution driver.
A command line utility that executes a MLIR file on the Vulkan by
translating MLIR GPU module to SPIR-V and host part to LLVM IR before
JIT-compiling and executing the latter.

Differential Revision: https://reviews.llvm.org/D72696
2020-02-19 11:37:26 -05:00
Lei Zhang b30d87a90b [mlir][spirv] Add basic definitions for supporting availability
SPIR-V has a few mechanisms to control op availability: version,
extension, and capabilities. These mechanisms are considered as
different availability classes.

This commit introduces basic definitions for modelling SPIR-V
availability classes. Specifically, an `Availability` class is
added to SPIRVBase.td, along with two subclasses: MinVersion
and MaxVersion for versioning. SPV_Op is extended to take a
list of `Availability`. Each `Availability` instance carries
information for generating op interfaces for the corresponding
availability class and also the concrete availability
requirements.

With the availability spec on ops, we can now auto-generate the
op interfaces of all SPIR-V availability classes and also
synthesize the op's implementations of these interfaces. The
interface generation is done via new TableGen backends
-gen-avail-interface-{decls|defs}. The op's implementation is
done via -gen-spirv-avail-impls.

Differential Revision: https://reviews.llvm.org/D71930
2019-12-27 16:25:09 -05:00
Nicolas Vasilache 9059cf392d Automated rollback of commit d60133f89b
PiperOrigin-RevId: 282574110
2019-11-26 08:47:48 -08:00
Christian Sigg d60133f89b Changing directory shortcut for CPU/GPU runner utils.
Moving cuda-runtime-wrappers.so into subdirectory to match libmlir_runner_utils.so.
Provide parent directory when running test and load .so from subdirectory.

PiperOrigin-RevId: 282410749
2019-11-25 12:30:54 -08:00
River Riddle 6b4e30b7c8 Add Ch-7 of the toy tutorial detailing how to define new types.
This chapter adds a new composite type to Toy, and shows the process of adding a new type to the IR, adding and updating operations to use it, and constant folding operations producing it.

PiperOrigin-RevId: 279107885
2019-11-07 09:54:04 -08:00
River Riddle dae0ae6879 NFC: Delete the Linalg tutorial.
This part of the tutorial is now covered by a new flow in Toy. This also removes a point of confusion as there is also a proper Linalg dialect.

PiperOrigin-RevId: 275338933
2019-10-17 14:27:37 -07:00
River Riddle 0372eb413f Add Ch.6 of the Toy tutorial.
This chapters introduces the notion of a full conversion, and adds support for lowering down to the LLVM dialect, LLVM IR, and thus code generation.

PiperOrigin-RevId: 275337786
2019-10-17 14:22:13 -07:00
Nicolas Vasilache 3b4f133fb7 Start a minimal mlir_utils runtime library for testing debugging purposes
Now that MLIR has a standardized StridedMemRef descriptor, it becomes very easy to interact with external library functions and build utilities directly in C++.
This CL introduces basic printing support in a libmlir_utils.so.
Unit tests are rewritten using this feature and also to improve coverage.

For now, C mandates that we have a unique function for each MemRef element type and rank.
In a future a simple unranked descriptor can be introduced to only require uniqu'ing by element type.

PiperOrigin-RevId: 273304741
2019-10-07 09:06:55 -07:00
Nicolas Vasilache b628194013 Move Linalg and VectorOps dialects to the Dialect subdir - NFC
PiperOrigin-RevId: 264277760
2019-08-19 17:11:38 -07:00
Stephan Herhut e8b21a75f8 Add an mlir-cuda-runner tool.
This tool allows to execute MLIR IR snippets written in the GPU dialect
on a CUDA capable GPU. For this to work, a working CUDA install is required
and the build has to be configured with MLIR_CUDA_RUNNER_ENABLED set to 1.

PiperOrigin-RevId: 256551415
2019-07-04 07:53:54 -07:00
Nicolas Vasilache dac75ae5ff Split test-specific passes out of mlir-opt
Instead put their impl in test/lib and link them into mlir-test-opt

PiperOrigin-RevId: 254837439
2019-06-24 17:47:12 -07:00
Nicolas Vasilache a8a4d35d3f Add a lowering for Linalg matmul to LLVM
This CL adds a lowering to LLVM for MamulOp and a corresponding integration test.

View descriptor manipulation is moved from MLIR's LLVM dialect to C++ code compiled on the side. To this end a separation is introduced between `cblas.cpp` and `cblas_interface.cpp`, the latter operating on view types whose ABI correspond to the LLVM signature generated by MLIR.

An intermediary step is introduced that allocates a new descriptor on the MLIR side for the purpose of passing it to LLVM. The reason for this extra step is that the ABI for by-value ViewType objects wants aligned descriptors, e.g.:
```
extern "C" void linalg_dot_impl(ViewType<float, 1> X, ViewType<float, 1> Y,
                                BaseViewType<float> Z) {
   ...
}
```
produces LLVM IR with the signature:
```
%struct.ViewType = type { %struct.BaseViewType, [1 x i64], [1 x i64] }
%struct.BaseViewType = type { float*, i64 }

define void @linalg_dot_impl(%struct.ViewType* byval align 8, %struct.ViewType* byval align 8, float*, i64) tensorflow/mlir#0 {
...
}
```

We don't seem to be able to make such aligned  allocations in the MLIR -> LLVM converter atm.
Going through a level of indirection allows the test to pass.
The temporary tradeoff is that the MLIR shims have to be written by hand.
They will disappear in the future.

PiperOrigin-RevId: 252670672
2019-06-19 22:58:46 -07:00
River Riddle 5bfe37691c Add a new TestDialect directory in tests/. This directory defines a fake 'TestDialect' that allows for the use of FileCheck to test things that aren't currently used anywhere else in tree. As a first order, this should simplify the tests used for tablegen components revolving around operation constraints/patterns.
--

PiperOrigin-RevId: 249724328
2019-06-01 19:59:04 -07:00
Alex Zinenko 29c7929b13 Make EDSC builder test more robust to the order of evaluation
EDSC builder test uses FileCheck to match the IR produced by EDSC in the
    textual order.  For mathematical operations, EDSC relies on overloaded
    operators.  Since they are essentially function calls, the order of evaluation
    of their operands is unspecified and differs between compilers.  Do not rely on
    a specific order of operands and just check they are all emitted before the
    last operation.  Give names to matched SSA values in order to make sure the
    right operands are used in relevant places.

--

PiperOrigin-RevId: 249494995
2019-06-01 19:56:34 -07:00
Alex Zinenko 6804cf2429 Move SDBM infrastructure into a new SDBM dialect
We now have sufficient extensibility in dialects to move attribute components
    such as SDBM out of the core IR into a dedicated dialect and make them
    optional.  Introduce an SDBM dialect and move the code.  This is a mostly
    non-functional change.

--

PiperOrigin-RevId: 249244802
2019-06-01 19:54:33 -07:00
Alex Zinenko 3183394328 Enable EDSC API test running through lit
EDSC subsystem contains an API test which is a .cpp file calling the API in
    question and producing IR.  This IR is further checked using FileCheck and
    should plug into lit.  Provide a CMakeLists.txt to build the test and modify
    the lit configuration to process the source file.

--

PiperOrigin-RevId: 248794443
2019-05-20 13:46:09 -07:00
Nicolas Vasilache 6aa5cc8b06 Cleanup linalg integration test
This CL performs post-commit cleanups.
    It adds the ability to specify which shared libraries to load dynamically in ExecutionEngine. The linalg integration test is updated to use a shared library.
    Additional minor cleanups related to LLVM lowering of Linalg are also included.

--

PiperOrigin-RevId: 248346589
2019-05-20 13:43:13 -07:00
Nicolas Vasilache 5c64d2a6c4 Pipe Linalg to a cblas call via mlir-cpu-runner
This CL extends the execution engine to allow the additional resolution of symbols names
    that have been registered explicitly. This allows linking static library symbols that have not been explicitly exported with the -rdynamic linking flag (which is deemed too intrusive).

--

PiperOrigin-RevId: 247969504
2019-05-20 13:39:02 -07:00
Nicolas Vasilache 56c7a957bf Parsing support for Range, View and Slice operations
This CL implements the previously unsupported parsing for Range, View and Slice operations.
    A pass is introduced to lower to the LLVM.
    Tests are moved out of C++ land and into mlir/test/Examples.
    This allows better fitting within standard developer workflows.

--

PiperOrigin-RevId: 245796600
2019-05-06 08:20:55 -07:00
Mehdi Amini c39592b09c Toy tutorial Chapter 5: Lowering to Linalg and LLVM
--

PiperOrigin-RevId: 242606796
2019-04-08 23:26:54 -07:00
Mehdi Amini d33a9dcc73 Add Chapter 4 for the Toy tutorial: shape inference, function specialization, and basic combines
--

PiperOrigin-RevId: 242050514
2019-04-05 07:42:56 -07:00
Mehdi Amini 092f3facad Fix Toy Ch3 testing with CMake
Mainly a missing dependency caused the tests to pass if one already built
    the repo, but not from a clean (or incremental) build.

--

PiperOrigin-RevId: 241852313
2019-04-03 19:22:42 -07:00
Mehdi Amini 213dda687b Chapter 2 of the Toy tutorial
This introduces a basic MLIRGen through straight AST traversal,
    without dialect registration at this point.

--

PiperOrigin-RevId: 241588354
2019-04-02 13:41:00 -07:00
Mehdi Amini 38b71d6b84 Initial version for chapter 1 of the Toy tutorial
--

PiperOrigin-RevId: 241549247
2019-04-02 13:40:06 -07:00
Jacques Pienaar 1273af232c Add build files and update README.
* Add initial version of build files;
    * Update README with instructions to download and build MLIR from github;

--

PiperOrigin-RevId: 241102092
2019-03-30 11:23:22 -07:00