Start GPU Dialect

Define a new dialect related to GPU kernels.  Currently, it only contains a
    single operation for launching a kernel on a three-dimensional grid of thread
    blocks, following a model similar to that of CUDA.  In particular, the body of
    the kernel contains operations executed by each thread and uses region
    arguments to accept thread and block identifiers (similar to how the loop body
    region accepts the induction value).

--

PiperOrigin-RevId: 245713728
This commit is contained in:
Alex Zinenko 2019-04-29 03:00:25 -07:00 committed by Mehdi Amini
parent 4c74f1bf38
commit aae8a7446e
11 changed files with 492 additions and 0 deletions

View File

@ -0,0 +1,69 @@
# GPU Dialect
Note: this dialect is more likely to change than others in the near future; use
with caution.
This dialect provides middle-level abstractions for launching GPU kernels
following a programming model similar to that of CUDA or OpenCL. It provides
abstractions for kernel invocations (and may eventually provide those for device
management) that are not present at the lower level (e.g., as LLVM IR intrinsics
for GPUs). Its goal is to abstract away device- and driver-specific
manipulations to launch a GPU kernel and provide a simple path towards GPU
execution from MLIR. It may be targeted, for example, by DSLs using MLIR. The
dialect uses `gpu` as its canonical prefix.
## Operations
### `gpu.launch`
Launch a kernel on the specified grid of thread blocks. The body of the kernel
is defined by the single region that this operation contains. The operation
takes at least six operands, with first three operands being grid sizes along
x,y,z dimensions, the following three arguments being block sizes along x,y,z
dimension, and the remaining operands are arguments of the kernel. When a
lower-dimensional kernel is required, unused sizes must be explicitly set to
`1`.
The body region has at least _twelve_ arguments, grouped as follows:
- three arguments that contain block identifiers along x,y,z dimensions;
- three arguments that contain thread identifiers along x,y,z dimensions;
- operands of the `gpu.launch` operation as is, including six leading operands
for grid and block sizes.
Operations inside the body region, and any operations in the nested regions, are
_not_ allowed to use values defined outside the _body_ region, as if this region
was a function. If necessary, values must be passed as kernel arguments into the
body region. Nested regions inside the kernel body are allowed to use values
defined in their ancestor regions as long as they don't cross the kernel body
region boundary.
Custom syntax for this operation is currently not available.
Example:
```mlir {.mlir}
// Generic syntax explains how the pretty syntax maps to the IR structure.
"gpu.launch"(%cst, %cst, %c1, // Grid sizes.
%cst, %c1, %c1, // Block sizes.
%arg0, %arg1) // Actual arguments.
{/*attributes*/}
// All sizes and identifiers have "index" size.
: (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> () {
// The operation passes block and thread identifiers, followed by grid and block
// sizes, followed by actual arguments to the entry block of the region.
^bb0(%bx : index, %by : index, %bz : index,
%tx : index, %ty : index, %tz : index,
%num_bx : index, %num_by : index, %num_bz : index,
%num_tx : index, %num_ty : index, %num_tz : index,
%arg0 : f32, %arg1 : memref<?xf32, 1>):
"some_op"(%bx, %tx) : (index, index) -> ()
%3 = "std.load"(%arg1, %bx) : (memref<?xf32, 1>, index) -> f32
}
```
Rationale: using operation/block arguments gives analyses a clear way of
understanding that a value has additional semantics (e.g., we will need to know
what value corresponds to threadIdx.x for coalescing). We can recover these
properties by analyzing the operations producing values, but it is easier just
to have that information by construction.

View File

@ -0,0 +1,72 @@
//===- GPUDialect.h - MLIR Dialect for GPU Kernels --------------*- C++ -*-===//
//
// Copyright 2019 The MLIR Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// =============================================================================
//
// This file defines the GPU kernel-related operations and puts them in the
// corresponding dialect.
//
//===----------------------------------------------------------------------===//
#ifndef MLIR_GPU_GPUDIALECT_H
#define MLIR_GPU_GPUDIALECT_H
#include "mlir/IR/Dialect.h"
#include "mlir/IR/OpDefinition.h"
namespace mlir {
/// The dialect containing GPU kernel launching operations and related
/// facilities.
class GPUDialect : public Dialect {
public:
/// Create the dialect in the given `context`.
GPUDialect(MLIRContext *context);
/// Get the canonical string name of the dialect.
static StringRef getDialectName();
};
struct KernelDim3 {
Value *x;
Value *y;
Value *z;
};
class LaunchOp : public Op<LaunchOp, OpTrait::AtLeastNOperands<6>::Impl,
OpTrait::ZeroResult,
OpTrait::NthRegionIsIsolatedAbove<0>::Impl> {
public:
using Op::Op;
static void build(Builder *builder, OperationState *result, Value *gridSizeX,
Value *gridSizeY, Value *gridSizeZ, Value *blockSizeX,
Value *blockSizeY, Value *blockSizeZ,
ArrayRef<Value *> operands);
Region &getBody();
KernelDim3 getBlockIds();
KernelDim3 getThreadIds();
KernelDim3 getGridSize();
KernelDim3 getBlockSize();
LogicalResult verify();
static StringRef getOperationName() { return "gpu.launch"; }
};
} // end namespace mlir
#endif // MLIR_GPUKERNEL_GPUDIALECT_H

View File

@ -395,6 +395,13 @@ public:
blocks.splice(blocks.end(), other.getBlocks());
}
/// Check that this does not use any value defined outside it.
/// Emit errors if `noteEmitter` is provided; this callback is used to point
/// to the operation containing the region, the actual error is reported at
/// the operation with an offending use.
bool
isIsolatedAbove(llvm::function_ref<void(const Twine &)> noteEmitter = {});
private:
RegionType blocks;

View File

@ -697,6 +697,22 @@ public:
}
};
/// This verifiers that all operands used in N-th region of the given operation
/// are defined within that region.
template <unsigned RegionIdx> class NthRegionIsIsolatedAbove {
public:
template <typename ConcreteType>
class Impl : public TraitBase<ConcreteType,
NthRegionIsIsolatedAbove<RegionIdx>::Impl> {
public:
static LogicalResult verifyTrait(Operation *op) {
auto noteEmitter = [op](const Twine &message) { op->emitNote(message); };
return op->getRegion(RegionIdx).isIsolatedAbove(noteEmitter) ? success()
: failure();
}
};
};
} // end namespace OpTrait
//===----------------------------------------------------------------------===//

View File

@ -4,6 +4,7 @@ add_subdirectory(Dialect)
add_subdirectory(EDSC)
add_subdirectory(ExecutionEngine)
add_subdirectory(FxpMathOps)
add_subdirectory(GPU)
add_subdirectory(IR)
add_subdirectory(LLVMIR)
add_subdirectory(Linalg)

View File

@ -0,0 +1,9 @@
add_llvm_library(MLIRGPU
IR/GPUDialect.cpp
IR/DialectRegistration.cpp
ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/GPU
)
add_dependencies(MLIRGPU MLIRIR LLVMSupport)
target_link_libraries(MLIRGPU MLIRIR LLVMSupport)

View File

@ -0,0 +1,21 @@
//===- DialectRegistration.cpp - MLIR GPU dialect registration ------------===//
//
// Copyright 2019 The MLIR Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// =============================================================================
#include "mlir/GPU/GPUDialect.h"
// Static initialization for GPU dialect registration.
static mlir::DialectRegistration<mlir::GPUDialect> kernelDialect;

View File

@ -0,0 +1,99 @@
//===- GPUDialect.cpp - MLIR Dialect for GPU Kernels implementation -------===//
//
// Copyright 2019 The MLIR Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// =============================================================================
//
// This file implements the GPU kernel-related dialect and its operations.
//
//===----------------------------------------------------------------------===//
#include "mlir/GPU/GPUDialect.h"
#include "mlir/IR/Builders.h"
#include "mlir/IR/StandardTypes.h"
using namespace mlir;
StringRef GPUDialect::getDialectName() { return "gpu"; }
GPUDialect::GPUDialect(MLIRContext *context)
: Dialect(getDialectName(), context) {
addOperations<LaunchOp>();
}
//===----------------------------------------------------------------------===//
// LaunchOp
//===----------------------------------------------------------------------===//
static SmallVector<Type, 4> getValueTypes(ArrayRef<Value *> values) {
SmallVector<Type, 4> types;
types.reserve(values.size());
for (Value *v : values)
types.push_back(v->getType());
return types;
}
void LaunchOp::build(Builder *builder, OperationState *result, Value *gridSizeX,
Value *gridSizeY, Value *gridSizeZ, Value *blockSizeX,
Value *blockSizeY, Value *blockSizeZ,
ArrayRef<Value *> operands) {
// Add grid and block sizes as op operands, followed by the data operands.
result->addOperands(
{gridSizeX, gridSizeY, gridSizeZ, blockSizeX, blockSizeY, blockSizeZ});
result->addOperands(operands);
// Create a kernel body region with 12 + N arguments, where the first 12
// arguments have `index` type and the rest have the same types as the data
// operands.
Region *kernelRegion = result->addRegion();
Block *body = new Block();
body->addArguments(std::vector<Type>(12, builder->getIndexType()));
body->addArguments(getValueTypes(operands));
kernelRegion->push_back(body);
}
Region &LaunchOp::getBody() { return getOperation()->getRegion(0); }
KernelDim3 LaunchOp::getBlockIds() {
auto args = getBody().getBlocks().front().getArguments();
return KernelDim3{args[0], args[1], args[2]};
}
KernelDim3 LaunchOp::getThreadIds() {
auto args = getBody().getBlocks().front().getArguments();
return KernelDim3{args[3], args[4], args[5]};
}
KernelDim3 LaunchOp::getGridSize() {
auto args = getBody().getBlocks().front().getArguments();
return KernelDim3{args[6], args[7], args[8]};
}
KernelDim3 LaunchOp::getBlockSize() {
auto args = getBody().getBlocks().front().getArguments();
return KernelDim3{args[9], args[10], args[11]};
}
LogicalResult LaunchOp::verify() {
// Kernel launch takes 6 leading operands for grid/block sizes and transforms
// them into 12 region arguments for block/thread identifiers and grid/block
// sizes.
if (!getBody().empty()) {
Block &entryBlock = getBody().front();
if (entryBlock.getNumArguments() != 6 + getNumOperands())
return emitError("unexpected number of region arguments");
}
return success();
}

View File

@ -353,6 +353,55 @@ void Region::cloneInto(Region *dest, BlockAndValueMapping &mapper,
it->walk(remapOperands);
}
// Check that the given `region` does not use any value defined outside its
// ancestor region `limit`. That is, given `A{B{C{}}}` with limit `B`, `C` is
// allowed to use values defined in `B` but not those defined in `A`.
// Emit errors if `emitOpNote` is provided; this callback is used to point to
// the operation containing the region, the actual error is reported at the
// operation with an offending use.
static bool
isRegionIsolatedAbove(Region &region, Region &limit,
llvm::function_ref<void(const Twine &)> emitOpNote = {}) {
assert(limit.isAncestor(&region) &&
"expected isolation limit to be an ancestor of the given region");
// List of regions to analyze. Each region is processed independently, with
// respect to the common `limit` region, so we can look at them in any order.
// Therefore, use a simple vector and push/pop back the current region.
SmallVector<Region *, 8> pendingRegions;
pendingRegions.push_back(&region);
// Traverse all operations in the region.
while (!pendingRegions.empty()) {
for (Block &block : *pendingRegions.pop_back_val()) {
for (Operation &op : block) {
for (Value *operand : op.getOperands()) {
// Check that any value that is used by an operation is defined in the
// same region as either an operation result or a block argument.
if (operand->getContainingRegion()->isProperAncestor(&limit)) {
if (emitOpNote) {
op.emitOpError("using value defined outside the region");
emitOpNote("required by region isolation constraints");
}
return false;
}
}
// Schedule any regions the operations contain for further checking.
pendingRegions.reserve(pendingRegions.size() + op.getNumRegions());
for (Region &subRegion : op.getRegions())
pendingRegions.push_back(&subRegion);
}
}
}
return true;
}
bool Region::isIsolatedAbove(
llvm::function_ref<void(const Twine &)> noteEmitter) {
return isRegionIsolatedAbove(*this, *this, noteEmitter);
}
Region *llvm::ilist_traits<::mlir::Block>::getContainingRegion() {
size_t Offset(
size_t(&((Region *)nullptr->*Region::getSublistAccess(nullptr))));

View File

@ -0,0 +1,78 @@
// RUN: mlir-opt -split-input-file -verify %s
func @not_enough_sizes(%sz : index) {
// expected-error@+1 {{expected 6 or more operands}}
"gpu.launch"(%sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index) -> () {
return
}
return
}
// -----
func @no_region_attrs(%sz : index) {
// expected-error@+1 {{unexpected number of region arguments}}
"gpu.launch"(%sz, %sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index, index) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index):
return
}
return
}
// -----
func @isolation_arg(%sz : index) {
// expected-note@+1 {{required by region isolation constraints}}
"gpu.launch"(%sz, %sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index, index) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index):
// expected-error@+1 {{using value defined outside the region}}
"use"(%sz) : (index) -> ()
return
}
return
}
// -----
func @isolation_op(%sz : index) {
%val = "produce"() : () -> (index)
// expected-note@+1 {{required by region isolation constraints}}
"gpu.launch"(%sz, %sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index, index) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index):
// expected-error@+1 {{using value defined outside the region}}
"use"(%val) : (index) -> ()
return
}
return
}
// -----
func @nested_isolation(%sz : index) {
// expected-note@+1 {{required by region isolation constraints}}
"gpu.launch"(%sz, %sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index, index) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index):
"region"() : () -> () {
"region"() : () -> () {
// expected-error@+1 {{using value defined outside the region}}
"use"(%sz) : (index) -> ()
}
}
}
return
}

71
mlir/test/GPU/ops.mlir Normal file
View File

@ -0,0 +1,71 @@
// RUN: mlir-opt %s | FileCheck %s
// CHECK-LABEL:func @no_args(%arg0: index)
func @no_args(%sz : index) {
// CHECK: "gpu.launch"(%arg0, %arg0, %arg0, %arg0, %arg0, %arg0) : (index, index, index, index, index, index) -> () {
"gpu.launch"(%sz, %sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index, index) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index):
return
}
return
}
// CHECK-LABEL:func @args(%arg0: index, %arg1: index, %arg2: f32, %arg3: memref<?xf32, 1>) {
func @args(%blk : index, %thrd : index, %float : f32, %data : memref<?xf32,1>) {
// CHECK: "gpu.launch"(%arg0, %arg0, %arg0, %arg1, %arg1, %arg1, %arg2, %arg3) : (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> () {
"gpu.launch"(%blk, %blk, %blk, %thrd, %thrd, %thrd, %float, %data)
: (index, index, index, index, index, index, f32, memref<?xf32,1>) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index,
%data0: f32, %data1: memref<?xf32,1>):
return
}
return
}
// It is possible to use values passed into the region as arguments.
// CHECK-LABEL: func @passing_values
func @passing_values(%blk : index, %thrd : index, %float : f32, %data : memref<?xf32,1>) {
// CHECK: "gpu.launch"(%arg0, %arg0, %arg0, %arg1, %arg1, %arg1, %arg2, %arg3) : (index, index, index, index, index, index, f32, memref<?xf32, 1>) -> () {
"gpu.launch"(%blk, %blk, %blk, %thrd, %thrd, %thrd, %float, %data)
: (index, index, index, index, index, index, f32, memref<?xf32,1>) -> () {
// CHECK: ^bb1(%i0: index, %i1: index, %i2: index, %i3: index, %i4: index, %i5: index, %i6: index, %i7: index, %i8: index, %i9: index, %i10: index, %i11: index, %i12: f32, %i13: memref<?xf32, 1>)
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index,
%data0: f32, %data1: memref<?xf32,1>):
// CHECK: "use"(%i12)
"use"(%data0): (f32) -> ()
return
}
return
}
// It is possible to use values defined in nested regions as long as they don't
// cross kernel launch region boundaries.
// CHECK-LABEL: func @nested_isolation
func @nested_isolation(%sz : index) {
"gpu.launch"(%sz, %sz, %sz, %sz, %sz, %sz)
: (index, index, index, index, index, index) -> () {
^bb1(%bx: index, %by: index, %bz: index,
%tx: index, %ty: index, %tz: index,
%szbx: index, %szby: index, %szbz: index,
%sztx: index, %szty: index, %sztz: index):
"region"() : () -> () {
// CHECK: %0 = "produce"()
%val = "produce"() : () -> (index)
"region"() : () -> () {
// CHECK: "use"(%0)
"use"(%val) : (index) -> ()
}
}
}
return
}