[mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.

Summary:

This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).

While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.

Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:

    1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
    2. implicit and eager discovery of dims and symbols when parsing
    3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)

A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.

For the following "Tensor Comprehension-inspired" string:

```
    def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
      C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
    }
```

With -gen-ods-decl=1, this emits (modulo formatting):

```
      def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
        NInputs<2>,
        NOutputs<1>,
        NamedStructuredOpTraits]> {
          let arguments = (ins Variadic<LinalgOperand>:$views);
          let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
          let extraClassDeclaration = [{
            llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
            llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
            void regionBuilder(ArrayRef<BlockArgument> args);
          }];
          let hasFolder = 1;
      }
```

With -gen-ods-impl, this emits (modulo formatting):

```
      llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
          return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
                                            getParallelIteratorTypeName(),
                                            getParallelIteratorTypeName(),
                                            getReductionIteratorTypeName() };
      }
      llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
      {
        MLIRContext *context = getContext();
        AffineExpr d0, d1, d2, d3;
        bindDims(context, d0, d1, d2, d3);
        return SmallVector<AffineMap, 8>{
            AffineMap::get(4, 0, {d0, d1, d3}),
            AffineMap::get(4, 0, {d3, d2}),
            AffineMap::get(4, 0, {d0, d1, d2}) };
      }
      void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
        using namespace edsc;
        using namespace intrinsics;
        ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);

        ValueHandle _4 = std_mulf(_0, _1);
        ValueHandle _5 = std_addf(_2, _4);
        (linalg_yield(ValueRange{ _5 }));
      }
```

Differential Revision: https://reviews.llvm.org/D77067
This commit is contained in:
Nicolas Vasilache 2020-04-10 13:54:08 -04:00
parent a04ab2ec08
commit 882ba48474
10 changed files with 1851 additions and 4 deletions

View File

@ -451,6 +451,93 @@ from a description in terms of only the generic op interface.
This is the main reason there are only a small number of ops today: we expect
them to be auto-generated from Tablegen soon.
### Named Payload Ops Specification
Linalg provides a declarative specification and a generation tool
(`mlir-linalg-ods-gen`) to automatically produce named ops from a notation that
is inspired by Einstein notation.
The syntax and semantics used in `mlir-linalg-ods-gen` are very much in flight
and borrow from Tensor Comprehensions (TC) but differ in a few dimensions, to
better adapt to Linalg:
1. The input and output tensor parameters are specified as `id :
type(symbolic-affine-expression-list)` (e.g. `A : f32(M, N + M)`) and each
new symbol is discovered eagerly. TC on the other hand does not allow
general symbolic affine expressions.
1. The output shapes are specified explicitly, in TC they are always derived
from the input shapes.
1. The operations used to specify computations use EDSC intrinsics so that they
can easily be parsed and emitted into a simple region builder without
resorting to more general MLIR parsing.
1. Reduction dimensions are specified with angle bracket notation on the
operation they apply to (e.g. `std_add<k>` specifies that `k` is a reduction
dimension). In TC, a reduction is specified with `op=` operator and the
reduction dimensions are inferred.
1. The parallel and reduction dimension are ordered by the textual program
order. For instance, in the comprehension `O(i, j) = std_add<k, l>(...)`,
`i` (resp. `j`) is a parallel iterator encoded by affine dimension of
position `0` (resp. `1`); `k` (resp. `l`) is a reduction iterator encoded by
an affine dimension of position `2` (resp. `3`).
These decisions and syntax are subject to evolution and change. In particular,
op-specific attributes, dynamic ranks, some form of templating, shape
calculation function specification, etc. may be added in the future.
At this time, the following restrictions are imposed on the syntax and
semantics:
1. Each def may only contain a single comprehension but each comprehension may
perform multiple updates.
2. Each tensor may only be used with a single indexing expression.
The following specification may be used to define a named `batchmatmul` op:
```
def batchmatmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}
```
When `mlir-linalg-ods-gen -gen-ods-decl=1` is called, the following ODS is
produced:
```
def batchmatmulOp : LinalgNamedStructured_Op<"batchmatmul", [
NInputs<2>,
NOutputs<1>,
NamedStructuredOpTraits]> { ... }
```
When `mlir-linalg-ods-gen -gen-impl=1` is called, the following C++ is produced:
```
llvm::Optional<SmallVector<StringRef, 8>> batchmatmul::referenceIterators() {
return SmallVector<StringRef, 8>{
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batchmatmul::referenceIndexingMaps() {
MLIRContext *context = getContext();
AffineExpr d0, d1, d2, d3;
bindDims(context, d0, d1, d2, d3);
return SmallVector<AffineMap, 8>{
AffineMap::get(4, 0, {d0, d1, d3}),
AffineMap::get(4, 0, {d3, d2}),
AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batchmatmul::regionBuilder(ArrayRef<BlockArgument> args) {
using namespace edsc;
using namespace intrinsics;
ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
ValueHandle _4 = std_mulf(_0, _1);
ValueHandle _5 = std_addf(_2, _4);
(linalg_yield(ValueRange{ _5 }));
}
```
## Open Issues and Design Alternatives<a name="open_issues"></a>
Multiple open issues and design alternatives are in flight and it is time to
lay them out for the community to discuss and pick apart:

View File

@ -256,7 +256,7 @@ def MatmulOp : LinalgStructured_Op<"matmul", [NInputs<2>, NOutputs<1>]> {
/// OptionalAttr<I64ArrayAttr>:$strides
/// OptionalAttr<I64ArrayAttr>:$dilations
/// OptionalAttr<I64ElementsAttr>:$padding
/// `strides` denotes the step of each window along the dimension.
/// `stirdes` denotes the step of each window along the dimension.
class PoolingBase_Op<string mnemonic, list<OpTrait> props>
: LinalgStructured_Op<mnemonic, props> {
let description = [{
@ -821,4 +821,18 @@ def IndexedGenericOp : GenericOpBase<"indexed_generic"> {
let hasFolder = 1;
}
//===----------------------------------------------------------------------===//
// Named Linalg ops, implemented as a declarative configurations of generic ops.
//===----------------------------------------------------------------------===//
def NamedStructuredOpTraits : NativeOpTrait<"linalg::NamedStructuredOpTraits">;
class LinalgNamedStructured_Op<string mnemonic, list<OpTrait> props>
: Op<Linalg_Dialect, mnemonic,
!listconcat(props, [StructuredOpTraits, LinalgStructuredInterface])> {
string spec = ?;
let assemblyFormat = "`(` operands `)` attr-dict `:` "
"functional-type(operands, results)";
}
#endif // LINALG_STRUCTURED_OPS

View File

@ -219,7 +219,7 @@ AffineExpr getAffineExprFromFlatForm(ArrayRef<int64_t> flatExprs,
ArrayRef<AffineExpr> localExprs,
MLIRContext *context);
raw_ostream &operator<<(raw_ostream &os, AffineExpr &expr);
raw_ostream &operator<<(raw_ostream &os, AffineExpr expr);
template <typename U> bool AffineExpr::isa() const {
if (std::is_same<U, AffineBinaryOpExpr>::value)

View File

@ -613,7 +613,7 @@ AffineExpr AffineExpr::compose(AffineMap map) const {
map.getResults().end());
return replaceDimsAndSymbols(dimReplacements, {});
}
raw_ostream &mlir::operator<<(raw_ostream &os, AffineExpr &expr) {
raw_ostream &mlir::operator<<(raw_ostream &os, AffineExpr expr) {
expr.print(os);
return os;
}

View File

@ -35,6 +35,7 @@ set(MLIR_TEST_DEPENDS
MLIRUnitTests
mlir-cpu-runner
mlir-edsc-builder-api-test
mlir-linalg-ods-gen
mlir-opt
mlir-sdbm-api-test
mlir-tblgen

View File

@ -21,7 +21,7 @@ config.name = 'MLIR'
config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell)
# suffixes: A list of file extensions to treat as test files.
config.suffixes = ['.td', '.mlir', '.toy', '.ll']
config.suffixes = ['.td', '.mlir', '.toy', '.ll', '.tc']
# test_source_root: The root path where tests are located.
config.test_source_root = os.path.dirname(__file__)

View File

@ -0,0 +1,75 @@
// RUN: mlir-linalg-ods-gen %s -gen-ods-decl=1 | FileCheck %s --check-prefix=ODS
// RUN: mlir-linalg-ods-gen %s -gen-impl=1 | FileCheck %s --check-prefix=IMPL
// RUN: mlir-linalg-ods-gen %s -gen-ods-decl=1 -test-emit-include-td-header \
// RUN: | mlir-tblgen -gen-op-decls -I %S/../../include
// ODS-LABEL: def matvecOp : LinalgNamedStructured_Op<"matvec", [
// ODS-NEXT: NInputs<2>,
// ODS-NEXT: NOutputs<1>,
// ODS-NEXT: NamedStructuredOpTraits]>
//
// IMPL-LABEL: matvec::referenceIterators() {
// IMPL-NEXT: { {{.*}}Parallel{{.*}}, {{.*}}Reduction{{.*}} }
//
// IMPL: matvec::referenceIndexingMaps() {
// IMPL: AffineMap::get(2, 0, {d0, d1}),
// IMPL-NEXT: AffineMap::get(2, 0, {d1}),
// IMPL-NEXT: AffineMap::get(2, 0, {d0}) };
//
// IMPL: matvec::regionBuilder(ArrayRef<BlockArgument> args) {
// IMPL: ValueHandle [[a:.*]](args[0]), [[b:.*]](args[1]), [[c:.*]](args[2]);
// IMPL: ValueHandle [[d:.*]] = std_mulf([[a]], [[b]]);
// IMPL: ValueHandle [[e:.*]] = std_addf([[c]], [[d]]);
// IMPL: (linalg_yield(ValueRange{ [[e]] }));
//
def matvec(A: f32(M, K), B: f32(K)) -> (C: f32(M)) {
C(m) = std_addf<k>(std_mulf(A(m, k), B(k)));
}
// ODS-LABEL: def matmulOp : LinalgNamedStructured_Op<"matmul", [
// ODS-NEXT: NInputs<2>,
// ODS-NEXT: NOutputs<1>,
// ODS-NEXT: NamedStructuredOpTraits]>
//
// IMPL-LABEL: matmul::referenceIterators() {
// IMPL-NEXT: { {{.*}}Parallel{{.*}}, {{.*}}Parallel{{.*}}, {{.*}}Reduction{{.*}} }
//
// IMPL: matmul::referenceIndexingMaps() {
// IMPL: AffineMap::get(3, 0, {d0, d2}),
// IMPL-NEXT: AffineMap::get(3, 0, {d2, d1}),
// IMPL-NEXT: AffineMap::get(3, 0, {d0, d1}) };
//
// IMPL: matmul::regionBuilder(ArrayRef<BlockArgument> args) {
// IMPL: ValueHandle [[a:.*]](args[0]), [[b:.*]](args[1]), [[c:.*]](args[2]);
// IMPL: ValueHandle [[d:.*]] = std_mulf([[a]], [[b]]);
// IMPL: ValueHandle [[e:.*]] = std_addf([[c]], [[d]]);
// IMPL: (linalg_yield(ValueRange{ [[e]] }));
//
def matmul(A: f32(M, K), B: f32(K, N)) -> (C: f32(M, N)) {
C(m, n) = std_addf<k>(std_mulf(A(m, k), B(k, n)));
}
// ODS-LABEL: def batchmatmulOp : LinalgNamedStructured_Op<"batchmatmul", [
// ODS-NEXT: NInputs<2>,
// ODS-NEXT: NOutputs<1>,
// ODS-NEXT: NamedStructuredOpTraits]>
//
// IMPL-LABEL: batchmatmul::referenceIterators() {
// IMPL-NEXT: { {{.*}}Parallel{{.*}}, {{.*}}Parallel{{.*}}, {{.*}}Reduction{{.*}} }
//
// IMPL: batchmatmul::referenceIndexingMaps() {
// IMPL: AffineMap::get(4, 0, {d0, d1, d3}),
// IMPL-NEXT: AffineMap::get(4, 0, {d3, d2}),
// IMPL-NEXT: AffineMap::get(4, 0, {d0, d1, d2}) };
//
// IMPL: batchmatmul::regionBuilder(ArrayRef<BlockArgument> args) {
// IMPL: ValueHandle [[a:.*]](args[0]), [[b:.*]](args[1]), [[c:.*]](args[2]);
// IMPL: ValueHandle [[d:.*]] = std_mulf([[a]], [[b]]);
// IMPL: ValueHandle [[e:.*]] = std_addf([[c]], [[d]]);
// IMPL: (linalg_yield(ValueRange{ [[e]] }));
//
// TBLGEN: batchmatmulOp
def batchmatmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}

View File

@ -1,5 +1,6 @@
add_subdirectory(mlir-cuda-runner)
add_subdirectory(mlir-cpu-runner)
add_subdirectory(mlir-linalg-ods-gen)
add_subdirectory(mlir-opt)
add_subdirectory(mlir-translate)
add_subdirectory(mlir-vulkan-runner)

View File

@ -0,0 +1,10 @@
add_llvm_tool(mlir-linalg-ods-gen
mlir-linalg-ods-gen.cpp
)
llvm_update_compile_flags(mlir-linalg-ods-gen)
target_link_libraries(mlir-linalg-ods-gen PRIVATE
MLIRParser
MLIRSupport
LLVMCore
LLVMSupport
)

File diff suppressed because it is too large Load Diff