[mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.

Summary: This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745). While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification. Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include: 1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax 2. implicit and eager discovery of dims and symbols when parsing 3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...) A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end. For the following "Tensor Comprehension-inspired" string: ``` def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) { C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n))); } ``` With -gen-ods-decl=1, this emits (modulo formatting): ``` def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [ NInputs<2>, NOutputs<1>, NamedStructuredOpTraits]> { let arguments = (ins Variadic<LinalgOperand>:$views); let results = (outs Variadic<AnyRankedTensor>:$output_tensors); let extraClassDeclaration = [{ llvm::Optional<SmallVector<StringRef, 8>> referenceIterators(); llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps(); void regionBuilder(ArrayRef<BlockArgument> args); }]; let hasFolder = 1; } ``` With -gen-ods-impl, this emits (modulo formatting): ``` llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() { return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(), getParallelIteratorTypeName(), getParallelIteratorTypeName(), getReductionIteratorTypeName() }; } llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps() { MLIRContext *context = getContext(); AffineExpr d0, d1, d2, d3; bindDims(context, d0, d1, d2, d3); return SmallVector<AffineMap, 8>{ AffineMap::get(4, 0, {d0, d1, d3}), AffineMap::get(4, 0, {d3, d2}), AffineMap::get(4, 0, {d0, d1, d2}) }; } void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) { using namespace edsc; using namespace intrinsics; ValueHandle _0(args[0]), _1(args[1]), _2(args[2]); ValueHandle _4 = std_mulf(_0, _1); ValueHandle _5 = std_addf(_2, _4); (linalg_yield(ValueRange{ _5 })); } ``` Differential Revision: https://reviews.llvm.org/D77067
2020-04-10 13:54:08 -04:00 · 2020-04-10 13:54:08 -04:00 · 882ba48474
parent a04ab2ec08
commit 882ba48474
10 changed files with 1851 additions and 4 deletions
--- a/mlir/docs/Dialects/Linalg.md
+++ b/mlir/docs/Dialects/Linalg.md
@ -451,6 +451,93 @@ from a description in terms of only the generic op interface.
 This is the main reason there are only a small number of ops today: we expect
 them to be auto-generated from Tablegen soon.

+### Named Payload Ops Specification
+
+Linalg provides a declarative specification and a generation tool
+(`mlir-linalg-ods-gen`) to automatically produce named ops from a notation that
+is inspired by Einstein notation.
+
+The syntax and semantics used in `mlir-linalg-ods-gen` are very much in flight
+and borrow from Tensor Comprehensions (TC) but differ in a few dimensions, to
+better adapt to Linalg:
+
+1.  The input and output tensor parameters are specified as `id :
+    type(symbolic-affine-expression-list)` (e.g. `A : f32(M, N + M)`) and each
+    new symbol is discovered eagerly. TC on the other hand does not allow
+    general symbolic affine expressions.
+1.  The output shapes are specified explicitly, in TC they are always derived
+    from the input shapes.
+1.  The operations used to specify computations use EDSC intrinsics so that they
+    can easily be parsed and emitted into a simple region builder without
+    resorting to more general MLIR parsing.
+1.  Reduction dimensions are specified with angle bracket notation on the 
+    operation they apply to (e.g. `std_add<k>` specifies that `k` is a reduction
+    dimension). In TC, a reduction is specified with `op=` operator and the
+    reduction dimensions are inferred.
+1.  The parallel and reduction dimension are ordered by the textual program
+    order. For instance, in the comprehension `O(i, j) = std_add<k, l>(...)`,
+    `i` (resp. `j`) is a parallel iterator encoded by affine dimension of
+    position `0` (resp. `1`); `k` (resp. `l`) is a reduction iterator encoded by
+    an affine dimension of position `2` (resp. `3`).
+
+These decisions and syntax are subject to evolution and change. In particular,
+op-specific attributes, dynamic ranks, some form of templating, shape
+calculation function specification, etc. may be added in the future.
+
+At this time, the following restrictions are imposed on the syntax and
+semantics:
+
+1.  Each def may only contain a single comprehension but each comprehension may
+    perform multiple updates.
+2.  Each tensor may only be used with a single indexing expression.
+
+The following specification may be used to define a named `batchmatmul` op:
+
+```
+def batchmatmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
+  C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
+}
+```
+
+When `mlir-linalg-ods-gen -gen-ods-decl=1` is called, the following ODS is
+produced:
+
+```
+  def batchmatmulOp : LinalgNamedStructured_Op<"batchmatmul", [
+    NInputs<2>,
+    NOutputs<1>,
+    NamedStructuredOpTraits]> { ... }
+```
+
+When `mlir-linalg-ods-gen -gen-impl=1` is called, the following C++ is produced:
+
+```
+llvm::Optional<SmallVector<StringRef, 8>> batchmatmul::referenceIterators() {
+  return SmallVector<StringRef, 8>{
+    getParallelIteratorTypeName(),
+    getParallelIteratorTypeName(),
+    getParallelIteratorTypeName(),
+    getReductionIteratorTypeName() };
+}
+llvm::Optional<SmallVector<AffineMap, 8>> batchmatmul::referenceIndexingMaps() {
+  MLIRContext *context = getContext();
+  AffineExpr d0, d1, d2, d3;
+  bindDims(context, d0, d1, d2, d3);
+  return SmallVector<AffineMap, 8>{
+      AffineMap::get(4, 0, {d0, d1, d3}),
+      AffineMap::get(4, 0, {d3, d2}),
+      AffineMap::get(4, 0, {d0, d1, d2}) };
+}
+void batchmatmul::regionBuilder(ArrayRef<BlockArgument> args) {
+  using namespace edsc;
+  using namespace intrinsics;
+  ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
+  ValueHandle _4 = std_mulf(_0, _1);
+  ValueHandle _5 = std_addf(_2, _4);
+  (linalg_yield(ValueRange{ _5 }));
+}
+```
+
 ## Open Issues and Design Alternatives<a name="open_issues"></a>
 Multiple open issues and design alternatives are in flight and it is time to
 lay them out for the community to discuss and pick apart:
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
@ -256,7 +256,7 @@ def MatmulOp : LinalgStructured_Op<"matmul", [NInputs<2>, NOutputs<1>]> {
 ///   OptionalAttr<I64ArrayAttr>:$strides
 ///   OptionalAttr<I64ArrayAttr>:$dilations
 ///   OptionalAttr<I64ElementsAttr>:$padding
-/// `strides` denotes the step of each window along the dimension.
+/// `stirdes` denotes the step of each window along the dimension.
 class PoolingBase_Op<string mnemonic, list<OpTrait> props>
  : LinalgStructured_Op<mnemonic, props> {
  let description = [{
@ -821,4 +821,18 @@ def IndexedGenericOp : GenericOpBase<"indexed_generic"> {
  let hasFolder = 1;
 }

+//===----------------------------------------------------------------------===//
+// Named Linalg ops, implemented as a declarative configurations of generic ops.
+//===----------------------------------------------------------------------===//
+
+def NamedStructuredOpTraits : NativeOpTrait<"linalg::NamedStructuredOpTraits">;
+
+class LinalgNamedStructured_Op<string mnemonic, list<OpTrait> props>
+  : Op<Linalg_Dialect, mnemonic,
+       !listconcat(props, [StructuredOpTraits, LinalgStructuredInterface])> {
+  string spec = ?;
+  let assemblyFormat = "`(` operands `)` attr-dict `:` "
+    "functional-type(operands, results)";
+}
+
 #endif // LINALG_STRUCTURED_OPS
--- a/mlir/include/mlir/IR/AffineExpr.h
+++ b/mlir/include/mlir/IR/AffineExpr.h
@ -219,7 +219,7 @@ AffineExpr getAffineExprFromFlatForm(ArrayRef<int64_t> flatExprs,
                                     ArrayRef<AffineExpr> localExprs,
                                     MLIRContext *context);

-raw_ostream &operator<<(raw_ostream &os, AffineExpr &expr);
+raw_ostream &operator<<(raw_ostream &os, AffineExpr expr);

 template <typename U> bool AffineExpr::isa() const {
  if (std::is_same<U, AffineBinaryOpExpr>::value)
--- a/mlir/lib/IR/AffineExpr.cpp
+++ b/mlir/lib/IR/AffineExpr.cpp
@ -613,7 +613,7 @@ AffineExpr AffineExpr::compose(AffineMap map) const {
                                             map.getResults().end());
  return replaceDimsAndSymbols(dimReplacements, {});
 }
-raw_ostream &mlir::operator<<(raw_ostream &os, AffineExpr &expr) {
+raw_ostream &mlir::operator<<(raw_ostream &os, AffineExpr expr) {
  expr.print(os);
  return os;
 }
--- a/mlir/test/CMakeLists.txt
+++ b/mlir/test/CMakeLists.txt
@ -35,6 +35,7 @@ set(MLIR_TEST_DEPENDS
  MLIRUnitTests
  mlir-cpu-runner
  mlir-edsc-builder-api-test
+  mlir-linalg-ods-gen
  mlir-opt
  mlir-sdbm-api-test
  mlir-tblgen
--- a/mlir/test/lit.cfg.py
+++ b/mlir/test/lit.cfg.py
@ -21,7 +21,7 @@ config.name = 'MLIR'
 config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell)

 # suffixes: A list of file extensions to treat as test files.
-config.suffixes = ['.td', '.mlir', '.toy', '.ll']
+config.suffixes = ['.td', '.mlir', '.toy', '.ll', '.tc']

 # test_source_root: The root path where tests are located.
 config.test_source_root = os.path.dirname(__file__)
--- a/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc
+++ b/mlir/test/mlir-linalg-ods-gen/test-linalg-ods-gen.tc
@ -0,0 +1,75 @@
+// RUN: mlir-linalg-ods-gen %s -gen-ods-decl=1 | FileCheck %s --check-prefix=ODS
+// RUN: mlir-linalg-ods-gen %s -gen-impl=1 | FileCheck %s --check-prefix=IMPL
+
+// RUN: mlir-linalg-ods-gen %s -gen-ods-decl=1 -test-emit-include-td-header \
+// RUN: | mlir-tblgen -gen-op-decls -I %S/../../include
+
+// ODS-LABEL: def matvecOp : LinalgNamedStructured_Op<"matvec", [
+//  ODS-NEXT:   NInputs<2>,
+//  ODS-NEXT:   NOutputs<1>,
+//  ODS-NEXT:   NamedStructuredOpTraits]>
+//
+// IMPL-LABEL:  matvec::referenceIterators() {
+//  IMPL-NEXT:  { {{.*}}Parallel{{.*}}, {{.*}}Reduction{{.*}} }
+//
+//       IMPL:  matvec::referenceIndexingMaps() {
+//       IMPL:  AffineMap::get(2, 0, {d0, d1}),
+//  IMPL-NEXT:  AffineMap::get(2, 0, {d1}),
+//  IMPL-NEXT:  AffineMap::get(2, 0, {d0}) };
+//
+//       IMPL:  matvec::regionBuilder(ArrayRef<BlockArgument> args) {
+//       IMPL:  ValueHandle [[a:.*]](args[0]), [[b:.*]](args[1]), [[c:.*]](args[2]);
+//       IMPL:  ValueHandle [[d:.*]] = std_mulf([[a]], [[b]]);
+//       IMPL:  ValueHandle [[e:.*]] = std_addf([[c]], [[d]]);
+//       IMPL:  (linalg_yield(ValueRange{ [[e]] }));
+//
+def matvec(A: f32(M, K), B: f32(K)) -> (C: f32(M)) {
+  C(m) = std_addf<k>(std_mulf(A(m, k), B(k)));
+}
+
+// ODS-LABEL: def matmulOp : LinalgNamedStructured_Op<"matmul", [
+//  ODS-NEXT:   NInputs<2>,
+//  ODS-NEXT:   NOutputs<1>,
+//  ODS-NEXT:   NamedStructuredOpTraits]>
+//
+// IMPL-LABEL:  matmul::referenceIterators() {
+//  IMPL-NEXT:  { {{.*}}Parallel{{.*}}, {{.*}}Parallel{{.*}}, {{.*}}Reduction{{.*}} }
+//
+//       IMPL:  matmul::referenceIndexingMaps() {
+//       IMPL:  AffineMap::get(3, 0, {d0, d2}),
+//  IMPL-NEXT:  AffineMap::get(3, 0, {d2, d1}),
+//  IMPL-NEXT:  AffineMap::get(3, 0, {d0, d1}) };
+//
+//       IMPL:  matmul::regionBuilder(ArrayRef<BlockArgument> args) {
+//       IMPL:  ValueHandle [[a:.*]](args[0]), [[b:.*]](args[1]), [[c:.*]](args[2]);
+//       IMPL:  ValueHandle [[d:.*]] = std_mulf([[a]], [[b]]);
+//       IMPL:  ValueHandle [[e:.*]] = std_addf([[c]], [[d]]);
+//       IMPL:  (linalg_yield(ValueRange{ [[e]] }));
+//
+def matmul(A: f32(M, K), B: f32(K, N)) -> (C: f32(M, N)) {
+  C(m, n) = std_addf<k>(std_mulf(A(m, k), B(k, n)));
+}
+
+// ODS-LABEL: def batchmatmulOp : LinalgNamedStructured_Op<"batchmatmul", [
+//  ODS-NEXT:   NInputs<2>,
+//  ODS-NEXT:   NOutputs<1>,
+//  ODS-NEXT:   NamedStructuredOpTraits]>
+//
+// IMPL-LABEL:  batchmatmul::referenceIterators() {
+//  IMPL-NEXT:  { {{.*}}Parallel{{.*}}, {{.*}}Parallel{{.*}}, {{.*}}Reduction{{.*}} }
+//
+//       IMPL:  batchmatmul::referenceIndexingMaps() {
+//       IMPL:  AffineMap::get(4, 0, {d0, d1, d3}),
+//  IMPL-NEXT:  AffineMap::get(4, 0, {d3, d2}),
+//  IMPL-NEXT:  AffineMap::get(4, 0, {d0, d1, d2}) };
+//
+//       IMPL:  batchmatmul::regionBuilder(ArrayRef<BlockArgument> args) {
+//       IMPL:  ValueHandle [[a:.*]](args[0]), [[b:.*]](args[1]), [[c:.*]](args[2]);
+//       IMPL:  ValueHandle [[d:.*]] = std_mulf([[a]], [[b]]);
+//       IMPL:  ValueHandle [[e:.*]] = std_addf([[c]], [[d]]);
+//       IMPL:  (linalg_yield(ValueRange{ [[e]] }));
+//
+//       TBLGEN: batchmatmulOp
+def batchmatmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
+  C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
+}
--- a/mlir/tools/CMakeLists.txt
+++ b/mlir/tools/CMakeLists.txt
@ -1,5 +1,6 @@
 add_subdirectory(mlir-cuda-runner)
 add_subdirectory(mlir-cpu-runner)
+add_subdirectory(mlir-linalg-ods-gen)
 add_subdirectory(mlir-opt)
 add_subdirectory(mlir-translate)
 add_subdirectory(mlir-vulkan-runner)
--- a/mlir/tools/mlir-linalg-ods-gen/CMakeLists.txt
+++ b/mlir/tools/mlir-linalg-ods-gen/CMakeLists.txt
@ -0,0 +1,10 @@
+add_llvm_tool(mlir-linalg-ods-gen
+  mlir-linalg-ods-gen.cpp
+)
+llvm_update_compile_flags(mlir-linalg-ods-gen)
+target_link_libraries(mlir-linalg-ods-gen PRIVATE
+  MLIRParser
+  MLIRSupport
+  LLVMCore
+  LLVMSupport
+  )
--- a/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp
+++ b/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-gen.cpp