[mlir:docs] Add proper documentation for defining dialects

We don't actually have any documentation today for how to declaratively define a dialect. This commit rectifies that and properly documents how to define a Dialect in tablegen, and details all of the possible fields. Differential Revision: https://reviews.llvm.org/D123258
2022-04-06 14:44:15 -07:00 · 2022-04-06 14:44:15 -07:00 · 73c4f9d4d3
parent a19fe7b640
commit 73c4f9d4d3
2 changed files with 335 additions and 2 deletions
--- a/mlir/docs/Canonicalization.md
+++ b/mlir/docs/Canonicalization.md
@ -200,11 +200,11 @@ implement the `materializeConstant` hook. This hook takes in an `Attribute`
 value, generally returned by `fold`, and produces a "constant-like" operation
 that materializes that value.

-In [ODS](OpDefinitions.md), a dialect can set the `hasConstantMaterializer` bit
+In [ODS](DefiningDialects.md), a dialect can set the `hasConstantMaterializer` bit
 to generate a declaration for the `materializeConstant` method.

 ```tablegen
-def MyDialect_Dialect : ... {
+def MyDialect : ... {
  let hasConstantMaterializer = 1;
 }
 ```
--- a/mlir/docs/DefiningDialects.md
+++ b/mlir/docs/DefiningDialects.md
@ -0,0 +1,333 @@
+# Defining Dialects
+
+This document describes how to define [Dialects](LangRef.md/#dialects).
+
+[TOC]
+
+## LangRef Refresher
+
+Before diving into how to define these constructs, below is a quick refresher
+from the [MLIR LangRef](LangRef.md).
+
+Dialects are the mechanism by which to engage with and extend the MLIR
+ecosystem. They allow for defining new [attributes](LangRef.md#attributes),
+[operations](LangRef.md#operations), and [types](LangRef.md#type-system).
+Dialects are used to model a variety of different abstractions; from traditional
+[arithmetic](Dialects/ArithmeticOps.md) to
+[pattern rewrites](Dialects/PDLOps.md); and is one of the most fundamental
+aspects of MLIR.
+
+## Defining a Dialect
+
+At the most fundamental level, defining a dialect in MLIR is as simple as
+specializing the
+[C++ `Dialect` class](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/IR/Dialect.h).
+That being said, MLIR provides a powerful declaratively specification mechanism via
+[TableGen](https://llvm.org/docs/TableGen/index.html); a generic language with
+tooling to maintain records of domain-specific information; that simplifies the
+definition process by automatically generating all of the necessary boilerplate
+C++ code, significantly reduces maintainence burden when changing aspects of dialect
+definitions, and also provides additional tools on top (such as
+documentation generation). Given the above, the declarative specification is the
+expected mechanism for defining new dialects, and is the method detailed within
+this document. Before continuing, it is highly recommended that users review the
+[TableGen Programmer's Reference](https://llvm.org/docs/TableGen/ProgRef.html)
+for an introduction to its syntax and constructs.
+
+Below showcases an example simple Dialect definition. We generally recommend defining
+the Dialect class in a different `.td` file from the attributes, operations, types,
+and other sub-components of the dialect to establish a proper layering between
+the various different dialect components. It also prevents situations where you may
+inadvertantly generate multiple definitions for some constructs. This recommendation
+extends to all of the MLIR constructs, including [Interfaces](Interfaces.md) for example.
+
+```tablegen
+// Include the definition of the necessary tablegen constructs for defining
+// our dialect. 
+include "mlir/IR/DialectBase.td"
+
+// Here is a simple definition of a dialect.
+def MyDialect : Dialect {
+  let summary = "A short one line description of my dialect.";
+  let description = [{
+    My dialect is a very important dialect. This section contains a much more
+    detailed description that documents all of the important pieces of information
+    to know about the document.
+  }];
+
+  /// This is the namespace of the dialect. It is used to encapsulate the sub-components
+  /// of the dialect, such as operations ("my_dialect.foo").
+  let name = "my_dialect";
+
+  /// The C++ namespace that the dialect, and its sub-components, get placed in.
+  let cppNamespace = "::my_dialect";
+}
+```
+
+The above showcases a very simple description of a dialect, but dialects have lots
+of other capabilities that you may or may not need to utilize.
+
+### Initialization
+
+Every dialect must implement an initialization hook to add attributes, operations, types,
+attach any desired interfaces, or perform any other necessary initialization for the
+dialect that should happen on construction. This hook is declared for every dialect to
+define, and has the form:
+
+```c++
+void MyDialect::initialize() {
+  // Dialect initialization logic should be defined in here.
+}
+```
+
+### Documentation
+
+The `summary` and `description` fields allow for providing user documentation
+for the dialect. The `summary` field expects a simple single-line string, with the
+`description` field used for long and extensive documentation. This documentation can be 
+used to generate markdown documentation for the dialect and is used by upstream
+[MLIR dialects](https://mlir.llvm.org/docs/Dialects/).
+
+### Class Name
+
+The name of the C++ class which gets generated is the same as the name of our TableGen
+dialect definition, but with any `_` characters stripped out. This means that if you name
+your dialect `Foo_Dialect`, the generated C++ class would be `FooDialect`. In the example
+above, we would get a C++ dialect named `MyDialect`.
+
+### C++ Namespace
+
+The namespace that the C++ class for our dialect, and all of its sub-components, is placed
+under is specified by the `cppNamespace` field. By default, uses the name of the dialect as
+the only namespace. To avoid placing in any namespace, use `""`. To specify nested namespaces,
+use `"::"` as the delimiter between namespace, e.g., given `"A::B"`, C++ classes will be placed
+within: `namespace A { namespace B { <classes> } }`.
+
+Note that this works in conjunction with the dialect's C++ code. Depending on how the generated files
+are included, you may want to specify a full namespace path or a partial one. In general, it's best
+to use full namespaces whenever you can. This makes it easier for dialects within different namespaces,
+and projects, to interact with each other.
+
+### Dependent Dialects
+
+MLIR has a very large ecosystem, and contains dialects that server many different purposes. It
+is quite common, given the above, that dialects may want to reuse certain components from other
+dialects. This may mean generating operations from those dialects during canonicalization, reusing
+attributes or types, etc. When a dialect has a dependency on another, i.e. when it constructs and/or
+generally relies on the components of another dialect, a dialect dependency should be explicitly
+recorded. An explicitly dependency ensures that dependent dialects are loaded alongside the
+dialect. Dialect dependencies can be recorded using the `dependentDialects` dialects field:
+
+```tablegen
+def MyDialect : Dialect {
+  // Here we register the Arithmetic and Func dialect as dependencies of our `MyDialect`.
+  let dependentDialects = [
+    "arith::ArithmeticDialect",
+    "func::FuncDialect"
+  ];
+}
+```
+
+### Extra declarations
+
+The declarative Dialect definitions try to auto-generate as much logic and methods
+as possible. With that said, there will always be long-tail cases that won't be covered.
+For such cases, `extraClassDeclaration` can be used. Code within the `extraClassDeclaration`
+field will be copied literally to the generated C++ Dialect class.
+
+Note that `extraClassDeclaration` is a mechanism intended for long-tail cases by
+power users; for not-yet-implemented widely-applicable cases, improving the
+infrastructure is preferable.
+
+### `hasConstantMaterializer`: Materializing Constants from Attributes
+
+This field is utilized to materialize a constant operation from an `Attribute` value and
+a `Type`. This is generally used when an operation within this dialect has been folded,
+and a constant operation should be generated. `hasConstantMaterializer` is used to enable
+materialization, and the `materializeConstant` hook is declared on the dialect. This
+hook takes in an `Attribute` value, generally returned by `fold`, and produces a
+"constant-like" operation that materializes that value. See the
+[documentation for canonicalization](Canonicalization.md) for a more in-depth
+introduction to `folding` in MLIR.
+
+Constant materialization logic can then be defined in the source file:
+
+```c++
+/// Hook to materialize a single constant operation from a given attribute value
+/// with the desired resultant type. This method should use the provided builder
+/// to create the operation without changing the insertion position. The
+/// generated operation is expected to be constant-like. On success, this hook
+/// should return the operation generated to represent the constant value.
+/// Otherwise, it should return nullptr on failure.
+Operation *MyDialect::materializeConstant(OpBuilder &builder, Attribute value,
+                                          Type type, Location loc) {
+  ...
+}
+```
+
+### `hasNonDefaultDestructor`: Providing a custom destructor
+
+This field should be used when the Dialect class has a custom destructor, i.e.
+when the dialect has some special logic to be run in the `~MyDialect`. In this case,
+only the declaration of the destructor is generated for the Dialect class.
+
+### Discardable Attribute Verification
+
+As described by the [MLIR Language Reference](LangRef.md#attributes),
+*discardable attribute* are a type of attribute that has its semantics defined
+by the dialect whose name prefixes that of the attribute. For example, if an
+operation has an attribute named `gpu.contained_module`, the `gpu` dialect
+defines the semantics and invariants, such as when and where it is valid to use,
+of that attribute. To hook into this verification for attributes that are prefixed
+by our dialect, several hooks on the Dialect may be used:
+
+#### `hasOperationAttrVerify`
+
+This field generates the hook for verifying when a discardable attribute of this dialect
+has been used within the attribute dictionary of an operation. This hook has the form:
+
+```c++
+/// Verify the use of the given attribute, whose name is prefixed by the namespace of this
+/// dialect, that was used in `op`s dictionary.
+LogicalResult MyDialect::verifyOperationAttribute(Operation *op, NamedAttribute attribute);
+```
+
+#### `hasRegionArgAttrVerify`
+
+This field generates the hook for verifying when a discardable attribute of this dialect
+has been used within the attribute dictionary of a region entry block argument. Note that
+the block arguments of a region entry block do not themselves have attribute dictionaries,
+but some operations may provide special dictionary attributes that correspond to the arguments
+of a region. For example, operations that implement `FunctionOpInterface` may have attribute
+dictionaries on the operation that correspond to the arguments of entry block of the function.
+In these cases, those operations will invoke this hook on the dialect to ensure the attribute
+is verified. The hook necessary for the dialect to implement has the form:
+
+```c++
+/// Verify the use of the given attribute, whose name is prefixed by the namespace of this
+/// dialect, that was used on the attribute dictionary of a region entry block argument.
+/// Note: As described above, when a region entry block has a dictionary is up to the individual
+/// operation to define. 
+LogicalResult MyDialect::verifyRegionArgAttribute(Operation *op, unsigned regionIndex,
+                                                  unsigned argIndex, NamedAttribute attribute);
+```
+
+#### `hasRegionResultAttrVerify`
+
+This field generates the hook for verifying when a discardable attribute of this dialect
+has been used within the attribute dictionary of a region result. Note that the results of a
+region do not themselves have attribute dictionaries, but some operations may provide special
+dictionary attributes that correspond to the results of a region. For example, operations that
+implement `FunctionOpInterface` may have attribute dictionaries on the operation that correspond
+to the results of the function. In these cases, those operations will invoke this hook on the
+dialect to ensure the attribute is verified. The hook necessary for the dialect to implement
+has the form:
+
+```c++
+/// Generate verification for the given attribute, whose name is prefixed by the namespace
+/// of this dialect, that was used on the attribute dictionary of a region result.
+/// Note: As described above, when a region entry block has a dictionary is up to the individual
+/// operation to define. 
+LogicalResult MyDialect::verifyRegionResultAttribute(Operation *op, unsigned regionIndex,
+                                                     unsigned argIndex, NamedAttribute attribute);
+```
+
+### Operation Interface Fallback
+
+Some dialects have an open ecosystem and don't register all of the possible operations. In such
+cases it is still possible to provide support for implementing an `OpInterface` for these 
+operations. When an operation isn't registered or does not provide an implementation for an 
+interface, the query will fallback to the dialect itself. The `hasOperationInterfaceFallback`
+field may be used to declare this fallback for operations:
+
+```c++
+/// Return an interface model for the interface with the given `typeId` for the operation
+/// with the given name.
+void *MyDialect::getRegisteredInterfaceForOp(TypeID typeID, StringAttr opName);
+```
+
+For a more detail description of the expected usages of this hook, view the detailed 
+[interface documentation](Interfaces.md#dialect-fallback-for-opinterface).
+
+### Default Attribute/Type Parsers and Printers 
+
+When a dialect registers an Attribute or Type, it must also override the respective
+`Dialect::parseAttribute`/`Dialect::printAttribute` or
+`Dialect::parseType`/`Dialect::printType` methods. In these cases, the dialect must
+explicitly handle the parsing and printing of each individual attribute or type within
+the dialect. If all of the attributes and types of the dialect provide a mnemonic,
+however, these methods may be autogenerated by using the
+`useDefaultAttributePrinterParser` and `useDefaultTypePrinterParser` fields. By default,
+these fields are set to `1`(enabled), meaning that if a dialect needs to explicitly handle the
+parser and printer of its Attributes and Types it should set these to `0` as necessary.
+
+### Dialect-wide Canonicalization Patterns
+
+Generally, [canonicalization](Canonicalization.md) patterns are specific to individual 
+operations within a dialect. There are some cases, however, that prompt canonicalization
+patterns to be added to the dialect-level. For example, if a dialect defines a canonicalization
+pattern that operates on an interface or trait, it can be beneficial to only add this pattern
+once, instead of duplicating per-operation that implements that interface. To enable the
+generation of this hook, the `hasCanonicalizer` field may be used. This will declare
+the `getCanonicalizationPatterns` method on the dialect, which has the form:
+
+```c++
+/// Return the canonicalization patterns for this dialect:
+void MyDialect::getCanonicalizationPatterns(RewritePatternSet &results) const;
+```
+
+See the documentation for [Canonicalization in MLIR](Canonicalization.md) for a much more 
+detailed description about canonicalization patterns.
+
+### C++ Accessor Prefix
+
+Historically, MLIR has generated accessors for operation components (such as attribute, operands, 
+results) using the tablegen definition name verbatim. This means that if an operation was defined
+as:
+
+```tablegen
+def MyOp : MyDialect<"op"> {
+  let arguments = (ins StrAttr:$value, StrAttr:$other_value);
+}
+```
+
+It would have accessors generated for the `value` and `other_value` attributes as follows:
+
+```c++
+StringAttr MyOp::value();
+void MyOp::value(StringAttr newValue);
+
+StringAttr MyOp::other_value();
+void MyOp::other_value(StringAttr newValue);
+```
+
+Since then, we have decided to move accessors over to a style that matches the rest of the
+code base. More specifically, this means that we prefix accessors with `get` and `set`
+respectively, and transform `snake_style` names to camel case (`UpperCamel` when prefixed,
+and `lowerCamel` for individual variable names). If we look at the same example as above, this
+would produce:
+
+```c++
+StringAttr MyOp::getValue();
+void MyOp::setValue(StringAttr newValue);
+
+StringAttr MyOp::getOtherValue();
+void MyOp::setOtherValue(StringAttr newValue);
+```
+
+The form in which accessors are generated is controlled by the `emitAccessorPrefix` field.
+This field may any of the following values:
+
+* `kEmitAccessorPrefix_Raw`
+  - Don't emit any `get`/`set` prefix.
+
+* `kEmitAccessorPrefix_Prefixed`
+  - Only emit with `get`/`set` prefix.
+
+* `kEmitAccessorPrefix_Both`
+  - Emit with **and** without prefix.
+
+All new dialects are strongly encouraged to use the `kEmitAccessorPrefix_Prefixed` value, as
+the `Raw` form is deprecated and in the process of being removed.
+
+Note: Remove this section when all dialects have been switched to the new accessor form.