[mlir] Add a document detailing the design of the SymbolTable.

Summary: This document provides insight on the rationale and the design of Symbols in MLIR, and why they are necessary. Differential Revision: https://reviews.llvm.org/D73590
2020-02-08 10:40:00 -08:00 · 2020-02-08 10:40:00 -08:00 · 20344d3704
parent eeb63944e4
commit 20344d3704
3 changed files with 222 additions and 9 deletions
--- a/mlir/docs/LangRef.md
+++ b/mlir/docs/LangRef.md
@ -1457,15 +1457,16 @@ This attribute can only be held internally by
 [array attributes](#array-attribute) and
 [dictionary attributes](#dictionary-attribute)(including the top-level operation
 attribute dictionary), i.e. no other attribute kinds such as Locations or
-extended attribute kinds. If a reference to a symbol is necessary from outside
-of the symbol table that the symbol is defined in, a
-[string attribute](#string-attribute) can be used to refer to the symbol name.
+extended attribute kinds.

 **Rationale:** Given that MLIR models global accesses with symbol references, to
 enable efficient multi-threading, it becomes difficult to effectively reason
 about their uses. By restricting the places that can legally hold a symbol
 reference, we can always opaquely reason about a symbols usage characteristics.

+See [`Symbols And SymbolTables`](SymbolsAndSymbolTables.md) for more
+information.
+
 #### Type Attribute

 Syntax:
--- a/mlir/docs/SymbolsAndSymbolTables.md
+++ b/mlir/docs/SymbolsAndSymbolTables.md
@ -0,0 +1,214 @@
+# Symbols and Symbol Tables
+
+[TOC]
+
+MLIR is a multi-level representation, with [Regions](LangRef.md#regions) the
+multi-level aspect is structural in the IR. A lot of infrastructure within the
+compiler is built around this nesting structure, including the processing of
+operations within the [pass manager](WritingAPass.md#pass-manager). One
+advantage of the MLIR design is that it is able to process operations in
+parallel, utilizing multiple threads. This is possible due to a property of the
+IR known as [`IsolatedFromAbove`](Traits.md#isolatedfromabove).
+
+Without this property, any operation could affect or mutate the use-list of
+operations defined above. Making this thread-safe requires expensive locking in
+some of the core IR data structures, which becomes quite inefficient. To enable
+multi-threaded compilation without this locking, MLIR uses local pools for
+constant values as well as `Symbol` accesses for global values and variables.
+This document details the design of `Symbol`s, what they are and how they fit
+into the system.
+
+The `Symbol` infrastructure essentially provides a non-SSA mechanism in which to
+refer to an operation symbolically with a name. This allows for referring to
+operations defined above regions that were defined as `IsolatedFromAbove` in a
+safe way. It also allows for symbolically referencing operations define below
+other regions as well.
+
+## Symbol
+
+A `Symbol` is a named operation that resides immediately within a region that
+defines a [`SymbolTable`](#symbol-table). The name of a symbol *must* be unique
+within the parent `SymbolTable`. This name is semantically similarly to an SSA
+result value, and may be referred to by other operations to provide a symbolic
+link, or use, to the symbol. An example of a `Symbol` operation is
+[`func`](LangRef.md#functions). `func` defines a symbol name, which is
+[referred to](#referencing-a-symbol) by operations like
+[`std.call`](Dialects/Standard.md#call).
+
+### Defining a Symbol
+
+A `Symbol` operation may use the `OpTrait::Symbol` trait, but have the following
+properties:
+
+*   A `StringAttr` attribute named
+    'SymbolTable::getSymbolAttrName()'(`sym_name`).
+    -   This attribute defines the symbolic 'name' of the operation.
+*   An optional `StringAttr` attribute named
+    'SymbolTable::getVisibilityAttrName()'(`sym_visibility`)
+    -   This attribute defines the [visibility](#symbol-visibility) of the
+        symbol, or more specifically in-which scopes it may be accessed.
+*   No SSA results
+    -   Intermixing the different ways to `use` an operation quickly becomes
+        unwieldy and difficult to analyze.
+
+## Symbol Table
+
+Described above are `Symbol`s, which reside within a region of an operation
+defining a `SymbolTable`. A `SymbolTable` operation provides the container for
+the [`Symbol`](#symbol) operations. It verifies that all `Symbol` operations
+have a unique name, and provides facilities for looking up symbols by name.
+Operations defining a `SymbolTable` may use the `OpTrait::SymbolTable` trait.
+
+### Referencing a Symbol
+
+`Symbol`s are referenced symbolically by name via the
+[`SymbolRefAttr`](LangRef.md#symbol-reference-attribute) attribute. A symbol
+reference attribute contains a named reference to an operation that is nested
+within a symbol table. It may optionally contain a set of nested references that
+further resolve to a symbol nested within a different symbol table. When
+resolving a nested reference, each non-leaf reference must refer to a symbol
+operation that is also a [symbol table](#symbol-table).
+
+Below is an example of how an operation may reference a symbol operation:
+
+```mlir
+// This `func` operation defines a symbol named `symbol`.
+func @symbol()
+
+// Our `foo.user` operation contains a SymbolRefAttr with the name of the
+// `symbol` func.
+"foo.user"() {uses = [@symbol]} : () -> ()
+
+// Symbol references resolve to the nearest parent operation that defines a
+// symbol table, so we can have references with arbitrary nesting levels.
+func @other_symbol() {
+  affine.for %i0 = 0 to 10 {
+    // Our `foo.user` operation resolves to the same `symbol` func as defined
+    // above.
+    "foo.user"() {uses = [@symbol]} : () -> ()
+  }
+  return
+}
+
+// Here we define a nested symbol table. References within this operation will
+// not resolve to any symbols defined above.
+module {
+  // Error. We resolve references with respect to the closest parent symbol
+  // table, so this reference can't be resolved.
+  "foo.user"() {uses = [@symbol]} : () -> ()
+}
+
+// Here we define another nested symbol table, except this time it also defines
+// a symbol.
+module @module_symbol {
+  // This `func` operation defines a symbol named `nested_symbol`.
+  func @nested_symbol()
+}
+
+// Our `foo.user` operation may refer to the nested symbol, by resolving through
+// the parent.
+"foo.user"() {uses = [@module_symbol::@symbol]} : () -> ()
+```
+
+Using an attribute, as opposed to an SSA value, has several benefits:
+
+*   References may appear in more places than the operand list; including
+    [nested attribute dictionaries](LangRef.md#dictionary-attribute),
+    [array attributes](LangRef.md#array-attribute), etc.
+
+*   Handling of SSA dominance remains unchanged.
+
+    -   If we were to use SSA values, we would need to create some mechanism in
+        which to opt-out of certain properties of it such as dominance.
+        Attributes allow for referencing the operations irregardless of the
+        order in which they were defined.
+    -   Attributes simplify referencing operations within nested symbol tables,
+        which are traditionally not visible outside of the parent region.
+
+The impact of this choice to use attributes as opposed to SSA values is that we
+now have two mechanisms with reference operations. This means that some dialects
+must either support both `SymbolRefs` and SSA value references, or provide
+operations that materialize SSA values from a symbol reference. Each has
+different trade offs depending on the situation. A function call may directly
+use a `SymbolRef` as the callee, whereas a reference to a global variable might
+use a materialization operation so that the variable can be used in other
+operations like `std.addi`.
+[`llvm.mlir.addressof`](Dialects/LLVM.md#llvmmliraddressof) is one example of
+such an operation.
+
+See the `LangRef` definition of the
+[`SymbolRefAttr`](LangRef.md#symbol-reference-attribute) for more information
+about the structure of this attribute.
+
+### Manipulating a Symbol
+
+As described above, `SymbolRefs` act as an auxiliary way of defining uses of
+operations to the traditional SSA use-list. As such, it is imperative to provide
+similar functionality to manipulate and inspect the list of uses and the users.
+The following are a few of the utilities provided by the `SymbolTable`:
+
+*   `SymbolTable::getSymbolUses`
+
+    -   Access an iterator range over all of the uses on and nested within a
+        particular operation.
+
+*   `SymbolTable::symbolKnownUseEmpty`
+
+    -   Check if a particular symbol is known to be unused within a specific
+        section of the IR.
+
+*   `SymbolTable::replaceAllSymbolUses`
+
+    -   Replace all of the uses of one symbol with a new one within a specific
+        section of the IR.
+
+*   `SymbolTable::lookupNearestSymbolFrom`
+
+    -   Lookup the definition of a symbol in the nearest symbol table from some
+        anchor operation.
+
+## Symbol Visibility
+
+Along with a name, a `Symbol` also has a `visibility` attached to it. The
+`visibility` of a symbol defines its structural reachability within the IR. A
+symbol may have one of the following visibilities:
+
+*   Public
+
+    -   The symbol may be referenced from outside of the visible IR. We cannot
+        assume that all of the uses of this symbol are observable.
+
+*   Private
+
+    -   The symbol may only be referenced from within the current symbol table.
+
+*   Nested
+
+    -   The symbol may be referenced by operations outside of the current symbol
+        table, but not outside of the visible IR, as long as each symbol table
+        parent also defines a non-private symbol.
+
+A few examples of what this looks like in the IR are shown below:
+
+```mlir
+module @public_module {
+  // This function can be accessed by 'live.user', but cannot be referenced
+  // externally; all uses are known to reside within parent regions.
+  func @nested_function() attributes { sym_visibility = "nested" }
+
+  // This function cannot be accessed outside of 'public_module'
+  func @private_function() attributes { sym_visibility = "private" }
+}
+
+// This function can only be accessed from within the top-level module
+func @private_function() attributes { sym_visibility = "private" }
+
+// This function may be referenced externally
+func @public_function()
+
+"live.user"() {uses = [
+  @public_module::@nested_function,
+  @private_function,
+  @public_function
+]} : () -> ()
+```
--- a/mlir/docs/Traits.md
+++ b/mlir/docs/Traits.md
@ -226,17 +226,15 @@ single block that must terminate with `TerminatorOpType`.

 *   `OpTrait::Symbol` -- `Symbol`

-This trait is used for operations that define a `Symbol`.
-
-TODO(riverriddle) Link to the proper document detailing the design of symbols.
+This trait is used for operations that define a
+[`Symbol`](SymbolsAndSymbolTables.md#symbol).

 ### SymbolTable

 *   `OpTrait::SymbolTable` -- `SymbolTable`

-This trait is used for operations that define a `SymbolTable`.
-
-TODO(riverriddle) Link to the proper document detailing the design of symbols.
+This trait is used for operations that define a
+[`SymbolTable`](SymbolsAndSymbolTables.md#symbol-table).

 ### Terminator