llvm-project/mlir/docs/Rationale/MLIRForGraphAlgorithms.md

404 lines
22 KiB
Markdown

# MLIR: Incremental Application to Graph Algorithms in ML Frameworks
The existing documentation about MLIR focuses on long term vision, how its
pieces fit together, and the benefits of modular and composable infrastructure
in the vast and distant future. While this viewpoint appeals to some, it causes
concern for others who are more concerned about the "here and now" - why does it
make sense to make a "revolutionary" change when any individual problem can be
fixed in place?
This document explains that adoption of MLIR to solve graph based problems
_isn't_ a revolutionary change: it is an incremental series of steps which build
on each other, each of which delivers local value. This document also addresses
some points of confusion that keep coming up.
One note: even though a major advantage of MLIR is that it can span the full
spectrum from graph algorithms down to low-level code generation, this document
focuses on the use of MLIR for **graph-level algorithms**. MLIR will also unlock
exciting code generation opportunities (particularly given its novel approach to
integrating state of the art polyhedral techniques), but issues that touch on
MLIR's relationship to XLA, Eigen, etc, are out of scope for this particular
doc.
This document uses TensorFlow as the example given that it is the focus of our
immediate work, but we believe that the same viewpoint could be useful for
people working in the context of other ML frameworks that may consider adopting
MLIR in the future.
### How is MLIR relevant?
MLIR is an overloaded acronym which unpacks as "Multi-Level Intermediate
Representation". Its high-level purpose is to provide mechanics for describing
and transforming programs and computations in a flexible way. It provides common
compiler infrastructure for things like constant folding, dead code elimination,
graph rewriting, and others - which are independent of the representational
choices picked by a given dialect (e.g. its concurrency semantics). It was built
with a specific focus on compile time and memory efficiency, accurate
propagation of source location information (important for reporting high quality
errors and warnings) and is designed for testability.
TensorFlow has numerous subsystems (some of which are proprietary, e.g.
Tensor-RT, nGraph, CoreML, etc) as well as translation layers between these
different subsystems, and these translation layers face similar challenges. ((As
an aside, the internals of each of these subsystems could often benefit from
MLIR infrastructure, but that isn't a focus of this doc.))
A key observation that MLIR makes is that these subsystems often have two things
going on: they are both particular data structures and encodings (e.g. HLO
graphs, TF-Lite's flat buffer format, TensorFlow's Graph format, the ONNX
abstraction, etc) as well as an abstraction of computation (a specific way of
modeling a convolution, a set of supported operations etc).
MLIR uses a standard IR (i.e., a set of data structures) for representing these
computations - this allows a huge amount of shared infrastructure across these
problem domains. MLIR then allows the definition of domain-specific "dialects"
that describe the set of operations that are legal and supported for a given
application. This means that the actual translations between data structures are
kept as simple as possible - and are thus relatively easy to make "correct".
This allows the common compiler infrastructure to handle the mapping problems
and the other issues within the domain.
MLIR's design is directly informed by the experience of building (and then
living with) intermediate representations like the LLVM IR, LLVM SelectionDAG,
the LLVM machine instruction representation, Swift SIL IR, and learns new
lessons from TensorFlow and XLA HLO, as well as learning from building countless
research and production systems on top of them. Our goal is to drag the state of
the art in compilers forward, not to merely apply a few well-known techniques to
the machine learning domain.
### What does adoption mean?
The point of this document is not to advocate for rewriting any particular
subsystem in TensorFlow - indeed, the burden required to justify a rewrite is
high, and often very specific to that subsystem. That said, there are several
subsystems that are about to get rewritten or substantially revised anyway, so
we use those as examples to concretely describe the benefits that MLIR provides
in these cases and what it will take. The subsystems discussed are:
1. the TF Lite TOCO translator, which we need to improve error
reporting/reliability issues and generalize it to support more ops, and
1. the TF/XLA bridge which needs to improve usability by merging some of its
usage models, support dynamic shapes and generalize guest subsystem support
to Tensor-RT and nGraph.
1. Grappler is another subsystem that is likely to get substantial revisions in
the future, and would definitely benefit from the MLIR framework, but there
are no known plans to do that work at this point, so we don't discuss it
further.
Adopting MLIR for these works the same way - and, in fact, the work to support
TF Lite is mostly a subset of the larger work to support the functionality of
the TF/XLA bridge. TF Lite and the TF/XLA bridge include several compiler passes
(things like encapsulate, functionalize control flow, lowering of ops, fusion,
constant folding, shape inference, etc).
MLIR supports converting from TensorFlow Graphs to MLIR and back, which means
that we can start by putting in a no-op translation to MLIR and back into the
pipeline, and verify that nothing breaks. Then we can work on replacing the
compiler transformations one by one by reimplementing them (with the improved
algorithms that we're planning).
This is a development plan, we wouldn't actually ship a TensorFlow that just
uses MLIR for a single pass. In practice, we'll have the MLIR flag gated under
an option, build out a replacement for an entire subsystem (e.g. the TOCO
translator) and when the time is right, we'll do A/B comparisons and eventually
make a switch and phase out the old code over time.
## What benefit does MLIR provide?
The adoption plan above might sound like it only makes things worse in the
immediate term - we have two implementations of the same functionality, we are
dividing our efforts, etc. In order for this to be worth it, we should have a
good sense that we are building towards an improved future that will make
customers and TensorFlow engineers happier when it lands. Here we describe a few
of the benefits that MLIR provides, in no particular order:
### A Lossless Human Editable Textual Representation
The MLIR in-memory data structure has a human readable and writable format, as
well as [a specification](LangRef.md) for that format - built just like any
other programming language. Important properties of this format are that it is
compact, easy to read, and lossless. You can dump an MLIR program out to disk
and munge around with it, then send it through a few more passes.
If you haven't worked with a system that works this way, it is hard to overstate
how big of a deal this in practice: it means that you can call `foo->dump()` on
an IR object to see its full contents, it means you can diff the IR before and
after a change, delta reduce IR files, and many other things.
### A Graph Verification Pass
Like many other popular compiler infrastructures, MLIR provides infrastructure
and implementation for a "verifier" which checks that the IR is well formed. The
MLIR verifier is a simple framework that makes it easy to provide a single
source of truth for those correctness properties and is general across all
Dialects (e.g. TF Graph, TF Lite flat buffer, XLA HLO, etc).
A verifier pass is sort of like a 'super assertion' that catches mistakes in
program transformations early, making you as an engineer more productive, making
the product more reliable, and making it easier to track down bugs when they
appear - because the verifier can be run at any time, either as a compiler pass
or with a single function call.
While MLIR provides a well-considered infrastructure for IR verification, and
has simple checks for existing TensorFlow operations, there is a lot that should
be added here and lots of opportunity to get involved!
### Designed for Testability
There are many aspects of this in MLIR, but we'll focus on compiler
transformations since they are the easiest to understand. Compiler
transformations are modeled as subclasses of the `Pass` C++ class, which are
driven by an `mlir-opt` tool. When combined with a lossless textual
representation, it becomes really easy to write unit tests for compiler
transformations, for example, this is a simple test that shows "x-x" is being
turned into zero:
```mlir
// RUN: mlir-opt %s -canonicalize | FileCheck %s
func @test_subi_zero_cfg(%arg0: i32) -> i32 {
%y = subi %arg0, %arg0 : i32
return %y: i32
}
// CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32)
// CHECK-NEXT: %c0_i32 = constant 0 : i32
// CHECK-NEXT: return %c0
```
The "CHECK" comments are interpreted by the
[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which
is sort of like a really advanced grep. This test is fully self-contained: it
feeds the input into the [canonicalize pass](Canonicalization.md), and checks
that the output matches the CHECK lines. See the `test/Transforms` directory for
more examples. In contrast, standard unit testing exposes the API of the
underlying framework to lots and lots of tests (making it harder to refactor and
move the API), typically requires a lot more code, and exacerbates issues with
link time. For examples, see
[the TEST_F functions in TensorFlow's testsuite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/arithmetic_optimizer_test.cc).
MLIR has been pervasively designed with this sort of design by testability,
allowing us to put in place a culture that expects every behavior changing
commit to include a test case, and for these test cases to be stable and
reliable over time, since they are testing exactly what they are supposed to.
End to end integration tests are still super useful for some things of course!
### Infrastructure for Warnings and Error Diagnostics and Location Tracking
MLIR benefits from the lessons learned from building other compilers - including
Clang which
[[set the standard](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)
for quality of implementation in C/C++ compiler diagnostics. Drawing from this
experience (and fixing mistakes in LLVM), MLIR requires that operations and
functions carry abstract location information, that transformations propagate
this information, and provides standardized mechanisms to emit errors and
warnings, as well as for clients to hook into them to capture and report them in
custom ways.
Why is this important? In practice, many graph-to-graph translators can fail
(e.g. TF Lite when an unsupported op is used) and it is important to be able to
report the error up through to the user in the most precise way possible, in
order for it to be actionable. This includes tracking rewrites through fusions
and fissions of ops, mapping back into language / API specific domains, etc.
More selfishly for infrastructure hackers, this is a huge boon because it means
that it is easy to write good tests for this: the testing tools for MLIR capture
the diagnostics produced by passes (using the standard diagnostic hooks) and
check that they match the expected diagnostics in the testcase. For example, to
test the dependence analysis infra in the code generator, Andy Davis wrote a
simple pass that checks dependencies and emits them as "notes", allowing him to
write tests like this:
```mlir
// RUN: mlir-opt %s -memref-dependence-check -verify-diagnostics
func @different_memrefs() {
%m.a = alloc() : memref<100xf32>
%m.b = alloc() : memref<100xf32>
%c0 = constant 0 : index
%c1 = constant 1.0 : f32
store %c1, %m.a[%c0] : memref<100xf32>
// expected-note@-1 {{dependence from memref access 0 to access 1 = false}}
%v0 = load %m.b[%c0] : memref<100xf32>
return
}
```
Note that a major limitation of this is that MLIR suffers from a problem of
"garbage in, garbage out": if the input locations to MLIR are imprecise, then
there is nothing that it can do to recover them. There is work underway in
TensorFlow/Python to improve the situation, and Swift for TensorFlow already has
perfect location tracking due to its design.
### Shape Information Captured in the IR
In TensorFlow Graphs, each op takes and returns values using a very simple type
system (TF_DataType) in which each value is a tensor of unknown rank and
dimensions. At the same time, many graphs have static shapes easily knowable for
wide swaths of the computation, and even dynamically shaped operations often
have statically knowable dimensions. Many analyses and transformations benefit
and use this information when available, but because TensorFlow graphs don't
capture this (e.g. serialize it to proto), passes have to recompute it on demand
with ShapeRefiner.
The [MLIR Tensor Type](LangRef.md#tensor-type) directly captures shape
information, so you can have things like:
```mlir
%x = tf.Add %x, %y : tensor<128 x 8 x ? x f32>
```
Capturing this in the IR is expected to speed up transformations (avoiding
recomputing the same info over and over again) which therefore makes it
practical to apply stronger shape analysis algorithms. It also makes it easier
to work with the IR, because on-the-side representations can get out of date,
and the API is easier to work with from an ergonomics perspective.
### Unified Graph Rewriting Infrastructure
This is still a work in progress, but we have sightlines towards a
[general rewriting infrastructure](GenericDAGRewriter.md) for transforming DAG
tiles into other DAG tiles, using a declarative pattern format. DAG to DAG
rewriting is a generalized solution for many common compiler optimizations,
lowerings, and other rewrites and having an IR enables us to invest in building
a single high-quality implementation.
Declarative pattern rules are preferable to imperative C++ code for a number of
reasons: they are more compact, easier to reason about, can have checkers
written against them, and new tools can be built that inspect and manipulate the
declarative patterns in interesting ways - e.g. applying theorem provers to
them. It will be exciting to see this ecosystem develop as the infrastructure
matures.
### Clarified Semantics for TensorFlow Operations
One of the challenging things about working with TensorFlow is that there are
many invariants and behaviors that need to be preserved and known about when
working with Graphs, and these can be difficult to reason about and lead to
bugs. Things like 'dead values', Switch and Merge nodes, concurrency semantics,
nodes that execute even when passed a dead value, multiple device program
representation - etc... all add complexities that can make it challenging to
reason about whether a transformation or analysis is correct in general. Even
something as simple as constant folding or transforming integer `x-x` into `0`
is non-trivial because you need to consider control dependence edges.
One of our major goals for the TensorFlow dialect of MLIR is to sort out these
situations and upgrade existing TensorFlow graphs to semantics that are easier
to reason about. The solutions to these problems are all still being debated,
but those discussions have already yielded a lot of potential answers:
introducing a `tf_dead_or<x>` types for switch/merge, modeling of TF operations
using futures/async semantics etc. None of these particular battles are critical
or important for MLIR to succeed (because of its "meta" nature, the abstraction
decisions of any given dialect are up for it to decide), but each one that works
out will make it easier to work with and transform TensorFlow operations. We
expect these issues to get nailed down in the next couple of months when MLIR
effort moves beyond TF Lite / TOCO support. The discussions that are happening
now are super valuable and making progress.
### Ergonomics
A minor-in-theory, but important-in-practice point is that MLIR is designed to
make it easy, memory efficient, and less error prone to transform code than
other systems. `TensorFlow::Graph` has implementation issues where the same
information is stored redundantly in different places (which must be manually
kept up to date), has somewhat unusual representation of certain constructs
(e.g. the function library, which makes it very difficult to add or remove
functions, e.g. during interprocedural transformations), and stores information
in the graph that is used by the executor, but isn't necessary for program
transformation.
TensorFlow has made a lot of progress in this area over the years, and there are
lots of ideas about further improvements in the future, we are happy that MLIR
addresses these needs (making it much easier to implement correct program
transformations) today, and are committed to pushing hard to make it better.
### Compile Time Performance and Memory Use
MLIR has been designed to be memory and compile-time efficient in its algorithms
and data structures, using immutable and uniqued structures, low level
bit-packing, and other well-known techniques to avoid unnecessary heap
allocations, and allow simple and safe multithreaded optimization of MLIR
programs. There are other reasons to believe that the MLIR implementations of
common transformations will be more efficient than the Python and C++
TensorFlow::Graph implementations of the same things, given the current
implementation details of TensorFlow.
That said, this is very much a theory at this point. When the new implementation
of various subsystems are available, we will see what happens in practice: there
will be no reason to speculate - we can measure.
## Common Questions and Concerns
Here we address some frequently asked questions and concerns.
### Isn't MLIR a big dependency to take on?
We've heard that at least some people are concerned that MLIR is a "big"
dependency to take on, and could result in large code size. Here are some key
points MLIR:
1. The entire MLIR codebase is a pretty small C++ code base in absolute terms
compared to what goes into a modern ML framework.
1. Like LLVM, MLIR is designed as a set of libraries that clients can link in
or ignore as they wish. For example, the transformations in MLIR kept
separate from the core IR abstractions, and dialect specific code (e.g.
TensorFlow, TF-Lite, XLA, etc) is all independently selectable by the build
system. Clients that don't care about XLA don't link in that code, whether
they are a TF-Lite system or a client that is completely unrelated to
TensorFlow.
1. MLIR's only third party dependency is on LLVM, but it doesn't depend on LLVM
IR or any other heavy dependency - it just depends on LLVM's support library
which provides efficient hash tables and other
[memory efficient data structures that the STL does not](http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task).
There have been discussions about splitting this set of libraries out to its
own subproject in LLVM that the LLVM IR project depends on. This would be
great for MLIR as well as other LLVM subprojects.
1. TensorFlow and many other frameworks already use LLVM - if so, MLIR would
not be pulling in an additional dependency at all.
### How does MLIR represent {control flow, concurrency, …} semantics in TensorFlow?
MLIR provides a dialect that is an isomorphic 1-1 mapping between TensorFlow
graphs and MLIR, as well as a pretty complete translator back and forth (the
only known gap is that a few TF_DataType enums aren't handled yet). MLIR is a
"Multi-Level IR", which allows it to represent code with different abstraction
levels, so the ability to faithfully represent TensorFlow code in a completely
backwards compatible way (even if there are some historical warts!) is critical.
In *addition* to the isomorphic mapping, we are actively working on efforts to
raise the abstraction level for working with TensorFlow graphs in MLIR. Doing so
would make it even easier to write TensorFlow transformations than it is today,
and would provide a path to migrating TF 1.x graphs forward into the TF 2.x
world. For example, because MLIR has an extensible type system, we can directly
model whether it is impossible for a Tensor value to be a "dead" value - similar
to the use of optional types in modern programming languages.
These discussions occasionally cause confusion because there are several issues
being mixed up into one:
* What are the current semantics of TensorFlow graphs, and what invariants can
we rely on?
* What should the semantics be in TensorFlow 2.0?
* What do programs rely on in practice, and if it is unfriendly, can we
migrate it?
* Can we find a way to make it so transforms don't have to worry about the
complexities of Switch/Merge, by using higher level control flow
representations? (tentative answer: yes)
* How should MLIR represent async vs sync operations, what invariants are
provided, how does this dovetail with control flow?
* When is it safe and beneficial to perform optimizations that might reduce
parallelism?
All of these questions have a "conservative/safe fallback": we can continue
providing exactly the same abstractions that TensorFlow always has. That said,
we are trying hard to level-up the representation (taking advantage of the
"Multi-Level" part of MLIR) because doing so will make it much much easier to
write analyses and transformations than it currently is in TensorFlow.
### Non Goals
It is important to point out things that MLIR does not aim to do. For example,
there is no runtime component to MLIR: the TensorFlow executor, the TF Lite
FlatBuffer interpreter, or other existing runtime should be used as-is.
Another non-goal is that MLIR currently doesn't support a stable binary
encoding. We will certainly add this at some point, but existing formats should
be used for serialization and distribution in the meantime.