Several things were suggested in post-submission reviews. In particular, use
pointers in function interfaces instead of references (still use references
internally). Clarify the behavior of the pass in presence of MLFunctions.
PiperOrigin-RevId: 222556851
This CL adds tooling for computing slices as an independent CL.
The first consumer of this analysis will be super-vector materialization in a
followup CL.
In particular, this adds:
1. a getForwardStaticSlice function with documentation, example and a
standalone unit test;
2. a getBackwardStaticSlice function with documentation, example and a
standalone unit test;
3. a getStaticSlice function with documentation, example and a standalone unit
test;
4. a topologicalSort function that is exercised through the getStaticSlice
unit test.
The getXXXStaticSlice functions take an additional root (resp. terminators)
parameter which acts as a boundary that the transitive propagation algorithm
is not allowed to cross.
PiperOrigin-RevId: 222446208
Not having self-contained headers in LLVM is a constant pain. Don't make the
same mistake in mlir. The only interesting change here is moving setSuccessor
to Instructions.cpp, which breaks the cycle between Instructions.h and
BasicBlock.h.
PiperOrigin-RevId: 222440816
cases.
- fix bug in calculating index expressions for DMA buffers in certain cases
(affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
additional operands to the index remap AffineMap can now be supplied by the
client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
::composeMap - fix position bug.
- Some clean up and more comments
PiperOrigin-RevId: 222434628
This function pass replaces affine_apply operations in CFG functions with
sequences of primitive arithmetic instructions that form the affine map.
The actual replacement functionality is located in LoweringUtils as a
standalone function operating on an individual affine_apply operation and
inserting the result at the location of the original operation. It is expected
to be useful for other, target-specific lowering passes that may start at
MLFunction level that Deaffinator does not support.
PiperOrigin-RevId: 222406692
Initial restricted implementaiton of the MLIR to LLVM IR translation.
Introduce a new flow into the mlir-translate tool taking an MLIR module
containing CFG functions only and producing and LLVM IR module. The MLIR
features supported by the translator are as follows:
- primitive and function types;
- integer constants;
- cfg and ext functions with 0 or 1 return values;
- calls to these functions;
- basic block conversion translation of arguments to phi nodes;
- conversion between arguments of the first basic block and function arguments;
- (conditional) branches;
- integer addition and comparison operations.
Are NOT supported:
- vector and tensor types and operations on them;
- memrefs and operations on them;
- allocations;
- functions returning multiple values;
- LLVM Module triple and data layout (index type is hardcoded to i64).
Create a new MLIR library and place it under lib/Target/LLVMIR. The "Target"
library group is similar to the one present in LLVM and is intended to contain
all future public MLIR translation targets.
The general flow of MLIR to LLVM IR convresion will include several lowering
and simplification passes on the MLIR itself in order to make the translation
as simple as possible. In particular, ML functions should be transformed to
CFG functions by the recently introduced pass, operations on structured types
will be converted to sequences of operations on primitive types, complex
operations such as affine_apply will be converted into sequence of primitive
operations, primitive operations themselves may eventually be converted to an
LLVM dialect that uses LLVM-like operations.
Introduce the first translation test so that further changes make sure the
basic translation functionality is not broken.
PiperOrigin-RevId: 222400112
This has been a long-standing TODO in the build system. Now that we need to
share the non-inlined implementation of file utilities for translators, create
a separate library for support functionality. Move Support/* headers to the
new library in the build system.
PiperOrigin-RevId: 222398880
Translations performed by mlir-translate only have MLIR on one end.
MLIR-to-MLIR conversions (including dialect changes) should be treated as
passes and run by mlir-opt. Individual translations should not care about
reading or writing MLIR and should work on in-memory representation of MLIR
modules instead. Split the TranslateFunction interface and the translate
registry into two parts: "from MLIR" and "to MLIR".
Update mlir-translate to handle both registries together by wrapping
translation functions into source-to-source convresions. Remove MLIR parsing
and writing from individual translations and make them operate on Modules
instead. This removes the need for individual translators to include
tools/mlir-translate/mlir-translate.h, which can now be safely removed.
Remove mlir-to-mlir translation that only existed as a registration example and
use mlir-opt instead for tests.
PiperOrigin-RevId: 222398707
The mlir-translate tool is expected to discover individual translations at link
time. These translations must register themselves and may need the utilities
that are currently defined in mlir-translate.cpp for their entry point
functions. Since mlir-translate is linking against individual translations,
the translations cannot link against mlir-translate themselves. Extract out
the utilities into a separate "Translation" library to avoid the potential
dependency cycle. Individual translations link to that library to access
TranslateRegistration. The mlir-translate tool links to individual translations
and to the "Translation" library because it needs the utilities as well.
The main header of the new library is located in include/mlir/Translation.h to
make it easily accessible by translators. The rationale for putting it to
include/mlir rather than to one of its subdirectories is that its purpose is
similar to that of include/mlir/Pass.h so it makes sense to put them at the
same level.
PiperOrigin-RevId: 222398617
Also, added iterators for VariadicResults class.
TESTED with unit tests
TODOs:
- Handle non-bool condition results (similar to the IfOp)
- Use PatternRewriter
PiperOrigin-RevId: 222340376
Existing default visitation function for dimension and symbols were called
"visitAffineDimExpr" and "visitAffineSymbolExpr". However, generic CRTP-based
visit and walk methods were calling "visitDimExpr" and "visitSymbolExpr",
respectively, on derived classes. This has not been discovered before because
all existing affine expression visitors (re)define functions for dimensions and
symbols. Change the names of the default empty visitation functions to the
latter form.
PiperOrigin-RevId: 222312114
This reverts the previous method which needs to create a new dialect with the
constant fold hook from TensorFlow. This new method uses a function object in
dialect to store the constant fold hook. Once a hook is registered to the
dialect, this function object will be assigned when the dialect is added to the
MLIRContext.
For the operations which are not registered, a new method getRegisteredDialects
is added to the MLIRContext to query the dialects which matches their op name
prefixes.
PiperOrigin-RevId: 222310149
This CL refactors a few things in Vectorize.cpp:
1. a clear distinction is made between:
a. the LoadOp are the roots of vectorization and must be vectorized
eagerly and propagate their value; and
b. the StoreOp which are the terminals of vectorization and must be
vectorized late (i.e. they do not produce values that need to be
propagated).
2. the StoreOp must be vectorized late because in general it can store a value
that is not reachable from the subset of loads defined in the
current pattern. One trivial such case is storing a constant defined at the
top-level of the MLFunction and that needs to be turned into a splat.
3. a description of the algorithm is given;
4. the implementation matches the algorithm;
5. the last example is made parametric, in practice it will fully rely on the
implementation of vector_transfer_read/write which will handle boundary
conditions and padding. This will happen by lowering to a lower-level
abstraction either:
a. directly in MLIR (whether DMA or just loops or any async tasks in the
future) (whiteboxing);
b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor
one may want to use;
c. a partial mix of a. and b. (grey-boxing)
5. minor cleanups are applied;
6. mistakenly disabled unit tests are re-enabled (oopsie).
With this CL, this MLIR snippet:
```
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
%A = alloc (%M, %N) : memref<?x?xf32>
%B = alloc (%M, %N) : memref<?x?xf32>
%C = alloc (%M, %N) : memref<?x?xf32>
%f1 = constant 1.0 : f32
%f2 = constant 2.0 : f32
for %i0 = 0 to %M {
for %i1 = 0 to %N {
// non-scoped %f1
store %f1, %A[%i0, %i1] : memref<?x?xf32>
}
}
for %i4 = 0 to %M {
for %i5 = 0 to %N {
%a5 = load %A[%i4, %i5] : memref<?x?xf32>
%b5 = load %B[%i4, %i5] : memref<?x?xf32>
%s5 = addf %a5, %b5 : f32
// non-scoped %f1
%s6 = addf %s5, %f1 : f32
store %s6, %C[%i4, %i5] : memref<?x?xf32>
}
}
return %C : memref<?x?xf32>
}
```
vectorized with these arguments:
```
-vectorize -virtual-vector-size 256 --test-fastest-varying=0
```
vectorization produces this standard innermost-loop vectorized code:
```
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg1) : memref<?x?xf32>
%1 = alloc(%arg0, %arg1) : memref<?x?xf32>
%2 = alloc(%arg0, %arg1) : memref<?x?xf32>
%cst = constant 1.000000e+00 : f32
%cst_0 = constant 2.000000e+00 : f32
for %i0 = 0 to %arg0 {
for %i1 = 0 to %arg1 step 256 {
%cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
"vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
}
}
for %i2 = 0 to %arg0 {
for %i3 = 0 to %arg1 step 256 {
%3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
%4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
%5 = addf %3, %4 : vector<256xf32>
%cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
%6 = addf %5, %cst_2 : vector<256xf32>
"vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
}
}
return %2 : memref<?x?xf32>
}
```
Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now.
PiperOrigin-RevId: 222280209
Added TF::Conv2D op and TFL::Conv2D op, and converted TF::Conv2D to
TFL::Conv2D, which need to address the operand numberr mismatch
and attribute conversion.
PiperOrigin-RevId: 222277554
In the general case, loop bounds can be expressed as affine maps of the outer
loop iterators and function arguments. Relax the check for loop bounds to be
known integer constants and also accept one-dimensional affine bounds in
ConvertToCFG ForStmt lowering. Emit affine_apply operations for both the upper
and the lower bound. The semantics of MLFunctions guarantees that both bounds
can be computed before the loop starts iterating. Constant bounds are merely a
short-hand notation for zero-dimensional affine maps and get supported
transparently.
Multidimensional affine bounds are not yet supported because the target IR
dialect lacks min/max operations necessary to implement the corresponding
semantics.
PiperOrigin-RevId: 222275801
op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting.
PiperOrigin-RevId: 222259727
This CL adds some vector support in prevision of the upcoming vector
materialization pass. In particular this CL adds 2 functions to:
1. compute the multiplicity of a subvector shape in a supervector shape;
2. help match operations on strict super-vectors. This is defined for a given
subvector shape as an operation that manipulates a vector type that is an
integral multiple of the subtype, with multiplicity at least 2.
This CL also adds a TestUtil pass where we can dump arbitrary testing of
functions and analysis that operate at a much smaller granularity than a pass
(e.g. an analysis for which it is convenient to write a bit of artificial MLIR
and write some custom test). This is in order to keep using Filecheck for
things that essentially look and feel like C++ unit tests.
PiperOrigin-RevId: 222250910
This does create an inconsistency between the print formats (e.g., attributes are normally before operands) but fixes an invalid parsing & keeps constant uniform wrt itself (function or int attributes have type at same place). And specifying the specific type for a int/float attribute might get revised shortly.
Also add test to verify that output printed can be parsed again.
PiperOrigin-RevId: 221923893
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.
- add support for getMemRefRegion symbolic in outer loops - hence support for
DMAs symbolic in outer surrounding loops.
- add DMA generation support for outgoing DMAs (store op's to lower memory
space); extend getMemoryRegion to store op's. -memref-bound-check now works
with store op's as well.
- fix dma-generate (references to the old memref in the dma_start op were also
being replaced with the new buffer); we need replace all memref uses to work
only on a subset of the uses - add a new optional argument for
replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
'operation' argument to serve as a filter - if provided, only those uses that
are dominated by the filter are replaced.
- Add missing print for attributes for dma_start, dma_wait op's.
- update the FlatAffineConstraints API
PiperOrigin-RevId: 221889223
This would also make the CallOp and ExtractElementOp invocations from eliminateIfOp function always valid and removes the need for error handling.
Also, verify TensorFlowOp trait.
PiperOrigin-RevId: 221737192
We do some limited renaming here but define an alias for OperationInst so that a follow up cl can solely perform the large scale renaming.
PiperOrigin-RevId: 221726963
* Optionally attach the type of integer and floating point attributes to the attributes, this allows restricting a int/float to specific width.
- Currently this allows suffixing int/float constant with type [this might be revised in future].
- Default to i64 and f32 if not specified.
* For index types the APInt width used is 64.
* Change callers to request a specific attribute type.
* Store iN type with APInt of width N.
* This change does not handle the folding of constants of different types (e.g., doing int type promotions to support constant folding i3 and i32), and instead restricts the constant folding to only operate on the same types.
PiperOrigin-RevId: 221722699
Unranked tensors used to return an empty list of dimensions as their shape. This is confusing since an empty list of dimensions is also returned for 0-D tensors. In particular, hasStaticShape() method used to check if any of the dimensions are -1, which held for unranked tensors even though they don't have static shape.
PiperOrigin-RevId: 221571138
Array attributes can nested and function attributes can appear anywhere at that
level. They should be remapped to point to the generated CFGFunction after
ML-to-CFG conversion, similarly to plain function attributes. Extract the
nested attribute remapping functionality from the Parser to Utils. Extract out
the remapping function for individual Functions from the module remapping
function. Use these new functions in the ML-to-CFG conversion pass and in the
parser.
PiperOrigin-RevId: 221510997
These functions are declared in Transforms/LoopUtils.h (included to the
Transforms/Utils library) but were defined in the loop unrolling pass in
Transforms/LoopUnroll.cpp. As a result, targets depending only on
TransformUtils library but not on Transforms could get link errors. Move the
definitions to Transforms/Utils/LoopUtils.cpp where they should actually live.
This does not modify any code.
PiperOrigin-RevId: 221508882