This CL starts implementing a Linalg dialect with the objective of supporting
optimizing compilation of loops and library calls for a subset of common linear
algebra operations.
This CL starts by simply adding a linalg.range type and an operation with the
proper roundtripping test.
--
PiperOrigin-RevId: 244189468
This is making up for some differences in standard library and linker flags.
It also get rid of the requirement to build with RTTI.
--
PiperOrigin-RevId: 241348845
* print-ir-before=(comma-separated-pass-list)
- Print the IR before each of the passes provided within the pass list.
* print-ir-before-all
- Print the IR before every pass in the pipeline.
* print-ir-after=(comma-separated-pass-list)
- Print the IR after each of the passes provided within the pass list.
* print-ir-after-all
- Print the IR after every pass in the pipeline.
* print-ir-module-scope
- Always print the Module IR, even for non module passes.
PiperOrigin-RevId: 238523649
Below shows the output for an example mlir-opt command line.
mlir-opt foo.mlir -verify-each=false -cse -canonicalize -cse -cse -pass-timing
list view (-pass-timing-display=list):
* In this mode the results are displayed in a list sorted by total time; with each pass/analysis instance aggregated into one unique result. This mode is similar to the output of 'time-passes' in llvm-opt.
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0097 seconds (0.0096 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0051 ( 58.3%) 0.0001 ( 12.2%) 0.0052 ( 53.8%) 0.0052 ( 53.8%) Canonicalizer
0.0025 ( 29.1%) 0.0005 ( 58.2%) 0.0031 ( 31.9%) 0.0031 ( 32.0%) CSE
0.0011 ( 12.6%) 0.0003 ( 29.7%) 0.0014 ( 14.3%) 0.0014 ( 14.2%) DominanceInfo
0.0087 (100.0%) 0.0009 (100.0%) 0.0097 (100.0%) 0.0096 (100.0%) Total
pipeline view (-pass-timing-display=pipeline):
* In this mode the results are displayed in a nested pipeline view that mirrors the internal pass pipeline that is being executed in the pass manager. This view is useful for understanding specifically which parts of the pipeline are taking the most time, and can also be used to identify when analyses are being invalidated and recomputed.
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0082 seconds (0.0081 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0042 (100.0%) 0.0039 (100.0%) 0.0082 (100.0%) 0.0081 (100.0%) Function Pipeline
0.0005 ( 11.6%) 0.0008 ( 21.1%) 0.0013 ( 16.1%) 0.0013 ( 16.2%) CSE
0.0002 ( 5.0%) 0.0004 ( 9.3%) 0.0006 ( 7.0%) 0.0006 ( 7.0%) (A) DominanceInfo
0.0026 ( 61.8%) 0.0018 ( 45.6%) 0.0044 ( 54.0%) 0.0044 ( 54.1%) Canonicalizer
0.0005 ( 11.7%) 0.0005 ( 13.0%) 0.0010 ( 12.3%) 0.0010 ( 12.4%) CSE
0.0003 ( 6.1%) 0.0003 ( 8.3%) 0.0006 ( 7.2%) 0.0006 ( 7.1%) (A) DominanceInfo
0.0002 ( 3.8%) 0.0001 ( 2.8%) 0.0003 ( 3.3%) 0.0003 ( 3.3%) CSE
0.0042 (100.0%) 0.0039 (100.0%) 0.0082 (100.0%) 0.0081 (100.0%) Total
PiperOrigin-RevId: 237825367
Multiple binaries have the needs to open input files. Use this function
to de-duplicate the code.
Also changed openOutputFile() to return errors using std::string since
it is a library call and accessing I/O in library call is not friendly.
PiperOrigin-RevId: 228878221
StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the
CFGFunctionViewGraph pass to ViewFunctionGraph.
This is step 13/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227069438
This *only* changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure.
This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed.
This is step 10/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227044402
Move PrintOpStatsPass out of tools and to other passes (moved to Analysis as it
doesn't modify the program but it is different than the other analysis passes
as it is only consumer at present is the user).
PiperOrigin-RevId: 227018996
op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting.
PiperOrigin-RevId: 222259727
This is to allow usage of comment blocks along with splits in test cases.
For example, "Function Control Flow Lowering" comment block in
raise-control-flow.mlir
TESTED with existing unit tests
PiperOrigin-RevId: 221214451
Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>.
PiperOrigin-RevId: 220682078
Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage.
Change build targets to alwayslink to enforce registration.
PiperOrigin-RevId: 220390178
- simple perfectly nested band tiling with fixed tile sizes.
- only the hyper-rectangular case is handled, with other limitations of
getIndexSet applying (constant loop bounds, etc.); once
the latter utility is extended, tiled code generation should become more
general.
- Add FlatAffineConstraints::isHyperRectangular()
PiperOrigin-RevId: 220324933
Start of TFLite legalizer pass. Currently focussed on macro expanding ops, limited to what is registered directly in a separate pass (this should instead be a general pass), no querying of what gets produced, the matching is string based instead of using the ops proper (the matching TF ops should be defined) etc. This is a step to enable prototyping. In addition to the above shortcomings, the legalizer is very verbose in this form and should instead be driven by autogenerated patterns (same is true for the op builders too). But this starts from the explicit form and extracting out commonality in follow up.
Add definition for tfl.relu for basic selection of fused relu add.
PiperOrigin-RevId: 220287087
Adds equality constraints to dependence constraint system for accesses using dims/symbols where the defining operation of the dim/symbol is a constant.
PiperOrigin-RevId: 219814740
- Builds access functions and iterations domains for each access.
- Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds.
- Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses.
- Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops.
- The LoopFusion pass will be extended in subsequent CLs.
PiperOrigin-RevId: 219630898
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:
$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
%0 = alloc() : memref<9x9xi32>
%1 = alloc() : memref<128xi32>
for %i0 = -1 to 9 {
for %i1 = -1 to 9 {
%2 = affine_apply #map0(%i0, %i1)
%3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
%4 = affine_apply #map1(%i0, %i1)
%5 = load %1[%4] : memref<128xi32>
}
}
return
}
- Improves productivity while manually / semi-automatically developing MLIR for
testing / prototyping; also provides an indirect way to catch errors in
transformations.
- This pass is an easy way to test the underlying affine analysis
machinery including low level routines.
Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.
While on this:
- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/
- fix a bug in AffineAnalysis.cpp::toAffineExpr
TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
- Introduce Fourier-Motzkin variable elimination to eliminate a dimension from
a system of linear equalities/inequalities. Update isEmpty to use this.
Since FM is only exact on rational/real spaces, an emptiness check based on
this is guaranteed to be exact whenever it says the underlying set is empty;
if it says, it's not empty, there may still be no integer points in it.
Also, supports a version that computes "dark shadows".
- Test this by checking for "always false" conditionals in if statements.
- Unique IntegerSet's that are small (few constraints, few variables). This
basically means the canonical empty set and other small sets that are
likely commonly used get uniqued; allows checking for the canonical empty set
by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size
for uniqui'ing.
- rename simplify-affine-expr -> simplify-affine-structures
Other cleanup
- IntegerSet::numConstraints, AffineMap::numResults are no longer needed;
remove them.
- add copy assignment operators for AffineMap, IntegerSet.
- rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet
- Misc cleanup for FlatAffineConstraints API
PiperOrigin-RevId: 218690456
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.
The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.
The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.
In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.
One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.
This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).
PiperOrigin-RevId: 217622889
out canonicalization pass to drive it, and a simple (x-x) === 0 pattern match
as a test case.
There is a tremendous number of improvements that need to land, and the
matcher/rewriter and patterns will be split out of this file, but this is a
starting point.
PiperOrigin-RevId: 216788604
Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline
data transfer to use these to detect DMA ops.
While on this
- return failure from mlir-opt::performActions if a pass generates invalid output
- improve error message for verify 'n' operand traits
PiperOrigin-RevId: 216429885
mode. We even diagnose mistakes nicely (aside from the a/an vowel confusion
which isn't worth worrying about):
test/IR/invalid.mlir split at line tensorflow/mlir#399:8:34: error: 'note' diagnostic emitted when expecting a 'error'
%x = "bar"() : () -> i32 // expected-error {{operand defined here}}
^
PiperOrigin-RevId: 214773208
Super thin slice that can convert a MLIR program (with addfs) to MLIR HLO dialect. Add this as translations to mlir-translate. Also add hlo::AddOp op and HLO op registration.
PiperOrigin-RevId: 214480409
Instead of linking in different initializeMLIRContext functions, add a registry mechanism and function to initialize all registered ops in a given MLIRContext. Initialize all registered ops along with the StandardOps when constructing a MLIRContext.
PiperOrigin-RevId: 214073842
optimization pass:
- Give the ability for operations to implement a constantFold hook (a simple
one for single-result ops as well as general support for multi-result ops).
- Implement folding support for constant and addf.
- Implement support in AbstractOperation and Operation to make this usable by
clients.
- Implement a very simple constant folding pass that does top down folding on
CFG and ML functions, with a testcase that exercises all the above stuff.
Random cleanups:
- Improve the build APIs for ConstantOp.
- Stop passing "-o -" to mlir-opt in the testsuite, since that is the default.
PiperOrigin-RevId: 213749809
- Compress the identifier/kind of a Function into a single word.
- Eliminate otherFailure from verifier now that we always have a location
- Eliminate the error string from the verifier now that we always have
locations.
- Simplify the parser's handling of fn forward references, using the location
tracked by the function.
PiperOrigin-RevId: 211985101
terminators. Improve mlir-opt to print better location info in the split-files
case.
Before:
error: unexpected error: branch has 2 operands, but target block has 1
br bb1(%0tensorflow/mlir#1, %0tensorflow/mlir#0 : i17, i1)
^
after:
invalid.mlir split at line tensorflow/mlir#305:6:3: error: unexpected error: branch has 2 operands, but target block has 1
br bb1(%0tensorflow/mlir#1, %0tensorflow/mlir#0 : i17, i1)
^
It still isn't optimal (it would be better to have just the original file and
line number but is a step forward, and doing the optimal thing would be a lot
more complicated.
PiperOrigin-RevId: 211917067
inserting shape_casts as necessary.
Along the way:
- Add some missing accessors to the AtLeastNOperands trait.
- Implement shape_cast / ShapeCastOp standard op.
- Improve handling of errors in mlir-opt, making it easier to understand
errors when invalid IR is rejected by the verifier.
PiperOrigin-RevId: 211897877
- Make the tf-lower-control flow handle error cases better. Add a testcase
that (currently) fails due to type mismatches.
- Factor more code in the verifier for basic block argument checking, and
check more invariants.
- Fix a crasher in the asmprinter on null instructions (which only occurs on
invalid code).
- Fix a bug handling conditional branches with no block operands, it would
access &operands[0] instead of using operands.data().
- Enhance the mlir-opt driver to use the verifier() in a non-crashing mode,
allowing issues to be reported as diagnostics.
PiperOrigin-RevId: 211818291
Enable using GraphWriter to dump graphviz in debug mode (kept to debug builds completely as this is only for debugging). Add option to mlir-opt to print CFGFunction after every transform in debug mode.
PiperOrigin-RevId: 211578699
- Add a new -verify mode to the mlir-opt tool that allows writing test cases
for optimization and other passes that produce diagnostics.
- Refactor existing the -check-parser-errors flag to mlir-opt into a new
-split-input-file option which is orthogonal to -verify.
- Eliminate the special error hook the parser maintained and use the standard
MLIRContext's one instead.
- Enhance the default MLIRContext error reporter to print file/line/col of
errors when it is available.
- Add new createChecked() methods to the builder that create ops and invoke
the verify hook on them, use this to detected unhandled code in the
RaiseControlFlow pass.
- Teach mlir-opt about expected-error @+, it previously only worked with @-
PiperOrigin-RevId: 211305770
Outside of IR/
- simplify a MutableAffineMap by flattening the affine expressions
- add a simplify affine expression pass that uses this analysis
- update the FlatAffineConstraints API (to be used in the next CL)
In IR:
- add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR
simplication of simplifyMod simpler and more powerful.
- rename the AffineExpr visitor methods to distinguish b/w visiting and
walking, and to simplify API names based on context.
The next CL will use some of these for the loop unrolling/unroll-jam to make
the detection for the need of cleanup loop powerful/non-trivial.
A future CL will finally move this simplification to FlatAffineConstraints to
make it more powerful. For eg., currently, even if a mod expr appearing in a
part of the expression tree can't be simplified, the whole thing won't be
simplified.
PiperOrigin-RevId: 211012256