llvm-project

Commit Graph

Author	SHA1	Message	Date
Uday Bondhugula	23ddd577ef	Complete migration to exclusive upper bound cl/220448963 had missed a part of the updates. - while on this, clean up some of the test cases to use ops' custom forms. PiperOrigin-RevId: 220675303	2019-03-29 13:52:17 -07:00
Nicolas Vasilache	cde8248753	[MLIR] Make upper bound implementation exclusive This CL implement exclusive upper bound behavior as per b/116854378. A followup CL will update the semantics of the for loop. PiperOrigin-RevId: 220448963	2019-03-29 13:49:49 -07:00
Uday Bondhugula	6cd5d5c544	Introduce loop tiling code generation (hyper-rectangular case) - simple perfectly nested band tiling with fixed tile sizes. - only the hyper-rectangular case is handled, with other limitations of getIndexSet applying (constant loop bounds, etc.); once the latter utility is extended, tiled code generation should become more general. - Add FlatAffineConstraints::isHyperRectangular() PiperOrigin-RevId: 220324933	2019-03-29 13:49:05 -07:00
MLIR Team	239e328913	Adds MemRefDependenceCheck analysis pass, plus multiple dependence check tests. Adds equality constraints to dependence constraint system for accesses using dims/symbols where the defining operation of the dim/symbol is a constant. PiperOrigin-RevId: 219814740	2019-03-29 13:48:05 -07:00
Uday Bondhugula	74c62c8ce0	Complete memref bound checker for arbitrary affine expressions. Handle local variables from mod's and div's when converting to flat form. - propagate mod, floordiv, ceildiv / local variables constraint information when flattening affine expressions and converting them into flat affine constraints; resolve multiple TODOs. - enables memref bound checker to work with arbitrary affine expressions - update FlatAffineConstraints API with several new methods - test/exercise functionality mostly through -memref-bound-check - other analyses such as dependence tests, etc. should now be able to work in the presence of any affine composition of add, mul, floor, ceil, mod. PiperOrigin-RevId: 219711806	2019-03-29 13:47:29 -07:00
MLIR Team	f28e4df666	Adds a dependence check to test whether two accesses to the same memref access the same element. - Builds access functions and iterations domains for each access. - Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds. - Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses. - Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops. - The LoopFusion pass will be extended in subsequent CLs. PiperOrigin-RevId: 219630898	2019-03-29 13:47:13 -07:00
Nicolas Vasilache	21638dcda9	[MLIR] Extend vectorization to 2+-D patterns This CL adds support for vectorization using more interesting 2-D and 3-D patterns. Note in particular the fact that we match some pretty complex imperfectly nested 2-D patterns with a quite minimal change to the implementation: we just add a bit of recursion to traverse the matched patterns and actually vectorize the loops. For instance, vectorizing the following loop by 128: ``` for %i3 = 0 to %0 { %7 = affine_apply (d0) -> (d0)(%i3) %8 = load %arg0[%c0_0, %7] : memref<?x?xf32> } ``` Currently generates: ``` #map0 = ()[s0] -> (s0 + 127) #map1 = (d0) -> (d0) for %i3 = 0 to #map0()[%0] step 128 { %9 = affine_apply #map1(%i3) %10 = alloc() : memref<1xvector<128xf32>> %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) : (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) -> (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) %12 = load %10[%c0] : memref<1xvector<128xf32>> } ``` The above is subject to evolution. PiperOrigin-RevId: 219629745	2019-03-29 13:46:58 -07:00
Uday Bondhugula	8201e19e3d	Introduce memref bound checking. Introduce analysis to check memref accesses (in MLFunctions) for out of bound ones. It works as follows: $ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0 * 128 - d1) mlfunc @test() { %0 = alloc() : memref<9x9xi32> %1 = alloc() : memref<128xi32> for %i0 = -1 to 9 { for %i1 = -1 to 9 { %2 = affine_apply #map0(%i0, %i1) %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32> %4 = affine_apply #map1(%i0, %i1) %5 = load %1[%4] : memref<128xi32> } } return } - Improves productivity while manually / semi-automatically developing MLIR for testing / prototyping; also provides an indirect way to catch errors in transformations. - This pass is an easy way to test the underlying affine analysis machinery including low level routines. Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256. While on this: - create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/ - fix a bug in AffineAnalysis.cpp::toAffineExpr TODO: extend to non-constant loop bounds (straightforward). Will transparently work for all accesses once floordiv, mod, ceildiv are supported in the AffineMap -> FlatAffineConstraints conversion. PiperOrigin-RevId: 219397961	2019-03-29 13:46:08 -07:00
Nicolas Vasilache	af7f56fdf8	[MLIR] Implement 1-D vectorization for fastest varying load/stores This CL is a first in a series that implements early vectorization of increasingly complex patterns. In particular, early vectorization will support arbitrary loop nesting patterns (both perfectly and imperfectly nested), at arbitrary depths in the loop tree. This first CL builds the minimal support for applying 1-D patterns. It relies on an unaligned load/store op abstraction that can be inplemented differently on different HW. Future CLs will support higher dimensional patterns, but 1-D patterns already exhibit interesting properties. In particular, we want to separate pattern matching (i.e. legality both structural and dependency analysis based), from profitability analysis, from application of the transformation. As a consequence patterns may intersect and we need to verify that a pattern can still apply by the time we get to applying it. A non-greedy analysis on profitability that takes into account pattern intersection is left for future work. Additionally the CL makes the following cleanups: 1. the matches method now returns a value, not a reference; 2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by value; 3. added size and empty methods to matches; 4. added a negative vectorization test with a conditional, this exhibited a but in the iterators. Iterators now return nullptr if the underlying storage is nullpt. PiperOrigin-RevId: 219299489	2019-03-29 13:44:26 -07:00
Lei Zhang	582b0761c6	Use matcher sugars for cannonicalization pattern matching - Added a mechanism for specifying pattern matching more concisely like LLVM. - Added support for canonicalization of addi/muli over vector/tensor splat - Added ValueType to Attribute class hierarchy - Allowed creating constant splat PiperOrigin-RevId: 219149621	2019-03-29 13:43:44 -07:00
Uday Bondhugula	1ec77cecf2	FourierMotzkinEliminate trivial bug fix PiperOrigin-RevId: 219148982	2019-03-29 13:43:30 -07:00
Lei Zhang	60b5184c8b	Canonicalize muli(x, 1) into x PiperOrigin-RevId: 218885877	2019-03-29 13:42:01 -07:00
Alex Zinenko	aae372ecb8	Drop trivial identity affine mappings in MemRef construction. As per MLIR spec, the absence of affine maps in MemRef type is interpreted as an implicit identity affine map. Therefore, MemRef types declared with explicit or implicit identity map should be considered equal at the MemRefType level. During MemRefType construction, drop trivial identity affine map compositions. A trivial identity composition consists of a single unbounded identity map. It is unclear whether affine maps should be composed in-place to a single map during MemRef type construction, so non-trivial compositions that could have been simplified to an identity are NOT removed. We chose to drop the trivial identity map rather than inject it in places that assume its present implicitly because it makes the code simpler by reducing boilerplate; identity mappings are obvious defaults. Update tests that were checking for the presence of trivial identity map compositions in the outputs. PiperOrigin-RevId: 218862454	2019-03-29 13:41:47 -07:00
Chris Lattner	967d934180	Fix two issues: 1) We incorrectly reassociated non-reassociative operations like subi, causing miscompilations. 2) When constant folding, we didn't add users of the new constant back to the worklist for reprocessing, causing us to miss some cases (pointed out by Uday). The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch. PiperOrigin-RevId: 218803984	2019-03-29 13:40:35 -07:00
Uday Bondhugula	988ce3387f	Change sigil for integer set: @@ -> # PiperOrigin-RevId: 218786684	2019-03-29 13:40:21 -07:00
MLIR Team	13f6cc0187	Run GCD test before elimination. Adds test case with rational solutions, but no integer solutions. PiperOrigin-RevId: 218772332	2019-03-29 13:39:34 -07:00
Uday Bondhugula	80610c2f49	Introduce Fourier-Motzkin variable elimination + other cleanup/support - Introduce Fourier-Motzkin variable elimination to eliminate a dimension from a system of linear equalities/inequalities. Update isEmpty to use this. Since FM is only exact on rational/real spaces, an emptiness check based on this is guaranteed to be exact whenever it says the underlying set is empty; if it says, it's not empty, there may still be no integer points in it. Also, supports a version that computes "dark shadows". - Test this by checking for "always false" conditionals in if statements. - Unique IntegerSet's that are small (few constraints, few variables). This basically means the canonical empty set and other small sets that are likely commonly used get uniqued; allows checking for the canonical empty set by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size for uniqui'ing. - rename simplify-affine-expr -> simplify-affine-structures Other cleanup - IntegerSet::numConstraints, AffineMap::numResults are no longer needed; remove them. - add copy assignment operators for AffineMap, IntegerSet. - rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet - Misc cleanup for FlatAffineConstraints API PiperOrigin-RevId: 218690456	2019-03-29 13:38:24 -07:00
MLIR Team	5413239350	Adds Gaussian Elimination to FlatAffineConstraints. - Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system. - Adds GCD test check if equality constraints have no solution. - Adds unit test cases. PiperOrigin-RevId: 218546319	2019-03-29 13:38:10 -07:00
Chris Lattner	bd01f9541f	Teach canonicalize pass to unique and hoist constants to the entry block. This is a straight-forward change, but required adding missing moveBefore() methods on operations (requiring moving some traits around to make C++ happy). This also fixes a constness issue with the getBlock/getFunction() methods on Instruction, and adds a missing getFunction() method on MLFuncBuilder. PiperOrigin-RevId: 218523905	2019-03-29 13:36:59 -07:00
Chris Lattner	301f83f906	Implement shape folding in the canonicalization pass: - Add a few canonicalization patterns to fold memref_cast into load/store/dealloc. - Canonicalize alloc(constant) into an alloc with a constant shape followed by a cast. - Add a new PatternRewriter::updatedRootInPlace API to make this more convenient. SimplifyAllocConst and the testcase is heavily based on Uday's implementation work, just in a different framework. PiperOrigin-RevId: 218361237	2019-03-29 13:36:31 -07:00
Chris Lattner	a03051b9c4	Add a pattern (x+0) -> x, generalize Canonicalize to CFGFunc's, address a few TODOs, and add some casting support to Operation. PiperOrigin-RevId: 218219340	2019-03-29 13:35:33 -07:00
Chris Lattner	7850258c49	Introduce a new Operation::erase helper to generalize some code in the pattern matcher / canonicalizer, and rename existing eraseFromBlock methods to align with it. PiperOrigin-RevId: 218104455	2019-03-29 13:34:51 -07:00
Uday Bondhugula	18e666702c	Generalize / improve DMA transfer overlap; nested and multiple DMA support; resolve multiple TODOs. - replace the fake test pass (that worked on just the first loop in the MLFunction) to perform DMA pipelining on all suitable loops. - nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops) - fix bugs / assumptions: correctly copy memory space and elemental type of source memref for double buffering. - correctly identify matching start/finish statements, handle multiple DMAs per loop. - introduce dominates/properlyDominates utitilies for MLFunction statements. - move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it getShiftValidity - refactor getContainingStmtPos -> findAncestorStmtInBlock - move into Analysis/Utils.h; has two users. - other improvements / cleanup for related API/utilities - add size argument to dma_wait - for nested DMAs or in general, it makes it easy to obtain the size to use when lowering the dma_wait since we wouldn't want to identify the matching dma_start, and more importantly, in general/in the future, there may not always be a dma_start dominating the dma_wait. - add debug information in the pass PiperOrigin-RevId: 217734892	2019-03-29 13:32:28 -07:00
Nicolas Vasilache	3013dadb7c	[MLIR] Basic infrastructure for vectorization test This CL implements a very simple loop vectorization test and the basic infrastructure to support it. The test simply consists in: 1. matching the loops in the MLFunction and all the Load/Store operations nested under the loop; 2. testing whether all the Load/Store are contiguous along the innermost memory dimension along that particular loop. If any reference is non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then the loop is not-vectorizable. The simple test above can gradually be extended with more interesting behaviors to account for the fact that a layout permutation may exist that enables contiguity etc. All these will come in due time but it is worthwhile noting that the test already supports detection of outer-vetorizable loops. In implementing this test, I also added a recursive MLFunctionMatcher and some sugar that can capture patterns such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating on the matched IR structures. For now it just uses in order traversal but post-order DFS will be useful in the future once IR rewrites start occuring. One may note that the memory management design decision follows a different pattern from MLIR. After evaluating different designs and how they quickly increase cognitive overhead, I decided to opt for the simplest solution in my view: a class-wide (threadsafe) RAII context. This way, a pass that needs MLFunctionMatcher can just have its own locally scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed. If passes are expected to have a longer lifetime, then the contexts can easily be scoped inside the runOnMLFunction call and storage lifetime reduced. Lastly, whatever the scope of threading (module, function, pass), this is expected to also be future-proof wrt concurrency (but this is a detail atm). PiperOrigin-RevId: 217622889	2019-03-29 13:32:13 -07:00
Chris Lattner	80e884a9f8	Add constant folding and binary operator reassociation to the canonicalize pass, build up the worklist infra in anticipation of improving the pattern matcher to match more than one node. PiperOrigin-RevId: 217330579	2019-03-29 13:31:44 -07:00
MLIR Team	0114e232d8	Adds method to AffineApplyOp which forward substitutes its results into any of its users which are also AffineApplyOps. Updates ComposeAffineMaps test pass to use this method. Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass. PiperOrigin-RevId: 217163351	2019-03-29 13:30:49 -07:00
Uday Bondhugula	86eac4618c	Create private exclusive / single use affine computation slice for an op stmt. - add util to create a private / exclusive / single use affine computation slice for an op stmt (see method doc comment); a single multi-result affine_apply op is prepended to the op stmt to provide all results needed for its operands as a function of loop iterators and symbols. - use it for DMA pipelining (to create private slices for DMA start stmt's); resolve TODOs/feature request (b/117159533) - move createComposedAffineApplyOp to Transforms/Utils; free it from taking a memref as input / generalize it. PiperOrigin-RevId: 216926818	2019-03-29 13:29:21 -07:00
Chris Lattner	9e3b928e32	Implement a super sketched out pattern match/rewrite framework and a sketched out canonicalization pass to drive it, and a simple (x-x) === 0 pattern match as a test case. There is a tremendous number of improvements that need to land, and the matcher/rewriter and patterns will be split out of this file, but this is a starting point. PiperOrigin-RevId: 216788604	2019-03-29 13:29:07 -07:00
Uday Bondhugula	82e55750d2	Add target independent standard DMA ops: dma.start, dma.wait Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline data transfer to use these to detect DMA ops. While on this - return failure from mlir-opt::performActions if a pass generates invalid output - improve error message for verify 'n' operand traits PiperOrigin-RevId: 216429885	2019-03-29 13:26:10 -07:00
MLIR Team	fe490043b0	Affine map composition. ) Implements AffineValueMap forward substitution for AffineApplyOps. ) Adds ComposeAffineMaps transformation pass, which composes affine maps for all loads/stores in an MLFunction. *) Adds multiple affine map composition tests. PiperOrigin-RevId: 216216446	2019-03-29 13:24:59 -07:00
Chris Lattner	d2d89cbc19	Rename affineint type to index type. The name 'index' may not be perfect, but is better than the old name. Here is some justification: 1) affineint (as it is named) is not a type suitable for general computation (e.g. the multiply/adds in an integer matmul). It has undefined width and is undefined on overflow. They are used as the indices for forstmt because they are intended to be used as indexes inside the loop. 2) It can be used in both cfg and ml functions, and in cfg functions. As you mention, “symbols” are not affine, and we use affineint values for symbols. 3) Integers aren’t affine, the algorithms applied to them can be. :) 4) The only suitable use for affineint in MLIR is for indexes and dimension sizes (i.e. the bounds of those indexes). PiperOrigin-RevId: 216057974	2019-03-29 13:24:16 -07:00
Uday Bondhugula	d18ae9e2c7	Constant folding for loop bounds. - Fold the lower/upper bound of a loop to a constant whenever the result of the application of the bound's affine map on the operand list yields a constant. - Update/complete 'for' stmt's API to set lower/upper bounds with operands. Resolve TODOs for ForStmt::set{Lower,Upper}Bound. - Moved AffineExprConstantFolder into AffineMap.cpp and added AffineMap::constantFold to be used by both AffineApplyOp and ForStmt::constantFoldBound. PiperOrigin-RevId: 215997346	2019-03-29 13:24:01 -07:00
Chris Lattner	6822c4e29c	Implement support for constant folding operations even when their operands are not all constant. Implement support for folding dim, x*0, and affine_apply. PiperOrigin-RevId: 215917432	2019-03-29 13:23:32 -07:00
Uday Bondhugula	6cfdb756b1	Introduce memref replacement/rewrite support: to replace an existing memref with a new one (of a potentially different rank/shape) with an optional index remapping. - introduce Utils::replaceAllMemRefUsesWith - use this for DMA double buffering (This CL also adds a few temporary utilities / code that will be done away with once: 1) abstract DMA op's are added 2) memref deferencing side-effect / trait is available on op's 3) b/117159533 is resolved (memref index computation slices). PiperOrigin-RevId: 215831373	2019-03-29 13:23:19 -07:00
Feng Liu	7d016fd352	Add support to Add, Sub, Mul for both Integer and Float types. The new operations are registered and also the const folding of them are implemented. PiperOrigin-RevId: 215575999	2019-03-29 13:21:40 -07:00
Uday Bondhugula	041817a45e	Introduce loop body skewing / loop pipelining / loop shifting utility. - loopBodySkew shifts statements of a loop body by stmt-wise delays, and is typically meant to be used to: - allow overlap of non-blocking start/wait until completion operations with other computation - allow shifting of statements (for better register reuse/locality/parallelism) - software pipelining (when applied to the innermost loop) - an additional argument specifies whether to unroll the prologue and epilogue. - add method to check SSA dominance preservation. - add a fake loop pipeline pass to test this utility. Sample input/output are below. While on this, fix/add following: - fix minor bug in getAddMulPureAffineExpr - add additional builder methods for common affine map cases - fix const_operand_iterator's for ForStmt, etc. When there is no such thing as 'const MLValue', the iterator shouldn't be returning const MLValue's. Returning MLValue is const correct. Sample input/output examples: 1) Simplest case: shift second statement by one. Input: for %i = 0 to 7 { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint } Output: #map0 = (d0) -> (d0 - 1) mlfunc @loop_nest_simple1() { %c8 = constant 8 : affineint %c0 = constant 0 : affineint %0 = "foo"(%c0) : (affineint) -> affineint for %i0 = 1 to 7 { %1 = "foo"(%i0) : (affineint) -> affineint %2 = affine_apply #map0(%i0) %3 = "bar"(%2) : (affineint) -> affineint } %4 = affine_apply #map0(%c8) %5 = "bar"(%4) : (affineint) -> affineint return } 2) DMA overlap: shift dma.wait and compute by one. Input for %i = 0 to 7 { %pingpong = affine_apply (d0) -> (d0 mod 2) (%i) "dma.enqueue"(%pingpong) : (affineint) -> affineint %pongping = affine_apply (d0) -> (d0 mod 2) (%i) "dma.wait"(%pongping) : (affineint) -> affineint "compute1"(%pongping) : (affineint) -> affineint } Output #map0 = (d0) -> (d0 mod 2) #map1 = (d0) -> (d0 - 1) #map2 = ()[s0] -> (s0 + 7) mlfunc @loop_nest_dma() { %c8 = constant 8 : affineint %c0 = constant 0 : affineint %0 = affine_apply #map0(%c0) %1 = "dma.enqueue"(%0) : (affineint) -> affineint for %i0 = 1 to 7 { %2 = affine_apply #map0(%i0) %3 = "dma.enqueue"(%2) : (affineint) -> affineint %4 = affine_apply #map1(%i0) %5 = affine_apply #map0(%4) %6 = "dma.wait"(%5) : (affineint) -> affineint %7 = "compute1"(%5) : (affineint) -> affineint } %8 = affine_apply #map1(%c8) %9 = affine_apply #map0(%8) %10 = "dma.wait"(%9) : (affineint) -> affineint %11 = "compute1"(%9) : (affineint) -> affineint return } 3) With arbitrary affine bound maps: Shift last two statements by two. Input: for %i = %N to ()[s0] -> (s0 + 7)()[%N] { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint %z = "foo_bar"(%i) : (affineint) -> (affineint) "bar_foo"(%i) : (affineint) -> (affineint) } Output #map0 = ()[s0] -> (s0 + 1) #map1 = ()[s0] -> (s0 + 2) #map2 = ()[s0] -> (s0 + 7) #map3 = (d0) -> (d0 - 2) #map4 = ()[s0] -> (s0 + 8) #map5 = ()[s0] -> (s0 + 9) for %i0 = %arg0 to #map0()[%arg0] { %0 = "foo"(%i0) : (affineint) -> affineint %1 = "bar"(%i0) : (affineint) -> affineint } for %i1 = #map1()[%arg0] to #map2()[%arg0] { %2 = "foo"(%i1) : (affineint) -> affineint %3 = "bar"(%i1) : (affineint) -> affineint %4 = affine_apply #map3(%i1) %5 = "foo_bar"(%4) : (affineint) -> affineint %6 = "bar_foo"(%4) : (affineint) -> affineint } for %i2 = #map4()[%arg0] to #map5()[%arg0] { %7 = affine_apply #map3(%i2) %8 = "foo_bar"(%7) : (affineint) -> affineint %9 = "bar_foo"(%7) : (affineint) -> affineint } 4) Shift one by zero, second by one, third by two for %i = 0 to 7 { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint %z = "foobar"(%i) : (affineint) -> affineint } #map0 = (d0) -> (d0 - 1) #map1 = (d0) -> (d0 - 2) #map2 = ()[s0] -> (s0 + 7) %c9 = constant 9 : affineint %c8 = constant 8 : affineint %c1 = constant 1 : affineint %c0 = constant 0 : affineint %0 = "foo"(%c0) : (affineint) -> affineint %1 = "foo"(%c1) : (affineint) -> affineint %2 = affine_apply #map0(%c1) %3 = "bar"(%2) : (affineint) -> affineint for %i0 = 2 to 7 { %4 = "foo"(%i0) : (affineint) -> affineint %5 = affine_apply #map0(%i0) %6 = "bar"(%5) : (affineint) -> affineint %7 = affine_apply #map1(%i0) %8 = "foobar"(%7) : (affineint) -> affineint } %9 = affine_apply #map0(%c8) %10 = "bar"(%9) : (affineint) -> affineint %11 = affine_apply #map1(%c8) %12 = "foobar"(%11) : (affineint) -> affineint %13 = affine_apply #map1(%c9) %14 = "foobar"(%13) : (affineint) -> affineint 5) SSA dominance violated; no shifting if a shift is specified for the second statement. for %i = 0 to 7 { %x = "foo"(%i) : (affineint) -> affineint "bar"(%x) : (affineint) -> affineint } PiperOrigin-RevId: 214975731	2019-03-29 13:21:26 -07:00
Uday Bondhugula	591fa9698e	Change behavior of loopUnrollFull with unroll factor 1 Using loopUnrollFull with unroll factor 1 should promote the loop body as opposed to doing nothing. PiperOrigin-RevId: 214812126	2019-03-29 13:20:59 -07:00
Nicolas Vasilache	54e5b4b4c0	[MLIR] Fix AsmPrinter for short-hand bound notation This CL retricts shorthand notation printing to only the bounds that can be roundtripped unambiguously; i.e.: 1. ()[]->(%some_cst) ()[] 2. ()[s0]->(s0) ()[%some_symbol] Upon inspection it turns out that the constant case was lossy so this CL also updates it. Note however that fixing this issue exhibits a potential issues in unroll.mlir. L488 exhibits a map ()[s0] -> (1)()[%arg0] which could be simplified down to ()[]->(1)()[]. This does not seem like a bug but maybe an undesired complexity in the maps generated by unrolling. bondhugula@, care to take a look? PiperOrigin-RevId: 214531410	2019-03-29 13:19:04 -07:00
MLIR Team	99188b9d98	Adds constant folding hook for AffineApplyOp. PiperOrigin-RevId: 214287780	2019-03-29 13:18:19 -07:00
Chris Lattner	82eb284a53	Implement support for constant folding operations and a simple constant folding optimization pass: - Give the ability for operations to implement a constantFold hook (a simple one for single-result ops as well as general support for multi-result ops). - Implement folding support for constant and addf. - Implement support in AbstractOperation and Operation to make this usable by clients. - Implement a very simple constant folding pass that does top down folding on CFG and ML functions, with a testcase that exercises all the above stuff. Random cleanups: - Improve the build APIs for ConstantOp. - Stop passing "-o -" to mlir-opt in the testsuite, since that is the default. PiperOrigin-RevId: 213749809	2019-03-29 13:16:33 -07:00
Uday Bondhugula	ab4797229c	Extend loop unroll/unroll-and-jam to affine bounds + refactor related code. - extend loop unroll-jam similar to loop unroll for affine bounds - extend both loop unroll/unroll-jam to deal with cleanup loop for non multiple of unroll factor. - extend promotion of single iteration loops to work with affine bounds - fix typo bugs in loop unroll - refactor common code b/w loop unroll and loop unroll-jam - move prototypes of non-pass transforms to LoopUtils.h - add additional builder methods. - introduce loopUnrollUpTo(factor) to unroll by either factor or trip count, whichever is less. - remove Statement::isInnermost (not used for now - will come back at the right place/in right form later) PiperOrigin-RevId: 213471227	2019-03-29 13:15:06 -07:00
Uday Bondhugula	64812a56c7	Extend getConstantTripCount to deal with a larger subset of loop bounds; make loop unroll/unroll-and-jam more powerful; add additional affine expr builder methods - use previously added analysis/simplification to infer multiple of unroll factor trip counts, making loop unroll/unroll-and-jam more general. - for loop unroll, support bounds that are single result affine map's with the same set of operands. For unknown loop bounds, loop unroll will now work as long as trip count can be determined to be a multiple of unroll factor. - extend getConstantTripCount to deal with single result affine map's with the same operands. move it to mlir/Analysis/LoopAnalysis.cpp - add additional builder utility methods for affine expr arithmetic (difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to use the utility methods. - move affine analysis routines to AffineAnalysis.cpp/.h from AffineStructures.cpp/.h. - Rename LoopUnrollJam to LoopUnrollAndJam to match class name. - add an additional simplification for simplifyFloorDiv, simplifyCeilDiv - Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does not have operands. Operands are passed to it through affine_apply, from loop bounds/if condition's, etc., operands are stored in the latter. This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU code generation, and can move to other analyses/transformations. Loop nests like these are now unrolled without any cleanup loop being generated. for %i = 1 to 100 { // unroll factor 4: no cleanup loop will be generated. for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) { %x = "foo"(%j) : (affineint) -> i32 } } for %i = 1 to 100 { // unroll factor 4: no cleanup loop will be generated. for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) { %y = "foo"(%j) : (affineint) -> i32 } } for %i = 1 to 100 { for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) { %x = "foo"() : () -> i32 } } TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor changes). PiperOrigin-RevId: 212661212	2019-03-29 13:13:00 -07:00
Uday Bondhugula	3bae041e5d	Add utility to promote single iteration loops. Add methods for getting constant loop counts. Improve / refactor loop unroll / loop unroll and jam. - add utility to remove single iteration loops. - use this utility to promote single iteration loops after unroll/unroll-and-jam - use loopUnrollByFactor for loopUnrollFull and remove most of the latter. - add methods for getting constant loop trip count PiperOrigin-RevId: 212039569	2019-03-29 13:11:21 -07:00
Uday Bondhugula	d5416f299e	Complete AffineExprFlattener based simplification for floordiv/ceildiv. - handle floordiv/ceildiv in AffineExprFlattener; update the simplification to work even if mod/floordiv/ceildiv expressions appearing in the tree can't be eliminated. - refactor the flattening / analysis to move it out of lib/Transforms/ - fix MutableAffineMap::isMultipleOf - add AffineBinaryOpExpr:getAdd/getMul/... utility methods PiperOrigin-RevId: 211540536	2019-03-29 13:09:18 -07:00
Uday Bondhugula	0122a99cbb	Affine expression analysis and simplification. Outside of IR/ - simplify a MutableAffineMap by flattening the affine expressions - add a simplify affine expression pass that uses this analysis - update the FlatAffineConstraints API (to be used in the next CL) In IR: - add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR simplication of simplifyMod simpler and more powerful. - rename the AffineExpr visitor methods to distinguish b/w visiting and walking, and to simplify API names based on context. The next CL will use some of these for the loop unrolling/unroll-jam to make the detection for the need of cleanup loop powerful/non-trivial. A future CL will finally move this simplification to FlatAffineConstraints to make it more powerful. For eg., currently, even if a mod expr appearing in a part of the expression tree can't be simplified, the whole thing won't be simplified. PiperOrigin-RevId: 211012256	2019-03-29 13:07:44 -07:00
Uday Bondhugula	e9fb4b492d	Introduce loop unroll jam transformation. - for test purposes, the unroll-jam pass unroll jams the first outermost loop. While on this: - fix StmtVisitor to allow overriding of function to iterate walk over children of a stmt. PiperOrigin-RevId: 210644813	2019-03-29 13:07:30 -07:00
Uday Bondhugula	00bed4bd99	Extend loop unrolling to unroll by a given factor; add builder for affine apply op. - add builder for AffineApplyOp (first one for an operation that has non-zero operands) - add support for loop unrolling by a given factor; uses the affine apply op builder. While on this, change 'step' of ForStmt to be 'unsigned' instead of AffineConstantExpr *. Add setters for ForStmt lb, ub, step. Sample Input: // CHECK-LABEL: mlfunc @loop_nest_unroll_cleanup() { mlfunc @loop_nest_unroll_cleanup() { for %i = 1 to 100 { for %j = 0 to 17 { %x = "addi32"(%j, %j) : (affineint, affineint) -> i32 %y = "addi32"(%x, %x) : (i32, i32) -> i32 } } return } Output: $ mlir-opt -loop-unroll -unroll-factor=4 /tmp/single2.mlir #map0 = (d0) -> (d0 + 1) #map1 = (d0) -> (d0 + 2) #map2 = (d0) -> (d0 + 3) mlfunc @loop_nest_unroll_cleanup() { for %i0 = 1 to 100 { for %i1 = 0 to 17 step 4 { %0 = "addi32"(%i1, %i1) : (affineint, affineint) -> i32 %1 = "addi32"(%0, %0) : (i32, i32) -> i32 %2 = affine_apply #map0(%i1) %3 = "addi32"(%2, %2) : (affineint, affineint) -> i32 %4 = affine_apply #map1(%i1) %5 = "addi32"(%4, %4) : (affineint, affineint) -> i32 %6 = affine_apply #map2(%i1) %7 = "addi32"(%6, %6) : (affineint, affineint) -> i32 } for %i2 = 16 to 17 { %8 = "addi32"(%i2, %i2) : (affineint, affineint) -> i32 %9 = "addi32"(%8, %8) : (i32, i32) -> i32 } } return } PiperOrigin-RevId: 209676220	2019-03-29 13:03:38 -07:00
Uday Bondhugula	98a24881d3	ShortLoopUnroll - bug fix. Collect loops through a post order walk instead of a pre-order so that loops are collected from inner loops are collected before outer surrounding ones. Add a complex test case. PiperOrigin-RevId: 209041057	2019-03-29 13:01:22 -07:00
Chris Lattner	d6c4c748d7	Escape and unescape strings in the parser and printer so they can roundtrip, print floating point in a structured form that we know can round trip, enumerate attributes in the visitor so we print affine mapping attributes symbolically (the majority of the testcase updates). We still have an issue where the hexadecimal floating point syntax is reparsed as an integer, but that can evolve in subsequent patches. PiperOrigin-RevId: 208828876	2019-03-29 13:00:05 -07:00
Uday Bondhugula	d8490d8d4f	Loop unrolling pass update - fix/complete forStmt cloning for unrolling to work for outer loops - create IV const's only when needed - test outer loop unrolling by creating a short trip count unroll pass for loops with trip counts <= <parameter> - add unrolling test cases for multiple op results, outer loop unrolling - fix/clean up StmtWalker class while on this - switch unroll loop iterator values from i32 to affineint PiperOrigin-RevId: 207645967	2019-03-29 12:56:16 -07:00
Tatiana Shpeisman	a0a6414ca2	Implement ML function arguments. Add representation for argument list in ML Function using TrailingObjects template. Implement argument iterators, parsing and printing. Unrelated minor change - remove OperationStmt::dropReferences(). Since MLFunction does not have cyclic operand references (it's an AST) destruction can be safely done w/o a special pass to drop references. PiperOrigin-RevId: 207583024	2019-03-29 12:55:47 -07:00
Uday Bondhugula	65b6e73245	Loop unrolling update. - deal with non-operation stmt's (if/for stmt's) in loops being unrolled (unrolling of non-innermost loops works). - update uses in unrolled bodies to use results of new operations that may be introduced in the unrolled bodies. Unrolling now works for all kinds of loop nests - perfect nests, imperfect nests, loops at any depth, and with any kind of operation in the body. (IfStmt support not done, hence untested there). Added missing dump/print method for StmtBlock. TODO: add test case for outer loop unrolling. PiperOrigin-RevId: 207314286	2019-03-29 12:55:19 -07:00
Uday Bondhugula	2a003256ae	MLStmt cloning and IV replacement for loop unrolling, add constant pool to MLFunctions. - MLStmt cloning and IV replacement - While at this, fix the innermostLoopGatherer to actually gather all the innermost loops (it was stopping its walk at the first innermost loop it found) - Improve comments for MLFunction statement classes, fix inheritance order. - Fixed StmtBlock destructor. PiperOrigin-RevId: 207049173	2019-03-29 12:53:02 -07:00
Tatiana Shpeisman	c8b0273f19	Implement induction variables. Pretty print induction variable operands as %i<ssa value number>. Add support for future pretty printing of ML function arguments as %arg<ssa value number>. Induction variables are implemented by inheriting ForStmt from MLValue. ForStmt provides APIs that make this design decision invisible to the ForStmt users. This CL in combination with cl/206253643 resolves http://b/111769060. PiperOrigin-RevId: 206655937	2019-03-29 12:49:36 -07:00
Uday Bondhugula	a0abd666a7	Sketch out loop unrolling transformation. - Implement a full loop unroll for innermost loops. - Use it to implement a pass that unroll all the innermost loops of all mlfunction's in a module. ForStmt's parsed currently have constant trip counts (and constant loop bounds). - Implement StmtVisitor based (Visitor pattern) Loop IVs aren't currently parsed and represented as SSA values. Replacing uses of loop IVs in unrolled bodies is thus a TODO. Class comments are sparse at some places - will add them after one round of comments. A cmd-line flag triggers this for now. Original: mlfunc @loops() { for x = 1 to 100 step 2 { for x = 1 to 4 { "Const"(){value: 1} : () -> () } } return } After unrolling: mlfunc @loops() { for x = 1 to 100 step 2 { "Const"(){value: 1} : () -> () "Const"(){value: 1} : () -> () "Const"(){value: 1} : () -> () "Const"(){value: 1} : () -> () } return } PiperOrigin-RevId: 205933235	2019-03-29 12:43:01 -07:00
Tatiana Shpeisman	1b24c48b91	Scaffolding for convertToCFG pass that replaces all instances of ML functions with equivalent CFG functions. Traverses module MLIR, generates CFG functions (empty for now) and removes ML functions. Adds Transforms library and tests. PiperOrigin-RevId: 205848367	2019-03-29 12:41:15 -07:00

... 3 4 5 6 7

306 Commits