llvm-project

Commit Graph

Author	SHA1	Message	Date
Alex Zinenko	0c4ee54198	Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass This change is mechanical and merges the LowerAffineApplyPass and LowerIfAndForPass into a single LowerAffinePass. It makes a step towards defining an "affine dialect" that would contain all polyhedral-related constructs. The motivation for merging these two passes is based on retiring MLFunctions and, eventually, transforming If and For statements into regular operations. After that happens, LowerAffinePass becomes yet another legalization. PiperOrigin-RevId: 227566113	2019-03-29 14:52:52 -07:00
Alex Zinenko	fa710c17f4	LowerForAndIf: expand affine_apply's inplace Existing implementation was created before ML/CFG unification refactoring and did not concern itself with further lowering to separate concerns. As a result, it emitted `affine_apply` instructions to implement `for` loop bounds and `if` conditions and required a follow-up function pass to lower those `affine_apply` to arithmetic primitives. In the unified function world, LowerForAndIf is mostly a lowering pass with low complexity. As we move towards a dialect for affine operations (including `for` and `if`), it makes sense to lower `for` and `if` conditions directly to arithmetic primitives instead of relying on `affine_apply`. Expose `expandAffineExpr` function in LoweringUtils. Use this function together with `expandAffineMaps` to emit primitives that implement loop and branch conditions directly. Also remove tests that become unnecessary after transforming LowerForAndIf into a function pass. PiperOrigin-RevId: 227563608	2019-03-29 14:52:22 -07:00
Alex Zinenko	d64db86f20	Refactor LowerAffineApply In LoweringUtils, extract out `expandAffineMap`. This function takes an affine map and a list of values the map should be applied to and emits a sequence of arithmetic instructions that implement the affine map. It is independent of the AffineApplyOp and can be used in places where we need to insert an evaluation of an affine map without relying on a (temporary) `affine_apply` instruction. This prepares for a merge between LowerAffineApply and LowerForAndIf passes. Move the `expandAffineApply` function to the LowerAffineApply pass since it is the only place that must be aware of the `affine_apply` instructions. PiperOrigin-RevId: 227563439	2019-03-29 14:52:07 -07:00
Chris Lattner	bbf362b784	Eliminate extfunc/cfgfunc/mlfunc as a concept, and just use 'func' instead. The entire compiler now looks at structural properties of the function (e.g. does it have one block, does it contain an if/for stmt, etc) so the only thing holding up this difference is round tripping through the parser/printer syntax. Removing this shrinks the compile by ~140LOC. This is step 31/n towards merging instructions and statements. The last step is updating the docs, which I will do as a separate patch in order to split it from this mostly mechanical patch. PiperOrigin-RevId: 227540453	2019-03-29 14:51:37 -07:00
Nicolas Vasilache	73f5c9c380	[MLIR] Sketch a simple set of EDSCs to declaratively write MLIR This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs) in MLIR components: 1. a `Type` system of shell classes that closely matches the MLIR type system. These types are subdivided into `Bindable` leaf expressions and non-bindable `Expr` expressions; 2. an `MLIREmitter` class whose purpose is to: a. maintain a map of `Bindable` leaf expressions to concrete SSAValue; b. provide helper functionality to specify bindings of `Bindable` classes to SSAValue while verifying comformable types; c. traverse the `Expr` and emit the MLIR. This is used on a concrete example to implement MemRef load/store with clipping in the LowerVectorTransfer pass. More specifically, the following pseudo-C++ code: ```c++ MLFuncBuilder *b = ...; Location location = ...; Bindable zero, one, expr, size; // EDSL expression auto access = select(expr < zero, zero, select(expr < size, expr, size - one)); auto ssaValue = MLIREmitter(b) .bind(zero, ...) .bind(one, ...) .bind(expr, ...) .bind(size, ...) .emit(location, access); ``` is used to emit all the MLIR for a clipped MemRef access. This simple EDSL can easily be extended to more powerful patterns and should serve as the counterpart to pattern matchers (and could potentially be unified once we get enough experience). In the future, most of this code should be TableGen'd but for now it has concrete valuable uses: make MLIR programmable in a declarative fashion. This CL also adds Stmt, proper supporting free functions and rewrites VectorTransferLowering fully using EDSCs. The code for creating the EDSCs emitting a VectorTransferReadOp as loops with clipped loads is: ```c++ Stmt block = Block({ tmpAlloc = alloc(tmpMemRefType), vectorView = vector_type_cast(tmpAlloc, vectorMemRefType), ForNest(ivs, lbs, ubs, steps, { scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs), store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs), }), vectorValue = load(vectorView, zero), tmpDealloc = dealloc(tmpAlloc.getLHS())}); emitter.emitStmt(block); ``` where `accessInfo.clippedScalarAccessExprs)` is created with: ```c++ select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one)); ``` The generated MLIR resembles: ```mlir %1 = dim %0, 0 : memref<?x?x?x?xf32> %2 = dim %0, 1 : memref<?x?x?x?xf32> %3 = dim %0, 2 : memref<?x?x?x?xf32> %4 = dim %0, 3 : memref<?x?x?x?xf32> %5 = alloc() : memref<5x4x3xf32> %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>> for %i4 = 0 to 3 { for %i5 = 0 to 4 { for %i6 = 0 to 5 { %7 = affine_apply #map0(%i0, %i4) %8 = cmpi "slt", %7, %c0 : index %9 = affine_apply #map0(%i0, %i4) %10 = cmpi "slt", %9, %1 : index %11 = affine_apply #map0(%i0, %i4) %12 = affine_apply #map1(%1, %c1) %13 = select %10, %11, %12 : index %14 = select %8, %c0, %13 : index %15 = affine_apply #map0(%i3, %i6) %16 = cmpi "slt", %15, %c0 : index %17 = affine_apply #map0(%i3, %i6) %18 = cmpi "slt", %17, %4 : index %19 = affine_apply #map0(%i3, %i6) %20 = affine_apply #map1(%4, %c1) %21 = select %18, %19, %20 : index %22 = select %16, %c0, %21 : index %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32> store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32> } } } %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>> dealloc %5 : memref<5x4x3xf32> ``` In particular notice that only 3 out of the 4-d accesses are clipped: this corresponds indeed to the number of dimensions in the super-vector. This CL also addresses the cleanups resulting from the review of the prevous CL and performs some refactoring to simplify the abstraction. PiperOrigin-RevId: 227367414	2019-03-29 14:50:23 -07:00
Chris Lattner	a250643ec8	Merge together the CFG/ML function paths in the CSE pass. I did a first pass on this to merge together the classes, but there may be other simplification possible. I'll leave that to riverriddle@ as future work. This is step 29/n towards merging instructions and statements. PiperOrigin-RevId: 227328680	2019-03-29 14:50:08 -07:00
Chris Lattner	7974889f54	Update and generalize various passes to work on both CFG and ML functions, simplifying them in minor ways. The only significant cleanup here is the constant folding pass. All the other changes are simple and easy, but this is still enough to shrink the compiler by 45LOC. The one pass left to merge is the CSE pass, which will be move involved, so I'm splitting it out to its own patch (which I'll tackle right after this). This is step 28/n towards merging instructions and statements. PiperOrigin-RevId: 227328115	2019-03-29 14:49:52 -07:00
Chris Lattner	3c8fc797de	Simplify the remapFunctionAttrs logic, merging CFG/ML function handling. Remove an unnecessary restriction in forward substitution. Slightly simplify LLVM IR lowering, which previously would crash if given an ML function, it should now produce a clean error if given a function with an if/for instruction in it, just like it does any other unsupported op. This is step 27/n towards merging instructions and statements. PiperOrigin-RevId: 227324542	2019-03-29 14:49:35 -07:00
Chris Lattner	4bd9f93606	Simplify GreedyPatternRewriteDriver now that functions are merged into one representation, shrinking by 70LOC. The PatternRewriter class can probably also be simplified as well, but one step at a time. This is step 26/n towards merging instructions and statements. NFC. PiperOrigin-RevId: 227324218	2019-03-29 14:49:20 -07:00
Uday Bondhugula	f12182157e	Introduce PostDominanceInfo, fix properlyDominates() for Instructions - introduce PostDominanceInfo in the right/complete way and use that for post dominance check in store-load forwarding - replace all uses of Analysis/Utils::dominates/properlyDominates with DominanceInfo::dominates/properlyDominates - drop all redundant copies of dominance methods in Analysis/Utils/ - in pipeline-data-transfer, replace dominates call with a much less expensive check; similarly, substitute dominates() in checkMemRefAccessDependence with a simpler check suitable for that context - fix a bug in properlyDominates - improve doc for 'for' instruction 'body' PiperOrigin-RevId: 227320507	2019-03-29 14:48:44 -07:00
Chris Lattner	ae618428f6	Greatly simplify the ConvertToCFG pass, converting it from a module pass to a function pass, and eliminating the need to copy over code and do interprocedural updates. While here, also improve it to make fewer empty blocks, and rename it to "LowerIfAndFor" since that is what it does. This is a net reduction of ~170 lines of code. As drive-bys, change the splitBlock method to not insert an unconditional branch, since that behavior is annoying for all clients. Also improve the AsmPrinter to not crash when a block is referenced that isn't linked into a function. PiperOrigin-RevId: 227308856	2019-03-29 14:48:13 -07:00
Uday Bondhugula	545f3ce430	Fix ASAN failure in memref-dataflow-opt - memrefsToErase had duplicates inserted into it; switch to SmallPtrSet. PiperOrigin-RevId: 227299306	2019-03-29 14:47:58 -07:00
Uday Bondhugula	b9fe6be6d4	Introduce memref store to load forwarding - a simple memref dataflow analysis - the load/store forwarding relies on memref dependence routines as well as SSA/dominance to identify the memref store instance uniquely supplying a value to a memref load, and replaces the result of that load with the value being stored. The memref is also deleted when possible if only stores remain. - add methods for post dominance for MLFunction blocks. - remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth, getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were earlier static) - add a helper method in FlatAffineConstraints - isRangeOneToOne. PiperOrigin-RevId: 227252907	2019-03-29 14:47:28 -07:00
Chris Lattner	dffc589ad2	Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the Function::walk functionality into f->walkInsts/Ops which allows visiting all instructions, not just ops. Eliminate Function::getBody() and Function::getReturn() helpers which crash in CFG functions, and were only kept around as a bridge. This is step 25/n towards merging instructions and statements. PiperOrigin-RevId: 227243966	2019-03-29 14:46:58 -07:00
Chris Lattner	5b9c3f7cdb	Tidy up references to "basic blocks" that should refer to blocks now. NFC. PiperOrigin-RevId: 227196077	2019-03-29 14:44:59 -07:00
Chris Lattner	456ad6a8e0	Standardize naming of statements -> instructions, revisting the code base to be consistent and moving the using declarations over. Hopefully this is the last truly massive patch in this refactoring. This is step 21/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227178245	2019-03-29 14:44:30 -07:00
Chris Lattner	315a466aed	Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand. The last major renaming is Statement -> Instruction, which is why Statement and Stmt still appears in various places. This is step 19/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227163082	2019-03-29 14:43:58 -07:00
Chris Lattner	69d9e990fa	Eliminate the using decls for MLFunction and CFGFunction standardizing on Function. This is step 18/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227139399	2019-03-29 14:43:13 -07:00
Chris Lattner	d798f9bad5	Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(), StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names. This is step 17/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227121537	2019-03-29 14:42:40 -07:00
Chris Lattner	5187cfcf03	Merge Operation into OperationInst and standardize nomenclature around OperationInst. This is a big mechanical patch. This is step 16/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227093712	2019-03-29 14:42:23 -07:00
Chris Lattner	471c976413	Rework inherentance hierarchy: Operation now derives from Statement, and OperationInst derives from it. This allows eliminating some forwarding functions, other complex code handling multiple paths, and the 'isStatement' bit tracked by Operation. This is the last patch I think I can make before the big mechanical change merging Operation into OperationInst, coming next. This is step 15/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227077411	2019-03-29 14:41:49 -07:00
Chris Lattner	4fbcd1ac52	Minor renamings: Trim the "Stmt" prefix off StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the CFGFunctionViewGraph pass to ViewFunctionGraph. This is step 13/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227069438	2019-03-29 14:40:51 -07:00
Chris Lattner	4c05f8cac6	Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new FuncBuilder class. Also rename SSAValue.cpp to Value.cpp This is step 12/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227067644	2019-03-29 14:40:22 -07:00
Chris Lattner	3f190312f8	Merge SSAValue, CFGValue, and MLValue together into a single Value class, which is the new base of the SSA value hierarchy. This CL also standardizes all the nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing. This is step 11/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227064624	2019-03-29 14:40:06 -07:00
Chris Lattner	776b035646	Eliminate the Instruction, BasicBlock, CFGFunction, MLFunction, and ExtFunction classes, using the Statement/StmtBlock hierarchy and Function instead. This only changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure. This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed. This is step 10/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227044402	2019-03-29 14:39:49 -07:00
Chris Lattner	abf72a8bb1	Rename findFunction from the ML side of the house to be named getFunction(), making it more similar to the CFG side of things. It is true that in a deeply nested case that this is not a guaranteed O(1) time operation, and that 'get' could lead compiler hackers to think this is cheap, but we need to merge these and we can look into solutions for this in the future if it becomes a problem in practice. This is step 9/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226983931	2019-03-29 14:38:49 -07:00
Chris Lattner	036f87b15f	Rename CFGFunctionGraphTraits.h -> FunctionGraphTraits.h and add graph specializations for doing CFG traversals of ML Functions, making the two sorts of functions have the same capabilities. This is step 8/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226968502	2019-03-29 14:38:19 -07:00
Alex Zinenko	eb0f9f37af	SuperVectorization: fix 'isa' assertion Supervectorization uses null pointers to SSA values as a means of communicating the failure to vectorize. In operation vectorization, all operations producing the values of operation arguments must be vectorized for the given operation to be vectorized. The existing check verified if any of the value "def" statements was vectorized instead, sometimes leading to assertions inside `isa` called on a null pointer. Fix this to check that all "def" statements were vectorized. PiperOrigin-RevId: 226941552	2019-03-29 14:37:20 -07:00
Jacques Pienaar	58d50a6325	Rename convenience methods to make type explicit. PiperOrigin-RevId: 226939383	2019-03-29 14:36:50 -07:00
Chris Lattner	d613f5ab65	Refactor MLFunction to contain a StmtBlock for its body instead of inheriting from it. This is necessary progress to squaring away the parent relationship that a StmtBlock has with its enclosing if/for/fn, and makes room for functions to have more than one block in the future. This also removes IfClause and ForStmtBody. This is step 5/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226936541	2019-03-29 14:36:35 -07:00
Chris Lattner	9a4060d3f5	Eliminate the ability to add operands to an instruction, used in a narrow case for SSA values in terminators, but easily worked around. At the same time, move the StmtOperand list in a OperationStmt to the end of its trailing objects list so we can reduce the number of operands, without affecting offsets to the other stuff in the allocation. This is important because we want OperationStmts to be consequtive, including their operands - we don't want to use an std::vector of operands like Instructions have. This is patch 4/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226865727	2019-03-29 14:36:20 -07:00
Chris Lattner	87ce4cc501	Per review on the previous CL, drop MLFuncBuilder::createOperation, changing clients to use OperationState instead. This makes MLFuncBuilder more similiar to CFGFuncBuilder. This whole area will get tidied up more when cfg and ml worlds get unified. This patch is just gardening, NFC. PiperOrigin-RevId: 226701959	2019-03-29 14:35:49 -07:00
Chris Lattner	1301f907a1	Refactor ForStmt: having it contain a StmtBlock instead of subclassing StmtBlock. This is more consistent with IfStmt and also conceptually makes more sense - a forstmt "isn't" its body, it contains its body. This is step 1/N towards merging BasicBlock and StmtBlock. This is required because in the new regime StmtBlock will have a use list (just like BasicBlock does) of operands, and ForStmt already has a use list for its induction variable. This is a mechanical patch, NFC. PiperOrigin-RevId: 226684158	2019-03-29 14:35:19 -07:00
MLIR Team	4eef795a1d	Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice. Updates LoopFusion pass to take these parameters as command line flags for experimentation. PiperOrigin-RevId: 226514297	2019-03-29 14:35:03 -07:00
MLIR Team	6892ffb896	Improve loop fusion algorithm by using a memref dependence graph. Fixed TODO for reduction fusion unit test. PiperOrigin-RevId: 226277226	2019-03-29 14:33:02 -07:00
Uday Bondhugula	14d2618f63	Simplify memref-dependence-check's meta data structures / drop duplication and reuse existing ones. - drop IterationDomainContext, redundant since FlatAffineConstraints has MLValue information associated with its dimensions. - refactor to use existing support - leads to a reduction in LOC - as a result of these changes, non-constant loop bounds get naturally supported for dep analysis. - update test cases to include a couple with non-constant loop bounds - rename addBoundsFromForStmt -> addForStmtDomain - complete TODO for getLoopIVs (handle 'if' statements) PiperOrigin-RevId: 226082008	2019-03-29 14:32:46 -07:00
Alex Zinenko	4dbd94b543	Refactor LowerVectorTransfersPass using pattern rewriters This introduces a generic lowering pass for ML functions. The pass is parameterized by template arguments defining individual pattern rewriters. Concrete lowering passes define individual pattern rewriters and inherit from the generic class that takes care of allocating rewriters, traversing ML functions and performing the actual rewrite. While this is similar to the greedy pattern rewriter available in Transform/Utils, it requires adjustments due to the ML/CFG duality. In particular, ML function rewriters must be able to create statements, not only operations, and need access to an MLFuncBuilder. When we move to using the unified function type, the ML-specific rewriting will become unnecessary. Use LowerVectorTransfers as a testbed for the generic pass. PiperOrigin-RevId: 225887424	2019-03-29 14:31:43 -07:00
Alex Zinenko	51c8a095a3	Materialize vector_type_cast operation in the SuperVector dialect This operation is produced and used by the super-vectorization passes and has been emitted as an abstract unregistered operation until now. For end-to-end testing purposes, it has to be eventually lowered to LLVM IR. Matching abstract operation by name goes into the opposite direction of the generic lowering approach that is expected to be used for LLVM IR lowering in the future. Register vector_type_cast operation as a part of the SuperVector dialect. Arguably, this operation is a special case of the `view` operation from the Standard dialect. The semantics of `view` is not fully specified at this point so it is safer to rely on a custom operation. Additionally, using a custom operation may help to achieve clear dialect separation. PiperOrigin-RevId: 225887305	2019-03-29 14:31:13 -07:00
Uday Bondhugula	4a3e4e8ea7	loop-unroll - add function callback argument for outside targets to provide unroll factors, and a cmd line argument to specify number of innermost loop unroll repetitions. - add function callback parameter for outside targets to provide unroll factors - add a cmd line parameter to repeatedly apply innermost loop unroll a certain number of times (to avoid using -loop-unroll -loop-unroll ...; instead -unroll-num-reps=2). - implement the callback for a target - update test cases / usage PiperOrigin-RevId: 225843191	2019-03-29 14:30:28 -07:00
MLIR Team	3b69230b3a	Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion. ) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality. ) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop. ) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access. ) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint. *) Adds multiple fusion unit tests. PiperOrigin-RevId: 225842944	2019-03-29 14:30:13 -07:00
Uday Bondhugula	dced746bd1	Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile - use addBoundsForForStmt - getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the returned things aren't owned / part of the stmt on which it's being called. - other minor API cleanup PiperOrigin-RevId: 225774301	2019-03-29 14:29:28 -07:00
Alex Zinenko	bc52a639f9	Extract vector_transfer_* Ops into a SuperVectorDialect. From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492	2019-03-29 14:28:58 -07:00
River Riddle	5c4f1fdd42	Check if the operation is already in the worklist before adding it. PiperOrigin-RevId: 225379496	2019-03-29 14:27:14 -07:00
Alex Zinenko	97d2f3cd3d	ConvertToCFG: use affine_apply to implement loop steps Originally, loop steps were implemented using `addi` and `constant` operations because `affine_apply` was not handled in the first implementation. The support for `affine_apply` has been added, use it to implement the update of the loop induction variable. This is more consistent with the lower and upper bounds of the loop that are also implemented as `affine_apply`, removes the dependence of the converted function on the StandardOps dialect and makes it clear from the CFG function that all operations on the loop induction variable are purely affine. PiperOrigin-RevId: 225165337	2019-03-29 14:26:22 -07:00
Uday Bondhugula	b9f53dc0bd	Update/Fix LoopUtils::stmtBodySkew to handle loop step. - loop step wasn't handled and there wasn't a TODO or an assertion; fix this. - rename 'delay' to shift for consistency/readability. - other readability changes. - remove duplicate attribute print for DmaStartOp; fix misplaced attribute print for DmaWaitOp - add build method for AddFOp (unrelated to this CL, but add it anyway) PiperOrigin-RevId: 224892958	2019-03-29 14:25:07 -07:00
Uday Bondhugula	d59a95a05c	Fix missing check for dependent DMAs in pipeline-data-transfer - adding a conservative check for now (TODO: use the dependence analysis pass once the latter is extended to deal with DMA ops). resolve an existing bug on a test case. - update test cases PiperOrigin-RevId: 224869526	2019-03-29 14:24:53 -07:00
Uday Bondhugula	6757fb151d	FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD(). - add method normalizeConstraintsByGCD - call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of projectOut. - remove call to GCDTightenInequalities() from getMemRefRegion - change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each time an identifier is eliminated (to detect emptiness early). - make FourierMotzkinEliminate, gaussianEliminateId(s), GCDTightenInequalities() private - improve / update stale comments PiperOrigin-RevId: 224866741	2019-03-29 14:24:37 -07:00
Uday Bondhugula	2ef57806ba	Update/fix -pipeline-data-transfer; fix b/120770946 - fix replaceAllMemRefUsesWith call to replace only inside loop body. - handle the case where DMA buffers are dynamic; extend doubleBuffer() method to handle dynamically shaped DMA buffers (pass the right operands to AllocOp) - place alloc's for DMA buffers at the depth at which pipelining is being done (instead of at top-level) - add more test cases PiperOrigin-RevId: 224852231	2019-03-29 14:24:22 -07:00
Alex Zinenko	073c3ad997	Properly namespace createLowerAffineApply This was missing from the original commit. The implementation of createLowerAffineApply was defined in the default namespace but declared in the `mlir` namespace, which could lead to linking errors when it was used. Put the definition in `mlir` namespace. PiperOrigin-RevId: 224830894	2019-03-29 14:24:04 -07:00
Nicolas Vasilache	c28aeef901	[MLIR] Drop bug-prone global map indexed by MLFunction* PiperOrigin-RevId: 224610805	2019-03-29 14:23:49 -07:00
Uday Bondhugula	2d6478fa92	Extend loop tiling utility to handle non-constant loop bounds and bounds that are a max/min of several expressions. - Extend loop tiling to handle non-constant loop bounds and bounds that are a max/min of several expressions, i.e., bounds using multi-result affine maps - also fix b/120630124 as a result (the IR was in an invalid state when tiled loop generation failed; SSA uses were created that weren't plugged into the IR). PiperOrigin-RevId: 224604460	2019-03-29 14:23:34 -07:00
Uday Bondhugula	dfc752e42b	Generate strided DMAs from -dma-generate - generate DMAs correctly now using strided DMAs where needed - add support for multi-level/nested strides; op still supports one level of stride for now. Other things - add test case for symbolic lower/upper bound; cases where the DMA buffer size can't be bounded by a known constant - add test case for dynamic shapes where the DMA buffers are however bounded by constants - refactor some of the '-dma-generate' code PiperOrigin-RevId: 224584529	2019-03-29 14:23:19 -07:00
Nicolas Vasilache	d9b6420fc9	[MLIR] Add LowerVectorTransfersPass This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp to a simple loop nest via local buffer allocations. This is an MLIR->MLIR lowering based on builders. A few TODOs are left to address in particular: 1. invert the permutation map so the accesses to the remote memref are coalesced; 2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory); 3. support broadcast / avoid copies when permutation_map is not of full column rank 4. add a proper "element_cast" op One notable limitation is this does not plan on supporting boundary conditions. It should be significantly easier to use pre-baked MLIR functions to handle such paddings. This is left for future consideration. Therefore the current CL only works properly for full-tile cases atm. This CL also adds 2 simple tests: ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N step 4 { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 step 4 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5) for %i6 = 0 to 3 { %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32> store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32> dealloc %1 : memref<5x4x3xf32> ``` and ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32> ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { for %i6 = 0 to 3 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32> store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32> %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>> dealloc %1 : memref<5x4x3xf32> ``` PiperOrigin-RevId: 224552717	2019-03-29 14:23:05 -07:00
Nicolas Vasilache	879be718a0	[MLIR] Fix the name of the MaterializeVectorPass PiperOrigin-RevId: 224536381	2019-03-29 14:22:49 -07:00
Smit Hinsu	adca59e4f7	Return bool from all emitError methods similar to Operation::emitOpError This simplifies call-sites returning true after emitting an error. After the conversion, dropped braces around single statement blocks as that seems more common. Also, switched to emitError method instead of emitting Error kind using the emitDiagnostic method. TESTED with existing unit tests PiperOrigin-RevId: 224527868	2019-03-29 14:22:06 -07:00
Nicolas Vasilache	13bc77045e	[MLIR] Drop assert for NYI in Vectorize.cpp This CLs adds proper error emission, removes NYI assertions and documents assumptions that are required in the relevant functions. PiperOrigin-RevId: 224377207	2019-03-29 14:21:37 -07:00
Nicolas Vasilache	5b610630b2	[MLIR] Error handling in MaterializeVectors This removes assertions as a means to capture NYI behavior and propagates errors up. PiperOrigin-RevId: 224376935	2019-03-29 14:20:37 -07:00
Nicolas Vasilache	4adc169bd0	[MLIR] Add AffineMap composition and use it in Materialization This CL adds the following free functions: ``` /// Returns the AffineExpr e o m. AffineExpr compose(AffineExpr e, AffineMap m); /// Returns the AffineExpr f o g. AffineMap compose(AffineMap f, AffineMap g); ``` This addresses the issue that AffineMap composition is only available at a distance via AffineValueMap and is thus unusable on Attributes. This CL thus implements AffineMap composition in a more modular and composable way. This CL does not claim that it can be a good replacement for the implementation in AffineValueMap, in particular it does not support bounded maps atm. Standalone tests are added that replicate some of the logic of the AffineMap composition pass. Lastly, affine map composition is used properly inside MaterializeVectors and a standalone test is added that requires permutation_map composition with a projection map. PiperOrigin-RevId: 224376870	2019-03-29 14:20:22 -07:00
Nicolas Vasilache	df0a25efee	[MLIR] Add support for permutation_map This CL hooks up and uses permutation_map in vector_transfer ops. In particular, when going into the nuts and bolts of the implementation, it became clear that cases arose that required supporting broadcast semantics. Broadcast semantics are thus added to the general permutation_map. The verify methods and tests are updated accordingly. Examples of interest include. Example 1: The following MLIR snippet: ```mlir for %i3 = 0 to %M { for %i4 = 0 to %N { for %i5 = 0 to %P { %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32> }}} ``` may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into: ```mlir for %i3 = 0 to %0 step 32 { for %i4 = 0 to %1 { for %i5 = 0 to %2 step 256 { %4 = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: (d0, d1, d2) -> (d2, d1)} : (memref<?x?x?xf32>, index, index) -> vector<32x256xf32> }}} ```` Meaning that vector_transfer_read will be responsible for reading the 2-D slice: `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will require a transposition when vector_transfer_read is further lowered. Example 2: The following MLIR snippet: ```mlir %cst0 = constant 0 : index for %i0 = 0 to %M { %a0 = load %A[%cst0, %cst0] : memref<?x?xf32> } ``` may vectorize with {permutation_map: (d0) -> (0)} into: ```mlir for %i0 = 0 to %0 step 128 { %3 = vector_transfer_read %arg0, %c0_0, %c0_0 {permutation_map: (d0, d1) -> (0)} : (memref<?x?xf32>, index, index) -> vector<128xf32> } ```` Meaning that vector_transfer_read will be responsible of reading the 0-D slice `%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector broadcast when vector_transfer_read is further lowered. Additionally, some minor cleanups and refactorings are performed. One notable thing missing here is the composition with a projection map during materialization. This is because I could not find an AffineMap composition that operates on AffineMap directly: everything related to composition seems to require going through SSAValue and only operates on AffinMap at a distance via AffineValueMap. I have raised this concern a bunch of times already, the followup CL will actually do something about it. In the meantime, the projection is hacked at a minimum to pass verification and materialiation tests are temporarily incorrect. PiperOrigin-RevId: 224376828	2019-03-29 14:20:07 -07:00
Alex Zinenko	7c89a225cf	ConvertToCFG: support min/max in loop bounds. The recently introduced `select` operation enables ConvertToCFG to support min(max) in loop bounds. Individual min(max) is implemented as `cmpi "lt"`(`cmpi "gt"`) followed by a `select` between the compared values. Multiple results of an `affine_apply` operation extracted from the loop bounds are reduced using min(max) in a sequential manner. While this may decrease the potential for instruction-level parallelism, it is easier to recognize for the following passes, in particular for the vectorizer. PiperOrigin-RevId: 224376233	2019-03-29 14:19:52 -07:00
Alex Zinenko	513d6d896c	OpPointer: replace conversion operator to Operation* to OpType. The implementation of OpPointer<OpType> provides an implicit conversion to Operation , but not to the underlying OpType . This has led to awkward-looking code when an OpPointer needs to be passed to a function accepting an OpType . For example, if (auto someOp = genericOp.dyn_cast<OpType>()) someFunction(&someOp); where "&" makes it harder to read. Arguably, one does not want to spell out OpPointer<OpType> in the line with dyn_cast. More generally, OpPointer is now being used as an owning pointer to OpType rather than to operation. Replace the implicit conversion to Operation* with the conversion to OpType* taking into account const-ness of the type. An Operation* can be obtained from an OpType with a simple call. Since an instance of OpPointer owns the OpType value, the pointer to it is never null. However, the OpType value may not be associated with any Operation*. In this case, return nullptr when conversion is attempted to maintain consistency with the existing null checks. PiperOrigin-RevId: 224368103	2019-03-29 14:19:37 -07:00
Uday Bondhugula	73fc0223e4	Fix cases where unsigned / signed arithmetic was being mixed (following up on cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds. - while on this, sweep through and fix potential repeated evaluation of expressions in loop upper bounds PiperOrigin-RevId: 224268918	2019-03-29 14:19:22 -07:00
Uday Bondhugula	a92130880e	Complete multiple unhandled cases for DmaGeneration / getMemRefRegion; update/improve/clean up API. - update FlatAffineConstraints::getConstBoundDifference; return constant differences between symbolic affine expressions, look at equalities as well. - fix buffer size computation when generating DMAs symbolic in outer loops, correctly handle symbols at various places (affine access maps, loop bounds, loop IVs outer to the depth at which DMA generation is being done) - bug fixes / complete some TODOs for getMemRefRegion - refactor common code b/w memref dependence check and getMemRefRegion - FlatAffineConstraints API update; added methods employ trivial checks / detection - sufficient to handle hyper-rectangular cases in a precise way while being fast / low complexity. Hyper-rectangular cases fall out as trivial cases for these methods while other cases still do not cause failure (either return conservative or return failure that is handled by the caller). PiperOrigin-RevId: 224229879	2019-03-29 14:18:22 -07:00
Alex Zinenko	7868abd9d8	ConvertToCFG: convert "if" statements. The condition of the "if" statement is an integer set, defined as a conjunction of affine constraints. An affine constraints consists of an affine expression and a flag indicating whether the expression is strictly equal to zero or is also allowed to be greater than zero. Affine maps, accepted by `affine_apply` are also formed from affine expressions. Leverage this fact to implement the checking of "if" conditions. Each affine expression from the integer set is converted into an affine map. This map is applied to the arguments of the "if" statement. The result of the application is compared with zero given the equality flag to obtain the final boolean value. The conjunction of conditions is tested sequentially with short-circuit branching to the "else" branch if any of the condition evaluates to false. Create an SESE region for the if statement (including its "then" and optional "else" statement blocks) and append it to the end of the current region. The conditional region consists of a sequence of condition-checking blocks that implement the short-circuit scheme, followed by a "then" SESE region and an "else" SESE region, and the continuation block that post-dominates all blocks of the "if" statement. The flow of blocks that correspond to the "then" and "else" clauses are constructed recursively, enabling easy nesting of "if" statements and if-then-else-if chains. Note that MLIR semantics does not require nor prohibit short-circuit evaluation. Since affine expressions do not have side effects, there is no observable difference in the program behavior. We may trade off extra operations for operation-level parallelism opportunity by first performing all `affine_apply` and comparison operations independently, and then performing a tree pattern reduction of the resulting boolean values with the `muli i1` operations (in absence of the dedicated bit operations). The pros and cons are not clear, and since MLIR does not include parallel semantics, we prefer to minimize the number of sequentially executed operations. PiperOrigin-RevId: 223970248	2019-03-29 14:16:10 -07:00
Nicolas Vasilache	b39d1f0bdb	[MLIR] Add VectorTransferOps This CL implements and uses VectorTransferOps in lieu of the former custom call op. Tests are updated accordingly. VectorTransferOps come in 2 flavors: VectorTransferReadOp and VectorTransferWriteOp. VectorTransferOps can be thought of as a backend-independent pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before it can be lowered to backend-dependent IR. Note that the current implementation does not yet support a real permutation map. Proper support will come in a followup CL. VectorTransferReadOp ==================== VectorTransferReadOp performs a blocking read from a scalar memref location into a super-vector of the same elemental type. This operation is called 'read' by opposition to 'load' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer read has semantics similar to a vector load, with additional support for: 1. an optional value of the elemental type of the MemRef. This value supports non-effecting padding and is inserted in places where the vector read exceeds the MemRef bounds. If the value is not specified, the access is statically guaranteed to be within bounds; 2. an attribute of type AffineMap to specify a slice of the original MemRef access and its transposition into the super-vector shape. The permutation_map is an unbounded AffineMap that must represent a permutation from the MemRef dim space projected onto the vector dim space. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32> ... %val = `ssa-value` : f32 // let %i, %j, %k, %l be ssa-values of type index %v0 = vector_transfer_read %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index) -> vector<16x32x64xf32> %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index, f32) -> vector<16x32x64xf32> ``` VectorTransferWriteOp ===================== VectorTransferWriteOp performs a blocking write from a super-vector to a scalar memref of the same elemental type. This operation is called 'write' by opposition to 'store' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer write has semantics similar to a vector store, with additional support for handling out-of-bounds situations. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>. %val = `ssa-value` : vector<16x32x64xf32> // let %i, %j, %k, %l be ssa-values of type index vector_transfer_write %val, %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index) ``` PiperOrigin-RevId: 223873234	2019-03-29 14:15:25 -07:00
Uday Bondhugula	5f76245cfe	Minor fix for replaceAllMemRefUsesWith. The check for whether the memref was used in a non-derefencing context had to be done inside, i.e., only for the op stmt's that the replacement was specified to be performed on (by the domStmtFilter arg if provided). As such, it is completely fine for example for a function to return a memref while the replacement is being performed only a specific loop's body (as in the case of DMA generation). PiperOrigin-RevId: 223827753	2019-03-29 14:14:43 -07:00
River Riddle	7669a259c4	Add a simple common sub expression elimination pass. The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches. PiperOrigin-RevId: 223811328	2019-03-29 14:14:28 -07:00
Uday Bondhugula	a619b5c295	Debug output / logging memref sizes in DMA generation + related changes - Add method to get a memref's size in bytes - clean up a loop tiling pass helper (NFC) PiperOrigin-RevId: 223422077	2019-03-29 14:12:56 -07:00
Chris Lattner	3f2530cdf5	Split "rewrite" functionality out of Pattern into a new RewritePattern derived class. This change is NFC, but allows for new kinds of patterns, specifically LegalizationPatterns which will be allowed to change the types of things they rewrite. PiperOrigin-RevId: 223243783	2019-03-29 14:12:07 -07:00
Alex Zinenko	68e9721aa8	Rename Deaffinator to LowerAffineApply and patch it. Several things were suggested in post-submission reviews. In particular, use pointers in function interfaces instead of references (still use references internally). Clarify the behavior of the pass in presence of MLFunctions. PiperOrigin-RevId: 222556851	2019-03-29 14:08:59 -07:00
Nicolas Vasilache	63bc6d2f6a	[MLIR] Fix opt build PiperOrigin-RevId: 222491353	2019-03-29 14:08:45 -07:00
Nicolas Vasilache	a5782f0d40	[MLIR][MaterializeVectors] Add a MaterializeVector pass via unrolling. This CL adds an MLIR-MLIR pass which materializes super-vectors to hardware-dependent sized vectors. While the physical vector size is target-dependent, the pass is written in a target-independent way: the target vector size is specified as a parameter to the pass. This pass is thus a partial lowering that opens the "greybox" that is the super-vector abstraction. This first CL adds a first materilization pass iterates over vector_transfer_write operations and: 1. computes the program slice including the current vector_transfer_write; 2. computes the multi-dimensional ratio of super-vector shape to hardware vector shape; 3. for each possible multi-dimensional value within the bounds of ratio, a new slice is instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on the hardware vector type. As a simple example, given: ```mlir mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> for %i0 = 0 to %M { for %i1 = 0 to %N { %a1 = load %A[%i0, %i1] : memref<?x?xf32> %b1 = load %B[%i0, %i1] : memref<?x?xf32> %s1 = addf %a1, %b1 : f32 store %s1, %C[%i0, %i1] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` and the following options: ``` -vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8 ``` materialization emits: ```mlir #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0, d1 + 8) #map2 = (d0, d1) -> (d0, d1 + 16) #map3 = (d0, d1) -> (d0, d1 + 24) mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 32 { %3 = affine_apply #map0(%i0, %i1) %4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %5 = affine_apply #map1(%i0, %i1) %6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %7 = affine_apply #map2(%i0, %i1) %8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %9 = affine_apply #map3(%i0, %i1) %10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %11 = affine_apply #map0(%i0, %i1) %12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %13 = affine_apply #map1(%i0, %i1) %14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %15 = affine_apply #map2(%i0, %i1) %16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %17 = affine_apply #map3(%i0, %i1) %18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %19 = addf %4, %12 : vector<8xf32> %20 = addf %6, %14 : vector<8xf32> %21 = addf %8, %16 : vector<8xf32> %22 = addf %10, %18 : vector<8xf32> %23 = affine_apply #map0(%i0, %i1) "vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %24 = affine_apply #map1(%i0, %i1) "vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %25 = affine_apply #map2(%i0, %i1) "vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %26 = affine_apply #map3(%i0, %i1) "vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` PiperOrigin-RevId: 222455351	2019-03-29 14:08:31 -07:00
Nicolas Vasilache	258dae5d73	[MLIR][Slicing] Apply cleanups This CL applies a few last cleanups from a previous CL that have been missed during the previous submit. PiperOrigin-RevId: 222454774	2019-03-29 14:08:17 -07:00
Nicolas Vasilache	5c16564bca	[MLIR][Slicing] Add utils for computing slices. This CL adds tooling for computing slices as an independent CL. The first consumer of this analysis will be super-vector materialization in a followup CL. In particular, this adds: 1. a getForwardStaticSlice function with documentation, example and a standalone unit test; 2. a getBackwardStaticSlice function with documentation, example and a standalone unit test; 3. a getStaticSlice function with documentation, example and a standalone unit test; 4. a topologicalSort function that is exercised through the getStaticSlice unit test. The getXXXStaticSlice functions take an additional root (resp. terminators) parameter which acts as a boundary that the transitive propagation algorithm is not allowed to cross. PiperOrigin-RevId: 222446208	2019-03-29 14:08:02 -07:00
Uday Bondhugula	2631b155a9	Fix bugs in DMA generation and FlatAffineConstraints; add more test cases. - fix bug in calculating index expressions for DMA buffers in certain cases (affected tiled loop nests); add more test cases for better coverage. - introduce an additional optional argument to replaceAllMemRefUsesWith; additional operands to the index remap AffineMap can now be supplied by the client. - FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound, ::composeMap - fix position bug. - Some clean up and more comments PiperOrigin-RevId: 222434628	2019-03-29 14:07:31 -07:00
Alex Zinenko	615c41c788	Introduce Deaffinator pass. This function pass replaces affine_apply operations in CFG functions with sequences of primitive arithmetic instructions that form the affine map. The actual replacement functionality is located in LoweringUtils as a standalone function operating on an individual affine_apply operation and inserting the result at the location of the original operation. It is expected to be useful for other, target-specific lowering passes that may start at MLFunction level that Deaffinator does not support. PiperOrigin-RevId: 222406692	2019-03-29 14:07:16 -07:00
Uday Bondhugula	b6c03917ad	Remove allocations for memref's that become dead as a result of double buffering in the auto DMA overlap pass. This is done online in the pass. PiperOrigin-RevId: 222313640	2019-03-29 14:05:19 -07:00
Nicolas Vasilache	87d46aaf4b	[MLIR][Vectorize] Refactor Vectorize use-def propagation. This CL refactors a few things in Vectorize.cpp: 1. a clear distinction is made between: a. the LoadOp are the roots of vectorization and must be vectorized eagerly and propagate their value; and b. the StoreOp which are the terminals of vectorization and must be vectorized late (i.e. they do not produce values that need to be propagated). 2. the StoreOp must be vectorized late because in general it can store a value that is not reachable from the subset of loads defined in the current pattern. One trivial such case is storing a constant defined at the top-level of the MLFunction and that needs to be turned into a splat. 3. a description of the algorithm is given; 4. the implementation matches the algorithm; 5. the last example is made parametric, in practice it will fully rely on the implementation of vector_transfer_read/write which will handle boundary conditions and padding. This will happen by lowering to a lower-level abstraction either: a. directly in MLIR (whether DMA or just loops or any async tasks in the future) (whiteboxing); b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor one may want to use; c. a partial mix of a. and b. (grey-boxing) 5. minor cleanups are applied; 6. mistakenly disabled unit tests are re-enabled (oopsie). With this CL, this MLIR snippet: ``` mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> %f1 = constant 1.0 : f32 %f2 = constant 2.0 : f32 for %i0 = 0 to %M { for %i1 = 0 to %N { // non-scoped %f1 store %f1, %A[%i0, %i1] : memref<?x?xf32> } } for %i4 = 0 to %M { for %i5 = 0 to %N { %a5 = load %A[%i4, %i5] : memref<?x?xf32> %b5 = load %B[%i4, %i5] : memref<?x?xf32> %s5 = addf %a5, %b5 : f32 // non-scoped %f1 %s6 = addf %s5, %f1 : f32 store %s6, %C[%i4, %i5] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` vectorized with these arguments: ``` -vectorize -virtual-vector-size 256 --test-fastest-varying=0 ``` vectorization produces this standard innermost-loop vectorized code: ``` mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> %cst = constant 1.000000e+00 : f32 %cst_0 = constant 2.000000e+00 : f32 for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 256 { %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } for %i2 = 0 to %arg0 { for %i3 = 0 to %arg1 step 256 { %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %5 = addf %3, %4 : vector<256xf32> %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> %6 = addf %5, %cst_2 : vector<256xf32> "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now. PiperOrigin-RevId: 222280209	2019-03-29 14:03:50 -07:00
Alex Zinenko	f986d5920b	ConvertToCFG: handle loop 1D affine loop bounds. In the general case, loop bounds can be expressed as affine maps of the outer loop iterators and function arguments. Relax the check for loop bounds to be known integer constants and also accept one-dimensional affine bounds in ConvertToCFG ForStmt lowering. Emit affine_apply operations for both the upper and the lower bound. The semantics of MLFunctions guarantees that both bounds can be computed before the loop starts iterating. Constant bounds are merely a short-hand notation for zero-dimensional affine maps and get supported transparently. Multidimensional affine bounds are not yet supported because the target IR dialect lacks min/max operations necessary to implement the corresponding semantics. PiperOrigin-RevId: 222275801	2019-03-29 14:03:20 -07:00
Jacques Pienaar	d0590caa90	Add op stats pass to mlir-opt. op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting. PiperOrigin-RevId: 222259727	2019-03-29 14:02:46 -07:00
Nicolas Vasilache	89d9913a20	[MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests This CL adds some vector support in prevision of the upcoming vector materialization pass. In particular this CL adds 2 functions to: 1. compute the multiplicity of a subvector shape in a supervector shape; 2. help match operations on strict super-vectors. This is defined for a given subvector shape as an operation that manipulates a vector type that is an integral multiple of the subtype, with multiplicity at least 2. This CL also adds a TestUtil pass where we can dump arbitrary testing of functions and analysis that operate at a much smaller granularity than a pass (e.g. an analysis for which it is convenient to write a bit of artificial MLIR and write some custom test). This is in order to keep using Filecheck for things that essentially look and feel like C++ unit tests. PiperOrigin-RevId: 222250910	2019-03-29 14:02:17 -07:00
Uday Bondhugula	fff1efbaf5	Updates to transformation/analysis passes/utilities. Update DMA generation pass and getMemRefRegion() to work with specified loop depths; add support for outgoing DMAs, store op's. - add support for getMemRefRegion symbolic in outer loops - hence support for DMAs symbolic in outer surrounding loops. - add DMA generation support for outgoing DMAs (store op's to lower memory space); extend getMemoryRegion to store op's. -memref-bound-check now works with store op's as well. - fix dma-generate (references to the old memref in the dma_start op were also being replaced with the new buffer); we need replace all memref uses to work only on a subset of the uses - add a new optional argument for replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional 'operation' argument to serve as a filter - if provided, only those uses that are dominated by the filter are replaced. - Add missing print for attributes for dma_start, dma_wait op's. - update the FlatAffineConstraints API PiperOrigin-RevId: 221889223	2019-03-29 14:00:51 -07:00
River Riddle	d34fcce2a7	[MLIR] Rename OperationInst to Instruction. PiperOrigin-RevId: 221795407	2019-03-29 14:00:09 -07:00
River Riddle	503caf0722	Replace TerminatorInst with builtin terminator operations. Note: Terminators will be merged into the operations list in a follow up patch. PiperOrigin-RevId: 221670037	2019-03-29 13:58:55 -07:00
Alex Zinenko	d030433443	ConvertToCFG: properly remap nested function attributes. Array attributes can nested and function attributes can appear anywhere at that level. They should be remapped to point to the generated CFGFunction after ML-to-CFG conversion, similarly to plain function attributes. Extract the nested attribute remapping functionality from the Parser to Utils. Extract out the remapping function for individual Functions from the module remapping function. Use these new functions in the ML-to-CFG conversion pass and in the parser. PiperOrigin-RevId: 221510997	2019-03-29 13:57:58 -07:00
Alex Zinenko	cb40633969	Move definitions of lopoUnroll* functions to LoopUtils.cpp. These functions are declared in Transforms/LoopUtils.h (included to the Transforms/Utils library) but were defined in the loop unrolling pass in Transforms/LoopUnroll.cpp. As a result, targets depending only on TransformUtils library but not on Transforms could get link errors. Move the definitions to Transforms/Utils/LoopUtils.cpp where they should actually live. This does not modify any code. PiperOrigin-RevId: 221508882	2019-03-29 13:57:44 -07:00
Nicolas Vasilache	fefbf91314	[MLIR] Support for vectorizing operations. This CL adds support for and a vectorization test to perform scalar 2-D addf. The support extension notably comprises: 1. extend vectorizable test to exclude vector_transfer operations and expose them to LoopAnalysis where they are needed. This is a temporary solution a concrete MLIR Op exists; 2. add some more functional sugar mapKeys, apply and ScopeGuard (which became relevant again); 3. fix improper shifting during coarsening; 4. rename unaligned load/store to vector_transfer_read/write and simplify the design removing the unnecessary AllocOp that were introduced prematurely: vector_transfer_read currently has the form: (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32> vector_transfer_write currently has the form: (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> () 5. adds vectorizeOperations which traverses the operations in a ForStmt and rewrites them to their vector form; 6. add support for vector splat from a constant. The relevant tests are also updated. PiperOrigin-RevId: 221421426	2019-03-29 13:56:47 -07:00
Alex Zinenko	5a0d3d0204	Basic conversion of MLFunctions to CFGFunctions. Implement a pass converting a subset of MLFunctions to CFGFunctions. Currently supports arbitrarily complex imperfect loop nests with statically constant (i.e., not affine map) bounds filled with operations. Does NOT support branches and non-constant loop bounds. Conversion is performed per-function and the function names are preserved to avoid breaking any external references to the current module. In-memory IR is updated to point to the right functions in direct calls and constant loads. This behavior is tested via a really hidden flag that enables function renaming. Inside each function, the control flow conversion is based on single-entry single-exit regions, i.e. subgraphs of the CFG that have exactly one incoming and exactly one outgoing edge. Since an MLFunction must have a single "return" statement as per MLIR spec, it constitutes an SESE region. Individual operations are appended to this region. Control flow statements are recursively converted into such regions that are concatenated with the current region. Bodies of the compound statement also form SESE regions, which allows to nest control flow statements easily. Note that SESE regions are not materialized in the code. It is sufficent to keep track of the end of the region as the current instruction insertion point as long as all recursive calls update the insertion point in the end. The converter maintains a mapping between SSA values in ML functions and their CFG counterparts. The mapping is used to find the operands for each operation and is updated to contain the results of each operation as the conversion continues. PiperOrigin-RevId: 221162602	2019-03-29 13:55:22 -07:00
Jacques Pienaar	25e6b541cd	Switch IntegerAttr to use APInt. Change the storage type to APInt from int64_t for IntegerAttr (following the change to APFloat storage in FloatAttr). Effectively a direct change from int64_t to 64-bit APInt throughout (the bitwidth hardcoded). This change also adds a getInt convenience method to IntegerAttr and replaces previous getValue calls with getInt calls. While this changes updates the storage type, it does not update all constant folding calls. PiperOrigin-RevId: 221082788	2019-03-29 13:55:08 -07:00
MLIR Team	b5424dd0cb	Adds support for returning the direction of the dependence between memref accesses (distance/direction vectors). Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths. Updates old and adds new memref dependence check tests. Resolves multiple TODOs. PiperOrigin-RevId: 220816515	2019-03-29 13:53:28 -07:00
Uday Bondhugula	e0623d4b86	Automatic DMA generation for simple cases. - constant bounded memory regions, static shapes, no handling of overlapping/duplicate regions (through union) for now; also only, load memory op's. - add build methods for DmaStartOp, DmaWaitOp. - move getMemoryRegion() into Analysis/Utils and expose it. - fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds; update test cases for memref-bound-check and memref-dependence-check for exclusive bounds (missed in a previous CL) PiperOrigin-RevId: 220729810	2019-03-29 13:53:14 -07:00
River Riddle	2fa4bc9fc8	Implement value type abstraction for locations. Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>. PiperOrigin-RevId: 220682078	2019-03-29 13:52:31 -07:00
Uday Bondhugula	23ddd577ef	Complete migration to exclusive upper bound cl/220448963 had missed a part of the updates. - while on this, clean up some of the test cases to use ops' custom forms. PiperOrigin-RevId: 220675303	2019-03-29 13:52:17 -07:00
Jacques Pienaar	cc9a6ed09d	Initialize Pass with PassID. The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo. PiperOrigin-RevId: 220485662	2019-03-29 13:50:34 -07:00
Nicolas Vasilache	cde8248753	[MLIR] Make upper bound implementation exclusive This CL implement exclusive upper bound behavior as per b/116854378. A followup CL will update the semantics of the for loop. PiperOrigin-RevId: 220448963	2019-03-29 13:49:49 -07:00
Jacques Pienaar	6f0fb22723	Add static pass registration Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage. Change build targets to alwayslink to enforce registration. PiperOrigin-RevId: 220390178	2019-03-29 13:49:34 -07:00
Uday Bondhugula	6cd5d5c544	Introduce loop tiling code generation (hyper-rectangular case) - simple perfectly nested band tiling with fixed tile sizes. - only the hyper-rectangular case is handled, with other limitations of getIndexSet applying (constant loop bounds, etc.); once the latter utility is extended, tiled code generation should become more general. - Add FlatAffineConstraints::isHyperRectangular() PiperOrigin-RevId: 220324933	2019-03-29 13:49:05 -07:00
MLIR Team	f28e4df666	Adds a dependence check to test whether two accesses to the same memref access the same element. - Builds access functions and iterations domains for each access. - Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds. - Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses. - Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops. - The LoopFusion pass will be extended in subsequent CLs. PiperOrigin-RevId: 219630898	2019-03-29 13:47:13 -07:00
Nicolas Vasilache	21638dcda9	[MLIR] Extend vectorization to 2+-D patterns This CL adds support for vectorization using more interesting 2-D and 3-D patterns. Note in particular the fact that we match some pretty complex imperfectly nested 2-D patterns with a quite minimal change to the implementation: we just add a bit of recursion to traverse the matched patterns and actually vectorize the loops. For instance, vectorizing the following loop by 128: ``` for %i3 = 0 to %0 { %7 = affine_apply (d0) -> (d0)(%i3) %8 = load %arg0[%c0_0, %7] : memref<?x?xf32> } ``` Currently generates: ``` #map0 = ()[s0] -> (s0 + 127) #map1 = (d0) -> (d0) for %i3 = 0 to #map0()[%0] step 128 { %9 = affine_apply #map1(%i3) %10 = alloc() : memref<1xvector<128xf32>> %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) : (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) -> (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) %12 = load %10[%c0] : memref<1xvector<128xf32>> } ``` The above is subject to evolution. PiperOrigin-RevId: 219629745	2019-03-29 13:46:58 -07:00
Jacques Pienaar	e1f9e65b9a	Enable constructing a FuncBuilder using a Operation*. FuncBuilder is useful to build a operation to replace an existing operation, so change the constructor to allow constructing it with an existing operation. Change FuncBuilder to contain (effectively) a tagged union of CFGFuncBuilder and MLFuncBuilder (as these should be cheap to copy and avoid allocating/deletion when created via a operation). PiperOrigin-RevId: 219532952	2019-03-29 13:46:22 -07:00
Uday Bondhugula	8201e19e3d	Introduce memref bound checking. Introduce analysis to check memref accesses (in MLFunctions) for out of bound ones. It works as follows: $ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0 * 128 - d1) mlfunc @test() { %0 = alloc() : memref<9x9xi32> %1 = alloc() : memref<128xi32> for %i0 = -1 to 9 { for %i1 = -1 to 9 { %2 = affine_apply #map0(%i0, %i1) %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32> %4 = affine_apply #map1(%i0, %i1) %5 = load %1[%4] : memref<128xi32> } } return } - Improves productivity while manually / semi-automatically developing MLIR for testing / prototyping; also provides an indirect way to catch errors in transformations. - This pass is an easy way to test the underlying affine analysis machinery including low level routines. Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256. While on this: - create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/ - fix a bug in AffineAnalysis.cpp::toAffineExpr TODO: extend to non-constant loop bounds (straightforward). Will transparently work for all accesses once floordiv, mod, ceildiv are supported in the AffineMap -> FlatAffineConstraints conversion. PiperOrigin-RevId: 219397961	2019-03-29 13:46:08 -07:00
River Riddle	4c465a181d	Implement value type abstraction for types. This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 219372163	2019-03-29 13:45:54 -07:00
Nicolas Vasilache	af7f56fdf8	[MLIR] Implement 1-D vectorization for fastest varying load/stores This CL is a first in a series that implements early vectorization of increasingly complex patterns. In particular, early vectorization will support arbitrary loop nesting patterns (both perfectly and imperfectly nested), at arbitrary depths in the loop tree. This first CL builds the minimal support for applying 1-D patterns. It relies on an unaligned load/store op abstraction that can be inplemented differently on different HW. Future CLs will support higher dimensional patterns, but 1-D patterns already exhibit interesting properties. In particular, we want to separate pattern matching (i.e. legality both structural and dependency analysis based), from profitability analysis, from application of the transformation. As a consequence patterns may intersect and we need to verify that a pattern can still apply by the time we get to applying it. A non-greedy analysis on profitability that takes into account pattern intersection is left for future work. Additionally the CL makes the following cleanups: 1. the matches method now returns a value, not a reference; 2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by value; 3. added size and empty methods to matches; 4. added a negative vectorization test with a conditional, this exhibited a but in the iterators. Iterators now return nullptr if the underlying storage is nullpt. PiperOrigin-RevId: 219299489	2019-03-29 13:44:26 -07:00
Chris Lattner	085b687fbd	Add support for walking the use list of an SSAValue and converting owners to Operation*'s, simplifying some code in GreedyPatternRewriteDriver.cpp. Also add print/dump methods on Operation. PiperOrigin-RevId: 219045764	2019-03-29 13:43:01 -07:00
Chris Lattner	967d934180	Fix two issues: 1) We incorrectly reassociated non-reassociative operations like subi, causing miscompilations. 2) When constant folding, we didn't add users of the new constant back to the worklist for reprocessing, causing us to miss some cases (pointed out by Uday). The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch. PiperOrigin-RevId: 218803984	2019-03-29 13:40:35 -07:00
Chris Lattner	adbba70d82	Simplify FunctionPass to eliminate the CFGFunctionPass/MLFunctionPass distinction. FunctionPasses can now choose to get called on all functions, or have the driver split CFG/ML Functions up for them. NFC. PiperOrigin-RevId: 218775885	2019-03-29 13:40:05 -07:00
Chris Lattner	7de0da9594	Refactor all of the canonicalization patterns out of the Canonicalize pass, and make operations provide a list of canonicalizations that can be applied to them. This allows canonicalization to be general to any IR definition. As part of this, sink PatternMatch.h/cpp down to the IR library to fix a layering problem. PiperOrigin-RevId: 218773981	2019-03-29 13:39:49 -07:00
River Riddle	792d1c25e4	Implement value type abstraction for attributes. This is done by changing Attribute to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 218764173	2019-03-29 13:39:19 -07:00
Chris Lattner	64d52014bd	Move transform utilities out to their own TransformUtils library, instead of just having the pattern matcher in its own library. At this point, lib/Transforms/*.cpp are all actually passes themselves (and will probably eventually be themselves move to a new subdirectory as we accrete more). PiperOrigin-RevId: 218745193	2019-03-29 13:39:06 -07:00
Chris Lattner	92285814e2	Refactor the bulk of the worklist driver out of the canonicalizer into its own helper function, in preparation for it being used by other passes. There is still a lot of room for improvement in its design, this patch is intended as an NFC refactoring, and the improvements will continue after this lands. PiperOrigin-RevId: 218737116	2019-03-29 13:38:52 -07:00
Uday Bondhugula	80610c2f49	Introduce Fourier-Motzkin variable elimination + other cleanup/support - Introduce Fourier-Motzkin variable elimination to eliminate a dimension from a system of linear equalities/inequalities. Update isEmpty to use this. Since FM is only exact on rational/real spaces, an emptiness check based on this is guaranteed to be exact whenever it says the underlying set is empty; if it says, it's not empty, there may still be no integer points in it. Also, supports a version that computes "dark shadows". - Test this by checking for "always false" conditionals in if statements. - Unique IntegerSet's that are small (few constraints, few variables). This basically means the canonical empty set and other small sets that are likely commonly used get uniqued; allows checking for the canonical empty set by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size for uniqui'ing. - rename simplify-affine-expr -> simplify-affine-structures Other cleanup - IntegerSet::numConstraints, AffineMap::numResults are no longer needed; remove them. - add copy assignment operators for AffineMap, IntegerSet. - rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet - Misc cleanup for FlatAffineConstraints API PiperOrigin-RevId: 218690456	2019-03-29 13:38:24 -07:00
MLIR Team	5413239350	Adds Gaussian Elimination to FlatAffineConstraints. - Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system. - Adds GCD test check if equality constraints have no solution. - Adds unit test cases. PiperOrigin-RevId: 218546319	2019-03-29 13:38:10 -07:00
Lei Zhang	52a0e58bdb	Change typedef to using to be consistent across the codebase Google C++ style guide also prefers using to typedef. PiperOrigin-RevId: 218541849	2019-03-29 13:37:55 -07:00
Chris Lattner	bd01f9541f	Teach canonicalize pass to unique and hoist constants to the entry block. This is a straight-forward change, but required adding missing moveBefore() methods on operations (requiring moving some traits around to make C++ happy). This also fixes a constness issue with the getBlock/getFunction() methods on Instruction, and adds a missing getFunction() method on MLFuncBuilder. PiperOrigin-RevId: 218523905	2019-03-29 13:36:59 -07:00
Chris Lattner	301f83f906	Implement shape folding in the canonicalization pass: - Add a few canonicalization patterns to fold memref_cast into load/store/dealloc. - Canonicalize alloc(constant) into an alloc with a constant shape followed by a cast. - Add a new PatternRewriter::updatedRootInPlace API to make this more convenient. SimplifyAllocConst and the testcase is heavily based on Uday's implementation work, just in a different framework. PiperOrigin-RevId: 218361237	2019-03-29 13:36:31 -07:00
Uday Bondhugula	ccfe593715	PassResult return cleanup. - return success as long as IR is in a valid state. PiperOrigin-RevId: 218225317	2019-03-29 13:35:47 -07:00
Chris Lattner	a03051b9c4	Add a pattern (x+0) -> x, generalize Canonicalize to CFGFunc's, address a few TODOs, and add some casting support to Operation. PiperOrigin-RevId: 218219340	2019-03-29 13:35:33 -07:00
Chris Lattner	7850258c49	Introduce a new Operation::erase helper to generalize some code in the pattern matcher / canonicalizer, and rename existing eraseFromBlock methods to align with it. PiperOrigin-RevId: 218104455	2019-03-29 13:34:51 -07:00
Chris Lattner	73a802741e	Introduce a new PatternRewriter class to help keep the worklist in PatternMatcher clients up to date and provide a funnel point for newly added operations. This is also progress towards the canonicalizer supporting CFGFunctions. This paves the way for more complex patterns, but by itself doesn't do much useful, so no testcase. PiperOrigin-RevId: 218101737	2019-03-29 13:34:23 -07:00
Uday Bondhugula	2f1103bd93	Loop bound constant folding: follow-up / address comments from cl/215997346 - create a single function to fold both bounds - move bound constant folding into transforms PiperOrigin-RevId: 217954701	2019-03-29 13:33:55 -07:00
Feng Liu	34927e2474	Rename Operation::getAs to Operation::dyn_cast Also rename Operation::is to Operation::isa Introduce Operation::cast All of these are for consistency with global dyn_cast/cast/isa operators. PiperOrigin-RevId: 217878786	2019-03-29 13:33:41 -07:00
Uday Bondhugula	18e666702c	Generalize / improve DMA transfer overlap; nested and multiple DMA support; resolve multiple TODOs. - replace the fake test pass (that worked on just the first loop in the MLFunction) to perform DMA pipelining on all suitable loops. - nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops) - fix bugs / assumptions: correctly copy memory space and elemental type of source memref for double buffering. - correctly identify matching start/finish statements, handle multiple DMAs per loop. - introduce dominates/properlyDominates utitilies for MLFunction statements. - move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it getShiftValidity - refactor getContainingStmtPos -> findAncestorStmtInBlock - move into Analysis/Utils.h; has two users. - other improvements / cleanup for related API/utilities - add size argument to dma_wait - for nested DMAs or in general, it makes it easy to obtain the size to use when lowering the dma_wait since we wouldn't want to identify the matching dma_start, and more importantly, in general/in the future, there may not always be a dma_start dominating the dma_wait. - add debug information in the pass PiperOrigin-RevId: 217734892	2019-03-29 13:32:28 -07:00
Nicolas Vasilache	3013dadb7c	[MLIR] Basic infrastructure for vectorization test This CL implements a very simple loop vectorization test and the basic infrastructure to support it. The test simply consists in: 1. matching the loops in the MLFunction and all the Load/Store operations nested under the loop; 2. testing whether all the Load/Store are contiguous along the innermost memory dimension along that particular loop. If any reference is non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then the loop is not-vectorizable. The simple test above can gradually be extended with more interesting behaviors to account for the fact that a layout permutation may exist that enables contiguity etc. All these will come in due time but it is worthwhile noting that the test already supports detection of outer-vetorizable loops. In implementing this test, I also added a recursive MLFunctionMatcher and some sugar that can capture patterns such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating on the matched IR structures. For now it just uses in order traversal but post-order DFS will be useful in the future once IR rewrites start occuring. One may note that the memory management design decision follows a different pattern from MLIR. After evaluating different designs and how they quickly increase cognitive overhead, I decided to opt for the simplest solution in my view: a class-wide (threadsafe) RAII context. This way, a pass that needs MLFunctionMatcher can just have its own locally scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed. If passes are expected to have a longer lifetime, then the contexts can easily be scoped inside the runOnMLFunction call and storage lifetime reduced. Lastly, whatever the scope of threading (module, function, pass), this is expected to also be future-proof wrt concurrency (but this is a detail atm). PiperOrigin-RevId: 217622889	2019-03-29 13:32:13 -07:00
Jacques Pienaar	47e7cd333e	Use FuncBuilder instead of MLFuncBuilder in pattern matcher. Use the general function buil wrapper instead of the CFG/ML specific one. PiperOrigin-RevId: 217335607	2019-03-29 13:31:59 -07:00
Chris Lattner	80e884a9f8	Add constant folding and binary operator reassociation to the canonicalize pass, build up the worklist infra in anticipation of improving the pattern matcher to match more than one node. PiperOrigin-RevId: 217330579	2019-03-29 13:31:44 -07:00
Feng Liu	0faf563383	Move Pattern and related classes to a different file So we can use it as a library. PiperOrigin-RevId: 217267049	2019-03-29 13:31:03 -07:00
MLIR Team	0114e232d8	Adds method to AffineApplyOp which forward substitutes its results into any of its users which are also AffineApplyOps. Updates ComposeAffineMaps test pass to use this method. Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass. PiperOrigin-RevId: 217163351	2019-03-29 13:30:49 -07:00
Chris Lattner	7e7157fd1d	Various improvements to pattern matching and other infra: - Make it so OpPointer implicitly converts to SSAValue* when the underlying op has a single value. This eliminates a lot more ->getResult() calls and makes the behavior more LLVM-like - Fill out PatternBenefit to be typed instead of just a typedef for int with magic numbers. - Simplify various code due to these changes. PiperOrigin-RevId: 217020717	2019-03-29 13:29:49 -07:00
Uday Bondhugula	86eac4618c	Create private exclusive / single use affine computation slice for an op stmt. - add util to create a private / exclusive / single use affine computation slice for an op stmt (see method doc comment); a single multi-result affine_apply op is prepended to the op stmt to provide all results needed for its operands as a function of loop iterators and symbols. - use it for DMA pipelining (to create private slices for DMA start stmt's); resolve TODOs/feature request (b/117159533) - move createComposedAffineApplyOp to Transforms/Utils; free it from taking a memref as input / generalize it. PiperOrigin-RevId: 216926818	2019-03-29 13:29:21 -07:00
Chris Lattner	9e3b928e32	Implement a super sketched out pattern match/rewrite framework and a sketched out canonicalization pass to drive it, and a simple (x-x) === 0 pattern match as a test case. There is a tremendous number of improvements that need to land, and the matcher/rewriter and patterns will be split out of this file, but this is a starting point. PiperOrigin-RevId: 216788604	2019-03-29 13:29:07 -07:00
Chris Lattner	8dda701a9c	Add MLFunction::walk/walkPostOrder methods for doing a simple traversal of operations. This is a simplified form for the existing walker API. PiperOrigin-RevId: 216754991	2019-03-29 13:28:26 -07:00
Jacques Pienaar	764fd035b0	Split BuiltinOps out of StandardOps. * Move Return, Constant and AffineApply out into BuiltinOps; * BuiltinOps are always registered, while StandardOps follow the same dynamic registration; * Kept isValidX in MLValue as we don't have a verify on AffineMap so need to keep it callable from Parser (I wanted to move it to be called in verify instead); PiperOrigin-RevId: 216592527	2019-03-29 13:28:12 -07:00
Nicolas Vasilache	1d3e7e2616	[MLIR] AffineMap value type This CL applies the same pattern as AffineExpr to AffineMap: a simple struct that acts as the storage is allocated in the bump pointer. The AffineMap is immutable and accessed everywhere by value. PiperOrigin-RevId: 216445930	2019-03-29 13:26:24 -07:00
Uday Bondhugula	82e55750d2	Add target independent standard DMA ops: dma.start, dma.wait Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline data transfer to use these to detect DMA ops. While on this - return failure from mlir-opt::performActions if a pass generates invalid output - improve error message for verify 'n' operand traits PiperOrigin-RevId: 216429885	2019-03-29 13:26:10 -07:00
MLIR Team	c386143834	Address comments from previous CL/216216446 PiperOrigin-RevId: 216298139	2019-03-29 13:25:28 -07:00
Nicolas Vasilache	6707c7bea1	[MLIR] AffineExpr final cleanups This CL: 1. performs the global codemod AffineXExpr->AffineXExprClass and AffineXExprRef -> AffineXExpr; 2. simplifies function calls by removing the redundant MLIRContext parameter; 3. adds missing binary operator versions of scalar op AffineExpr where it makes sense. PiperOrigin-RevId: 216242674	2019-03-29 13:25:14 -07:00
MLIR Team	fe490043b0	Affine map composition. ) Implements AffineValueMap forward substitution for AffineApplyOps. ) Adds ComposeAffineMaps transformation pass, which composes affine maps for all loads/stores in an MLFunction. *) Adds multiple affine map composition tests. PiperOrigin-RevId: 216216446	2019-03-29 13:24:59 -07:00
Nicolas Vasilache	ce2edea135	[MLIR] Cleanup AffineExpr This CL introduces a series of cleanups for AffineExpr value types: 1. to make it clear that the value types should be used, the pointer AffineExpr types are put in the detail namespace. Unfortunately, since the value type operator-> only forwards to the underlying pointer type, we still need to expose this in the include file for now; 2. AffineExprKind is ok to use, it thus comes out of detail and thus of AffineExpr 3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are similarly extracted as free functions and their naming is mande consistent across Builder, MLContext and AffineExpr 4. AffineBinaryOpEx::simplify functions are made into static free functions. In particular it is moved away from AffineMap.cpp where it does not belong 5. operator AffineExprType is made explicit 6. uses the binary operators everywhere possible 7. drops the pointer usage everywhere outside of AffineExpr.cpp, MLIRContext.cpp and AsmPrinter.cpp PiperOrigin-RevId: 216207212	2019-03-29 13:24:45 -07:00
Nicolas Vasilache	4911978f7e	[MLIR] Value types for AffineXXXExpr This CL makes AffineExprRef into a value type. Notably: 1. drops llvm isa, cast, dyn_cast on pointer type and uses member functions on the value type. It may be possible to still use classof (in a followup CL) 2. AffineBaseExprRef aggressively casts constness away: if we mean the type is immutable then let's jump in with both feet; 3. Drop implicit casts to the underlying pointer type because that always results in surprising behavior and is not needed in practice once enough cleanup has been applied. The remaining negative I see is that we still need to mix operator. and operator->. There is an ugly solution that forwards the methods but that ends up duplicating the class hierarchy which I tried to avoid as much as possible. But maybe it's not that bad anymore since AffineExpr.h would still contain a single class hierarchy (the duplication would be impl detail in.cpp) PiperOrigin-RevId: 216188003	2019-03-29 13:24:31 -07:00
Chris Lattner	d2d89cbc19	Rename affineint type to index type. The name 'index' may not be perfect, but is better than the old name. Here is some justification: 1) affineint (as it is named) is not a type suitable for general computation (e.g. the multiply/adds in an integer matmul). It has undefined width and is undefined on overflow. They are used as the indices for forstmt because they are intended to be used as indexes inside the loop. 2) It can be used in both cfg and ml functions, and in cfg functions. As you mention, “symbols” are not affine, and we use affineint values for symbols. 3) Integers aren’t affine, the algorithms applied to them can be. :) 4) The only suitable use for affineint in MLIR is for indexes and dimension sizes (i.e. the bounds of those indexes). PiperOrigin-RevId: 216057974	2019-03-29 13:24:16 -07:00
Uday Bondhugula	d18ae9e2c7	Constant folding for loop bounds. - Fold the lower/upper bound of a loop to a constant whenever the result of the application of the bound's affine map on the operand list yields a constant. - Update/complete 'for' stmt's API to set lower/upper bounds with operands. Resolve TODOs for ForStmt::set{Lower,Upper}Bound. - Moved AffineExprConstantFolder into AffineMap.cpp and added AffineMap::constantFold to be used by both AffineApplyOp and ForStmt::constantFoldBound. PiperOrigin-RevId: 215997346	2019-03-29 13:24:01 -07:00
Uday Bondhugula	f069d796f3	Fix opt build breakage - lib/Transforms/Utils.cpp PiperOrigin-RevId: 215924308	2019-03-29 13:23:46 -07:00
Chris Lattner	6822c4e29c	Implement support for constant folding operations even when their operands are not all constant. Implement support for folding dim, x*0, and affine_apply. PiperOrigin-RevId: 215917432	2019-03-29 13:23:32 -07:00
Uday Bondhugula	6cfdb756b1	Introduce memref replacement/rewrite support: to replace an existing memref with a new one (of a potentially different rank/shape) with an optional index remapping. - introduce Utils::replaceAllMemRefUsesWith - use this for DMA double buffering (This CL also adds a few temporary utilities / code that will be done away with once: 1) abstract DMA op's are added 2) memref deferencing side-effect / trait is available on op's 3) b/117159533 is resolved (memref index computation slices). PiperOrigin-RevId: 215831373	2019-03-29 13:23:19 -07:00
Nicolas Vasilache	b55b407601	[RFC][MLIR] Use AffineExprRef in place of AffineExpr* in IR This CL starts by replacing AffineExpr* with value-type AffineExprRef in a few places in the IR. By a domino effect that is pretty telling of the inconsistencies in the codebase, const is removed where it makes sense. The rationale is that the decision was concisously made that unique'd types have pointer semantics without const specifier. This is fine but we should be consistent. In the end, the only logical invariant is that there should never be such a thing as a const AffineExpr, const AffineMap or const IntegerSet* in our codebase. This CL takes a number of shortcuts to killing const with fire, in particular forcing const AffineExprRef to return the underlying non-const AffineExpr. This will be removed once AffineExpr has disappeared in containers but for now such shortcuts allow a bit of sanity in this long quest for cleanups. The only places where const AffineExpr, const AffineMap or const IntegerSet* may still appear is by transitive needs from containers, comparison operators etc. There is still one major thing remaining here: figure out why cast/dyn_cast return me a const AffineXXX, which in turn requires a bunch of ugly const_casts. I suspect this is due to the classof taking const AffineXXXExpr. I wonder whether this is a side effect of 1., if it is coming from llvm itself (I'd doubt it) or something else (clattner@?) In light of this, the whole discussion about const makes total sense to me now and I would systematically apply the rule that in the end, we should never have any const XXX in our codebase for unique'd types (assuming we can remove them all in containers and no additional constness constraint is added on us from the outside world). PiperOrigin-RevId: 215811554	2019-03-29 13:23:05 -07:00
Nicolas Vasilache	5b8017db18	[MLIR] Templated AffineExprBaseRef This CL implements AffineExprBaseRef as a templated type to allow LLVM-style casts to work properly. This also allows making AffineExprBaseRef::expr private. To achieve this, it is necessary to use llvm::simplify_type and make AffineConstExpr derive from both AffineExpr and llvm::simplify<AffineExprRef>. Note that llvm::simplify_type is just an interface to enable the proper template resolution of isa/cast/dyn_cast but it otherwise holds no value. Lastly note that certain dyn_cast operations wanted the const AffineExpr* form of AffineExprBaseRef so I made the implicit constructor take that by default and documented the immutable behavior. I think this is consistent with the decision to make unique'd type immutable by convention and never use const on them. PiperOrigin-RevId: 215642247	2019-03-29 13:22:49 -07:00
Nicolas Vasilache	544f5e7a9b	[MLIR] Remove uses of AffineExpr* outside of IR This CL uniformizes the uses of AffineExprWrap outside of IR. The public API of AffineExpr builder is modified to only use AffineExprWrap. A few places access AffineExprWrap.expr, this is only while the API is in transition to easily keep track (i.e. make expr private and let the compiler track the errors). Parser.cpp exhibits patterns that are dependent on nullptr values so converting it is left for another CL. PiperOrigin-RevId: 215642005	2019-03-29 13:22:35 -07:00
Nicolas Vasilache	9ef87c4b6b	[MLIR] AffineExpr lightweight value type for operators This CL proposes adding MLIRContext* to AffineExpr as discussed previously. This allows the value class to not require the context in its constructor and makes it a POD that it makes sense to pass by value everywhere. A list of other RFC CLs will build on this. The RFC CLs are small incremental pushes of the API which would be a pretty big change otherwise. Pushing the thinking a little bit more it seems reasonable to use implicit cast/constructor to/from AffineExpr. As this thing evolves, it looks to me like IR (and probably Parser, for not so good reasons) want to operate on AffineExpr and the rest of the code wants to operate on the value type. For this reason I think AffineExprImpl/AffineExpr may also make sense but I do not have a particular naming preference. The jury is still out for naming decision between the above and AffineExprBase/AffineExpr or AffineExpr*/AffineExprRef. PiperOrigin-RevId: 215641596	2019-03-29 13:22:21 -07:00
Nicolas Vasilache	4805e629c5	[MLIR] Use chainable ligthweight wrapper for AffineExpr This CL argues that the builder API for AffineExpr should be used with a lightweight wrapper that supports operators chaining. This CL takes the ill-named AffineExprWrap and proposes a simple set of operators with builtin constant simplifications. This allows: 1. removing the getAddMulPureAffineExpr function; 2. avoiding concerns about constant vs non-constant simplifications at every call site; 3. writing the mathematical expressions we want to write without unnecessary obfuscations. The points above represent pure technical debt that we don't want to carry on. It is important to realize that this is not a mere convenience or "just sugar" but reduction in cognitive overhead. This thinking can be pushed significantly further, I have added some comments with some basic ideas but we could make AffineMap, AffineApply and other objects that use map applications more functional and value-based. I am putting this out to get a first batch of reviews and see what people think. I think in my preferred design I would have the Builder directly return such AffineExprPtr objects by value everywhere and avoid the boilerplate explicit creations that I am doing by hand at this point. Yes this AffineExprPtr would implicitly convert to AffineExpr* because that is what it is. PiperOrigin-RevId: 215641317	2019-03-29 13:22:07 -07:00
Uday Bondhugula	041817a45e	Introduce loop body skewing / loop pipelining / loop shifting utility. - loopBodySkew shifts statements of a loop body by stmt-wise delays, and is typically meant to be used to: - allow overlap of non-blocking start/wait until completion operations with other computation - allow shifting of statements (for better register reuse/locality/parallelism) - software pipelining (when applied to the innermost loop) - an additional argument specifies whether to unroll the prologue and epilogue. - add method to check SSA dominance preservation. - add a fake loop pipeline pass to test this utility. Sample input/output are below. While on this, fix/add following: - fix minor bug in getAddMulPureAffineExpr - add additional builder methods for common affine map cases - fix const_operand_iterator's for ForStmt, etc. When there is no such thing as 'const MLValue', the iterator shouldn't be returning const MLValue's. Returning MLValue is const correct. Sample input/output examples: 1) Simplest case: shift second statement by one. Input: for %i = 0 to 7 { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint } Output: #map0 = (d0) -> (d0 - 1) mlfunc @loop_nest_simple1() { %c8 = constant 8 : affineint %c0 = constant 0 : affineint %0 = "foo"(%c0) : (affineint) -> affineint for %i0 = 1 to 7 { %1 = "foo"(%i0) : (affineint) -> affineint %2 = affine_apply #map0(%i0) %3 = "bar"(%2) : (affineint) -> affineint } %4 = affine_apply #map0(%c8) %5 = "bar"(%4) : (affineint) -> affineint return } 2) DMA overlap: shift dma.wait and compute by one. Input for %i = 0 to 7 { %pingpong = affine_apply (d0) -> (d0 mod 2) (%i) "dma.enqueue"(%pingpong) : (affineint) -> affineint %pongping = affine_apply (d0) -> (d0 mod 2) (%i) "dma.wait"(%pongping) : (affineint) -> affineint "compute1"(%pongping) : (affineint) -> affineint } Output #map0 = (d0) -> (d0 mod 2) #map1 = (d0) -> (d0 - 1) #map2 = ()[s0] -> (s0 + 7) mlfunc @loop_nest_dma() { %c8 = constant 8 : affineint %c0 = constant 0 : affineint %0 = affine_apply #map0(%c0) %1 = "dma.enqueue"(%0) : (affineint) -> affineint for %i0 = 1 to 7 { %2 = affine_apply #map0(%i0) %3 = "dma.enqueue"(%2) : (affineint) -> affineint %4 = affine_apply #map1(%i0) %5 = affine_apply #map0(%4) %6 = "dma.wait"(%5) : (affineint) -> affineint %7 = "compute1"(%5) : (affineint) -> affineint } %8 = affine_apply #map1(%c8) %9 = affine_apply #map0(%8) %10 = "dma.wait"(%9) : (affineint) -> affineint %11 = "compute1"(%9) : (affineint) -> affineint return } 3) With arbitrary affine bound maps: Shift last two statements by two. Input: for %i = %N to ()[s0] -> (s0 + 7)()[%N] { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint %z = "foo_bar"(%i) : (affineint) -> (affineint) "bar_foo"(%i) : (affineint) -> (affineint) } Output #map0 = ()[s0] -> (s0 + 1) #map1 = ()[s0] -> (s0 + 2) #map2 = ()[s0] -> (s0 + 7) #map3 = (d0) -> (d0 - 2) #map4 = ()[s0] -> (s0 + 8) #map5 = ()[s0] -> (s0 + 9) for %i0 = %arg0 to #map0()[%arg0] { %0 = "foo"(%i0) : (affineint) -> affineint %1 = "bar"(%i0) : (affineint) -> affineint } for %i1 = #map1()[%arg0] to #map2()[%arg0] { %2 = "foo"(%i1) : (affineint) -> affineint %3 = "bar"(%i1) : (affineint) -> affineint %4 = affine_apply #map3(%i1) %5 = "foo_bar"(%4) : (affineint) -> affineint %6 = "bar_foo"(%4) : (affineint) -> affineint } for %i2 = #map4()[%arg0] to #map5()[%arg0] { %7 = affine_apply #map3(%i2) %8 = "foo_bar"(%7) : (affineint) -> affineint %9 = "bar_foo"(%7) : (affineint) -> affineint } 4) Shift one by zero, second by one, third by two for %i = 0 to 7 { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint %z = "foobar"(%i) : (affineint) -> affineint } #map0 = (d0) -> (d0 - 1) #map1 = (d0) -> (d0 - 2) #map2 = ()[s0] -> (s0 + 7) %c9 = constant 9 : affineint %c8 = constant 8 : affineint %c1 = constant 1 : affineint %c0 = constant 0 : affineint %0 = "foo"(%c0) : (affineint) -> affineint %1 = "foo"(%c1) : (affineint) -> affineint %2 = affine_apply #map0(%c1) %3 = "bar"(%2) : (affineint) -> affineint for %i0 = 2 to 7 { %4 = "foo"(%i0) : (affineint) -> affineint %5 = affine_apply #map0(%i0) %6 = "bar"(%5) : (affineint) -> affineint %7 = affine_apply #map1(%i0) %8 = "foobar"(%7) : (affineint) -> affineint } %9 = affine_apply #map0(%c8) %10 = "bar"(%9) : (affineint) -> affineint %11 = affine_apply #map1(%c8) %12 = "foobar"(%11) : (affineint) -> affineint %13 = affine_apply #map1(%c9) %14 = "foobar"(%13) : (affineint) -> affineint 5) SSA dominance violated; no shifting if a shift is specified for the second statement. for %i = 0 to 7 { %x = "foo"(%i) : (affineint) -> affineint "bar"(%x) : (affineint) -> affineint } PiperOrigin-RevId: 214975731	2019-03-29 13:21:26 -07:00
Uday Bondhugula	591fa9698e	Change behavior of loopUnrollFull with unroll factor 1 Using loopUnrollFull with unroll factor 1 should promote the loop body as opposed to doing nothing. PiperOrigin-RevId: 214812126	2019-03-29 13:20:59 -07:00
Uday Bondhugula	501462ac47	Use statement walker for constant folding. - makes the code compact (gets rid of MLFunction walking logic) - makes it natural to extend to fold affine map loop bounds and if conditions (upcoming CL) PiperOrigin-RevId: 214668957	2019-03-29 13:19:32 -07:00
Chris Lattner	cdb9551aba	Move the GraphTraits implementations for CFGs out to their own header, consolidate the implementations in CFGFunctionViewGraph.cpp into it, and implement the missing const specializations for functions. NFC. PiperOrigin-RevId: 214048649	2019-03-29 13:17:35 -07:00
Chris Lattner	d6f8ec7bac	Introduce [post]dominator tree and related infrastructure, use it in CFG func verifier. We get most of this infrastructure directly from LLVM, we just need to adapt it to our CFG abstraction. This has a few unrelated changes engangled in it: - getFunction() in various classes was const incorrect, fix it. - This moves Verifier.cpp to the analysis library, since Verifier depends on dominance and these are both really analyses. - IndexedAccessorIterator::reference was defined wrong, leading to really exciting template errors that were fun to diagnose. - This flips the boolean sense of the foldOperation() function in constant folding pass in response to previous patch feedback. PiperOrigin-RevId: 214046593	2019-03-29 13:17:20 -07:00
Chris Lattner	82eb284a53	Implement support for constant folding operations and a simple constant folding optimization pass: - Give the ability for operations to implement a constantFold hook (a simple one for single-result ops as well as general support for multi-result ops). - Implement folding support for constant and addf. - Implement support in AbstractOperation and Operation to make this usable by clients. - Implement a very simple constant folding pass that does top down folding on CFG and ML functions, with a testcase that exercises all the above stuff. Random cleanups: - Improve the build APIs for ConstantOp. - Stop passing "-o -" to mlir-opt in the testsuite, since that is the default. PiperOrigin-RevId: 213749809	2019-03-29 13:16:33 -07:00
Feng Liu	7e004efae2	Add function attributes for ExtFunction, CFGFunction and MLFunction. PiperOrigin-RevId: 213540509	2019-03-29 13:15:35 -07:00
Uday Bondhugula	ab4797229c	Extend loop unroll/unroll-and-jam to affine bounds + refactor related code. - extend loop unroll-jam similar to loop unroll for affine bounds - extend both loop unroll/unroll-jam to deal with cleanup loop for non multiple of unroll factor. - extend promotion of single iteration loops to work with affine bounds - fix typo bugs in loop unroll - refactor common code b/w loop unroll and loop unroll-jam - move prototypes of non-pass transforms to LoopUtils.h - add additional builder methods. - introduce loopUnrollUpTo(factor) to unroll by either factor or trip count, whichever is less. - remove Statement::isInnermost (not used for now - will come back at the right place/in right form later) PiperOrigin-RevId: 213471227	2019-03-29 13:15:06 -07:00
Tatiana Shpeisman	52111cefc0	Store 'then' clause statements directly in the 'if' statement. Also a few minor changes. PiperOrigin-RevId: 213359024	2019-03-29 13:14:23 -07:00
Uday Bondhugula	37a3f638ea	Misc changes to builder's and Transforms/ API to allow code generation. - add builder method for ReturnOp - expose API from Transforms/ to work on specific ML statements (do this for LoopUnroll, LoopUnrollAndJam) - add MLFuncBuilder::getForStmtBodyBuilder, ::getBlock PiperOrigin-RevId: 213074178	2019-03-29 13:14:09 -07:00
Jacques Pienaar	fb3116f59e	Add PassResult and have passes return PassResult to indicate failure/success. For FunctionPass's for passes that want to stop upon error encountered. PiperOrigin-RevId: 213058651	2019-03-29 13:13:55 -07:00
Uday Bondhugula	64812a56c7	Extend getConstantTripCount to deal with a larger subset of loop bounds; make loop unroll/unroll-and-jam more powerful; add additional affine expr builder methods - use previously added analysis/simplification to infer multiple of unroll factor trip counts, making loop unroll/unroll-and-jam more general. - for loop unroll, support bounds that are single result affine map's with the same set of operands. For unknown loop bounds, loop unroll will now work as long as trip count can be determined to be a multiple of unroll factor. - extend getConstantTripCount to deal with single result affine map's with the same operands. move it to mlir/Analysis/LoopAnalysis.cpp - add additional builder utility methods for affine expr arithmetic (difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to use the utility methods. - move affine analysis routines to AffineAnalysis.cpp/.h from AffineStructures.cpp/.h. - Rename LoopUnrollJam to LoopUnrollAndJam to match class name. - add an additional simplification for simplifyFloorDiv, simplifyCeilDiv - Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does not have operands. Operands are passed to it through affine_apply, from loop bounds/if condition's, etc., operands are stored in the latter. This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU code generation, and can move to other analyses/transformations. Loop nests like these are now unrolled without any cleanup loop being generated. for %i = 1 to 100 { // unroll factor 4: no cleanup loop will be generated. for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) { %x = "foo"(%j) : (affineint) -> i32 } } for %i = 1 to 100 { // unroll factor 4: no cleanup loop will be generated. for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) { %y = "foo"(%j) : (affineint) -> i32 } } for %i = 1 to 100 { for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) { %x = "foo"() : () -> i32 } } TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor changes). PiperOrigin-RevId: 212661212	2019-03-29 13:13:00 -07:00
Uday Bondhugula	3bae041e5d	Add utility to promote single iteration loops. Add methods for getting constant loop counts. Improve / refactor loop unroll / loop unroll and jam. - add utility to remove single iteration loops. - use this utility to promote single iteration loops after unroll/unroll-and-jam - use loopUnrollByFactor for loopUnrollFull and remove most of the latter. - add methods for getting constant loop trip count PiperOrigin-RevId: 212039569	2019-03-29 13:11:21 -07:00
Chris Lattner	348f31a4fa	Add location specifier to MLIR Functions, and: - Compress the identifier/kind of a Function into a single word. - Eliminate otherFailure from verifier now that we always have a location - Eliminate the error string from the verifier now that we always have locations. - Simplify the parser's handling of fn forward references, using the location tracked by the function. PiperOrigin-RevId: 211985101	2019-03-29 13:10:55 -07:00
Jacques Pienaar	95f31d53d5	Add GraphTraits and DOTGraphTraits for CFGFunction in debug builds. Enable using GraphWriter to dump graphviz in debug mode (kept to debug builds completely as this is only for debugging). Add option to mlir-opt to print CFGFunction after every transform in debug mode. PiperOrigin-RevId: 211578699	2019-03-29 13:09:31 -07:00
Uday Bondhugula	d5416f299e	Complete AffineExprFlattener based simplification for floordiv/ceildiv. - handle floordiv/ceildiv in AffineExprFlattener; update the simplification to work even if mod/floordiv/ceildiv expressions appearing in the tree can't be eliminated. - refactor the flattening / analysis to move it out of lib/Transforms/ - fix MutableAffineMap::isMultipleOf - add AffineBinaryOpExpr:getAdd/getMul/... utility methods PiperOrigin-RevId: 211540536	2019-03-29 13:09:18 -07:00
Uday Bondhugula	0122a99cbb	Affine expression analysis and simplification. Outside of IR/ - simplify a MutableAffineMap by flattening the affine expressions - add a simplify affine expression pass that uses this analysis - update the FlatAffineConstraints API (to be used in the next CL) In IR: - add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR simplication of simplifyMod simpler and more powerful. - rename the AffineExpr visitor methods to distinguish b/w visiting and walking, and to simplify API names based on context. The next CL will use some of these for the loop unrolling/unroll-jam to make the detection for the need of cleanup loop powerful/non-trivial. A future CL will finally move this simplification to FlatAffineConstraints to make it more powerful. For eg., currently, even if a mod expr appearing in a part of the expression tree can't be simplified, the whole thing won't be simplified. PiperOrigin-RevId: 211012256	2019-03-29 13:07:44 -07:00
Uday Bondhugula	e9fb4b492d	Introduce loop unroll jam transformation. - for test purposes, the unroll-jam pass unroll jams the first outermost loop. While on this: - fix StmtVisitor to allow overriding of function to iterate walk over children of a stmt. PiperOrigin-RevId: 210644813	2019-03-29 13:07:30 -07:00
Tatiana Shpeisman	d32a28c520	Implement operands for the lower and upper bounds of the for statement. This revamps implementation of the loop bounds in the ForStmt, using general representation that supports operands. The frequent case of constant bounds is supported via special access methods. This also includes: - Operand iterators for the Statement class. - OpPointer::is() method to query the class of the Operation. - Support for the bound shorthand notation parsing and printing. - Validity checks for the bound operands used as dim ids and symbols I didn't mean this CL to be so large. It just happened this way, as one thing led to another. PiperOrigin-RevId: 210204858	2019-03-29 13:05:16 -07:00
Chris Lattner	dfc58848e3	Two unrelated API cleanups: remove the location processing stuff from custom op parser hooks, as it has been subsumed by a simpler and cleaner mechanism. Second, remove the "Inst" suffixes from a few methods in CFGFuncBuilder since they are redundant and this is inconsistent with the other builders. NFC. PiperOrigin-RevId: 210006263	2019-03-29 13:04:47 -07:00
Chris Lattner	956e0f7e21	Push location information more tightly into the IR, providing space for every operation and statement to have a location, and make it so a location is required to be specified whenever you make one (though a null location is still allowed). This is to encourage compiler authors to propagate loc info properly, allowing our failability story to work well. This is still a WIP - it isn't clear if we want to continue abusing Attribute for location information, or whether we should introduce a new class heirarchy to do so. This is good step along the way, and unblocks some of the tf/xla work that builds upon it. PiperOrigin-RevId: 210001406	2019-03-29 13:04:33 -07:00
Uday Bondhugula	00bed4bd99	Extend loop unrolling to unroll by a given factor; add builder for affine apply op. - add builder for AffineApplyOp (first one for an operation that has non-zero operands) - add support for loop unrolling by a given factor; uses the affine apply op builder. While on this, change 'step' of ForStmt to be 'unsigned' instead of AffineConstantExpr *. Add setters for ForStmt lb, ub, step. Sample Input: // CHECK-LABEL: mlfunc @loop_nest_unroll_cleanup() { mlfunc @loop_nest_unroll_cleanup() { for %i = 1 to 100 { for %j = 0 to 17 { %x = "addi32"(%j, %j) : (affineint, affineint) -> i32 %y = "addi32"(%x, %x) : (i32, i32) -> i32 } } return } Output: $ mlir-opt -loop-unroll -unroll-factor=4 /tmp/single2.mlir #map0 = (d0) -> (d0 + 1) #map1 = (d0) -> (d0 + 2) #map2 = (d0) -> (d0 + 3) mlfunc @loop_nest_unroll_cleanup() { for %i0 = 1 to 100 { for %i1 = 0 to 17 step 4 { %0 = "addi32"(%i1, %i1) : (affineint, affineint) -> i32 %1 = "addi32"(%0, %0) : (i32, i32) -> i32 %2 = affine_apply #map0(%i1) %3 = "addi32"(%2, %2) : (affineint, affineint) -> i32 %4 = affine_apply #map1(%i1) %5 = "addi32"(%4, %4) : (affineint, affineint) -> i32 %6 = affine_apply #map2(%i1) %7 = "addi32"(%6, %6) : (affineint, affineint) -> i32 } for %i2 = 16 to 17 { %8 = "addi32"(%i2, %i2) : (affineint, affineint) -> i32 %9 = "addi32"(%8, %8) : (i32, i32) -> i32 } } return } PiperOrigin-RevId: 209676220	2019-03-29 13:03:38 -07:00
Chris Lattner	ae79d69922	Implement a module-level symbol table for functions, enforcing uniqueness of names across the module and auto-renaming conflicts. Have the parser reject malformed modules that have redefinitions. PiperOrigin-RevId: 209227560	2019-03-29 13:02:30 -07:00
Uday Bondhugula	98a24881d3	ShortLoopUnroll - bug fix. Collect loops through a post order walk instead of a pre-order so that loops are collected from inner loops are collected before outer surrounding ones. Add a complex test case. PiperOrigin-RevId: 209041057	2019-03-29 13:01:22 -07:00
Uday Bondhugula	3e92be9c71	Move Pass.{h,cpp} from lib/IR/ to lib/Transforms/. PiperOrigin-RevId: 208571437	2019-03-29 12:59:07 -07:00
Chris Lattner	8159186f57	Rework the cloning infrastructure for statements to be able to take and update an operand mapping, which simplifies it a bit. Implement cloning for IfStmt, rename getThenClause() to getThen() which is unambiguous and less repetitive in use cases. PiperOrigin-RevId: 207915990	2019-03-29 12:57:38 -07:00
Uday Bondhugula	d8490d8d4f	Loop unrolling pass update - fix/complete forStmt cloning for unrolling to work for outer loops - create IV const's only when needed - test outer loop unrolling by creating a short trip count unroll pass for loops with trip counts <= <parameter> - add unrolling test cases for multiple op results, outer loop unrolling - fix/clean up StmtWalker class while on this - switch unroll loop iterator values from i32 to affineint PiperOrigin-RevId: 207645967	2019-03-29 12:56:16 -07:00
Uday Bondhugula	65b6e73245	Loop unrolling update. - deal with non-operation stmt's (if/for stmt's) in loops being unrolled (unrolling of non-innermost loops works). - update uses in unrolled bodies to use results of new operations that may be introduced in the unrolled bodies. Unrolling now works for all kinds of loop nests - perfect nests, imperfect nests, loops at any depth, and with any kind of operation in the body. (IfStmt support not done, hence untested there). Added missing dump/print method for StmtBlock. TODO: add test case for outer loop unrolling. PiperOrigin-RevId: 207314286	2019-03-29 12:55:19 -07:00
Uday Bondhugula	2a003256ae	MLStmt cloning and IV replacement for loop unrolling, add constant pool to MLFunctions. - MLStmt cloning and IV replacement - While at this, fix the innermostLoopGatherer to actually gather all the innermost loops (it was stopping its walk at the first innermost loop it found) - Improve comments for MLFunction statement classes, fix inheritance order. - Fixed StmtBlock destructor. PiperOrigin-RevId: 207049173	2019-03-29 12:53:02 -07:00
Tatiana Shpeisman	8189a12bce	Clean up and extend MLFuncBuilder to allow creating statements in the middle of a statement block. Rename Statement::getFunction() and StmtBlock()::getFunction() to findFunction() to make it clear that this is not a constant time getter. Fix b/112039912 - we were recording 'i' instead of '%i' for loop induction variables causing "use of undefined SSA value" error. PiperOrigin-RevId: 206884644	2019-03-29 12:51:38 -07:00
Uday Bondhugula	dfd48dc24c	LoopUnroll post order walk: fix misleading naming PiperOrigin-RevId: 206609084	2019-03-29 12:48:44 -07:00
Chris Lattner	12adbeb872	Prepare for implementation of TensorFlow passes: - Sketch out a TensorFlow/IR directory that will hold op definitions and common TF support logic. We will eventually have TensorFlow/TF2HLO, TensorFlow/Grappler, TensorFlow/TFLite, etc. - Add sketches of a Switch/Merge op definition, including some missing stuff like the TwoResults trait. Add a skeleton of a pass to raise this form. - Beef up the Pass/FunctionPass definitions slightly, moving the common code out of LoopUnroll.cpp into a new IR/Pass.cpp file. - Switch ConvertToCFG.cpp to be a ModulePass. - Allow _ to start bare identifiers, since this is important for TF attributes. PiperOrigin-RevId: 206502517	2019-03-29 12:47:25 -07:00
Uday Bondhugula	0af97111d2	Stmt visitors and walkers. - Update InnermostLoopGatherer to use a post order traversal (linear time/single traversal). - Drop getNumNestedLoops(). - Update isInnermost() to use the StmtWalker. When using return values in conjunction with walkers, the StmtWalker CRTP pattern doesn't appear to be of any use. It just requires overriding nearly all of the methods, which is what InnermostLoopGatherer currently does. Please see FIXME/ENLIGHTENME comments. TODO: figure this out from this CL discussion. Note - Comments on visitor/walker base class are out of date; will update when this CL is finalized. PiperOrigin-RevId: 206340901	2019-03-29 12:46:17 -07:00
Tatiana Shpeisman	9ebd3c7df8	Implement MLValue, statement operands, operation statement operands and values. ML functions now have full support for expressing operations. Induction variables, function arguments and return values are still todo. PiperOrigin-RevId: 206253643	2019-03-29 12:46:04 -07:00
Chris Lattner	f964bad6d1	Implement a proper function list in module, which auto-maintain the parent pointer, and ensure that functions are deleted when the module is destroyed. This exposed the fact that MLFunction had no dtor, and that the dtor in CFGFunction was broken with cyclic references. Fix both of these problems. PiperOrigin-RevId: 206051666	2019-03-29 12:43:57 -07:00
Uday Bondhugula	a0abd666a7	Sketch out loop unrolling transformation. - Implement a full loop unroll for innermost loops. - Use it to implement a pass that unroll all the innermost loops of all mlfunction's in a module. ForStmt's parsed currently have constant trip counts (and constant loop bounds). - Implement StmtVisitor based (Visitor pattern) Loop IVs aren't currently parsed and represented as SSA values. Replacing uses of loop IVs in unrolled bodies is thus a TODO. Class comments are sparse at some places - will add them after one round of comments. A cmd-line flag triggers this for now. Original: mlfunc @loops() { for x = 1 to 100 step 2 { for x = 1 to 4 { "Const"(){value: 1} : () -> () } } return } After unrolling: mlfunc @loops() { for x = 1 to 100 step 2 { "Const"(){value: 1} : () -> () "Const"(){value: 1} : () -> () "Const"(){value: 1} : () -> () "Const"(){value: 1} : () -> () } return } PiperOrigin-RevId: 205933235	2019-03-29 12:43:01 -07:00
Tatiana Shpeisman	1b24c48b91	Scaffolding for convertToCFG pass that replaces all instances of ML functions with equivalent CFG functions. Traverses module MLIR, generates CFG functions (empty for now) and removes ML functions. Adds Transforms library and tests. PiperOrigin-RevId: 205848367	2019-03-29 12:41:15 -07:00

... 9 10 11 12 13 ...

686 Commits