llvm-project

Commit Graph

Author	SHA1	Message	Date
River Riddle	10ddae6d88	Use Status instead of bool in DialectConversion. PiperOrigin-RevId: 237339277	2019-03-29 17:06:06 -07:00
River Riddle	ba6fdc8b01	Move UtilResult into the Support directory and rename it to Status. Status provides an unambiguous way to specify success/failure results. These can be generated by 'Status::success()' and Status::failure()'. Status provides no implicit conversion to bool and should be consumed by one of the following utility functions: * bool succeeded(Status) - Return if the status corresponds to a success value. * bool failed(Status) - Return if the status corresponds to a failure value. PiperOrigin-RevId: 237153884	2019-03-29 17:04:19 -07:00
River Riddle	d43f630de8	NFC: Remove 'Result' from the analysis manager api to better reflect the implementation. There is no distinction between analysis computation and result. PiperOrigin-RevId: 237093101	2019-03-29 17:02:12 -07:00
River Riddle	1d87b62afe	Add support for preserving specific analyses in the analysis manager. Passes can now preserve specific analyses via 'markAnalysesPreserved'. Example: markAnalysesPreserved<DominanceInfo>(); markAnalysesPreserved<DominanceInfo, PostDominanceInfo>(); PiperOrigin-RevId: 237081454	2019-03-29 17:01:41 -07:00
MLIR Team	c1ff9e866e	Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union for loop fusion pass (WIP). Adds utility to convert slice bounds to a FlatAffineConstraints representation. Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers. PiperOrigin-RevId: 236973261	2019-03-29 16:59:21 -07:00
Uday Bondhugula	5836fae8a0	DMA generation CL flag update - allow mem capacity to be overridden by command-line flag - change default fast mem space to 2 PiperOrigin-RevId: 236951598	2019-03-29 16:59:05 -07:00
Uday Bondhugula	02af8c22df	Change Pass:getFunction() to return pointer instead of ref - NFC - change this for consistency - everything else similar takes/returns a Function pointer - the FuncBuilder ctor, Block/Value/Instruction::getFunction(), etc. - saves a whole bunch of &s everywhere PiperOrigin-RevId: 236928761	2019-03-29 16:58:35 -07:00
Nicolas Vasilache	069c818f40	Fix lower/upper bound mismatch in stripmineSink Also beef up the corresponding test case. PiperOrigin-RevId: 236878818	2019-03-29 16:57:21 -07:00
Dimitrios Vytiniotis	a60ba7d908	Supporting conversion of argument attributes along their types. This fixes a bug: previously, during conversion function argument attributes were neither beings passed through nor converted. This fix extends DialectConversion to allow for simultaneous conversion of the function type and the argument attributes. This was important when lowering MLIR to LLVM where attribute information (e.g. noalias) needs to be preserved in MLIR(LLVMDialect). Longer run it seems reasonable that we want to convert both the function attribute and its type and the argument attributes, but that requires a small refactoring in Function.h to aggregate these three fields in an inner struct, which will require some discussion. PiperOrigin-RevId: 236709409	2019-03-29 16:55:51 -07:00
MLIR Team	d42ef78a75	Handle MemRefRegion::compute return value in loop fusion pass (NFC). PiperOrigin-RevId: 236685849	2019-03-29 16:55:20 -07:00
River Riddle	485746f524	Implement the initial AnalysisManagement infrastructure, with the introduction of the FunctionAnalysisManager and ModuleAnalysisManager classes. These classes provide analysis computation, caching, and invalidation for a specific IR unit. The invalidation is currently limited to either all or none, i.e. you cannot yet preserve specific analyses. An analysis can be any class, but it must provide the following: * A constructor for a given IR unit. struct MyAnalysis { // Compute this analysis with the provided module. MyAnalysis(Module *module); }; Analyses can be accessed from a Pass by calling either the 'getAnalysisResult<AnalysisT>' or 'getCachedAnalysisResult<AnalysisT>' methods. A FunctionPass may query for a cached analysis on the parent module with 'getCachedModuleAnalysisResult'. Similary, a ModulePass may query an analysis, it doesn't need to be cached, on a child function with 'getFunctionAnalysisResult'. By default, when running a pass all cached analyses are set to be invalidated. If no transformation was performed, a pass can use the method 'markAllAnalysesPreserved' to preserve all analysis results. As noted above, preserving specific analyses is not yet supported. PiperOrigin-RevId: 236505642	2019-03-29 16:54:50 -07:00
Uday Bondhugula	eee85361bb	Remove hidden flag from fusion CL options PiperOrigin-RevId: 236409185	2019-03-29 16:54:05 -07:00
River Riddle	f37651c708	NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder. PiperOrigin-RevId: 236403665	2019-03-29 16:53:35 -07:00
Lei Zhang	85d9b6c8f7	Use consistent names for dialect op source files This CL changes dialect op source files (.h, .cpp, .td) to follow the following convention: <full-dialect-name>/<dialect-namespace>Ops.{h\|cpp\|td} Builtin and standard dialects are specially treated, though. Both of them do not have dialect namespace; the former is still named as BuiltinOps.* and the latter is named as Ops.*. Purely mechanical. NFC. PiperOrigin-RevId: 236371358	2019-03-29 16:53:19 -07:00
MLIR Team	d038e34735	Loop fusion for input reuse. ) Breaks fusion pass into multiple sub passes over nodes in data dependence graph: - first pass fuses single-use producers into their unique consumer. - second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges. - third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer). ) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved. ) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation. ) Adds a generic loop utilitiy for finding all sequential loops in a loop nest. *) Adds and updates unit tests. PiperOrigin-RevId: 236350987	2019-03-29 16:52:35 -07:00
River Riddle	ddc6788cc7	Provide a Builder::getNamedAttr and (Instruction\|Function)::setAttr(StringRef, Attribute) to simplify attribute manipulation. PiperOrigin-RevId: 236222504	2019-03-29 16:50:59 -07:00
River Riddle	ed5fe2098b	Remove PassResult and have the runOnFunction/runOnModule functions return void instead. To signal a pass failure, passes should now invoke the 'signalPassFailure' method. This provides the equivalent functionality when needed, but isn't an intrusive part of the API like PassResult. PiperOrigin-RevId: 236202029	2019-03-29 16:50:44 -07:00
Uday Bondhugula	58889884a2	Change some of the debug messages to use emitError / emitWarning / emitNote - NFC PiperOrigin-RevId: 236169676	2019-03-29 16:50:29 -07:00
River Riddle	c6c534493d	Port all of the existing passes over to the new pass manager infrastructure. This is largely NFC. PiperOrigin-RevId: 235952357	2019-03-29 16:47:14 -07:00
Uday Bondhugula	7aa60a383f	Temp change in FlatAffineConstraints::getSliceBounds() to deal with TODO in LoopFusion - getConstDifference in LoopFusion is pending a refactoring to handle bounds with min's and max's; it currently asserts on some useful test cases that we want to experiment with. This CL changes getSliceBounds to be more conservative so as to not trigger the assertion. Filed b/126426796 to track this. PiperOrigin-RevId: 235826538	2019-03-29 16:45:23 -07:00
Uday Bondhugula	d4b3ff1096	Loop fusion comand line options cleanup - clean up loop fusion CL options for promoting local buffers to fast memory space - add parameters to loop fusion pass instantiation PiperOrigin-RevId: 235813419	2019-03-29 16:44:38 -07:00
River Riddle	cdbfd48471	Rewrite the dominance info classes to allow for operating on arbitrary control flow within operation regions. The CSE pass is also updated to properly handle nested dominance. PiperOrigin-RevId: 235742627	2019-03-29 16:43:35 -07:00
Nicolas Vasilache	62c54a2ec4	Add a stripmineSink and imperfectly nested tiling primitives. This CL adds a primitive to perform stripmining of a loop by a given factor and sinking it under multiple target loops. In turn this is used to implement imperfectly nested loop tiling (with interchange) by repeatedly calling the stripmineSink primitive. The API returns the point loops and allows repeated invocations of tiling to achieve declarative, multi-level, imperfectly-nested tiling. Note that this CL is only concerned with the mechanical aspects and does not worry about analysis and legality. The API is demonstrated in an example which creates an EDSC block, emits the corresponding MLIR and applies imperfectly-nested tiling: ```cpp auto block = edsc::block({ For(ArrayRef<edsc::Expr>{i, j}, {zero, zero}, {M, N}, {one, one}, { For(k1, zero, O, one, { C({i, j, k1}) = A({i, j, k1}) + B({i, j, k1}) }), For(k2, zero, O, one, { C({i, j, k2}) = A({i, j, k2}) + B({i, j, k2}) }), }), }); // clang-format on emitter.emitStmts(block.getBody()); auto l_i = emitter.getAffineForOp(i), l_j = emitter.getAffineForOp(j), l_k1 = emitter.getAffineForOp(k1), l_k2 = emitter.getAffineForOp(k2); auto indicesL1 = mlir::tile({l_i, l_j}, {512, 1024}, {l_k1, l_k2}); auto l_ii1 = indicesL1[0][0], l_jj1 = indicesL1[1][0]; mlir::tile({l_jj1, l_ii1}, {32, 16}, l_jj1); ``` The edsc::Expr for the induction variables (i, j, k_1, k_2) provide the programmatic hooks from which tiling can be applied declaratively. PiperOrigin-RevId: 235548228	2019-03-29 16:41:20 -07:00
Uday Bondhugula	dfe07b7bf6	Refactor AffineExprFlattener and move FlatAffineConstraints out of IR into Analysis - NFC - refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints could be moved out of IR/; the simplification that the IR needs for AffineExpr's doesn't depend on FlatAffineConstraints - have AffineExprFlattener derive from SimpleAffineExprFlattener to use for all Analysis/Transforms purposes; override addLocalFloorDivId in the derived class - turn addAffineForOpDomain into a method on FlatAffineConstraints - turn AffineForOp::getAsValueMap into an AffineValueMap ctor PiperOrigin-RevId: 235283610	2019-03-29 16:39:32 -07:00
River Riddle	f48716146e	NFC: Make DialectConversion not directly inherit from ModulePass. It is now just a utility class that performs dialect conversion on a provided module. PiperOrigin-RevId: 235194067	2019-03-29 16:38:57 -07:00
River Riddle	5410dff790	Rewrite MLPatternLoweringPass to no longer inherit from FunctionPass and just provide a utility function that applies ML patterns. PiperOrigin-RevId: 235194034	2019-03-29 16:38:41 -07:00
MLIR Team	8564b274db	Internal change PiperOrigin-RevId: 235191129	2019-03-29 16:38:24 -07:00
River Riddle	3e656599f1	Define a PassID class to use when defining a pass. This allows for the type used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures. PiperOrigin-RevId: 235107957	2019-03-29 16:37:12 -07:00
Uday Bondhugula	4d3af6be82	Print debug message better + switch a dma-generate cl opt to uint64_t PiperOrigin-RevId: 234840316	2019-03-29 16:35:41 -07:00
Uday Bondhugula	a1dad3a5d9	Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox - compute slices precisely where the destination iteration depends on multiple source iterations (instead of over-approximating to the whole source loop extent) - update unionBoundingBox to deal with input with non-matching symbols - reenable disabled backend test case PiperOrigin-RevId: 234714069	2019-03-29 16:33:11 -07:00
River Riddle	48ccae2476	NFC: Refactor the files related to passes. * PassRegistry is split into its own source file. * Pass related files are moved to a new library 'Pass'. PiperOrigin-RevId: 234705771	2019-03-29 16:32:56 -07:00
Uday Bondhugula	5021dc4fa0	DMA placement update - hoist loops invariant DMAs - hoist DMAs past all loops immediately surrounding the region that the latter is invariant on - do this at DMA generation time itself PiperOrigin-RevId: 234628447	2019-03-29 16:32:41 -07:00
Uday Bondhugula	4ca6219099	Update pass documentation + improve/fix some comments - add documentation for passes - improve / fix outdated doc comments PiperOrigin-RevId: 234627076	2019-03-29 16:32:11 -07:00
River Riddle	da0ebe0670	Add a generic pattern matcher for matching constant values produced by an operation with zero operands and a single result. PiperOrigin-RevId: 234616691	2019-03-29 16:31:56 -07:00
Alex Zinenko	b4dba895a6	EDSC: make Expr typed and extensible Expose the result types of edsc::Expr, which are now stored for all types of Exprs and not only for the variadic ones. Require return types when an Expr is constructed, if it will ever have some. An empty return type list is interpreted as an Expr that does not create a value (e.g. `return` or `store`). Conceptually, all edss::Exprs are now typed, with the type being a (potentially empty) tuple of return types. Unbound expressions and Bindables must now be constructed with a specific type they will take. This makes EDSC less evidently type-polymorphic, but we can still write generic code such as Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; } and use it to construct different typed expressions as sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx))); sumOfSquares(Bindable(FloatType::getF32(ctx)), Bindable(FloatType::getF32(ctx))); On the positive side, we get the following. 1. We can now perform type checking when constructing Exprs rather than during MLIR emission. Nevertheless, this is still duplicates the Op::verify() until we can factor out type checking from that. 2. MLIREmitter is significantly simplified. 3. ExprKind enum is only used for actual kinds of expressions. Data structures are converging with AbstractOperation, and the users can now create a VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even an unregistered one without having to extend the enum and make pervasive changes to EDSCs. On the negative side, we get the following. 1. Typed bindables are more verbose, even in Python. 2. We lose the ability to do print debugging for higher-level EDSC abstractions that are implemented as multiple MLIR Ops, for example logical disjunction. This is the step 2/n towards making EDSC extensible. *** Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr now has sufficient information to build itself. This is the step 3/n towards making EDSC extensible. Both of these strive to minimize the amount of irrelevant changes. In particular, this introduces more complex pretty-printing for affine and binary expression to make sure tests continue to pass. It also relies on string comparison to identify specific operations that an Expr produces. PiperOrigin-RevId: 234609882	2019-03-29 16:31:26 -07:00
Alex Zinenko	0a4c940c1b	EDSC: introduce support for blocks EDSC currently implement a block as a statement that is itself a list of statements. This suffers from two modeling problems: (1) these blocks are not addressable, i.e. one cannot create an instruction where thus constructed block is a successor; (2) they support block nesting, which is not supported by MLIR blocks. Furthermore, emitting such "compound statement" (misleadingly named `Block` in Python bindings) does not actually produce a new Block in the IR. Implement support for creating actual IR Blocks in EDSC. In particular, define a new StmtBlock EDSC class that is neither an Expr nor a Stmt but contains a list of Stmts. Additionally, StmtBlock may have (early-) typed arguments. These arguments are Bindable expressions that can be used inside the block. Provide two calls in the MLIREmitter, `emitBlock` that actually emits a new block and `emitBlockBody` that only emits the instructions contained in the block without creating a new block. In the latter case, the instructions must not use block arguments. Update Python bindings to make it clear when instruction emission happens without creating a new block. PiperOrigin-RevId: 234556474	2019-03-29 16:30:56 -07:00
Uday Bondhugula	f97c1c5b06	Misc. updates/fixes to analysis utils used for DMA generation; update DMA generation pass to make it drop certain assumptions, complete TODOs. - multiple fixes for getMemoryFootprintBytes - pass loopDepth correctly from getMemoryFootprintBytes() - use union while computing memory footprints - bug fixes for addAffineForOpDomain - take into account loop step - add domains of other loop IVs in turn that might have been used in the bounds - dma-generate: drop assumption of "non-unit stride loops being tile space loops and skipping those and recursing to inner depths"; DMA generation is now purely based on available fast mem capacity and memory footprint's calculated - handle memory region compute failures/bailouts correctly from dma-generate - loop tiling cleanup/NFC - update some debug and error messages to use emitNote/emitError in pipeline-data-transfer pass - NFC PiperOrigin-RevId: 234245969	2019-03-29 16:30:26 -07:00
MLIR Team	58aa383e60	Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region. PiperOrigin-RevId: 234240958	2019-03-29 16:30:11 -07:00
MLIR Team	8f5f2c765d	LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests. ) Adds utility to LoopUtils to perform loop interchange of two AffineForOps. ) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges. ) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential. ) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them. ) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation. ) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest. *) Adds and updates related unit tests. PiperOrigin-RevId: 234158370	2019-03-29 16:29:26 -07:00
Alex Zinenko	d7aa700ccb	Dialect conversion: decouple function signature conversion from type conversion Function types are built-in in MLIR and affect the validity of the IR itself. However, advanced target dialects such as the LLVM IR dialect may include custom function types. Until now, dialect conversion was expecting function types not to be converted to the custom type: although the signatures was allowed to change, the outer type must have been an mlir::FunctionType. This effectively prevented dialect conversion from creating instructions that operate on values of the custom function type. Dissociate function signature conversion from general type conversion. Function signature conversion must still produce an mlir::FunctionType and is used in places where built-in types are required to make IR valid. General type conversion is used for SSA values, including function and block arguments and function results. Exercise this behavior in the LLVM IR dialect conversion by converting function types to LLVM IR function pointer types. The pointer to a function is chosen to provide consistent lowering of higher-order functions: while it is possible to have a value of function type, it is not possible to create a function type accepting a returning another function type. PiperOrigin-RevId: 234124494	2019-03-29 16:28:41 -07:00
Uday Bondhugula	6b7a49dd6a	Add -tile-sizes command line option for loop tiling; clean up cl options for for dma-generate, loop-unroll. - add -tile-sizes command line option for loop tiling to specify different tile sizes for loops in a band - clean up command line options for loop-unroll, dma-generate (remove cl::hidden) PiperOrigin-RevId: 234006232	2019-03-29 16:28:10 -07:00
Uday Bondhugula	00860662a2	Generate dealloc's for alloc's of pipeline-data-transfer - for the DMA transfers being pipelined through double buffering, generate deallocs for the double buffers being alloc'ed This change is along the lines of cl/233502632. We initially wanted to experiment with scoped allocation - so the deallocation's were usually not necessary; however, they are needed even with scoped allocations in some situations - for eg. when the enclosing loop gets unrolled. The dealloc serves as an end of lifetime marker. PiperOrigin-RevId: 233653463	2019-03-29 16:25:53 -07:00
River Riddle	4755774d16	Make IndexType a standard type instead of a builtin. This also cleans up some unnecessary factory methods on the Type class. PiperOrigin-RevId: 233640730	2019-03-29 16:25:38 -07:00
Alex Zinenko	0e59e5c49b	EDSC: move Expr and Stmt construction operators to a namespace In the current state, edsc::Expr and edsc::Stmt overload operators to construct other Exprs and Stmts. This includes some unconventional overloads of the `operator==` to create a comparison expression and of the `operator!` to create a negation expression. This situation could lead to unpleasant surprises where the code does not behave like expected. Make all Expr and Stmt construction operators free functions and move them to the `edsc::op` namespace. Callers willing to use these operators must explicitly include them with the `using` declaration. This can be done in some local scope. Additionally, we currently emit signed comparisons for order-comparison operators. With namespaces, we can later introduce two sets of operators in different namespace, e.g. `edsc::op::sign` and `edsc::op::unsign` to clearly state which kind of comparison is implied. PiperOrigin-RevId: 233578674	2019-03-29 16:25:08 -07:00
Uday Bondhugula	8b3f841daf	Generate dealloc's for the alloc's of dma-generate. - for the DMA buffers being allocated (and their tags), generate corresponding deallocs - minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass Code generation for DMA transfers was being done with the initial simplifying assumption that the alloc's would map to scoped allocations, and so no deallocations would be necessary. Drop this assumption to generalize. Note that even with scoped allocations, unrolling loops that have scoped allocations could create a series of allocations and exhaustion of fast memory. Having a end of lifetime marker like a dealloc in fact allows creating new scopes if necessary when lowering to a backend and still utilize scoped allocation. DMA buffers created by -dma-generate are guaranteed to have either non-overlapping lifetimes or nested lifetimes. PiperOrigin-RevId: 233502632	2019-03-29 16:24:08 -07:00
River Riddle	366ebcf6aa	Remove the restriction that only registered terminator operations may terminate a block and have block operands. This allows for any operation to hold block operands. It also introduces the notion that unregistered operations may terminate a block. As such, the 'isTerminator' api on Instruction has been split into 'isKnownTerminator' and 'isKnownNonTerminator'. PiperOrigin-RevId: 233076831	2019-03-29 16:22:23 -07:00
Uday Bondhugula	c419accea3	Automated rollback of changelist 232728977. PiperOrigin-RevId: 232944889	2019-03-29 16:21:38 -07:00
River Riddle	a886625813	Modify the canonicalizations of select and muli to use the fold hook. This also extends the greedy pattern rewrite driver to add the operands of folded operations back to the worklist. PiperOrigin-RevId: 232878959	2019-03-29 16:20:06 -07:00
Uday Bondhugula	4ba8c9147d	Automated rollback of changelist 232717775. PiperOrigin-RevId: 232807986	2019-03-29 16:19:33 -07:00
River Riddle	99fee0b181	When canonicalizing only erase the operation after calling the 'fold' hook if replacement results were supplied. This fixes a bug where the operation would always get erased, even if it was modified in place. PiperOrigin-RevId: 232757964	2019-03-29 16:19:17 -07:00
River Riddle	fd2d7c857b	Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace the AffineOps dialect with 'affine'. PiperOrigin-RevId: 232728977	2019-03-29 16:18:59 -07:00
River Riddle	90d10b4e00	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect. PiperOrigin-RevId: 232717775	2019-03-29 16:17:59 -07:00
River Riddle	3227dee15d	NFC: Rename affine_apply to affine.apply. This is the first step to adding a namespace to the affine dialect. PiperOrigin-RevId: 232707862	2019-03-29 16:17:29 -07:00
MLIR Team	b9dde91ea6	Adds the ability to compute the MemRefRegion of a sliced loop nest. Utilizes this feature during loop fusion cost computation, to compute what the write region of a fusion candidate loop nest slice would be (without having to materialize the slice or change the IR). ) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice. ) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes. PiperOrigin-RevId: 232706165	2019-03-29 16:17:15 -07:00
River Riddle	0c65cf283c	Move the AffineFor loop bound folding to a canonicalization pattern on the AffineForOp. PiperOrigin-RevId: 232610715	2019-03-29 16:16:11 -07:00
River Riddle	10237de8eb	Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring: * AffineStructures has moved to IR. * simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR. * makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps. * ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted. PiperOrigin-RevId: 232586468	2019-03-29 16:15:41 -07:00
MLIR Team	a78edcda5b	Loop fusion improvements: ) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion. ) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check. PiperOrigin-RevId: 232477853	2019-03-29 16:13:50 -07:00
Uday Bondhugula	ed27b40085	Remove stray debug output - NFC PiperOrigin-RevId: 232390076	2019-03-29 16:13:17 -07:00
River Riddle	bf9c381d1d	Remove InstWalker and move all instruction walking to the api facilities on Function/Block/Instruction. PiperOrigin-RevId: 232388113	2019-03-29 16:12:59 -07:00
River Riddle	c9ad4621ce	NFC: Move AffineApplyOp to the AffineOps dialect. This also moves the isValidDim/isValidSymbol methods from Value to the AffineOps dialect. PiperOrigin-RevId: 232386632	2019-03-29 16:12:40 -07:00
Uday Bondhugula	0f50414fa4	Refactor common code getting memref access in getMemRefRegion - NFC - use getAccessMap() instead of repeating it - fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap allocation and unique_ptr where possible) - change extractForInductionVars - MutableArrayRef -> ArrayRef for the arguments. Since the method is just returning copies of 'Value *', the client can't mutate the pointers themselves; it's fine to mutate the 'Value''s themselves, but that doesn't mutate the pointers to those. - change the way extractForInductionVars returns (see b/123437690) PiperOrigin-RevId: 232359277	2019-03-29 16:12:25 -07:00
River Riddle	b499277fb6	Remove remaining usages of OperationInst in lib/Transforms. PiperOrigin-RevId: 232323671	2019-03-29 16:10:53 -07:00
River Riddle	a3d9ccaecb	Replace the walkOps/visitOperationInst variants from the InstWalkers with the Instruction variants. PiperOrigin-RevId: 232322030	2019-03-29 16:10:24 -07:00
Uday Bondhugula	b26900dce5	Update dma-generate pass to (1) work on blocks of instructions (instead of just loops), (2) take into account fast memory space capacity and lower 'dmaDepth' to fit, (3) add location information for debug info / errors - change dma-generate pass to work on blocks of instructions (start/end iterators) instead of 'for' loops; complete TODOs - allows DMA generation for straightline blocks of operation instructions interspersed b/w loops - take into account fast memory capacity: check whether memory footprint fits in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA generation is performed until it does fit in the provided memory - add location information to MemRefRegion; any insufficient fast memory capacity errors or debug info w.r.t dma generation shows location information - allow DMA generation pass to be instantiated with a fast memory capacity option (besides command line flag) - change getMemRefRegion to return unique_ptr's - change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst' - other helper methods; add postDomInstFilter option for replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods Eg. output $ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir /tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { ^ $ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir /tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { PiperOrigin-RevId: 232297044	2019-03-29 16:09:52 -07:00
River Riddle	de2d0dfbca	Fold the functionality of OperationInst into Instruction. OperationInst still exists as a forward declaration and will be removed incrementally in a set of followup cleanup patches. PiperOrigin-RevId: 232198540	2019-03-29 16:09:19 -07:00
River Riddle	126ec14e2d	Fix the handling of the resizable operands bit of OperationState in a few places. PiperOrigin-RevId: 232163738	2019-03-29 16:08:28 -07:00
Uday Bondhugula	8be2627436	Promote local buffers created post fusion to higher memory space - fusion already includes the necessary analysis to create small/local buffers post fusion; allocate these buffers in a higher memory space if the necessary pass parameters are provided (threshold size, memory space id) - although there will be a separate utility at some point to directly detect and promote small local buffers to higher memory spaces, doing it while fusion when possible is much less expensive, comes free with fusion analysis, and covers a key common case. PiperOrigin-RevId: 232063894	2019-03-29 16:07:23 -07:00
River Riddle	5052bd8582	Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516	2019-03-29 16:06:49 -07:00
Nicolas Vasilache	0353ef99eb	Cleanup EDSCs and start a functional auto-generated library of custom Ops This CL applies the following simplifications to EDSCs: 1. Rename Block to StmtList because an MLIR Block is a different, not yet supported, notion; 2. Rework Bindable to drop specific storage and just use it as a simple wrapper around Expr. The only value of Bindable is to force a static cast when used by the user to bind into the emitter. For all intended purposes, Bindable is just a lightweight check that an Expr is Unbound. This simplifies usage and reduces the API footprint. After playing with it for some time, it wasn't worth the API cognition overhead; 3. Replace makeExprs and makeBindables by makeNewExprs and copyExprs which is more explicit and less easy to misuse; 4. Add generally useful functionality to MLIREmitter: a. expose zero and one for the ubiquitous common lower bounds and step; b. add support to create already bound Exprs for all function arguments as well as shapes and views for Exprs bound to memrefs. 5. Delete Stmt::operator= and replace by a `Stmt::set` method which is more explicit. 6. Make Stmt::operator Expr() explicit. 7. Indexed.indices assertions are removed to pave the way for expressing slices and views as well as to work with 0-D memrefs. The CL plugs those simplifications with TableGen and allows emitting a full MLIR function for pointwise add. This "x.add" op is both type and rank-agnostic (by allowing ArrayRef of Expr passed to For loops) and opens the door to spinning up a composable library of existing and custom ops that should automate a lot of the tedious work in TF/XLA -> MLIR. Testing needs to be significantly improved but can be done in a separate CL. PiperOrigin-RevId: 231982325	2019-03-29 16:05:23 -07:00
River Riddle	9f22a2391b	Define an detail::OperandStorage class to handle managing instruction operands. This class stores operands in a similar way to SmallVector except for two key differences. The first is the inline storage, which is a trailing objects array. The second is that being able to dynamically resize the operand list is optional. This means that we can enable the cases where operations need to change the number of operands after construction without losing the spatial locality benefits of the common case (operation instructions / non-control flow instructions with a lifetime fixed number of operands). PiperOrigin-RevId: 231910497	2019-03-29 16:05:08 -07:00
Nicolas Vasilache	d4921f4a96	Address Performance issue in NestedMatcher A performance issue was reported due to the usage of NestedMatcher in ComposeAffineMaps. The main culprit was the ubiquitous copies that were occuring when appending even a single element in `matchOne`. This CL generally simplifies the implementation and removes one level of indirection by getting rid of auxiliary storage as well as simplifying the API. The users of the API are updated accordingly. The implementation was tested on a heavily unrolled example with ComposeAffineMaps and is now close in performance with an implementation based on stateless InstWalker. As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers. Lastly, the following cleanups reported by @aminim were addressed: 1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime; 2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist. PiperOrigin-RevId: 231781279	2019-03-29 16:04:07 -07:00
MLIR Team	1e85191d07	Fix ASAN issue: snapshot edge list before loop which can modify this list. PiperOrigin-RevId: 231686040	2019-03-29 16:03:38 -07:00
MLIR Team	d7c824451f	LoopFusion: insert the source loop nest slice at a depth in the destination loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out. PiperOrigin-RevId: 231684492	2019-03-29 16:03:23 -07:00
Uday Bondhugula	44064d5b3b	3000x speed improvement on compose-affine-maps by dropping NestedMatcher for a trivial inst walker :-) (reduces pass time from several minutes non-terminating to 120ms) - (fixes b/123541184) - use a simple 7-line inst walker to collect affine_apply op's instead of the nested matcher; -compose-affine-maps pass runs in 120ms now instead of 5 minutes + (non- terminating / out of memory) - on a realistic test case that is 20,000 lines 12-d loop nest - this CL is also pushing for simple existing/standard patterns unless there is a real efficiency issue (OTOH, fixing nested matcher to address this issue requires cl/231400521) - the improvement is from swapping out the nested walker as opposed to from a bug or anything else that this CL changes - update stale comment PiperOrigin-RevId: 231623619	2019-03-29 16:02:53 -07:00
River Riddle	b6928c945c	Standardize the spelling of debug info to "debuginfo" in opt flags. PiperOrigin-RevId: 231610337	2019-03-29 16:02:38 -07:00
Uday Bondhugula	c0e9e5eb07	Fix getFullMemRefAsRegion() and FlatAffineConstraints::reset PiperOrigin-RevId: 231426734	2019-03-29 16:00:39 -07:00
MLIR Team	a0f3db4024	Support fusing loop nests which require insertion into a new instruction Block position while preserving dependences, opening up additional fusion opportunities. - Adds SSA Value edges to the data dependence graph used in the loop fusion pass. PiperOrigin-RevId: 231417649	2019-03-29 16:00:04 -07:00
River Riddle	755538328b	Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231342063	2019-03-29 15:59:30 -07:00
Nicolas Vasilache	ae772b7965	Automated rollback of changelist 231318632. PiperOrigin-RevId: 231327161	2019-03-29 15:42:38 -07:00
River Riddle	5ecef2b3f6	Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231318632	2019-03-29 15:42:08 -07:00
Nicolas Vasilache	1a5287d594	Replace too obscure usage of functional::map by declare + reserve + loop. Cleanup a usage of functional::map that is deemed too obscure in `reindexAffineIndices`. Also fix a stale comment in `reindexAffineIndices`. PiperOrigin-RevId: 231211184	2019-03-29 15:41:08 -07:00
Chris Lattner	b42bea215a	Change AffineApplyOp to produce a single result, simplifying the code that works with it, and updating the g3docs. PiperOrigin-RevId: 231120927	2019-03-29 15:40:38 -07:00
River Riddle	36babbd781	Change the ForInst induction variable to be a block argument of the body instead of the ForInst itself. This is a necessary step in converting ForInst into an operation. PiperOrigin-RevId: 231064139	2019-03-29 15:40:23 -07:00
Nicolas Vasilache	0e7a8a9027	Drop AffineMap::Null and IntegerSet::Null Addresses b/122486036 This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing the Null method and cleaning up the constructors. As the ::Null uses were tracked down, opportunities appeared to untangle some of the Parsing logic and make it explicit where AffineMap/IntegerSet have ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit pointer values of AffineMap* and IntegerSet* that were passed as function parameters. Depending the values of those pointers one of 3 behaviors could occur. This parsing logic convolution is one of the rare cases where I would advocate for code duplication. The more proper fix would be to make the syntax unambiguous or to allow some lookahead. PiperOrigin-RevId: 231058512	2019-03-29 15:40:08 -07:00
Nicolas Vasilache	81c7f2e2f3	Cleanup resource management and rename recursive matchers This CL follows up on a memory leak issue related to SmallVector growth that escapes the BumpPtrAllocator. The fix is to properly use ArrayRef and placement new to define away the issue. The following renaming is also applied: 1. MLFunctionMatcher -> NestedPattern 2. MLFunctionMatches -> NestedMatch As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator. PiperOrigin-RevId: 231047766	2019-03-29 15:39:53 -07:00
River Riddle	75c21e1de0	Wrap cl::opt flags within passes in a category with the pass name. This improves the help output of tools like mlir-opt. Example: dma-generate options: -dma-fast-mem-capacity - Set fast memory space ... -dma-fast-mem-space=<uint> - Set fast memory space ... loop-fusion options: -fusion-compute-tolerance=<number> - Fractional increase in ... -fusion-maximal - Enables maximal loop fusion loop-tile options: -tile-size=<uint> - Use this tile size for ... loop-unroll options: -unroll-factor=<uint> - Use this unroll factor ... -unroll-full - Fully unroll loops -unroll-full-threshold=<uint> - Unroll all loops with ... -unroll-num-reps=<uint> - Unroll innermost loops ... loop-unroll-jam options: -unroll-jam-factor=<uint> - Use this unroll jam factor ... PiperOrigin-RevId: 231019363	2019-03-29 15:39:38 -07:00
Uday Bondhugula	b4a1443508	Update replaceAllMemRefUsesWith to generate single result affine_apply's for index remapping - generate a sequence of single result affine_apply's for the index remapping (instead of one multi result affine_apply) - update dma-generate and loop-fusion test cases; while on this, change test cases to use single result affine apply ops - some fusion comment fix/cleanup PiperOrigin-RevId: 230985830	2019-03-29 15:38:23 -07:00
Uday Bondhugula	b588d58c5f	Update createAffineComputationSlice to generate single result affine maps - Update createAffineComputationSlice to generate a sequence of single result affine apply ops instead of one multi-result affine apply - update pipeline-data-transfer test case; while on this, also update the test case to use only single result affine maps, and make it more robust to change. PiperOrigin-RevId: 230965478	2019-03-29 15:37:53 -07:00
River Riddle	c3424c3c75	Allow operations to hold a blocklist and add support for parsing/printing a block list for verbose printing. PiperOrigin-RevId: 230951462	2019-03-29 15:37:37 -07:00
Alex Zinenko	6d37a255e2	Generic dialect conversion pass exercised by LLVM IR lowering This commit introduces a generic dialect conversion/lowering/legalization pass and illustrates it on StandardOps->LLVMIR conversion. It partially reuses the PatternRewriter infrastructure and adds the following functionality: - an actual pass; - non-default pattern constructors; - one-to-many rewrites; - rewriting terminators with successors; - not applying patterns iteratively (unlike the existing greedy rewrite driver); - ability to change function signature; - ability to change basic block argument types. The latter two things required, given the existing API, to create new functions in the same module. Eventually, this should converge with the rest of PatternRewriter. However, we may want to keep two pass versions: "heavy" with function/block argument conversion and "light" that only touches operations. This pass creates new functions within a module as a means to change function signature, then creates new blocks with converted argument types in the new function. Then, it traverses the CFG in DFS-preorder to make sure defs are converted before uses in the dominated blocks. The generic pass has a minimal interface with two hooks: one to fill in the set of patterns, and another one to convert types for functions and blocks. The patterns are defined as separate classes that can be table-generated in the future. The LLVM IR lowering pass partially inherits from the existing LLVM IR translator, in particular for type conversion. It defines a conversion pattern template, instantiated for different operations, and is a good candidate for tablegen. The lowering does not yet support loads and stores and is not connected to the translator as it would have broken the existing flows. Future patches will add missing support before switching the translator in a single patch. PiperOrigin-RevId: 230951202	2019-03-29 15:37:23 -07:00
Uday Bondhugula	95f19d558c	Fix return value logic / error reporting in -dma-generate PiperOrigin-RevId: 230906158	2019-03-29 15:36:23 -07:00
MLIR Team	5c5739d42b	Change the dependence check in the loop fusion pass to use the MLIR instruction list ordering (instead of the dependence graph node id ordering). This breaks the overloading of dependence graph node ids as both edge endpoints and instruction list position. PiperOrigin-RevId: 230849232	2019-03-29 15:35:53 -07:00
Uday Bondhugula	f94b15c247	Update dma-generate: update for multiple load/store op's per memref - introduce a way to compute union using symbolic rectangular bounding boxes - handle multiple load/store op's to the same memref by taking a union of the regions - command-line argument to provide capacity of the fast memory space - minor change to replaceAllMemRefUsesWith to not generate affine_apply if the supplied index remap was identity PiperOrigin-RevId: 230848185	2019-03-29 15:35:38 -07:00
Uday Bondhugula	06d21d9f64	loop-fusion: debug info cleanup PiperOrigin-RevId: 230817383	2019-03-29 15:35:08 -07:00
Chris Lattner	934b6d125f	Introduce a new operation hook point for implementing simple local canonicalizations of operations. The ultimate important user of this is going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it is just a more convenient way to write certain classes of canonicalizations (see the change in StandardOps.cpp). NFC. PiperOrigin-RevId: 230770021	2019-03-29 15:34:35 -07:00
River Riddle	451869f394	Add cloning functionality to Block and Function, this also adds support for remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values. PiperOrigin-RevId: 230768759	2019-03-29 15:34:20 -07:00
Uday Bondhugula	72e5c7f428	Minor updates + cleanup to dma-generate - switch some debug info to emitError - use a single constant op for zero index to make it easier to write/update test cases; avoid creating new constant op's for common zero index cases - test case cleanup This is in preparation for an upcoming major update to this pass. PiperOrigin-RevId: 230728379	2019-03-29 15:34:06 -07:00
River Riddle	f319bbbd28	Add a function pass to strip debug info from functions and instructions. PiperOrigin-RevId: 230654315	2019-03-29 15:33:50 -07:00
River Riddle	6859f33292	Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int. PiperOrigin-RevId: 230605756	2019-03-29 15:33:20 -07:00
MLIR Team	b28009b681	Fix single producer check in loop fusion pass. PiperOrigin-RevId: 230565482	2019-03-29 15:32:20 -07:00
Uday Bondhugula	864d9e02a1	Update fusion cost model + some additional infrastructure and debug information for -loop-fusion - update fusion cost model to fuse while tolerating a certain amount of redundant computation; add cl option -fusion-compute-tolerance evaluate memory footprint and intermediate memory reduction - emit debug info from -loop-fusion showing what was fused and why - introduce function to compute memory footprint for a loop nest - getMemRefRegion readability update - NFC PiperOrigin-RevId: 230541857	2019-03-29 15:32:06 -07:00
Uday Bondhugula	92e9d9484c	loop unroll update: unroll factor one for a single iteration loop - unrolling a single iteration loop by a factor of one should promote its body into its parent; this makes it consistent with the behavior/expectation that unrolling a loop by a factor equal to its trip count makes the loop go away. PiperOrigin-RevId: 230426499	2019-03-29 15:31:35 -07:00
Uday Bondhugula	1b735dfe27	Refactor -dma-generate walker - NFC - ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have this instead of deriving from the InstWalker PiperOrigin-RevId: 230413820	2019-03-29 15:31:03 -07:00
Uday Bondhugula	94a03f864f	Allocate private/local buffers for slices accurately during fusion - the size of the private memref created for the slice should be based on the memref region accessed at the depth at which the slice is being materialized, i.e., symbolic in the outer IVs up until that depth, as opposed to the region accessed based on the entire domain. - leads to a significant contraction of the temporary / intermediate memref whenever the memref isn't reduced to a single scalar (through store fwd'ing). Other changes - update to promoteIfSingleIteration - avoid introducing unnecessary identity map affine_apply from IV; makes it much easier to write and read test cases and pass output for all passes that use promoteIfSingleIteration; loop-fusion test cases become much simpler - fix replaceAllMemrefUsesWith bug that was exposed by the above update - 'domInstFilter' could be one of the ops erased due to a memref replacement in it. - fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was missing (the latter need not always be 1); add lbFloorDivisors output argument - rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape PiperOrigin-RevId: 230405218	2019-03-29 15:30:31 -07:00
MLIR Team	71495d58a7	Handle escaping memrefs in loop fusion pass: ) Do not remove loop nests which write to memrefs which escape the function. ) Do not remove memrefs which escape the function (e.g. are used in the return instruction). PiperOrigin-RevId: 230398630	2019-03-29 15:30:14 -07:00
Nicolas Vasilache	9f3f39d61a	Cleanup EDSCs This CL performs a bunch of cleanups related to EDSCs that are generally useful in the context of using them with a simple wrapping C API (not in this CL) and with simple language bindings to Python and Swift. PiperOrigin-RevId: 230066505	2019-03-29 15:27:58 -07:00
Lei Zhang	1e484b5ef4	Mark (void)indexRemap to please compiler for unused variable check PiperOrigin-RevId: 229957023	2019-03-29 15:26:59 -07:00
MLIR Team	c4237ae990	LoopFusion: Creates private MemRefs which are used only by operations in the fused loop. ) Enables reduction of private memref size based on MemRef region accessed by fused slice. ) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence. *) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice). PiperOrigin-RevId: 229936698	2019-03-29 15:26:15 -07:00
Smit Hinsu	0eebe6ffd9	Update comment in the constant folding pass as constant folding is supported even when not all operands are constants PiperOrigin-RevId: 229670189	2019-03-29 15:24:28 -07:00
Nicolas Vasilache	4573a8da9a	Fix improperly indexed DimOp in LowerVectorTransfers.cpp This CL fixes a misunderstanding in how to build DimOp which triggered execution issues in the CPU path. The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to construct the dynamic dimensions should be: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and `dim %arg, 4 : memref<?x4x?x8x?xf32>` Before this CL, we wold construct: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 1 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and expect the other dimensions to be constants. This assumption seems consistent at first glance with the syntax of alloc: ``` %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32> ``` But this was actuallyincorrect. This CL also makes the relevant functions available to EDSCs and removes duplication of the incorrect function. PiperOrigin-RevId: 229622766	2019-03-29 15:24:13 -07:00
Uday Bondhugula	c1ca23ef6e	Some loop fusion code cleanup/simplification post cl/229575126 - enforce the assumptions better / in a simpler way PiperOrigin-RevId: 229612424	2019-03-29 15:23:43 -07:00
MLIR Team	27d067e164	LoopFusion improvements: ) Adds support for fusing into consumer loop nests with multiple loads from the same memref. ) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth. *) Removes dependence on src loop depth and simplifies cost model computation. PiperOrigin-RevId: 229575126	2019-03-29 15:21:59 -07:00
Uday Bondhugula	f99a44a7cd	Address documentation/readability related comments from cl/227252907 on memref store forwarding - NFC. PiperOrigin-RevId: 229561933	2019-03-29 15:20:59 -07:00
Uday Bondhugula	03e15e1b9f	Minor code cleanup - NFC. - readability changes PiperOrigin-RevId: 229443430	2019-03-29 15:19:41 -07:00
Nicolas Vasilache	424041ad58	Add EDSC sugar This allows load, store and ForNest to be used with both Expr and Bindable. This simplifies writing generic pieces of MLIR snippet. For instance, a generic pointwise add can now be written: ```cpp // Different Bindable ivs, one per loop in the loop nest. auto ivs = makeBindables(shapeA.size()); Bindable zero, one; // Same bindable, all equal to `zero`. SmallVector<Bindable, 8> zeros(ivs.size(), zero); // Same bindable, all equal to `one`. SmallVector<Bindable, 8> ones(ivs.size(), one); // clang-format off Bindable A, B, C; Stmt scalarA, scalarB, tmp; Stmt block = edsc::Block({ ForNest(ivs, zeros, shapeA, ones, { scalarA = load(A, ivs), scalarB = load(B, ivs), tmp = scalarA + scalarB, store(tmp, C, ivs) }), }); // clang-format on ``` This CL also adds some extra support for pretty printing that will be used in a future CL when we introduce standalone testing of EDSCs. At the momen twe are lacking the basic infrastructure to write such tests. PiperOrigin-RevId: 229375850	2019-03-29 15:16:53 -07:00
Uday Bondhugula	6e4f3e40c7	Fix outdated comments PiperOrigin-RevId: 229300301	2019-03-29 15:16:08 -07:00
Lei Zhang	61ec6c0992	Swap the type and attribute parameter in ConstantOp::build() This is to keep consistent with other TableGen generated builders so that we can also use this builder in TableGen rules. PiperOrigin-RevId: 229244630	2019-03-29 15:14:52 -07:00
MLIR Team	38c2fe3158	LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time). ) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality). ) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest. ) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests). ) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths). *) Test: Adds multiple unit tests to test the new functionality. PiperOrigin-RevId: 229219757	2019-03-29 15:13:53 -07:00
Nicolas Vasilache	d734c50c5f	[MLIR] Clip all access dimensions during LowerVectorTransfers This CL adds a short term remedy to an issue that was found during execution tests. Lowering of vector transfer ops uses the permutation map to determine which ForInst have been super-vectorized. During materialization to HW vector sizes however, some of those dimensions may be fully unrolled and do not appear in the permutation map. Such dimensions were then not clipped and may have accessed out of bounds. This CL conservatively clips all dimensions to ensure no out of bounds access. The longer term solution is still up for debate but will probably require either passing more information between Materialization and lowering, or just merging the 2 passes. PiperOrigin-RevId: 228980787	2019-03-29 15:12:26 -07:00
Nicolas Vasilache	362557e11c	Simplify compositions of AffineApply This CL is the 6th and last on the path to simplifying AffineMap composition. This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple calls to `fullyComposeAffineMapAndOperands`. PiperOrigin-RevId: 228962580	2019-03-29 15:11:56 -07:00
Nicolas Vasilache	cfa5831960	Uniformize composition of AffineApplyOp by construction This CL is the 5th on the path to simplifying AffineMap composition. This removes the distinction between normalized single-result AffineMap and more general composed multi-result map. One nice byproduct of making the implementation driven by single-result is that the multi-result extension is a trivial change: the implementation is still single-result and we just use: ``` unsigned idx = getIndexOf(...); map.getResult(idx); ``` This CL also fixes an AffineNormalizer implementation issue related to symbols. Namely it stops performing substitutions on symbols in AffineNormalizer and instead concatenates them all to be consistent with the call to `AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform simplifications of symbols coming from different maps based on positions only: i.e. dims are applied and renumbered but symbols must be concatenated. The only way to determine whether symbols from different AffineApply are the same is to look at the concrete values. The canonicalizeMapAndOperands is thus extended with behavior to support replacing operands that appear multiple times. Lastly, this CL demonstrates that the implementation is correct by rewriting ComposeAffineMaps using only `makeComposedAffineApply`. The implementation uses a matcher because AffineApplyOp are introduced as composed operations on the fly instead of iteratively forwardSubstituting. For this purpose, a walker would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps is scheduled to disappear, this CL replaces the implementation based on iterative `forwardSubstitute` by a composed-by-construction `makeComposedAffineApply`. Remaining calls to `forwardSubstitute` will be removed in the next CL. PiperOrigin-RevId: 228830443	2019-03-29 15:08:40 -07:00
Alex Zinenko	9003490287	Implement branch-free single-division lowering of affine division/remainder This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from affine expressions to the arithmetic primitive operations. Integer division rules in affine expressions explicitly require rounding towards either negative or positive infinity unlike machine implementations that round towards zero. In the general case, implementing `floordiv` and `ceildiv` using machine signed division requires computing both the quotient and the remainder. When the divisor is positive, this can be simplified by adjusting the dividend and the quotient by one and switching signs. In the current use cases, we are unlikely to encounter affine expressions with negative divisors (affine divisions appear in loop transformations such as tiling that guarantee that divisors are positive by construction). Therefore, it is reasonable to use branch-free single-division implementation. In case of affine maps, divisors can only be literals so we can check the sign and implement the case for negative divisors when the need arises. The affine lowering pass can still fail when applied to semi-affine maps (division or modulo by a symbol). PiperOrigin-RevId: 228668181	2019-03-29 15:07:40 -07:00
Uday Bondhugula	742c37abc9	Fix DMA overlap pass buffer mapping - the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2); step wasn't being accounted for. - fix test cases, enable failing test cases PiperOrigin-RevId: 228635726	2019-03-29 15:07:10 -07:00
Nicolas Vasilache	1f78d63f05	[MLIR] Make SuperVectorization use normalized AffineApplyOp Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL uses the simpler single-result unbounded AffineApplyOp in the MaterializeVectors pass. PiperOrigin-RevId: 228469085	2019-03-29 15:05:55 -07:00
Nicolas Vasilache	c6f798a976	Introduce AffineMap::compose(AffineMap) This CL is the 2nd on the path to simplifying AffineMap composition. This CL uses the now accepted `AffineExpr::compose(AffineMap)` to implement `AffineMap::compose(AffineMap)`. Implications of keeping the simplification function in Analysis are documented where relevant. PiperOrigin-RevId: 228276646	2019-03-29 15:04:20 -07:00
Uday Bondhugula	21baf86a2f	Extend loop-fusion's slicing utility + other fixes / updates - refactor toAffineFromEq and the code surrounding it; refactor code into FlatAffineConstraints::getSliceBounds - add FlatAffineConstraints methods to detect identifiers as mod's and div's of other identifiers - add FlatAffineConstraints::getConstantLower/UpperBound - Address b/122118218 (don't assert on invalid fusion depths cmdline flags - instead, don't do anything; change cmdline flags src-loop-depth -> fusion-src-loop-depth - AffineExpr/Map print method update: don't fail on null instances (since we have a wrapper around a pointer, it's avoidable); rationale: dump/print methods should never fail if possible. - Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to IsRangeOneToOne when it's trivially going to be true. - Add additional test cases to exercise the new support - update a few existing test cases since the maps are now generated uniformly with all destination loop operands appearing for the backward slice - Fix projectOut - fix wrong range for getBestElimCandidate. - Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since we didn't have any non-hyperrectangular ones. PiperOrigin-RevId: 228265152	2019-03-29 15:03:20 -07:00
Uday Bondhugula	b934d75b8f	Convert expr - c * (expr floordiv c) to expr mod c in AffineExpr - Detect 'mod' to replace the combination of floordiv, mul, and subtract when possible at construction time; when 'c' is a power of two, this reduces the number of operations; also more compact and readable. Update simplifyAdd for this. On a side note: - with the affine expr flattening we have, a mod expression like d0 mod c would be flattened into d0 - c * q, c * q <= d0 <= cq + c - 1, with 'q' being added as the local variable (q = d0 floordiv c); as a result, a mod was turned into a floordiv whenever the expression was reconstructed back, i.e., as d0 - c (d0 floordiv c); as a result of this change, we recover the mod back. - rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but the file hadn't been). PiperOrigin-RevId: 228258120	2019-03-29 15:02:56 -07:00
Uday Bondhugula	56b3640b94	Misc readability and doc / code comment related improvements - NFC - when SSAValue/MLValue existed, code at several places was forced to create additional aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get rid of such redundant code - use filling ctors instead of explicit loops - for smallvectors, change insert(list.end(), ...) -> append(... - improve comments at various places - turn getMemRefAccess into MemRefAccess ctor and drop duplicated getMemRefAccess. In the next CL, provide getAccess() accessors for load, store, DMA op's to return a MemRefAccess. PiperOrigin-RevId: 228243638	2019-03-29 15:02:41 -07:00
Nicolas Vasilache	618c6a74c6	[MLIR] Introduce normalized single-result unbounded AffineApplyOp Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL introduces a simpler abstraction and composition of single-result unbounded AffineApplyOp by using the existing unbound AffineMap composition. This CL adds a simple API call and relevant tests: ```c++ OpPointer<AffineApplyOp> makeNormalizedAffineApply( FuncBuilder b, Location loc, AffineMap map, ArrayRef<Value> operands); ``` which creates a single-result unbounded AffineApplyOp. The operands of AffineApplyOp are not themselves results of AffineApplyOp by consrtuction. This represent the simplest possible interface to complement the composition of (mathematical) AffineMap, for the cases when we are interested in applying it to Value*. In this CL the composed AffineMap is not compressed (i.e. there exist operands that are not part of the result). A followup commit will compress to normal form. The single-result unbounded AffineApplyOp abstraction will be used in a followup CL to support the MaterializeVectors pass. PiperOrigin-RevId: 227879021	2019-03-29 14:56:37 -07:00
Nicolas Vasilache	0ebc0ba72e	[MLIR] More graceful failure in MaterializeVectors Even though it is unexpected except in pathological cases, a nullptr clone may be returned. This CL handles the nullptr return gracefuly. PiperOrigin-RevId: 227764615	2019-03-29 14:55:05 -07:00
Nicolas Vasilache	5b87a5ef4b	[MLIR] Drop strict super-vector requirement in MaterializeVector The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a premature optimization to avoid interfering with other vector code potentially introduced via other means. This CL avoids this premature optimization and the spurious errors it causes when super-vector size == HW vector size (which is a possible corner case). This may be revisited in the future. PiperOrigin-RevId: 227763966	2019-03-29 14:54:49 -07:00
Nicolas Vasilache	947e5f4a68	[MLIR] Handle corner case in MaterializeVectors This corner was found when stress testing with a functional end-to-end CPU path. In the case where the hardware vector size is 1x...x1 the `keep` vector is empty and would result a crash. While there is no reason to expect a 1x...x1 HW vector in practice, this case can just gracefully degrade to scalar, which is what this CL allows. PiperOrigin-RevId: 227761097	2019-03-29 14:54:22 -07:00
River Riddle	54948a4380	Split the standard types from builtin types and move them into separate source files(StandardTypes.cpp/h). After this cl only FunctionType and IndexType are builtin types, but IndexType will likely become a standard type when the ml/cfgfunc merger is done. Mechanical NFC. PiperOrigin-RevId: 227750918	2019-03-29 14:54:07 -07:00
Alex Zinenko	0c4ee54198	Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass This change is mechanical and merges the LowerAffineApplyPass and LowerIfAndForPass into a single LowerAffinePass. It makes a step towards defining an "affine dialect" that would contain all polyhedral-related constructs. The motivation for merging these two passes is based on retiring MLFunctions and, eventually, transforming If and For statements into regular operations. After that happens, LowerAffinePass becomes yet another legalization. PiperOrigin-RevId: 227566113	2019-03-29 14:52:52 -07:00
Alex Zinenko	fa710c17f4	LowerForAndIf: expand affine_apply's inplace Existing implementation was created before ML/CFG unification refactoring and did not concern itself with further lowering to separate concerns. As a result, it emitted `affine_apply` instructions to implement `for` loop bounds and `if` conditions and required a follow-up function pass to lower those `affine_apply` to arithmetic primitives. In the unified function world, LowerForAndIf is mostly a lowering pass with low complexity. As we move towards a dialect for affine operations (including `for` and `if`), it makes sense to lower `for` and `if` conditions directly to arithmetic primitives instead of relying on `affine_apply`. Expose `expandAffineExpr` function in LoweringUtils. Use this function together with `expandAffineMaps` to emit primitives that implement loop and branch conditions directly. Also remove tests that become unnecessary after transforming LowerForAndIf into a function pass. PiperOrigin-RevId: 227563608	2019-03-29 14:52:22 -07:00
Alex Zinenko	d64db86f20	Refactor LowerAffineApply In LoweringUtils, extract out `expandAffineMap`. This function takes an affine map and a list of values the map should be applied to and emits a sequence of arithmetic instructions that implement the affine map. It is independent of the AffineApplyOp and can be used in places where we need to insert an evaluation of an affine map without relying on a (temporary) `affine_apply` instruction. This prepares for a merge between LowerAffineApply and LowerForAndIf passes. Move the `expandAffineApply` function to the LowerAffineApply pass since it is the only place that must be aware of the `affine_apply` instructions. PiperOrigin-RevId: 227563439	2019-03-29 14:52:07 -07:00
Chris Lattner	bbf362b784	Eliminate extfunc/cfgfunc/mlfunc as a concept, and just use 'func' instead. The entire compiler now looks at structural properties of the function (e.g. does it have one block, does it contain an if/for stmt, etc) so the only thing holding up this difference is round tripping through the parser/printer syntax. Removing this shrinks the compile by ~140LOC. This is step 31/n towards merging instructions and statements. The last step is updating the docs, which I will do as a separate patch in order to split it from this mostly mechanical patch. PiperOrigin-RevId: 227540453	2019-03-29 14:51:37 -07:00
Nicolas Vasilache	73f5c9c380	[MLIR] Sketch a simple set of EDSCs to declaratively write MLIR This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs) in MLIR components: 1. a `Type` system of shell classes that closely matches the MLIR type system. These types are subdivided into `Bindable` leaf expressions and non-bindable `Expr` expressions; 2. an `MLIREmitter` class whose purpose is to: a. maintain a map of `Bindable` leaf expressions to concrete SSAValue; b. provide helper functionality to specify bindings of `Bindable` classes to SSAValue while verifying comformable types; c. traverse the `Expr` and emit the MLIR. This is used on a concrete example to implement MemRef load/store with clipping in the LowerVectorTransfer pass. More specifically, the following pseudo-C++ code: ```c++ MLFuncBuilder *b = ...; Location location = ...; Bindable zero, one, expr, size; // EDSL expression auto access = select(expr < zero, zero, select(expr < size, expr, size - one)); auto ssaValue = MLIREmitter(b) .bind(zero, ...) .bind(one, ...) .bind(expr, ...) .bind(size, ...) .emit(location, access); ``` is used to emit all the MLIR for a clipped MemRef access. This simple EDSL can easily be extended to more powerful patterns and should serve as the counterpart to pattern matchers (and could potentially be unified once we get enough experience). In the future, most of this code should be TableGen'd but for now it has concrete valuable uses: make MLIR programmable in a declarative fashion. This CL also adds Stmt, proper supporting free functions and rewrites VectorTransferLowering fully using EDSCs. The code for creating the EDSCs emitting a VectorTransferReadOp as loops with clipped loads is: ```c++ Stmt block = Block({ tmpAlloc = alloc(tmpMemRefType), vectorView = vector_type_cast(tmpAlloc, vectorMemRefType), ForNest(ivs, lbs, ubs, steps, { scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs), store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs), }), vectorValue = load(vectorView, zero), tmpDealloc = dealloc(tmpAlloc.getLHS())}); emitter.emitStmt(block); ``` where `accessInfo.clippedScalarAccessExprs)` is created with: ```c++ select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one)); ``` The generated MLIR resembles: ```mlir %1 = dim %0, 0 : memref<?x?x?x?xf32> %2 = dim %0, 1 : memref<?x?x?x?xf32> %3 = dim %0, 2 : memref<?x?x?x?xf32> %4 = dim %0, 3 : memref<?x?x?x?xf32> %5 = alloc() : memref<5x4x3xf32> %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>> for %i4 = 0 to 3 { for %i5 = 0 to 4 { for %i6 = 0 to 5 { %7 = affine_apply #map0(%i0, %i4) %8 = cmpi "slt", %7, %c0 : index %9 = affine_apply #map0(%i0, %i4) %10 = cmpi "slt", %9, %1 : index %11 = affine_apply #map0(%i0, %i4) %12 = affine_apply #map1(%1, %c1) %13 = select %10, %11, %12 : index %14 = select %8, %c0, %13 : index %15 = affine_apply #map0(%i3, %i6) %16 = cmpi "slt", %15, %c0 : index %17 = affine_apply #map0(%i3, %i6) %18 = cmpi "slt", %17, %4 : index %19 = affine_apply #map0(%i3, %i6) %20 = affine_apply #map1(%4, %c1) %21 = select %18, %19, %20 : index %22 = select %16, %c0, %21 : index %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32> store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32> } } } %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>> dealloc %5 : memref<5x4x3xf32> ``` In particular notice that only 3 out of the 4-d accesses are clipped: this corresponds indeed to the number of dimensions in the super-vector. This CL also addresses the cleanups resulting from the review of the prevous CL and performs some refactoring to simplify the abstraction. PiperOrigin-RevId: 227367414	2019-03-29 14:50:23 -07:00
Chris Lattner	a250643ec8	Merge together the CFG/ML function paths in the CSE pass. I did a first pass on this to merge together the classes, but there may be other simplification possible. I'll leave that to riverriddle@ as future work. This is step 29/n towards merging instructions and statements. PiperOrigin-RevId: 227328680	2019-03-29 14:50:08 -07:00
Chris Lattner	7974889f54	Update and generalize various passes to work on both CFG and ML functions, simplifying them in minor ways. The only significant cleanup here is the constant folding pass. All the other changes are simple and easy, but this is still enough to shrink the compiler by 45LOC. The one pass left to merge is the CSE pass, which will be move involved, so I'm splitting it out to its own patch (which I'll tackle right after this). This is step 28/n towards merging instructions and statements. PiperOrigin-RevId: 227328115	2019-03-29 14:49:52 -07:00
Chris Lattner	3c8fc797de	Simplify the remapFunctionAttrs logic, merging CFG/ML function handling. Remove an unnecessary restriction in forward substitution. Slightly simplify LLVM IR lowering, which previously would crash if given an ML function, it should now produce a clean error if given a function with an if/for instruction in it, just like it does any other unsupported op. This is step 27/n towards merging instructions and statements. PiperOrigin-RevId: 227324542	2019-03-29 14:49:35 -07:00
Chris Lattner	4bd9f93606	Simplify GreedyPatternRewriteDriver now that functions are merged into one representation, shrinking by 70LOC. The PatternRewriter class can probably also be simplified as well, but one step at a time. This is step 26/n towards merging instructions and statements. NFC. PiperOrigin-RevId: 227324218	2019-03-29 14:49:20 -07:00
Uday Bondhugula	f12182157e	Introduce PostDominanceInfo, fix properlyDominates() for Instructions - introduce PostDominanceInfo in the right/complete way and use that for post dominance check in store-load forwarding - replace all uses of Analysis/Utils::dominates/properlyDominates with DominanceInfo::dominates/properlyDominates - drop all redundant copies of dominance methods in Analysis/Utils/ - in pipeline-data-transfer, replace dominates call with a much less expensive check; similarly, substitute dominates() in checkMemRefAccessDependence with a simpler check suitable for that context - fix a bug in properlyDominates - improve doc for 'for' instruction 'body' PiperOrigin-RevId: 227320507	2019-03-29 14:48:44 -07:00
Chris Lattner	ae618428f6	Greatly simplify the ConvertToCFG pass, converting it from a module pass to a function pass, and eliminating the need to copy over code and do interprocedural updates. While here, also improve it to make fewer empty blocks, and rename it to "LowerIfAndFor" since that is what it does. This is a net reduction of ~170 lines of code. As drive-bys, change the splitBlock method to not insert an unconditional branch, since that behavior is annoying for all clients. Also improve the AsmPrinter to not crash when a block is referenced that isn't linked into a function. PiperOrigin-RevId: 227308856	2019-03-29 14:48:13 -07:00
Uday Bondhugula	545f3ce430	Fix ASAN failure in memref-dataflow-opt - memrefsToErase had duplicates inserted into it; switch to SmallPtrSet. PiperOrigin-RevId: 227299306	2019-03-29 14:47:58 -07:00
Uday Bondhugula	b9fe6be6d4	Introduce memref store to load forwarding - a simple memref dataflow analysis - the load/store forwarding relies on memref dependence routines as well as SSA/dominance to identify the memref store instance uniquely supplying a value to a memref load, and replaces the result of that load with the value being stored. The memref is also deleted when possible if only stores remain. - add methods for post dominance for MLFunction blocks. - remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth, getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were earlier static) - add a helper method in FlatAffineConstraints - isRangeOneToOne. PiperOrigin-RevId: 227252907	2019-03-29 14:47:28 -07:00
Chris Lattner	dffc589ad2	Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the Function::walk functionality into f->walkInsts/Ops which allows visiting all instructions, not just ops. Eliminate Function::getBody() and Function::getReturn() helpers which crash in CFG functions, and were only kept around as a bridge. This is step 25/n towards merging instructions and statements. PiperOrigin-RevId: 227243966	2019-03-29 14:46:58 -07:00
Chris Lattner	5b9c3f7cdb	Tidy up references to "basic blocks" that should refer to blocks now. NFC. PiperOrigin-RevId: 227196077	2019-03-29 14:44:59 -07:00
Chris Lattner	456ad6a8e0	Standardize naming of statements -> instructions, revisting the code base to be consistent and moving the using declarations over. Hopefully this is the last truly massive patch in this refactoring. This is step 21/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227178245	2019-03-29 14:44:30 -07:00
Chris Lattner	315a466aed	Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand. The last major renaming is Statement -> Instruction, which is why Statement and Stmt still appears in various places. This is step 19/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227163082	2019-03-29 14:43:58 -07:00
Chris Lattner	69d9e990fa	Eliminate the using decls for MLFunction and CFGFunction standardizing on Function. This is step 18/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227139399	2019-03-29 14:43:13 -07:00
Chris Lattner	d798f9bad5	Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(), StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names. This is step 17/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227121537	2019-03-29 14:42:40 -07:00
Chris Lattner	5187cfcf03	Merge Operation into OperationInst and standardize nomenclature around OperationInst. This is a big mechanical patch. This is step 16/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227093712	2019-03-29 14:42:23 -07:00
Chris Lattner	471c976413	Rework inherentance hierarchy: Operation now derives from Statement, and OperationInst derives from it. This allows eliminating some forwarding functions, other complex code handling multiple paths, and the 'isStatement' bit tracked by Operation. This is the last patch I think I can make before the big mechanical change merging Operation into OperationInst, coming next. This is step 15/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227077411	2019-03-29 14:41:49 -07:00
Chris Lattner	4fbcd1ac52	Minor renamings: Trim the "Stmt" prefix off StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the CFGFunctionViewGraph pass to ViewFunctionGraph. This is step 13/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227069438	2019-03-29 14:40:51 -07:00
Chris Lattner	4c05f8cac6	Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new FuncBuilder class. Also rename SSAValue.cpp to Value.cpp This is step 12/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227067644	2019-03-29 14:40:22 -07:00
Chris Lattner	3f190312f8	Merge SSAValue, CFGValue, and MLValue together into a single Value class, which is the new base of the SSA value hierarchy. This CL also standardizes all the nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing. This is step 11/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227064624	2019-03-29 14:40:06 -07:00
Chris Lattner	776b035646	Eliminate the Instruction, BasicBlock, CFGFunction, MLFunction, and ExtFunction classes, using the Statement/StmtBlock hierarchy and Function instead. This only changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure. This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed. This is step 10/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227044402	2019-03-29 14:39:49 -07:00
Chris Lattner	abf72a8bb1	Rename findFunction from the ML side of the house to be named getFunction(), making it more similar to the CFG side of things. It is true that in a deeply nested case that this is not a guaranteed O(1) time operation, and that 'get' could lead compiler hackers to think this is cheap, but we need to merge these and we can look into solutions for this in the future if it becomes a problem in practice. This is step 9/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226983931	2019-03-29 14:38:49 -07:00
Chris Lattner	036f87b15f	Rename CFGFunctionGraphTraits.h -> FunctionGraphTraits.h and add graph specializations for doing CFG traversals of ML Functions, making the two sorts of functions have the same capabilities. This is step 8/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226968502	2019-03-29 14:38:19 -07:00
Alex Zinenko	eb0f9f37af	SuperVectorization: fix 'isa' assertion Supervectorization uses null pointers to SSA values as a means of communicating the failure to vectorize. In operation vectorization, all operations producing the values of operation arguments must be vectorized for the given operation to be vectorized. The existing check verified if any of the value "def" statements was vectorized instead, sometimes leading to assertions inside `isa` called on a null pointer. Fix this to check that all "def" statements were vectorized. PiperOrigin-RevId: 226941552	2019-03-29 14:37:20 -07:00
Jacques Pienaar	58d50a6325	Rename convenience methods to make type explicit. PiperOrigin-RevId: 226939383	2019-03-29 14:36:50 -07:00
Chris Lattner	d613f5ab65	Refactor MLFunction to contain a StmtBlock for its body instead of inheriting from it. This is necessary progress to squaring away the parent relationship that a StmtBlock has with its enclosing if/for/fn, and makes room for functions to have more than one block in the future. This also removes IfClause and ForStmtBody. This is step 5/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226936541	2019-03-29 14:36:35 -07:00
Chris Lattner	9a4060d3f5	Eliminate the ability to add operands to an instruction, used in a narrow case for SSA values in terminators, but easily worked around. At the same time, move the StmtOperand list in a OperationStmt to the end of its trailing objects list so we can reduce the number of operands, without affecting offsets to the other stuff in the allocation. This is important because we want OperationStmts to be consequtive, including their operands - we don't want to use an std::vector of operands like Instructions have. This is patch 4/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226865727	2019-03-29 14:36:20 -07:00
Chris Lattner	87ce4cc501	Per review on the previous CL, drop MLFuncBuilder::createOperation, changing clients to use OperationState instead. This makes MLFuncBuilder more similiar to CFGFuncBuilder. This whole area will get tidied up more when cfg and ml worlds get unified. This patch is just gardening, NFC. PiperOrigin-RevId: 226701959	2019-03-29 14:35:49 -07:00
Chris Lattner	1301f907a1	Refactor ForStmt: having it contain a StmtBlock instead of subclassing StmtBlock. This is more consistent with IfStmt and also conceptually makes more sense - a forstmt "isn't" its body, it contains its body. This is step 1/N towards merging BasicBlock and StmtBlock. This is required because in the new regime StmtBlock will have a use list (just like BasicBlock does) of operands, and ForStmt already has a use list for its induction variable. This is a mechanical patch, NFC. PiperOrigin-RevId: 226684158	2019-03-29 14:35:19 -07:00
MLIR Team	4eef795a1d	Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice. Updates LoopFusion pass to take these parameters as command line flags for experimentation. PiperOrigin-RevId: 226514297	2019-03-29 14:35:03 -07:00
MLIR Team	6892ffb896	Improve loop fusion algorithm by using a memref dependence graph. Fixed TODO for reduction fusion unit test. PiperOrigin-RevId: 226277226	2019-03-29 14:33:02 -07:00
Uday Bondhugula	14d2618f63	Simplify memref-dependence-check's meta data structures / drop duplication and reuse existing ones. - drop IterationDomainContext, redundant since FlatAffineConstraints has MLValue information associated with its dimensions. - refactor to use existing support - leads to a reduction in LOC - as a result of these changes, non-constant loop bounds get naturally supported for dep analysis. - update test cases to include a couple with non-constant loop bounds - rename addBoundsFromForStmt -> addForStmtDomain - complete TODO for getLoopIVs (handle 'if' statements) PiperOrigin-RevId: 226082008	2019-03-29 14:32:46 -07:00
Alex Zinenko	4dbd94b543	Refactor LowerVectorTransfersPass using pattern rewriters This introduces a generic lowering pass for ML functions. The pass is parameterized by template arguments defining individual pattern rewriters. Concrete lowering passes define individual pattern rewriters and inherit from the generic class that takes care of allocating rewriters, traversing ML functions and performing the actual rewrite. While this is similar to the greedy pattern rewriter available in Transform/Utils, it requires adjustments due to the ML/CFG duality. In particular, ML function rewriters must be able to create statements, not only operations, and need access to an MLFuncBuilder. When we move to using the unified function type, the ML-specific rewriting will become unnecessary. Use LowerVectorTransfers as a testbed for the generic pass. PiperOrigin-RevId: 225887424	2019-03-29 14:31:43 -07:00
Alex Zinenko	51c8a095a3	Materialize vector_type_cast operation in the SuperVector dialect This operation is produced and used by the super-vectorization passes and has been emitted as an abstract unregistered operation until now. For end-to-end testing purposes, it has to be eventually lowered to LLVM IR. Matching abstract operation by name goes into the opposite direction of the generic lowering approach that is expected to be used for LLVM IR lowering in the future. Register vector_type_cast operation as a part of the SuperVector dialect. Arguably, this operation is a special case of the `view` operation from the Standard dialect. The semantics of `view` is not fully specified at this point so it is safer to rely on a custom operation. Additionally, using a custom operation may help to achieve clear dialect separation. PiperOrigin-RevId: 225887305	2019-03-29 14:31:13 -07:00
Uday Bondhugula	4a3e4e8ea7	loop-unroll - add function callback argument for outside targets to provide unroll factors, and a cmd line argument to specify number of innermost loop unroll repetitions. - add function callback parameter for outside targets to provide unroll factors - add a cmd line parameter to repeatedly apply innermost loop unroll a certain number of times (to avoid using -loop-unroll -loop-unroll ...; instead -unroll-num-reps=2). - implement the callback for a target - update test cases / usage PiperOrigin-RevId: 225843191	2019-03-29 14:30:28 -07:00
MLIR Team	3b69230b3a	Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion. ) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality. ) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop. ) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access. ) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint. *) Adds multiple fusion unit tests. PiperOrigin-RevId: 225842944	2019-03-29 14:30:13 -07:00
Uday Bondhugula	dced746bd1	Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile - use addBoundsForForStmt - getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the returned things aren't owned / part of the stmt on which it's being called. - other minor API cleanup PiperOrigin-RevId: 225774301	2019-03-29 14:29:28 -07:00
Alex Zinenko	bc52a639f9	Extract vector_transfer_* Ops into a SuperVectorDialect. From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492	2019-03-29 14:28:58 -07:00
River Riddle	5c4f1fdd42	Check if the operation is already in the worklist before adding it. PiperOrigin-RevId: 225379496	2019-03-29 14:27:14 -07:00
Alex Zinenko	97d2f3cd3d	ConvertToCFG: use affine_apply to implement loop steps Originally, loop steps were implemented using `addi` and `constant` operations because `affine_apply` was not handled in the first implementation. The support for `affine_apply` has been added, use it to implement the update of the loop induction variable. This is more consistent with the lower and upper bounds of the loop that are also implemented as `affine_apply`, removes the dependence of the converted function on the StandardOps dialect and makes it clear from the CFG function that all operations on the loop induction variable are purely affine. PiperOrigin-RevId: 225165337	2019-03-29 14:26:22 -07:00
Uday Bondhugula	b9f53dc0bd	Update/Fix LoopUtils::stmtBodySkew to handle loop step. - loop step wasn't handled and there wasn't a TODO or an assertion; fix this. - rename 'delay' to shift for consistency/readability. - other readability changes. - remove duplicate attribute print for DmaStartOp; fix misplaced attribute print for DmaWaitOp - add build method for AddFOp (unrelated to this CL, but add it anyway) PiperOrigin-RevId: 224892958	2019-03-29 14:25:07 -07:00
Uday Bondhugula	d59a95a05c	Fix missing check for dependent DMAs in pipeline-data-transfer - adding a conservative check for now (TODO: use the dependence analysis pass once the latter is extended to deal with DMA ops). resolve an existing bug on a test case. - update test cases PiperOrigin-RevId: 224869526	2019-03-29 14:24:53 -07:00
Uday Bondhugula	6757fb151d	FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD(). - add method normalizeConstraintsByGCD - call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of projectOut. - remove call to GCDTightenInequalities() from getMemRefRegion - change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each time an identifier is eliminated (to detect emptiness early). - make FourierMotzkinEliminate, gaussianEliminateId(s), GCDTightenInequalities() private - improve / update stale comments PiperOrigin-RevId: 224866741	2019-03-29 14:24:37 -07:00
Uday Bondhugula	2ef57806ba	Update/fix -pipeline-data-transfer; fix b/120770946 - fix replaceAllMemRefUsesWith call to replace only inside loop body. - handle the case where DMA buffers are dynamic; extend doubleBuffer() method to handle dynamically shaped DMA buffers (pass the right operands to AllocOp) - place alloc's for DMA buffers at the depth at which pipelining is being done (instead of at top-level) - add more test cases PiperOrigin-RevId: 224852231	2019-03-29 14:24:22 -07:00
Alex Zinenko	073c3ad997	Properly namespace createLowerAffineApply This was missing from the original commit. The implementation of createLowerAffineApply was defined in the default namespace but declared in the `mlir` namespace, which could lead to linking errors when it was used. Put the definition in `mlir` namespace. PiperOrigin-RevId: 224830894	2019-03-29 14:24:04 -07:00
Nicolas Vasilache	c28aeef901	[MLIR] Drop bug-prone global map indexed by MLFunction* PiperOrigin-RevId: 224610805	2019-03-29 14:23:49 -07:00
Uday Bondhugula	2d6478fa92	Extend loop tiling utility to handle non-constant loop bounds and bounds that are a max/min of several expressions. - Extend loop tiling to handle non-constant loop bounds and bounds that are a max/min of several expressions, i.e., bounds using multi-result affine maps - also fix b/120630124 as a result (the IR was in an invalid state when tiled loop generation failed; SSA uses were created that weren't plugged into the IR). PiperOrigin-RevId: 224604460	2019-03-29 14:23:34 -07:00
Uday Bondhugula	dfc752e42b	Generate strided DMAs from -dma-generate - generate DMAs correctly now using strided DMAs where needed - add support for multi-level/nested strides; op still supports one level of stride for now. Other things - add test case for symbolic lower/upper bound; cases where the DMA buffer size can't be bounded by a known constant - add test case for dynamic shapes where the DMA buffers are however bounded by constants - refactor some of the '-dma-generate' code PiperOrigin-RevId: 224584529	2019-03-29 14:23:19 -07:00
Nicolas Vasilache	d9b6420fc9	[MLIR] Add LowerVectorTransfersPass This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp to a simple loop nest via local buffer allocations. This is an MLIR->MLIR lowering based on builders. A few TODOs are left to address in particular: 1. invert the permutation map so the accesses to the remote memref are coalesced; 2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory); 3. support broadcast / avoid copies when permutation_map is not of full column rank 4. add a proper "element_cast" op One notable limitation is this does not plan on supporting boundary conditions. It should be significantly easier to use pre-baked MLIR functions to handle such paddings. This is left for future consideration. Therefore the current CL only works properly for full-tile cases atm. This CL also adds 2 simple tests: ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N step 4 { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 step 4 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5) for %i6 = 0 to 3 { %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32> store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32> dealloc %1 : memref<5x4x3xf32> ``` and ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32> ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { for %i6 = 0 to 3 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32> store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32> %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>> dealloc %1 : memref<5x4x3xf32> ``` PiperOrigin-RevId: 224552717	2019-03-29 14:23:05 -07:00
Nicolas Vasilache	879be718a0	[MLIR] Fix the name of the MaterializeVectorPass PiperOrigin-RevId: 224536381	2019-03-29 14:22:49 -07:00
Smit Hinsu	adca59e4f7	Return bool from all emitError methods similar to Operation::emitOpError This simplifies call-sites returning true after emitting an error. After the conversion, dropped braces around single statement blocks as that seems more common. Also, switched to emitError method instead of emitting Error kind using the emitDiagnostic method. TESTED with existing unit tests PiperOrigin-RevId: 224527868	2019-03-29 14:22:06 -07:00
Nicolas Vasilache	13bc77045e	[MLIR] Drop assert for NYI in Vectorize.cpp This CLs adds proper error emission, removes NYI assertions and documents assumptions that are required in the relevant functions. PiperOrigin-RevId: 224377207	2019-03-29 14:21:37 -07:00
Nicolas Vasilache	5b610630b2	[MLIR] Error handling in MaterializeVectors This removes assertions as a means to capture NYI behavior and propagates errors up. PiperOrigin-RevId: 224376935	2019-03-29 14:20:37 -07:00
Nicolas Vasilache	4adc169bd0	[MLIR] Add AffineMap composition and use it in Materialization This CL adds the following free functions: ``` /// Returns the AffineExpr e o m. AffineExpr compose(AffineExpr e, AffineMap m); /// Returns the AffineExpr f o g. AffineMap compose(AffineMap f, AffineMap g); ``` This addresses the issue that AffineMap composition is only available at a distance via AffineValueMap and is thus unusable on Attributes. This CL thus implements AffineMap composition in a more modular and composable way. This CL does not claim that it can be a good replacement for the implementation in AffineValueMap, in particular it does not support bounded maps atm. Standalone tests are added that replicate some of the logic of the AffineMap composition pass. Lastly, affine map composition is used properly inside MaterializeVectors and a standalone test is added that requires permutation_map composition with a projection map. PiperOrigin-RevId: 224376870	2019-03-29 14:20:22 -07:00
Nicolas Vasilache	df0a25efee	[MLIR] Add support for permutation_map This CL hooks up and uses permutation_map in vector_transfer ops. In particular, when going into the nuts and bolts of the implementation, it became clear that cases arose that required supporting broadcast semantics. Broadcast semantics are thus added to the general permutation_map. The verify methods and tests are updated accordingly. Examples of interest include. Example 1: The following MLIR snippet: ```mlir for %i3 = 0 to %M { for %i4 = 0 to %N { for %i5 = 0 to %P { %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32> }}} ``` may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into: ```mlir for %i3 = 0 to %0 step 32 { for %i4 = 0 to %1 { for %i5 = 0 to %2 step 256 { %4 = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: (d0, d1, d2) -> (d2, d1)} : (memref<?x?x?xf32>, index, index) -> vector<32x256xf32> }}} ```` Meaning that vector_transfer_read will be responsible for reading the 2-D slice: `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will require a transposition when vector_transfer_read is further lowered. Example 2: The following MLIR snippet: ```mlir %cst0 = constant 0 : index for %i0 = 0 to %M { %a0 = load %A[%cst0, %cst0] : memref<?x?xf32> } ``` may vectorize with {permutation_map: (d0) -> (0)} into: ```mlir for %i0 = 0 to %0 step 128 { %3 = vector_transfer_read %arg0, %c0_0, %c0_0 {permutation_map: (d0, d1) -> (0)} : (memref<?x?xf32>, index, index) -> vector<128xf32> } ```` Meaning that vector_transfer_read will be responsible of reading the 0-D slice `%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector broadcast when vector_transfer_read is further lowered. Additionally, some minor cleanups and refactorings are performed. One notable thing missing here is the composition with a projection map during materialization. This is because I could not find an AffineMap composition that operates on AffineMap directly: everything related to composition seems to require going through SSAValue and only operates on AffinMap at a distance via AffineValueMap. I have raised this concern a bunch of times already, the followup CL will actually do something about it. In the meantime, the projection is hacked at a minimum to pass verification and materialiation tests are temporarily incorrect. PiperOrigin-RevId: 224376828	2019-03-29 14:20:07 -07:00
Alex Zinenko	7c89a225cf	ConvertToCFG: support min/max in loop bounds. The recently introduced `select` operation enables ConvertToCFG to support min(max) in loop bounds. Individual min(max) is implemented as `cmpi "lt"`(`cmpi "gt"`) followed by a `select` between the compared values. Multiple results of an `affine_apply` operation extracted from the loop bounds are reduced using min(max) in a sequential manner. While this may decrease the potential for instruction-level parallelism, it is easier to recognize for the following passes, in particular for the vectorizer. PiperOrigin-RevId: 224376233	2019-03-29 14:19:52 -07:00
Alex Zinenko	513d6d896c	OpPointer: replace conversion operator to Operation* to OpType. The implementation of OpPointer<OpType> provides an implicit conversion to Operation , but not to the underlying OpType . This has led to awkward-looking code when an OpPointer needs to be passed to a function accepting an OpType . For example, if (auto someOp = genericOp.dyn_cast<OpType>()) someFunction(&someOp); where "&" makes it harder to read. Arguably, one does not want to spell out OpPointer<OpType> in the line with dyn_cast. More generally, OpPointer is now being used as an owning pointer to OpType rather than to operation. Replace the implicit conversion to Operation* with the conversion to OpType* taking into account const-ness of the type. An Operation* can be obtained from an OpType with a simple call. Since an instance of OpPointer owns the OpType value, the pointer to it is never null. However, the OpType value may not be associated with any Operation*. In this case, return nullptr when conversion is attempted to maintain consistency with the existing null checks. PiperOrigin-RevId: 224368103	2019-03-29 14:19:37 -07:00
Uday Bondhugula	73fc0223e4	Fix cases where unsigned / signed arithmetic was being mixed (following up on cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds. - while on this, sweep through and fix potential repeated evaluation of expressions in loop upper bounds PiperOrigin-RevId: 224268918	2019-03-29 14:19:22 -07:00
Uday Bondhugula	a92130880e	Complete multiple unhandled cases for DmaGeneration / getMemRefRegion; update/improve/clean up API. - update FlatAffineConstraints::getConstBoundDifference; return constant differences between symbolic affine expressions, look at equalities as well. - fix buffer size computation when generating DMAs symbolic in outer loops, correctly handle symbols at various places (affine access maps, loop bounds, loop IVs outer to the depth at which DMA generation is being done) - bug fixes / complete some TODOs for getMemRefRegion - refactor common code b/w memref dependence check and getMemRefRegion - FlatAffineConstraints API update; added methods employ trivial checks / detection - sufficient to handle hyper-rectangular cases in a precise way while being fast / low complexity. Hyper-rectangular cases fall out as trivial cases for these methods while other cases still do not cause failure (either return conservative or return failure that is handled by the caller). PiperOrigin-RevId: 224229879	2019-03-29 14:18:22 -07:00
Alex Zinenko	7868abd9d8	ConvertToCFG: convert "if" statements. The condition of the "if" statement is an integer set, defined as a conjunction of affine constraints. An affine constraints consists of an affine expression and a flag indicating whether the expression is strictly equal to zero or is also allowed to be greater than zero. Affine maps, accepted by `affine_apply` are also formed from affine expressions. Leverage this fact to implement the checking of "if" conditions. Each affine expression from the integer set is converted into an affine map. This map is applied to the arguments of the "if" statement. The result of the application is compared with zero given the equality flag to obtain the final boolean value. The conjunction of conditions is tested sequentially with short-circuit branching to the "else" branch if any of the condition evaluates to false. Create an SESE region for the if statement (including its "then" and optional "else" statement blocks) and append it to the end of the current region. The conditional region consists of a sequence of condition-checking blocks that implement the short-circuit scheme, followed by a "then" SESE region and an "else" SESE region, and the continuation block that post-dominates all blocks of the "if" statement. The flow of blocks that correspond to the "then" and "else" clauses are constructed recursively, enabling easy nesting of "if" statements and if-then-else-if chains. Note that MLIR semantics does not require nor prohibit short-circuit evaluation. Since affine expressions do not have side effects, there is no observable difference in the program behavior. We may trade off extra operations for operation-level parallelism opportunity by first performing all `affine_apply` and comparison operations independently, and then performing a tree pattern reduction of the resulting boolean values with the `muli i1` operations (in absence of the dedicated bit operations). The pros and cons are not clear, and since MLIR does not include parallel semantics, we prefer to minimize the number of sequentially executed operations. PiperOrigin-RevId: 223970248	2019-03-29 14:16:10 -07:00
Nicolas Vasilache	b39d1f0bdb	[MLIR] Add VectorTransferOps This CL implements and uses VectorTransferOps in lieu of the former custom call op. Tests are updated accordingly. VectorTransferOps come in 2 flavors: VectorTransferReadOp and VectorTransferWriteOp. VectorTransferOps can be thought of as a backend-independent pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before it can be lowered to backend-dependent IR. Note that the current implementation does not yet support a real permutation map. Proper support will come in a followup CL. VectorTransferReadOp ==================== VectorTransferReadOp performs a blocking read from a scalar memref location into a super-vector of the same elemental type. This operation is called 'read' by opposition to 'load' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer read has semantics similar to a vector load, with additional support for: 1. an optional value of the elemental type of the MemRef. This value supports non-effecting padding and is inserted in places where the vector read exceeds the MemRef bounds. If the value is not specified, the access is statically guaranteed to be within bounds; 2. an attribute of type AffineMap to specify a slice of the original MemRef access and its transposition into the super-vector shape. The permutation_map is an unbounded AffineMap that must represent a permutation from the MemRef dim space projected onto the vector dim space. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32> ... %val = `ssa-value` : f32 // let %i, %j, %k, %l be ssa-values of type index %v0 = vector_transfer_read %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index) -> vector<16x32x64xf32> %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index, f32) -> vector<16x32x64xf32> ``` VectorTransferWriteOp ===================== VectorTransferWriteOp performs a blocking write from a super-vector to a scalar memref of the same elemental type. This operation is called 'write' by opposition to 'store' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer write has semantics similar to a vector store, with additional support for handling out-of-bounds situations. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>. %val = `ssa-value` : vector<16x32x64xf32> // let %i, %j, %k, %l be ssa-values of type index vector_transfer_write %val, %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index) ``` PiperOrigin-RevId: 223873234	2019-03-29 14:15:25 -07:00
Uday Bondhugula	5f76245cfe	Minor fix for replaceAllMemRefUsesWith. The check for whether the memref was used in a non-derefencing context had to be done inside, i.e., only for the op stmt's that the replacement was specified to be performed on (by the domStmtFilter arg if provided). As such, it is completely fine for example for a function to return a memref while the replacement is being performed only a specific loop's body (as in the case of DMA generation). PiperOrigin-RevId: 223827753	2019-03-29 14:14:43 -07:00
River Riddle	7669a259c4	Add a simple common sub expression elimination pass. The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches. PiperOrigin-RevId: 223811328	2019-03-29 14:14:28 -07:00
Uday Bondhugula	a619b5c295	Debug output / logging memref sizes in DMA generation + related changes - Add method to get a memref's size in bytes - clean up a loop tiling pass helper (NFC) PiperOrigin-RevId: 223422077	2019-03-29 14:12:56 -07:00
Chris Lattner	3f2530cdf5	Split "rewrite" functionality out of Pattern into a new RewritePattern derived class. This change is NFC, but allows for new kinds of patterns, specifically LegalizationPatterns which will be allowed to change the types of things they rewrite. PiperOrigin-RevId: 223243783	2019-03-29 14:12:07 -07:00
Alex Zinenko	68e9721aa8	Rename Deaffinator to LowerAffineApply and patch it. Several things were suggested in post-submission reviews. In particular, use pointers in function interfaces instead of references (still use references internally). Clarify the behavior of the pass in presence of MLFunctions. PiperOrigin-RevId: 222556851	2019-03-29 14:08:59 -07:00
Nicolas Vasilache	63bc6d2f6a	[MLIR] Fix opt build PiperOrigin-RevId: 222491353	2019-03-29 14:08:45 -07:00
Nicolas Vasilache	a5782f0d40	[MLIR][MaterializeVectors] Add a MaterializeVector pass via unrolling. This CL adds an MLIR-MLIR pass which materializes super-vectors to hardware-dependent sized vectors. While the physical vector size is target-dependent, the pass is written in a target-independent way: the target vector size is specified as a parameter to the pass. This pass is thus a partial lowering that opens the "greybox" that is the super-vector abstraction. This first CL adds a first materilization pass iterates over vector_transfer_write operations and: 1. computes the program slice including the current vector_transfer_write; 2. computes the multi-dimensional ratio of super-vector shape to hardware vector shape; 3. for each possible multi-dimensional value within the bounds of ratio, a new slice is instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on the hardware vector type. As a simple example, given: ```mlir mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> for %i0 = 0 to %M { for %i1 = 0 to %N { %a1 = load %A[%i0, %i1] : memref<?x?xf32> %b1 = load %B[%i0, %i1] : memref<?x?xf32> %s1 = addf %a1, %b1 : f32 store %s1, %C[%i0, %i1] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` and the following options: ``` -vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8 ``` materialization emits: ```mlir #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0, d1 + 8) #map2 = (d0, d1) -> (d0, d1 + 16) #map3 = (d0, d1) -> (d0, d1 + 24) mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 32 { %3 = affine_apply #map0(%i0, %i1) %4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %5 = affine_apply #map1(%i0, %i1) %6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %7 = affine_apply #map2(%i0, %i1) %8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %9 = affine_apply #map3(%i0, %i1) %10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %11 = affine_apply #map0(%i0, %i1) %12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %13 = affine_apply #map1(%i0, %i1) %14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %15 = affine_apply #map2(%i0, %i1) %16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %17 = affine_apply #map3(%i0, %i1) %18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %19 = addf %4, %12 : vector<8xf32> %20 = addf %6, %14 : vector<8xf32> %21 = addf %8, %16 : vector<8xf32> %22 = addf %10, %18 : vector<8xf32> %23 = affine_apply #map0(%i0, %i1) "vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %24 = affine_apply #map1(%i0, %i1) "vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %25 = affine_apply #map2(%i0, %i1) "vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %26 = affine_apply #map3(%i0, %i1) "vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` PiperOrigin-RevId: 222455351	2019-03-29 14:08:31 -07:00
Nicolas Vasilache	258dae5d73	[MLIR][Slicing] Apply cleanups This CL applies a few last cleanups from a previous CL that have been missed during the previous submit. PiperOrigin-RevId: 222454774	2019-03-29 14:08:17 -07:00
Nicolas Vasilache	5c16564bca	[MLIR][Slicing] Add utils for computing slices. This CL adds tooling for computing slices as an independent CL. The first consumer of this analysis will be super-vector materialization in a followup CL. In particular, this adds: 1. a getForwardStaticSlice function with documentation, example and a standalone unit test; 2. a getBackwardStaticSlice function with documentation, example and a standalone unit test; 3. a getStaticSlice function with documentation, example and a standalone unit test; 4. a topologicalSort function that is exercised through the getStaticSlice unit test. The getXXXStaticSlice functions take an additional root (resp. terminators) parameter which acts as a boundary that the transitive propagation algorithm is not allowed to cross. PiperOrigin-RevId: 222446208	2019-03-29 14:08:02 -07:00
Uday Bondhugula	2631b155a9	Fix bugs in DMA generation and FlatAffineConstraints; add more test cases. - fix bug in calculating index expressions for DMA buffers in certain cases (affected tiled loop nests); add more test cases for better coverage. - introduce an additional optional argument to replaceAllMemRefUsesWith; additional operands to the index remap AffineMap can now be supplied by the client. - FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound, ::composeMap - fix position bug. - Some clean up and more comments PiperOrigin-RevId: 222434628	2019-03-29 14:07:31 -07:00
Alex Zinenko	615c41c788	Introduce Deaffinator pass. This function pass replaces affine_apply operations in CFG functions with sequences of primitive arithmetic instructions that form the affine map. The actual replacement functionality is located in LoweringUtils as a standalone function operating on an individual affine_apply operation and inserting the result at the location of the original operation. It is expected to be useful for other, target-specific lowering passes that may start at MLFunction level that Deaffinator does not support. PiperOrigin-RevId: 222406692	2019-03-29 14:07:16 -07:00
Uday Bondhugula	b6c03917ad	Remove allocations for memref's that become dead as a result of double buffering in the auto DMA overlap pass. This is done online in the pass. PiperOrigin-RevId: 222313640	2019-03-29 14:05:19 -07:00
Nicolas Vasilache	87d46aaf4b	[MLIR][Vectorize] Refactor Vectorize use-def propagation. This CL refactors a few things in Vectorize.cpp: 1. a clear distinction is made between: a. the LoadOp are the roots of vectorization and must be vectorized eagerly and propagate their value; and b. the StoreOp which are the terminals of vectorization and must be vectorized late (i.e. they do not produce values that need to be propagated). 2. the StoreOp must be vectorized late because in general it can store a value that is not reachable from the subset of loads defined in the current pattern. One trivial such case is storing a constant defined at the top-level of the MLFunction and that needs to be turned into a splat. 3. a description of the algorithm is given; 4. the implementation matches the algorithm; 5. the last example is made parametric, in practice it will fully rely on the implementation of vector_transfer_read/write which will handle boundary conditions and padding. This will happen by lowering to a lower-level abstraction either: a. directly in MLIR (whether DMA or just loops or any async tasks in the future) (whiteboxing); b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor one may want to use; c. a partial mix of a. and b. (grey-boxing) 5. minor cleanups are applied; 6. mistakenly disabled unit tests are re-enabled (oopsie). With this CL, this MLIR snippet: ``` mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> %f1 = constant 1.0 : f32 %f2 = constant 2.0 : f32 for %i0 = 0 to %M { for %i1 = 0 to %N { // non-scoped %f1 store %f1, %A[%i0, %i1] : memref<?x?xf32> } } for %i4 = 0 to %M { for %i5 = 0 to %N { %a5 = load %A[%i4, %i5] : memref<?x?xf32> %b5 = load %B[%i4, %i5] : memref<?x?xf32> %s5 = addf %a5, %b5 : f32 // non-scoped %f1 %s6 = addf %s5, %f1 : f32 store %s6, %C[%i4, %i5] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` vectorized with these arguments: ``` -vectorize -virtual-vector-size 256 --test-fastest-varying=0 ``` vectorization produces this standard innermost-loop vectorized code: ``` mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> %cst = constant 1.000000e+00 : f32 %cst_0 = constant 2.000000e+00 : f32 for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 256 { %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } for %i2 = 0 to %arg0 { for %i3 = 0 to %arg1 step 256 { %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %5 = addf %3, %4 : vector<256xf32> %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> %6 = addf %5, %cst_2 : vector<256xf32> "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now. PiperOrigin-RevId: 222280209	2019-03-29 14:03:50 -07:00
Alex Zinenko	f986d5920b	ConvertToCFG: handle loop 1D affine loop bounds. In the general case, loop bounds can be expressed as affine maps of the outer loop iterators and function arguments. Relax the check for loop bounds to be known integer constants and also accept one-dimensional affine bounds in ConvertToCFG ForStmt lowering. Emit affine_apply operations for both the upper and the lower bound. The semantics of MLFunctions guarantees that both bounds can be computed before the loop starts iterating. Constant bounds are merely a short-hand notation for zero-dimensional affine maps and get supported transparently. Multidimensional affine bounds are not yet supported because the target IR dialect lacks min/max operations necessary to implement the corresponding semantics. PiperOrigin-RevId: 222275801	2019-03-29 14:03:20 -07:00
Jacques Pienaar	d0590caa90	Add op stats pass to mlir-opt. op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting. PiperOrigin-RevId: 222259727	2019-03-29 14:02:46 -07:00
Nicolas Vasilache	89d9913a20	[MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests This CL adds some vector support in prevision of the upcoming vector materialization pass. In particular this CL adds 2 functions to: 1. compute the multiplicity of a subvector shape in a supervector shape; 2. help match operations on strict super-vectors. This is defined for a given subvector shape as an operation that manipulates a vector type that is an integral multiple of the subtype, with multiplicity at least 2. This CL also adds a TestUtil pass where we can dump arbitrary testing of functions and analysis that operate at a much smaller granularity than a pass (e.g. an analysis for which it is convenient to write a bit of artificial MLIR and write some custom test). This is in order to keep using Filecheck for things that essentially look and feel like C++ unit tests. PiperOrigin-RevId: 222250910	2019-03-29 14:02:17 -07:00
Uday Bondhugula	fff1efbaf5	Updates to transformation/analysis passes/utilities. Update DMA generation pass and getMemRefRegion() to work with specified loop depths; add support for outgoing DMAs, store op's. - add support for getMemRefRegion symbolic in outer loops - hence support for DMAs symbolic in outer surrounding loops. - add DMA generation support for outgoing DMAs (store op's to lower memory space); extend getMemoryRegion to store op's. -memref-bound-check now works with store op's as well. - fix dma-generate (references to the old memref in the dma_start op were also being replaced with the new buffer); we need replace all memref uses to work only on a subset of the uses - add a new optional argument for replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional 'operation' argument to serve as a filter - if provided, only those uses that are dominated by the filter are replaced. - Add missing print for attributes for dma_start, dma_wait op's. - update the FlatAffineConstraints API PiperOrigin-RevId: 221889223	2019-03-29 14:00:51 -07:00
River Riddle	d34fcce2a7	[MLIR] Rename OperationInst to Instruction. PiperOrigin-RevId: 221795407	2019-03-29 14:00:09 -07:00
River Riddle	503caf0722	Replace TerminatorInst with builtin terminator operations. Note: Terminators will be merged into the operations list in a follow up patch. PiperOrigin-RevId: 221670037	2019-03-29 13:58:55 -07:00
Alex Zinenko	d030433443	ConvertToCFG: properly remap nested function attributes. Array attributes can nested and function attributes can appear anywhere at that level. They should be remapped to point to the generated CFGFunction after ML-to-CFG conversion, similarly to plain function attributes. Extract the nested attribute remapping functionality from the Parser to Utils. Extract out the remapping function for individual Functions from the module remapping function. Use these new functions in the ML-to-CFG conversion pass and in the parser. PiperOrigin-RevId: 221510997	2019-03-29 13:57:58 -07:00
Alex Zinenko	cb40633969	Move definitions of lopoUnroll* functions to LoopUtils.cpp. These functions are declared in Transforms/LoopUtils.h (included to the Transforms/Utils library) but were defined in the loop unrolling pass in Transforms/LoopUnroll.cpp. As a result, targets depending only on TransformUtils library but not on Transforms could get link errors. Move the definitions to Transforms/Utils/LoopUtils.cpp where they should actually live. This does not modify any code. PiperOrigin-RevId: 221508882	2019-03-29 13:57:44 -07:00
Nicolas Vasilache	fefbf91314	[MLIR] Support for vectorizing operations. This CL adds support for and a vectorization test to perform scalar 2-D addf. The support extension notably comprises: 1. extend vectorizable test to exclude vector_transfer operations and expose them to LoopAnalysis where they are needed. This is a temporary solution a concrete MLIR Op exists; 2. add some more functional sugar mapKeys, apply and ScopeGuard (which became relevant again); 3. fix improper shifting during coarsening; 4. rename unaligned load/store to vector_transfer_read/write and simplify the design removing the unnecessary AllocOp that were introduced prematurely: vector_transfer_read currently has the form: (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32> vector_transfer_write currently has the form: (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> () 5. adds vectorizeOperations which traverses the operations in a ForStmt and rewrites them to their vector form; 6. add support for vector splat from a constant. The relevant tests are also updated. PiperOrigin-RevId: 221421426	2019-03-29 13:56:47 -07:00
Alex Zinenko	5a0d3d0204	Basic conversion of MLFunctions to CFGFunctions. Implement a pass converting a subset of MLFunctions to CFGFunctions. Currently supports arbitrarily complex imperfect loop nests with statically constant (i.e., not affine map) bounds filled with operations. Does NOT support branches and non-constant loop bounds. Conversion is performed per-function and the function names are preserved to avoid breaking any external references to the current module. In-memory IR is updated to point to the right functions in direct calls and constant loads. This behavior is tested via a really hidden flag that enables function renaming. Inside each function, the control flow conversion is based on single-entry single-exit regions, i.e. subgraphs of the CFG that have exactly one incoming and exactly one outgoing edge. Since an MLFunction must have a single "return" statement as per MLIR spec, it constitutes an SESE region. Individual operations are appended to this region. Control flow statements are recursively converted into such regions that are concatenated with the current region. Bodies of the compound statement also form SESE regions, which allows to nest control flow statements easily. Note that SESE regions are not materialized in the code. It is sufficent to keep track of the end of the region as the current instruction insertion point as long as all recursive calls update the insertion point in the end. The converter maintains a mapping between SSA values in ML functions and their CFG counterparts. The mapping is used to find the operands for each operation and is updated to contain the results of each operation as the conversion continues. PiperOrigin-RevId: 221162602	2019-03-29 13:55:22 -07:00
Jacques Pienaar	25e6b541cd	Switch IntegerAttr to use APInt. Change the storage type to APInt from int64_t for IntegerAttr (following the change to APFloat storage in FloatAttr). Effectively a direct change from int64_t to 64-bit APInt throughout (the bitwidth hardcoded). This change also adds a getInt convenience method to IntegerAttr and replaces previous getValue calls with getInt calls. While this changes updates the storage type, it does not update all constant folding calls. PiperOrigin-RevId: 221082788	2019-03-29 13:55:08 -07:00
MLIR Team	b5424dd0cb	Adds support for returning the direction of the dependence between memref accesses (distance/direction vectors). Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths. Updates old and adds new memref dependence check tests. Resolves multiple TODOs. PiperOrigin-RevId: 220816515	2019-03-29 13:53:28 -07:00
Uday Bondhugula	e0623d4b86	Automatic DMA generation for simple cases. - constant bounded memory regions, static shapes, no handling of overlapping/duplicate regions (through union) for now; also only, load memory op's. - add build methods for DmaStartOp, DmaWaitOp. - move getMemoryRegion() into Analysis/Utils and expose it. - fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds; update test cases for memref-bound-check and memref-dependence-check for exclusive bounds (missed in a previous CL) PiperOrigin-RevId: 220729810	2019-03-29 13:53:14 -07:00
River Riddle	2fa4bc9fc8	Implement value type abstraction for locations. Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>. PiperOrigin-RevId: 220682078	2019-03-29 13:52:31 -07:00
Uday Bondhugula	23ddd577ef	Complete migration to exclusive upper bound cl/220448963 had missed a part of the updates. - while on this, clean up some of the test cases to use ops' custom forms. PiperOrigin-RevId: 220675303	2019-03-29 13:52:17 -07:00
Jacques Pienaar	cc9a6ed09d	Initialize Pass with PassID. The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo. PiperOrigin-RevId: 220485662	2019-03-29 13:50:34 -07:00
Nicolas Vasilache	cde8248753	[MLIR] Make upper bound implementation exclusive This CL implement exclusive upper bound behavior as per b/116854378. A followup CL will update the semantics of the for loop. PiperOrigin-RevId: 220448963	2019-03-29 13:49:49 -07:00
Jacques Pienaar	6f0fb22723	Add static pass registration Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage. Change build targets to alwayslink to enforce registration. PiperOrigin-RevId: 220390178	2019-03-29 13:49:34 -07:00
Uday Bondhugula	6cd5d5c544	Introduce loop tiling code generation (hyper-rectangular case) - simple perfectly nested band tiling with fixed tile sizes. - only the hyper-rectangular case is handled, with other limitations of getIndexSet applying (constant loop bounds, etc.); once the latter utility is extended, tiled code generation should become more general. - Add FlatAffineConstraints::isHyperRectangular() PiperOrigin-RevId: 220324933	2019-03-29 13:49:05 -07:00
MLIR Team	f28e4df666	Adds a dependence check to test whether two accesses to the same memref access the same element. - Builds access functions and iterations domains for each access. - Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds. - Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses. - Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops. - The LoopFusion pass will be extended in subsequent CLs. PiperOrigin-RevId: 219630898	2019-03-29 13:47:13 -07:00
Nicolas Vasilache	21638dcda9	[MLIR] Extend vectorization to 2+-D patterns This CL adds support for vectorization using more interesting 2-D and 3-D patterns. Note in particular the fact that we match some pretty complex imperfectly nested 2-D patterns with a quite minimal change to the implementation: we just add a bit of recursion to traverse the matched patterns and actually vectorize the loops. For instance, vectorizing the following loop by 128: ``` for %i3 = 0 to %0 { %7 = affine_apply (d0) -> (d0)(%i3) %8 = load %arg0[%c0_0, %7] : memref<?x?xf32> } ``` Currently generates: ``` #map0 = ()[s0] -> (s0 + 127) #map1 = (d0) -> (d0) for %i3 = 0 to #map0()[%0] step 128 { %9 = affine_apply #map1(%i3) %10 = alloc() : memref<1xvector<128xf32>> %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) : (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) -> (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) %12 = load %10[%c0] : memref<1xvector<128xf32>> } ``` The above is subject to evolution. PiperOrigin-RevId: 219629745	2019-03-29 13:46:58 -07:00
Jacques Pienaar	e1f9e65b9a	Enable constructing a FuncBuilder using a Operation*. FuncBuilder is useful to build a operation to replace an existing operation, so change the constructor to allow constructing it with an existing operation. Change FuncBuilder to contain (effectively) a tagged union of CFGFuncBuilder and MLFuncBuilder (as these should be cheap to copy and avoid allocating/deletion when created via a operation). PiperOrigin-RevId: 219532952	2019-03-29 13:46:22 -07:00
Uday Bondhugula	8201e19e3d	Introduce memref bound checking. Introduce analysis to check memref accesses (in MLFunctions) for out of bound ones. It works as follows: $ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0 * 128 - d1) mlfunc @test() { %0 = alloc() : memref<9x9xi32> %1 = alloc() : memref<128xi32> for %i0 = -1 to 9 { for %i1 = -1 to 9 { %2 = affine_apply #map0(%i0, %i1) %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32> %4 = affine_apply #map1(%i0, %i1) %5 = load %1[%4] : memref<128xi32> } } return } - Improves productivity while manually / semi-automatically developing MLIR for testing / prototyping; also provides an indirect way to catch errors in transformations. - This pass is an easy way to test the underlying affine analysis machinery including low level routines. Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256. While on this: - create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/ - fix a bug in AffineAnalysis.cpp::toAffineExpr TODO: extend to non-constant loop bounds (straightforward). Will transparently work for all accesses once floordiv, mod, ceildiv are supported in the AffineMap -> FlatAffineConstraints conversion. PiperOrigin-RevId: 219397961	2019-03-29 13:46:08 -07:00
River Riddle	4c465a181d	Implement value type abstraction for types. This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 219372163	2019-03-29 13:45:54 -07:00
Nicolas Vasilache	af7f56fdf8	[MLIR] Implement 1-D vectorization for fastest varying load/stores This CL is a first in a series that implements early vectorization of increasingly complex patterns. In particular, early vectorization will support arbitrary loop nesting patterns (both perfectly and imperfectly nested), at arbitrary depths in the loop tree. This first CL builds the minimal support for applying 1-D patterns. It relies on an unaligned load/store op abstraction that can be inplemented differently on different HW. Future CLs will support higher dimensional patterns, but 1-D patterns already exhibit interesting properties. In particular, we want to separate pattern matching (i.e. legality both structural and dependency analysis based), from profitability analysis, from application of the transformation. As a consequence patterns may intersect and we need to verify that a pattern can still apply by the time we get to applying it. A non-greedy analysis on profitability that takes into account pattern intersection is left for future work. Additionally the CL makes the following cleanups: 1. the matches method now returns a value, not a reference; 2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by value; 3. added size and empty methods to matches; 4. added a negative vectorization test with a conditional, this exhibited a but in the iterators. Iterators now return nullptr if the underlying storage is nullpt. PiperOrigin-RevId: 219299489	2019-03-29 13:44:26 -07:00
Chris Lattner	085b687fbd	Add support for walking the use list of an SSAValue and converting owners to Operation*'s, simplifying some code in GreedyPatternRewriteDriver.cpp. Also add print/dump methods on Operation. PiperOrigin-RevId: 219045764	2019-03-29 13:43:01 -07:00
Chris Lattner	967d934180	Fix two issues: 1) We incorrectly reassociated non-reassociative operations like subi, causing miscompilations. 2) When constant folding, we didn't add users of the new constant back to the worklist for reprocessing, causing us to miss some cases (pointed out by Uday). The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch. PiperOrigin-RevId: 218803984	2019-03-29 13:40:35 -07:00
Chris Lattner	adbba70d82	Simplify FunctionPass to eliminate the CFGFunctionPass/MLFunctionPass distinction. FunctionPasses can now choose to get called on all functions, or have the driver split CFG/ML Functions up for them. NFC. PiperOrigin-RevId: 218775885	2019-03-29 13:40:05 -07:00
Chris Lattner	7de0da9594	Refactor all of the canonicalization patterns out of the Canonicalize pass, and make operations provide a list of canonicalizations that can be applied to them. This allows canonicalization to be general to any IR definition. As part of this, sink PatternMatch.h/cpp down to the IR library to fix a layering problem. PiperOrigin-RevId: 218773981	2019-03-29 13:39:49 -07:00
River Riddle	792d1c25e4	Implement value type abstraction for attributes. This is done by changing Attribute to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 218764173	2019-03-29 13:39:19 -07:00
Chris Lattner	64d52014bd	Move transform utilities out to their own TransformUtils library, instead of just having the pattern matcher in its own library. At this point, lib/Transforms/*.cpp are all actually passes themselves (and will probably eventually be themselves move to a new subdirectory as we accrete more). PiperOrigin-RevId: 218745193	2019-03-29 13:39:06 -07:00
Chris Lattner	92285814e2	Refactor the bulk of the worklist driver out of the canonicalizer into its own helper function, in preparation for it being used by other passes. There is still a lot of room for improvement in its design, this patch is intended as an NFC refactoring, and the improvements will continue after this lands. PiperOrigin-RevId: 218737116	2019-03-29 13:38:52 -07:00
Uday Bondhugula	80610c2f49	Introduce Fourier-Motzkin variable elimination + other cleanup/support - Introduce Fourier-Motzkin variable elimination to eliminate a dimension from a system of linear equalities/inequalities. Update isEmpty to use this. Since FM is only exact on rational/real spaces, an emptiness check based on this is guaranteed to be exact whenever it says the underlying set is empty; if it says, it's not empty, there may still be no integer points in it. Also, supports a version that computes "dark shadows". - Test this by checking for "always false" conditionals in if statements. - Unique IntegerSet's that are small (few constraints, few variables). This basically means the canonical empty set and other small sets that are likely commonly used get uniqued; allows checking for the canonical empty set by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size for uniqui'ing. - rename simplify-affine-expr -> simplify-affine-structures Other cleanup - IntegerSet::numConstraints, AffineMap::numResults are no longer needed; remove them. - add copy assignment operators for AffineMap, IntegerSet. - rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet - Misc cleanup for FlatAffineConstraints API PiperOrigin-RevId: 218690456	2019-03-29 13:38:24 -07:00
MLIR Team	5413239350	Adds Gaussian Elimination to FlatAffineConstraints. - Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system. - Adds GCD test check if equality constraints have no solution. - Adds unit test cases. PiperOrigin-RevId: 218546319	2019-03-29 13:38:10 -07:00
Lei Zhang	52a0e58bdb	Change typedef to using to be consistent across the codebase Google C++ style guide also prefers using to typedef. PiperOrigin-RevId: 218541849	2019-03-29 13:37:55 -07:00
Chris Lattner	bd01f9541f	Teach canonicalize pass to unique and hoist constants to the entry block. This is a straight-forward change, but required adding missing moveBefore() methods on operations (requiring moving some traits around to make C++ happy). This also fixes a constness issue with the getBlock/getFunction() methods on Instruction, and adds a missing getFunction() method on MLFuncBuilder. PiperOrigin-RevId: 218523905	2019-03-29 13:36:59 -07:00
Chris Lattner	301f83f906	Implement shape folding in the canonicalization pass: - Add a few canonicalization patterns to fold memref_cast into load/store/dealloc. - Canonicalize alloc(constant) into an alloc with a constant shape followed by a cast. - Add a new PatternRewriter::updatedRootInPlace API to make this more convenient. SimplifyAllocConst and the testcase is heavily based on Uday's implementation work, just in a different framework. PiperOrigin-RevId: 218361237	2019-03-29 13:36:31 -07:00
Uday Bondhugula	ccfe593715	PassResult return cleanup. - return success as long as IR is in a valid state. PiperOrigin-RevId: 218225317	2019-03-29 13:35:47 -07:00
Chris Lattner	a03051b9c4	Add a pattern (x+0) -> x, generalize Canonicalize to CFGFunc's, address a few TODOs, and add some casting support to Operation. PiperOrigin-RevId: 218219340	2019-03-29 13:35:33 -07:00

... 3 4 5 6 7 ...

519 Commits