llvm-project

Commit Graph

Author	SHA1	Message	Date
River Riddle	66ed7d6d83	Update the OperationFolder to find a valid insertion point when materializing constants. The OperationFolder currently just inserts into the entry block of a Function, but regions may be isolated above, i.e. explicit capture only, and blindly inserting constants may break the invariants of these regions. PiperOrigin-RevId: 254987796	2019-06-25 09:43:21 -07:00
Nicolas Vasilache	dac75ae5ff	Split test-specific passes out of mlir-opt Instead put their impl in test/lib and link them into mlir-test-opt PiperOrigin-RevId: 254837439	2019-06-24 17:47:12 -07:00
River Riddle	b67cab4c44	Update CSE to respect nested regions that are isolated from above. This cl also removes the unused 'NthRegionIsIsolatedFromAbove' trait as it was replaced with a more general 'IsIsolatedFromAbove'. PiperOrigin-RevId: 254709704	2019-06-24 13:44:53 -07:00
River Riddle	704a7fb13e	Add support for 1->N type mappings in the dialect conversion infrastructure. To support these mappings a hook must be overridden on the type converter: 'materializeConversion' :to generate a cast operation from the new types to the old type. This operation is automatically erased if all uses are removed, otherwise it remains in the IR for the user to handle. PiperOrigin-RevId: 254411383	2019-06-22 09:16:06 -07:00
River Riddle	9764ae3f24	Refactor the TypeConverter to support more robust type conversions: * Support for 1->0 type mappings, i.e. when the argument is being removed. * Reordering types when converting a type signature. * Adding new inputs when converting a type signature. This cl also lays down the initial foundation for supporting 1->N type mappings, but full support will come in a followup. Moving forward, function signature changes will be driven by populating a SignatureConversion instance. This class contains all of the necessary information for adding/removing/remapping function signatures; e.g. addInputs, addResults, remapInputs, etc. PiperOrigin-RevId: 254064665	2019-06-19 23:08:33 -07:00
Geoffrey Martin-Noble	fd99b6ce97	Remove unnecessary -verify-diagnostics These were likely added in error because of confusion about the flag when it was just called "-verify". The extra flag doesn't cause much harm, but it does make mlir-opt do more work and clutter the RUN line PiperOrigin-RevId: 254037016	2019-06-19 23:08:13 -07:00
Geoffrey Martin-Noble	d7d69569e7	Rename -verify mlir-opt flag to -verify-expected-diagnostics This name has caused some confusion because it suggests that it's running op verification (and that this verification isn't getting run by default). PiperOrigin-RevId: 254035268	2019-06-19 23:08:03 -07:00
Andy Davis	898cf0e968	LoopFusion: adds support for computing forward computation slices, which will enable fusion of consumer loop nests into their producers in subsequent CLs. PiperOrigin-RevId: 253601994	2019-06-19 23:03:42 -07:00
River Riddle	6a0555a875	Refactor SplatElementsAttr to inherit from DenseElementsAttr as opposed to being a separate Attribute type. DenseElementsAttr provides a better internal representation for splat values as well as better API for accessing elements. PiperOrigin-RevId: 253138287	2019-06-19 23:01:52 -07:00
River Riddle	5da741f671	Add basic cost modeling to the dialect conversion infrastructure. This initial cost model favors specific patterns based upon two criteria: 1) Lowest minimum pattern stack depth when legalizing. - This leads the system to favor patterns that have lower legalization stacks, i.e. represent a more direct mapping to the target. 2) Pattern benefit. - When considering multiple patterns with the same legalization depth, this favors patterns with a larger specified benefit. PiperOrigin-RevId: 252713470	2019-06-19 22:59:06 -07:00
Amit Sabne	7a43da6060	Loop invariant code motion - remove reliance on getForwardSlice. Add more tests. -- PiperOrigin-RevId: 250950703	2019-06-01 20:13:30 -07:00
Rasmus Munk Larsen	861c55e150	Add a rank op to MLIR. Example: %1 = rank %0 : index -- PiperOrigin-RevId: 250505411	2019-06-01 20:06:51 -07:00
Andy Davis	a560f2c646	Affine Loop Fusion Utility Module (1/n). ) Adds LoopFusionUtils which will expose a set of loop fusion utilities (e.g. dependence checks, fusion cost/storage reduction, loop fusion transformation) for use by loop fusion algorithms. Support for checking block-level fusion-preventing dependences is added in this CL (additional loop fusion utilities will be added in subsequent CLs). ) Adds TestLoopFusion test pass for testing LoopFusionUtils at a fine granularity. *) Adds unit test for testing dependence check for block-level fusion-preventing dependences. -- PiperOrigin-RevId: 249861071	2019-06-01 20:00:23 -07:00
River Riddle	1a100849c4	Add support for saving and restoring the insertion point of a FuncBuilder. This also updates the edsc::ScopedContext to use a single builder that saves/restores insertion points. This is necessary for using edscs within RewritePatterns. -- PiperOrigin-RevId: 248812645	2019-05-20 13:46:35 -07:00
Andy Davis	12e31761ce	Fixes a small bug in computing dependence direction vectors, where equality constraint can be created on the wrong loop IVs when source/sink of the dependence are at different loop depths. Adds unit tests for cases where source/sink of the dependence are at varying loop depths. -- PiperOrigin-RevId: 248627490	2019-05-20 13:45:00 -07:00
Tamas Berghammer	9cc5747a7b	Add test for affine-loop-tile pass with a loop of trip count 1 -- PiperOrigin-RevId: 247950156	2019-05-20 13:38:52 -07:00
Chris Lattner	81e478adca	rename -memref-dependence-check to -test-memref-dependence-check since it generates remarks for testing, it isn't itself a transformation. While there, upgrade its diagnostic emission to use the streaming interface. Prune some unnecessary #includes. -- PiperOrigin-RevId: 247768062	2019-05-20 13:36:38 -07:00
Andy Davis	0412bf6f09	Add memref dimension bounds as upper/lower bounds on MemRefRegion constraints, to guard against potential over-approximation from projection. -- PiperOrigin-RevId: 247431201	2019-05-10 19:25:53 -07:00
Andy Davis	6254a42d58	Fix bug in DmaGenerate pass where MemRefRegion union was not propagated to read region. Also cleaned up dma-generate.mlir a bit. -- PiperOrigin-RevId: 247417358	2019-05-10 19:25:44 -07:00
River Riddle	e088f93f0d	Simplify the parser/printer of ConstantOp now that all attributes have types. This has the added benefit of removing type redundancy from the pretty form. As a consequence, IntegerAttr/FloatAttr will now always print the type even if it is i64/f64. -- PiperOrigin-RevId: 247295828	2019-05-10 19:24:30 -07:00
Jacques Pienaar	a1b24a0e08	Verify that attribute type and constant op return type matches. -- PiperOrigin-RevId: 247263129	2019-05-10 19:24:14 -07:00
Geoffrey Martin-Noble	c34386e3e5	CmpFOp. Add float comparison op This closely mirrors the llvm fcmp instruction, defining 16 different predicates Constant folding is unsupported for NaN and Inf because there's no way to represent those as constants at the moment -- PiperOrigin-RevId: 246932358	2019-05-10 19:22:58 -07:00
Geoffrey Martin-Noble	c4891378e2	Add split-input-file to constant fold test Better to keep tests as separate as possible -- PiperOrigin-RevId: 246900564	2019-05-10 19:22:50 -07:00
Alex Zinenko	d3380a504f	Change syntax of regions in the generic form of operations The generic form of operations currently supports optional regions to be located after the operation type. As we are going to add a type to each region in a leading position in the region syntax, similarly to functions, it becomes ambiguous to have regions immediately after the operation type. Put regions between operands the optional list of successors in the generic operation syntax and wrap them in parentheses. The effect on the exisitng IR syntax is minimal since only three operations (`affine.for`, `affine.if` and `gpu.kernel`) currently use regions. -- PiperOrigin-RevId: 246787087	2019-05-06 08:29:48 -07:00
Nicolas Vasilache	258e8d9ce2	Prepend an "affine-" prefix to Affine pass option names - NFC Trying to activate both LLVM and MLIR passes in mlir-cpu-runner showed name collisions when registering pass names. One possible way of disambiguating that should also work across dialects is to prepend the dialect name to the passes that specifically operate on that dialect. With this CL, mlir-cpu-runner tests still run when both LLVM and MLIR passes are registered -- PiperOrigin-RevId: 246539917	2019-05-06 08:26:44 -07:00
River Riddle	b14c4b4ca8	Add support for basic remark diagnostics. This is the minimal functionality needed to separate notes from remarks. It also provides a starting point to start building out better remark infrastructure. -- PiperOrigin-RevId: 246175216	2019-05-06 08:24:02 -07:00
Smit Hinsu	c9b0540b9c	Make identity cast operations with the same operand and result types legal Instead, fold such operations. This way callers don't need to conditionally create cast operations depending on if a value already has the target type. Also, introduce areCastCompatible to allow cast users to verify that the generated op will be valid before creating the operation. TESTED with unit tests -- PiperOrigin-RevId: 245606133	2019-05-06 08:19:37 -07:00
Feng Liu	5c757087c7	Apply patterns repeatly if the function is modified During the pattern rewrite, if the function is changed, i.e. ops created, deleted or swapped, the pattern rewriter needs to re-scan the function entirely and apply the patterns again, so the patterns whose root ops have been popped out from the working list nor an immediate users of the changed ops can be reconsidered. A command line flag is added to set the max number of iterations rescanning the function for pattern match. If the rewrite doesn' converge after this number, this compiling will continue and the result can be sub-optimal. One unit test is updated because this change fixed the missing optimization opportunities. -- PiperOrigin-RevId: 244754190	2019-04-23 22:02:16 -07:00
Amit Sabne	7905da656e	Loop invariant code motion. -- PiperOrigin-RevId: 244043679	2019-04-18 11:49:31 -07:00
Smit Hinsu	074cb4292f	Fix CHECK-EMPTY directives without trailing colon There are no empty lines in output for three of these directives so removed them and replaced the remaining one with 'CHECK-NOT:' as otherwise it is failing with the following error. error: found 'CHECK-EMPTY' without previous 'CHECK: line TESTED = n/a PiperOrigin-RevId: 243288605	2019-04-18 11:48:01 -07:00
Stephan Herhut	af016ba7a4	Add xor bitwise operation to StandardOps. This adds parsing, printing and some folding/canonicalization. Also extends rewriting of subi %0, %0 to handle vectors and tensors. -- PiperOrigin-RevId: 242448164	2019-04-08 19:17:56 -07:00
Stephan Herhut	a8a5c06961	Add and and or bitwise operations to StandardOps. This adds parsing, printing and some folding/canonicalization. -- PiperOrigin-RevId: 242409840	2019-04-08 19:17:50 -07:00
River Riddle	a8f4b9eeeb	Iterate on the operations to fold in TestConstantFold in reverse to remove the need for ConstantFoldHelper to have a flag for insertion at the head of the entry block. This also fixes an asan bug in TestConstantFold due to the iteration order of operations and ConstantFoldHelper's constant insertion placement. Note: This now means that we cannot fold chains of operations, i.e. where constant foldable operations feed into each other. Given that this is a testing pass solely for constant folding, this isn't really something that we want anyways. Constant fold tests should be simple and direct, with more advanced folding/feeding being tested with the canonicalizer. -- PiperOrigin-RevId: 242011744	2019-04-05 07:41:52 -07:00
Lei Zhang	4e40c83291	Deduplicate constant folding logic in ConstantFold and GreedyPatternRewriteDriver There are two places containing constant folding logic right now: the ConstantFold pass and the GreedyPatternRewriteDriver. The logic was not shared and started to drift apart. We were testing constant folding logic using the ConstantFold pass, but lagged behind the GreedyPatternRewriteDriver, where we really want the constant folding to happen. This CL pulled the logic into utility functions and classes for sharing between these two places. A new ConstantFoldHelper class is created to help constant fold and de-duplication. Also, renamed the ConstantFold pass to TestConstantFold to make it clear that it is intended for testing purpose. -- PiperOrigin-RevId: 241971681	2019-04-05 07:41:32 -07:00
River Riddle	6fa3181329	Remove the non-postorder walk functions from Function/Block/Instruction and rename walkPostOrder to walk. -- PiperOrigin-RevId: 241965239	2019-04-05 07:41:23 -07:00
Andy Davis	d0d1b2a30d	Fix bug in LoopTiling where creation of tile-space loop upper bound did not handle symbol operands correctly. -- PiperOrigin-RevId: 241958502	2019-04-05 07:41:12 -07:00
Andy Davis	55014813e3	Adds dependence analysis support for iteration domains with local variables (enables dependence analysis of loops with non-unit step). -- PiperOrigin-RevId: 241957023	2019-04-05 07:41:02 -07:00
Nicolas Vasilache	f1b12f5a64	Fix test that fails on non-determinism in LowerVectorTransfers This CL fixes the non-determinism across compilers in an edsc::select expression used in LowerVectorTransfers. This is achieved by factoring the expression out of the function call to ensure a deterministic order of evaluation. Since the expression is now factored out, fewer IR is generated and the test is updated accordingly. -- PiperOrigin-RevId: 241679962	2019-04-03 01:09:13 -07:00
Andy Davis	7c1fc9e795	Enable producer-consumer fusion for liveout memrefs if consumer read region matches producer write region. -- PiperOrigin-RevId: 241517207	2019-04-02 13:39:50 -07:00
River Riddle	084669e005	Remove MLPatternLoweringPass and rewrite LowerVectorTransfers to use RewritePattern instead. -- PiperOrigin-RevId: 241455472	2019-04-02 13:39:17 -07:00
Nicolas Vasilache	c9d5f3418a	Cleanup SuperVectorization dialect printing and parsing. On the read side, ``` %3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32> ``` becomes: ``` %3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32> ``` On the write side, ``` vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index ``` becomes ``` vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32> ``` Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments. PiperOrigin-RevId: 241021879	2019-03-29 17:56:42 -07:00
Nicolas Vasilache	094ca64ab0	Refactor vectorization patterns This CL removes the reliance of the vectorize pass on the specification of a `fastestVaryingDim` parameter. This parameter is a restriction meant to more easily target a particular loop/memref combination for vectorization and is mainly used for testing. This also had the side-effect of restricting vectorization patterns to only the ones in which all memrefs were contiguous along the same loop dimension. This simple restriction prevented matmul to vectorize in 2-D. this CL removes the restriction and adds the matmul test which vectorizes in 2-D along the parallel loops. Support for reduction loops is left for future work. PiperOrigin-RevId: 240993827	2019-03-29 17:55:36 -07:00
MLIR Team	9d30b36aaf	Enable input-reuse fusion to search function arguments for fusion candidates (takes care of a TODO, enables another tutorial test case). PiperOrigin-RevId: 240979894	2019-03-29 17:54:36 -07:00
River Riddle	106dd08e99	Change the vectorizer test pass to output via diagnostics instead of llvm::outs. This allows for the output to be deterministic when multi-threading is enabled. PiperOrigin-RevId: 240905858	2019-03-29 17:54:21 -07:00
River Riddle	01140bd137	Change the muli-return syntax for operations. The name of the operation result now contains the number of results that it refers to if the number of results is greater than 1. Example: %call:2 = call @multi_return() : () -> (f32, i32) use(%calltensorflow/mlir#0, %calltensorflow/mlir#1) This cl also adds parser support for uniquely named result values. This means that a test writer can now write something like: %foo, %bar = call @multi_return() : () -> (f32, i32) use(%foo, %bar) Note: The printer will still print the collapsed form. PiperOrigin-RevId: 240860058	2019-03-29 17:51:32 -07:00
MLIR Team	9d9675fc8f	Remove overly conservative check in LoopFusion pass (enables fusion in tutorial example). PiperOrigin-RevId: 240859227	2019-03-29 17:51:16 -07:00
River Riddle	af9760fe18	Replace remaining usages of the Instruction class with Operation. PiperOrigin-RevId: 240777521	2019-03-29 17:50:04 -07:00
Nicolas Vasilache	31442a66ef	Cleanup vectorize_1d.mlir test - NFC This CL splits a large monolithic test function into smaller ones that are each CHECK-LABEL'd PiperOrigin-RevId: 240684979	2019-03-29 17:49:45 -07:00
Nicolas Vasilache	4dc7af9da8	Make vectorization aware of loop semantics Now that we have a dependence analysis, we can check that loops are indeed parallel and make vectorization correct. PiperOrigin-RevId: 240682727	2019-03-29 17:49:30 -07:00
River Riddle	832567b379	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and set the namespace of the AffineOps dialect to 'affine'. PiperOrigin-RevId: 240165792	2019-03-29 17:39:03 -07:00
River Riddle	9c6e92360c	NFC: Rename the 'if' operation in the AffineOps dialect to 'affine.if'. PiperOrigin-RevId: 240071154	2019-03-29 17:36:53 -07:00
Nicolas Vasilache	071ca8da91	Support composition of symbols in AffineApplyOp This CL revisits the composition of AffineApplyOp for the special case where a symbol itself comes from an AffineApplyOp. This is achieved by rewriting such symbols into dims to allow composition to occur mathematically. The implementation is also refactored to improve readability. Rationale for locally rewriting symbols as dims: ================================================ The mathematical composition of AffineMap must always concatenate symbols because it does not have enough information to do otherwise. For example, composing `(d0)[s0] -> (d0 + s0)` with itself must produce `(d0)[s0, s1] -> (d0 + s0 + s1)`. The result is only equivalent to `(d0)[s0] -> (d0 + 2 * s0)` when applied to the same mlir::Value* for both s0 and s1. As a consequence mathematical composition of AffineMap always concatenates symbols. When AffineMaps are used in AffineApplyOp however, they may specify composition via symbols, which is ambiguous mathematically. This corner case is handled by locally rewriting such symbols that come from AffineApplyOp into dims and composing through dims. PiperOrigin-RevId: 239791597	2019-03-29 17:30:59 -07:00
Nicolas Vasilache	f43388e4ce	Port LowerVectorTransfers from EDSC + AST to declarative builders This CL removes the dependency of LowerVectorTransfers on the AST version of EDSCs which will be retired. This exhibited a pretty fundamental staging difference in AST-based vs declarative based emission. Since the delayed creation with an AST was staged, the loop order came into existence after the clipping expressions were computed. This now changes as the loops first need to be created declaratively in fixed order and then the clipping expressions are created. Also, due to lack of staging, coalescing cannot be done on the fly anymore and needs to be done either as a pre-pass (current implementation) or as a local transformation on the generated IR (future work). Tests are updated accordingly. PiperOrigin-RevId: 238971631	2019-03-29 17:22:06 -07:00
Uday Bondhugula	e1e455f7dd	Change parallelism detection test pass to emit a note - emit a note on the loop being parallel instead of setting a loop attribute - rename the pass -test-detect-parallel (from -detect-parallel) PiperOrigin-RevId: 238122847	2019-03-29 17:16:27 -07:00
Uday Bondhugula	9f2781e8dd	Fix misc bugs / TODOs / other improvements to analysis utils - fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent - make getConstantBoundOnDimSize also return the identifier upper bound - fix unionBoundingBox to correctly use the divisor and upper bound identified by getConstantBoundOnDimSize - deal with loop step correctly in addAffineForOpDomain (covers most cases now) - fully compose bound map / operands and simplify/canonicalize before adding dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add test case there - expose mlir::isTopLevelSymbol from AffineOps PiperOrigin-RevId: 238050395	2019-03-29 17:15:27 -07:00
Uday Bondhugula	075090f891	Extend loop unrolling and unroll-jamming to non-matching bound operands and multi-result upper bounds, complete TODOs, fix/improve test cases. - complete TODOs for loop unroll/unroll-and-jam. Something as simple as "for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now. - update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it more powerful as it composes inputs into it - getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same code; refactor and remove one. - reorganize test cases, write previous ones better; most of these changes are "label replacements". - fix wrongly labeled test cases in unroll-jam.mlir PiperOrigin-RevId: 238014653	2019-03-29 17:14:12 -07:00
MLIR Team	8d62a6092f	Clean up some stray mlfunc/cfgfunc leftovers. PiperOrigin-RevId: 237936610	2019-03-29 17:13:26 -07:00
Uday Bondhugula	ce7e59536c	Add a basic model to set tile sizes + some cleanup - compute tile sizes based on a simple model that looks at memory footprints (instead of using the hardcoded default value) - adjust tile sizes to make them factors of trip counts based on an option - update loop fusion CL options to allow setting maximal fusion at pass creation - change an emitError to emitWarning (since it's not a hard error unless the client treats it that way, in which case, it can emit one) $ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ] for %i = 0 to 256 { for %i0 = 0 to 256 step 4 { for %i1 = 0 to 256 step 4 { for %i2 = 0 to 250 step 5 { for %i3 = #map4(%i0) to #map11(%i0) { for %i4 = #map4(%i1) to #map11(%i1) { for %i5 = #map4(%i2) to #map12(%i2) { %0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>> %1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>> %2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>> %3 = mulf %0, %1 : vector<64xf32> %4 = addf %2, %3 : vector<64xf32> store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>> } } } } } } PiperOrigin-RevId: 237461836	2019-03-29 17:06:51 -07:00
MLIR Team	c1ff9e866e	Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union for loop fusion pass (WIP). Adds utility to convert slice bounds to a FlatAffineConstraints representation. Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers. PiperOrigin-RevId: 236973261	2019-03-29 16:59:21 -07:00
Uday Bondhugula	5836fae8a0	DMA generation CL flag update - allow mem capacity to be overridden by command-line flag - change default fast mem space to 2 PiperOrigin-RevId: 236951598	2019-03-29 16:59:05 -07:00
Uday Bondhugula	7e288e7c19	Add missing run command to fusion test cases - follow up to cl/236882988 PiperOrigin-RevId: 236947383	2019-03-29 16:58:50 -07:00
Uday Bondhugula	b34f8d3c83	Fix and improve detectAsMod - fix for the mod detection - simplify/avoid the mod at construction (if the dividend is already known to be less than the divisor), since the information is available at hand there PiperOrigin-RevId: 236882988	2019-03-29 16:57:36 -07:00
Uday Bondhugula	a77734e185	Make sure that fusion test cases don't have out of bounds accesses - fix out of bounds test case - -memref-bound-check on the test/Transforms/loop-fusion.mlir no longer reports any errors, before or after -loop-fusion is run PiperOrigin-RevId: 236757658	2019-03-29 16:56:35 -07:00
MLIR Team	39a1ddeb1c	Adds loop attribute as a temporary work around to prevent slice fusion of loop nests containing instructions with side effects (the proper solution will be do use memref read/write regions in the future). PiperOrigin-RevId: 236733739	2019-03-29 16:56:20 -07:00
Uday Bondhugula	12b9dece8d	Bug fix for getConstantBoundOnDimSize - this was detected when memref-bound-check was run on the output of the loop-fusion pass - the addition (to represent ceildiv as a floordiv) had to be performed only for the constant term of the constraint - update test cases - memref-bound-check no longer returns an error on the output of this test case PiperOrigin-RevId: 236731137	2019-03-29 16:56:06 -07:00
River Riddle	eeeef090ef	Set the namespace of the StandardOps dialect to "std", but add a special case to the parser to allow parsing standard operations without the "std" prefix. This will now allow for the standard dialect to be looked up dynamically by name. PiperOrigin-RevId: 236493865	2019-03-29 16:54:20 -07:00
Uday Bondhugula	8254aabd4a	A simple pass to detect and mark all parallel loops - detect all parallel loops based on dep information and mark them with a "parallel" attribute - add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method to use that (reuse some code from @andydavis (cl/236007073) for this) - a simple/meaningful way to test memref dep test as well Ex: $ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir #map1 = ()[s0] -> (s0) func @foo(%arg0: index) { %0 = alloc() : memref<1024x1024xvector<64xf32>> %1 = alloc() : memref<1024x1024xvector<64xf32>> %2 = alloc() : memref<1024x1024xvector<64xf32>> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg0 { for %i2 = 0 to %arg0 { %3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>> %4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>> %5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>> %6 = mulf %3, %4 : vector<64xf32> %7 = addf %5, %6 : vector<64xf32> store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>> } {parallel: false} } {parallel: true} } {parallel: true} return } PiperOrigin-RevId: 236367368	2019-03-29 16:53:03 -07:00
MLIR Team	d038e34735	Loop fusion for input reuse. ) Breaks fusion pass into multiple sub passes over nodes in data dependence graph: - first pass fuses single-use producers into their unique consumer. - second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges. - third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer). ) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved. ) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation. ) Adds a generic loop utilitiy for finding all sequential loops in a loop nest. *) Adds and updates unit tests. PiperOrigin-RevId: 236350987	2019-03-29 16:52:35 -07:00
Uday Bondhugula	932e4fb29f	Analysis support for floordiv/mod's in loop bounds/ - handle floordiv/mod's in loop bounds for all analysis purposes - allows fusion slicing to be more powerful - add simple test cases based on -memref-bound-check - fusion based test cases in follow up CLs PiperOrigin-RevId: 236328551	2019-03-29 16:52:04 -07:00
Uday Bondhugula	58889884a2	Change some of the debug messages to use emitError / emitWarning / emitNote - NFC PiperOrigin-RevId: 236169676	2019-03-29 16:50:29 -07:00
Uday Bondhugula	a003179367	Detect more trivially redundant constraints better - detect more trivially redundant constraints in FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 + 64j - 8 >= 0) is now detected. The method is still linear-time and does a single scan over the FlatAffineConstraints buffer. This detection is useful and needed to eliminate redundant constraints generated after FM elimination. - update GCDTightenInequalities so that we also normalize by the GCD while at it. This way more constraints will show up as redundant (232i - 203 >= 0 becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call normalizeConstraintsByGCD. - In FourierMotzkinEliminate, call GCDTightenInequalities and normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints() - so that more redundant constraints are detected. As a result, redundancy due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >= 0 is now detected (here only i >= 7 is non-redundant). As a result of these, a -memref-bound-check on the added test case runs in 16ms instead of 1.35s (opt build) and no longer returns a conservative result. PiperOrigin-RevId: 235983550	2019-03-29 16:47:59 -07:00
MLIR Team	c2766f3760	Fix bug in memref region computation with slice loop bounds. Adds loop IV values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system. PiperOrigin-RevId: 235952912	2019-03-29 16:47:29 -07:00
River Riddle	cdbfd48471	Rewrite the dominance info classes to allow for operating on arbitrary control flow within operation regions. The CSE pass is also updated to properly handle nested dominance. PiperOrigin-RevId: 235742627	2019-03-29 16:43:35 -07:00
Uday Bondhugula	a1dad3a5d9	Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox - compute slices precisely where the destination iteration depends on multiple source iterations (instead of over-approximating to the whole source loop extent) - update unionBoundingBox to deal with input with non-matching symbols - reenable disabled backend test case PiperOrigin-RevId: 234714069	2019-03-29 16:33:11 -07:00
Uday Bondhugula	5021dc4fa0	DMA placement update - hoist loops invariant DMAs - hoist DMAs past all loops immediately surrounding the region that the latter is invariant on - do this at DMA generation time itself PiperOrigin-RevId: 234628447	2019-03-29 16:32:41 -07:00
Uday Bondhugula	f97c1c5b06	Misc. updates/fixes to analysis utils used for DMA generation; update DMA generation pass to make it drop certain assumptions, complete TODOs. - multiple fixes for getMemoryFootprintBytes - pass loopDepth correctly from getMemoryFootprintBytes() - use union while computing memory footprints - bug fixes for addAffineForOpDomain - take into account loop step - add domains of other loop IVs in turn that might have been used in the bounds - dma-generate: drop assumption of "non-unit stride loops being tile space loops and skipping those and recursing to inner depths"; DMA generation is now purely based on available fast mem capacity and memory footprint's calculated - handle memory region compute failures/bailouts correctly from dma-generate - loop tiling cleanup/NFC - update some debug and error messages to use emitNote/emitError in pipeline-data-transfer pass - NFC PiperOrigin-RevId: 234245969	2019-03-29 16:30:26 -07:00
MLIR Team	58aa383e60	Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region. PiperOrigin-RevId: 234240958	2019-03-29 16:30:11 -07:00
MLIR Team	8f5f2c765d	LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests. ) Adds utility to LoopUtils to perform loop interchange of two AffineForOps. ) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges. ) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential. ) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them. ) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation. ) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest. *) Adds and updates related unit tests. PiperOrigin-RevId: 234158370	2019-03-29 16:29:26 -07:00
MLIR Team	affb2193cc	Update direction vector computation to use FlatAffineConstraints::getLower/UpperBounds. Update FlatAffineConstraints::getLower/UpperBounds to project to the identifier for which bounds are being computed. This change enables computing bounds on an identifier which were previously dependent on the bounds of another identifier. PiperOrigin-RevId: 234017514	2019-03-29 16:28:25 -07:00
Uday Bondhugula	00860662a2	Generate dealloc's for alloc's of pipeline-data-transfer - for the DMA transfers being pipelined through double buffering, generate deallocs for the double buffers being alloc'ed This change is along the lines of cl/233502632. We initially wanted to experiment with scoped allocation - so the deallocation's were usually not necessary; however, they are needed even with scoped allocations in some situations - for eg. when the enclosing loop gets unrolled. The dealloc serves as an end of lifetime marker. PiperOrigin-RevId: 233653463	2019-03-29 16:25:53 -07:00
Uday Bondhugula	8b3f841daf	Generate dealloc's for the alloc's of dma-generate. - for the DMA buffers being allocated (and their tags), generate corresponding deallocs - minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass Code generation for DMA transfers was being done with the initial simplifying assumption that the alloc's would map to scoped allocations, and so no deallocations would be necessary. Drop this assumption to generalize. Note that even with scoped allocations, unrolling loops that have scoped allocations could create a series of allocations and exhaustion of fast memory. Having a end of lifetime marker like a dealloc in fact allows creating new scopes if necessary when lowering to a backend and still utilize scoped allocation. DMA buffers created by -dma-generate are guaranteed to have either non-overlapping lifetimes or nested lifetimes. PiperOrigin-RevId: 233502632	2019-03-29 16:24:08 -07:00
Uday Bondhugula	f5eed89df0	Fix + cleanup for getMemRefRegion() - determine symbols for the memref region correctly - this wasn't exposed earlier since we didn't have any test cases where the portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of one IV depending on other IVs within that part) PiperOrigin-RevId: 233493872	2019-03-29 16:23:53 -07:00
Uday Bondhugula	c419accea3	Automated rollback of changelist 232728977. PiperOrigin-RevId: 232944889	2019-03-29 16:21:38 -07:00
River Riddle	13a45c7194	Add verification for AffineApply/AffineFor/AffineIf dimension and symbol operands. This also allows a DimOp to be a valid dimension identifier if its operand is a valid dimension identifier. PiperOrigin-RevId: 232923468	2019-03-29 16:21:08 -07:00
River Riddle	a886625813	Modify the canonicalizations of select and muli to use the fold hook. This also extends the greedy pattern rewrite driver to add the operands of folded operations back to the worklist. PiperOrigin-RevId: 232878959	2019-03-29 16:20:06 -07:00
Uday Bondhugula	4ba8c9147d	Automated rollback of changelist 232717775. PiperOrigin-RevId: 232807986	2019-03-29 16:19:33 -07:00
River Riddle	fd2d7c857b	Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace the AffineOps dialect with 'affine'. PiperOrigin-RevId: 232728977	2019-03-29 16:18:59 -07:00
River Riddle	90d10b4e00	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect. PiperOrigin-RevId: 232717775	2019-03-29 16:17:59 -07:00
River Riddle	3227dee15d	NFC: Rename affine_apply to affine.apply. This is the first step to adding a namespace to the affine dialect. PiperOrigin-RevId: 232707862	2019-03-29 16:17:29 -07:00
River Riddle	0c65cf283c	Move the AffineFor loop bound folding to a canonicalization pattern on the AffineForOp. PiperOrigin-RevId: 232610715	2019-03-29 16:16:11 -07:00
River Riddle	10237de8eb	Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring: * AffineStructures has moved to IR. * simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR. * makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps. * ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted. PiperOrigin-RevId: 232586468	2019-03-29 16:15:41 -07:00
MLIR Team	a78edcda5b	Loop fusion improvements: ) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion. ) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check. PiperOrigin-RevId: 232477853	2019-03-29 16:13:50 -07:00
Uday Bondhugula	b26900dce5	Update dma-generate pass to (1) work on blocks of instructions (instead of just loops), (2) take into account fast memory space capacity and lower 'dmaDepth' to fit, (3) add location information for debug info / errors - change dma-generate pass to work on blocks of instructions (start/end iterators) instead of 'for' loops; complete TODOs - allows DMA generation for straightline blocks of operation instructions interspersed b/w loops - take into account fast memory capacity: check whether memory footprint fits in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA generation is performed until it does fit in the provided memory - add location information to MemRefRegion; any insufficient fast memory capacity errors or debug info w.r.t dma generation shows location information - allow DMA generation pass to be instantiated with a fast memory capacity option (besides command line flag) - change getMemRefRegion to return unique_ptr's - change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst' - other helper methods; add postDomInstFilter option for replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods Eg. output $ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir /tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { ^ $ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir /tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) { PiperOrigin-RevId: 232297044	2019-03-29 16:09:52 -07:00
River Riddle	5052bd8582	Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body. PiperOrigin-RevId: 232060516	2019-03-29 16:06:49 -07:00
MLIR Team	d7c824451f	LoopFusion: insert the source loop nest slice at a depth in the destination loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out. PiperOrigin-RevId: 231684492	2019-03-29 16:03:23 -07:00
River Riddle	a642bb1779	Update tests using affine maps to not rely on specific map numbers in the output IR. This is necessary to remove the dependency on ForInst not numbering the AffineMap bounds it has custom formatting for. PiperOrigin-RevId: 231634812	2019-03-29 16:03:08 -07:00
River Riddle	b6928c945c	Standardize the spelling of debug info to "debuginfo" in opt flags. PiperOrigin-RevId: 231610337	2019-03-29 16:02:38 -07:00
River Riddle	994111238b	Fold CallIndirectOp to CallOp when the callee operand is a known constant function. PiperOrigin-RevId: 231511697	2019-03-29 16:01:23 -07:00
MLIR Team	a0f3db4024	Support fusing loop nests which require insertion into a new instruction Block position while preserving dependences, opening up additional fusion opportunities. - Adds SSA Value edges to the data dependence graph used in the loop fusion pass. PiperOrigin-RevId: 231417649	2019-03-29 16:00:04 -07:00
River Riddle	755538328b	Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231342063	2019-03-29 15:59:30 -07:00
Nicolas Vasilache	ae772b7965	Automated rollback of changelist 231318632. PiperOrigin-RevId: 231327161	2019-03-29 15:42:38 -07:00
River Riddle	5ecef2b3f6	Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst. PiperOrigin-RevId: 231318632	2019-03-29 15:42:08 -07:00
Uday Bondhugula	fb679fc2b5	Drop unused result from affine map in test case - NFC PiperOrigin-RevId: 231008044	2019-03-29 15:38:53 -07:00
Chris Lattner	607d1c2ca7	More updates of tests to move towards single result affine maps. PiperOrigin-RevId: 230991929	2019-03-29 15:38:38 -07:00
Uday Bondhugula	b4a1443508	Update replaceAllMemRefUsesWith to generate single result affine_apply's for index remapping - generate a sequence of single result affine_apply's for the index remapping (instead of one multi result affine_apply) - update dma-generate and loop-fusion test cases; while on this, change test cases to use single result affine apply ops - some fusion comment fix/cleanup PiperOrigin-RevId: 230985830	2019-03-29 15:38:23 -07:00
Uday Bondhugula	b588d58c5f	Update createAffineComputationSlice to generate single result affine maps - Update createAffineComputationSlice to generate a sequence of single result affine apply ops instead of one multi-result affine apply - update pipeline-data-transfer test case; while on this, also update the test case to use only single result affine maps, and make it more robust to change. PiperOrigin-RevId: 230965478	2019-03-29 15:37:53 -07:00
Uday Bondhugula	f94b15c247	Update dma-generate: update for multiple load/store op's per memref - introduce a way to compute union using symbolic rectangular bounding boxes - handle multiple load/store op's to the same memref by taking a union of the regions - command-line argument to provide capacity of the fast memory space - minor change to replaceAllMemRefUsesWith to not generate affine_apply if the supplied index remap was identity PiperOrigin-RevId: 230848185	2019-03-29 15:35:38 -07:00
Chris Lattner	f60a0ba61c	Incremental progress to move the testsuite towards single-result affine_apply instructions. PiperOrigin-RevId: 230775607	2019-03-29 15:34:53 -07:00
Uday Bondhugula	72e5c7f428	Minor updates + cleanup to dma-generate - switch some debug info to emitError - use a single constant op for zero index to make it easier to write/update test cases; avoid creating new constant op's for common zero index cases - test case cleanup This is in preparation for an upcoming major update to this pass. PiperOrigin-RevId: 230728379	2019-03-29 15:34:06 -07:00
River Riddle	f319bbbd28	Add a function pass to strip debug info from functions and instructions. PiperOrigin-RevId: 230654315	2019-03-29 15:33:50 -07:00
MLIR Team	b28009b681	Fix single producer check in loop fusion pass. PiperOrigin-RevId: 230565482	2019-03-29 15:32:20 -07:00
Uday Bondhugula	864d9e02a1	Update fusion cost model + some additional infrastructure and debug information for -loop-fusion - update fusion cost model to fuse while tolerating a certain amount of redundant computation; add cl option -fusion-compute-tolerance evaluate memory footprint and intermediate memory reduction - emit debug info from -loop-fusion showing what was fused and why - introduce function to compute memory footprint for a loop nest - getMemRefRegion readability update - NFC PiperOrigin-RevId: 230541857	2019-03-29 15:32:06 -07:00
Uday Bondhugula	92e9d9484c	loop unroll update: unroll factor one for a single iteration loop - unrolling a single iteration loop by a factor of one should promote its body into its parent; this makes it consistent with the behavior/expectation that unrolling a loop by a factor equal to its trip count makes the loop go away. PiperOrigin-RevId: 230426499	2019-03-29 15:31:35 -07:00
Uday Bondhugula	94a03f864f	Allocate private/local buffers for slices accurately during fusion - the size of the private memref created for the slice should be based on the memref region accessed at the depth at which the slice is being materialized, i.e., symbolic in the outer IVs up until that depth, as opposed to the region accessed based on the entire domain. - leads to a significant contraction of the temporary / intermediate memref whenever the memref isn't reduced to a single scalar (through store fwd'ing). Other changes - update to promoteIfSingleIteration - avoid introducing unnecessary identity map affine_apply from IV; makes it much easier to write and read test cases and pass output for all passes that use promoteIfSingleIteration; loop-fusion test cases become much simpler - fix replaceAllMemrefUsesWith bug that was exposed by the above update - 'domInstFilter' could be one of the ops erased due to a memref replacement in it. - fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was missing (the latter need not always be 1); add lbFloorDivisors output argument - rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape PiperOrigin-RevId: 230405218	2019-03-29 15:30:31 -07:00
MLIR Team	71495d58a7	Handle escaping memrefs in loop fusion pass: ) Do not remove loop nests which write to memrefs which escape the function. ) Do not remove memrefs which escape the function (e.g. are used in the return instruction). PiperOrigin-RevId: 230398630	2019-03-29 15:30:14 -07:00
Uday Bondhugula	c1880a857d	AffineExpr pretty print - add missing handling to print expr * - 1 as -expr - print multiplication by -1 as unary negate; expressions like s0 * -1, d0 * -1 + d1 will now appear as -s0, -d0 + d1 resp. - a minor cleanup while on printAffineExprInternal PiperOrigin-RevId: 230222151	2019-03-29 15:28:44 -07:00
River Riddle	512d87cefc	Add a constant folding hook to ExtractElementOp to fold extracting the element of a constant. This also adds a 'getValue' function to DenseElementsAttr and SparseElementsAttr to get the element at a constant index. PiperOrigin-RevId: 230098938	2019-03-29 15:28:28 -07:00
Uday Bondhugula	d7522eb264	Fix test cases that were accessing out of bounds to start with (b/123072438) - detected with memref-bound-check - fixes b/123072438; while on this, fix another test case which was reported out of bounds PiperOrigin-RevId: 229978187	2019-03-29 15:27:29 -07:00
MLIR Team	c4237ae990	LoopFusion: Creates private MemRefs which are used only by operations in the fused loop. ) Enables reduction of private memref size based on MemRef region accessed by fused slice. ) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence. *) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice). PiperOrigin-RevId: 229936698	2019-03-29 15:26:15 -07:00
Nicolas Vasilache	24e5a72dac	Fix AffineApply corner case This CL adds a test reported by andydavis@ and fixes the corner case that appears when operands do not come from an AffineApply and no Dim composition is needed. In such cases, we would need to create an empty map which is disallowed. The composition in such cases becomes trivial: there is no composition. This CL also updates the name AffineNormalizer to AffineApplyNormalizer. PiperOrigin-RevId: 229819234	2019-03-29 15:25:59 -07:00
Uday Bondhugula	40f7535571	Update stale / target-specific information in comments - NFC PiperOrigin-RevId: 229800834	2019-03-29 15:25:29 -07:00
Nicolas Vasilache	4573a8da9a	Fix improperly indexed DimOp in LowerVectorTransfers.cpp This CL fixes a misunderstanding in how to build DimOp which triggered execution issues in the CPU path. The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to construct the dynamic dimensions should be: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and `dim %arg, 4 : memref<?x4x?x8x?xf32>` Before this CL, we wold construct: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 1 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and expect the other dimensions to be constants. This assumption seems consistent at first glance with the syntax of alloc: ``` %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32> ``` But this was actuallyincorrect. This CL also makes the relevant functions available to EDSCs and removes duplication of the incorrect function. PiperOrigin-RevId: 229622766	2019-03-29 15:24:13 -07:00
River Riddle	5843e5a7c0	Add a canonicalization pattern to remove Dealloc operations if the memref is an AllocOp that is only used by Dealloc operations. PiperOrigin-RevId: 229606558	2019-03-29 15:23:13 -07:00
River Riddle	ada685f352	Add canonicalization to remove AllocOps if there are no uses. AllocOp has side effects on the heap, but can still be deleted if it has zero uses. PiperOrigin-RevId: 229596556	2019-03-29 15:22:28 -07:00
MLIR Team	27d067e164	LoopFusion improvements: ) Adds support for fusing into consumer loop nests with multiple loads from the same memref. ) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth. *) Removes dependence on src loop depth and simplifies cost model computation. PiperOrigin-RevId: 229575126	2019-03-29 15:21:59 -07:00
River Riddle	ed26dd0421	Add a canonicalization pattern for conditional branch to fold constant branch conditions. PiperOrigin-RevId: 229242007	2019-03-29 15:14:37 -07:00
MLIR Team	38c2fe3158	LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time). ) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality). ) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest. ) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests). ) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths). *) Test: Adds multiple unit tests to test the new functionality. PiperOrigin-RevId: 229219757	2019-03-29 15:13:53 -07:00
Nicolas Vasilache	0ab81776aa	Fix typo in lower_vector_transfers.mlir PiperOrigin-RevId: 229010160	2019-03-29 15:12:40 -07:00
Nicolas Vasilache	d734c50c5f	[MLIR] Clip all access dimensions during LowerVectorTransfers This CL adds a short term remedy to an issue that was found during execution tests. Lowering of vector transfer ops uses the permutation map to determine which ForInst have been super-vectorized. During materialization to HW vector sizes however, some of those dimensions may be fully unrolled and do not appear in the permutation map. Such dimensions were then not clipped and may have accessed out of bounds. This CL conservatively clips all dimensions to ensure no out of bounds access. The longer term solution is still up for debate but will probably require either passing more information between Materialization and lowering, or just merging the 2 passes. PiperOrigin-RevId: 228980787	2019-03-29 15:12:26 -07:00
Uday Bondhugula	c35d6b4f2d	Drop -canonicalize from -dma-generate test case cmd - should be testing on the output of -dma-generate and not '-dma-generate -canonicalize'; save trouble for those updating -canonicalize in the future! PiperOrigin-RevId: 228915192	2019-03-29 15:11:26 -07:00
Lei Zhang	311af4abf3	Const fold splat vectors/tensors in standard add, sub, and mul ops The const folding logic is structurally similar, so use a template to abstract the common part. Moved mul(x, 0) to a legalization pattern to be consistent with mul(x, 1). Also promoted getZeroAttr() to be a method on Builder since it is expected to be frequently used. PiperOrigin-RevId: 228891989	2019-03-29 15:09:55 -07:00
Nicolas Vasilache	cfa5831960	Uniformize composition of AffineApplyOp by construction This CL is the 5th on the path to simplifying AffineMap composition. This removes the distinction between normalized single-result AffineMap and more general composed multi-result map. One nice byproduct of making the implementation driven by single-result is that the multi-result extension is a trivial change: the implementation is still single-result and we just use: ``` unsigned idx = getIndexOf(...); map.getResult(idx); ``` This CL also fixes an AffineNormalizer implementation issue related to symbols. Namely it stops performing substitutions on symbols in AffineNormalizer and instead concatenates them all to be consistent with the call to `AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform simplifications of symbols coming from different maps based on positions only: i.e. dims are applied and renumbered but symbols must be concatenated. The only way to determine whether symbols from different AffineApply are the same is to look at the concrete values. The canonicalizeMapAndOperands is thus extended with behavior to support replacing operands that appear multiple times. Lastly, this CL demonstrates that the implementation is correct by rewriting ComposeAffineMaps using only `makeComposedAffineApply`. The implementation uses a matcher because AffineApplyOp are introduced as composed operations on the fly instead of iteratively forwardSubstituting. For this purpose, a walker would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps is scheduled to disappear, this CL replaces the implementation based on iterative `forwardSubstitute` by a composed-by-construction `makeComposedAffineApply`. Remaining calls to `forwardSubstitute` will be removed in the next CL. PiperOrigin-RevId: 228830443	2019-03-29 15:08:40 -07:00
Uday Bondhugula	2370c601ba	Add safeguard against FM explosion - FM has a worst case exponential complexity. For our purposes, this worst case is rarely expected, but could still appear due to improperly constructed constraints (a logical/memory error in other methods for eg.) or artificially created arbitrarily complex integer sets (adversarial / fuzz tests). Add a check to detect such an explosion in the number of constraints and conservatively return false from isEmpty() (instead of running out of memory or running for too long). - Add an artifical virus test case. PiperOrigin-RevId: 228753496	2019-03-29 15:07:55 -07:00
Alex Zinenko	9003490287	Implement branch-free single-division lowering of affine division/remainder This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from affine expressions to the arithmetic primitive operations. Integer division rules in affine expressions explicitly require rounding towards either negative or positive infinity unlike machine implementations that round towards zero. In the general case, implementing `floordiv` and `ceildiv` using machine signed division requires computing both the quotient and the remainder. When the divisor is positive, this can be simplified by adjusting the dividend and the quotient by one and switching signs. In the current use cases, we are unlikely to encounter affine expressions with negative divisors (affine divisions appear in loop transformations such as tiling that guarantee that divisors are positive by construction). Therefore, it is reasonable to use branch-free single-division implementation. In case of affine maps, divisors can only be literals so we can check the sign and implement the case for negative divisors when the need arises. The affine lowering pass can still fail when applied to semi-affine maps (division or modulo by a symbol). PiperOrigin-RevId: 228668181	2019-03-29 15:07:40 -07:00
Uday Bondhugula	742c37abc9	Fix DMA overlap pass buffer mapping - the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2); step wasn't being accounted for. - fix test cases, enable failing test cases PiperOrigin-RevId: 228635726	2019-03-29 15:07:10 -07:00
Uday Bondhugula	303c09299f	Fix affine expr flattener bug + improve simplification in a particular scenario - fix visitDivExpr: constraints constructed for localVarCst used the original divisor instead of the simplified divisor; fix this. Add a simple test case in memref-bound-check that reproduces this bug - although this was encountered in the context of slicing for fusion. - improve mod expr flattening: when flattening mod expressions, cancel out the GCD of the numerator and denominator so that we can get a simpler flattened form along with a simpler floordiv local var for it PiperOrigin-RevId: 228539928	2019-03-29 15:06:11 -07:00
Nicolas Vasilache	1f78d63f05	[MLIR] Make SuperVectorization use normalized AffineApplyOp Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL uses the simpler single-result unbounded AffineApplyOp in the MaterializeVectors pass. PiperOrigin-RevId: 228469085	2019-03-29 15:05:55 -07:00
Nicolas Vasilache	c6f798a976	Introduce AffineMap::compose(AffineMap) This CL is the 2nd on the path to simplifying AffineMap composition. This CL uses the now accepted `AffineExpr::compose(AffineMap)` to implement `AffineMap::compose(AffineMap)`. Implications of keeping the simplification function in Analysis are documented where relevant. PiperOrigin-RevId: 228276646	2019-03-29 15:04:20 -07:00
Uday Bondhugula	e94ba6815a	Fix 0-d memref corner case for getMemRefRegion() - fix crash on test/Transforms/canonicalize.mlir with -memref-bound-check PiperOrigin-RevId: 228268486	2019-03-29 15:03:50 -07:00
Nicolas Vasilache	c449e46ceb	Introduce AffineExpr::compose(AffineMap) This CL is the 1st on the path to simplifying AffineMap composition. This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to implement `AffineExpr::compose(AffineMap)`. Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but this CL does not yet pull the trigger on that. PiperOrigin-RevId: 228265845	2019-03-29 15:03:36 -07:00
Uday Bondhugula	21baf86a2f	Extend loop-fusion's slicing utility + other fixes / updates - refactor toAffineFromEq and the code surrounding it; refactor code into FlatAffineConstraints::getSliceBounds - add FlatAffineConstraints methods to detect identifiers as mod's and div's of other identifiers - add FlatAffineConstraints::getConstantLower/UpperBound - Address b/122118218 (don't assert on invalid fusion depths cmdline flags - instead, don't do anything; change cmdline flags src-loop-depth -> fusion-src-loop-depth - AffineExpr/Map print method update: don't fail on null instances (since we have a wrapper around a pointer, it's avoidable); rationale: dump/print methods should never fail if possible. - Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to IsRangeOneToOne when it's trivially going to be true. - Add additional test cases to exercise the new support - update a few existing test cases since the maps are now generated uniformly with all destination loop operands appearing for the backward slice - Fix projectOut - fix wrong range for getBestElimCandidate. - Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since we didn't have any non-hyperrectangular ones. PiperOrigin-RevId: 228265152	2019-03-29 15:03:20 -07:00
Uday Bondhugula	b934d75b8f	Convert expr - c * (expr floordiv c) to expr mod c in AffineExpr - Detect 'mod' to replace the combination of floordiv, mul, and subtract when possible at construction time; when 'c' is a power of two, this reduces the number of operations; also more compact and readable. Update simplifyAdd for this. On a side note: - with the affine expr flattening we have, a mod expression like d0 mod c would be flattened into d0 - c * q, c * q <= d0 <= cq + c - 1, with 'q' being added as the local variable (q = d0 floordiv c); as a result, a mod was turned into a floordiv whenever the expression was reconstructed back, i.e., as d0 - c (d0 floordiv c); as a result of this change, we recover the mod back. - rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but the file hadn't been). PiperOrigin-RevId: 228258120	2019-03-29 15:02:56 -07:00
Nicolas Vasilache	7c0bbe0939	Iterate on vector rather than DenseMap during AffineMap normalization This CL removes a flakyness associated to a spurious iteration on DenseMap iterators when normalizing AffineMap. PiperOrigin-RevId: 228160074	2019-03-29 14:59:37 -07:00
Alex Zinenko	c47ed53211	Add simple constant folding hook for CmpIOp Integer comparisons can be constant folded if both of their arguments are known constants, which we can compare in the compiler. This requires implementing all comparison predicates, but thanks to consistency between LLVM and MLIR comparison predicates, we have a one-to-one correspondence between predicates and llvm::APInt comparison functions. Constant folding of comparsions with maximum/minimum values of the integer type are left for future work. This will be used to test the lowering of mod/floordiv/ceildiv in affine expressions at compile time. PiperOrigin-RevId: 228077580	2019-03-29 14:59:22 -07:00
Alex Zinenko	bc04556cf8	Introduce integer division and remainder operations This adds signed/unsigned integer division and remainder operations to the StandardOps dialect. Two versions are required because MLIR integers are signless, but the meaning of the leading bit is important in division and affects the results. LLVM IR made a similar choice. Define the operations in the tablegen file and add simple constant folding hooks in the C++ implementation. Handle signed division overflow and division by zero errors in constant folding. Canonicalization is left for future work. These operations are necessary to lower affine_apply's down to LLVM IR. PiperOrigin-RevId: 228077549	2019-03-29 14:58:52 -07:00
Uday Bondhugula	8496f2c30b	Complete TODOs / cleanup for loop-fusion utility - this is CL 1/2 that does a clean up and gets rid of one limitation in an underlying method - as a result, fusion works for more cases. - fix bugs/incomplete impl. in toAffineMapFromEq - fusing across rank changing reshapes for example now just works For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this, -loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward to get rid of the temporary: INPUT // Rank 1 -> Rank 2 reshape for %i0 = 0 to 64 { %v = load %A[%i0] store %v, %B[%i0 floordiv 8, i0 mod 8] } for %i1 = 0 to 8 for %i2 = 0 to 8 %w = load %B[%i1, i2] "foo"(%w) : (f32) -> () OUTPUT $ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir #map0 = (d0, d1) -> (d0 * 8 + d1) mlfunc @fuse_reshape(%arg0: memref<64xf32>) { for %i0 = 0 to 8 { for %i1 = 0 to 8 { %0 = affine_apply #map0(%i0, %i1) %1 = load %arg0[%0] : memref<64xf32> "foo"(%1) : (f32) -> () } } } AFAIK, there is no polyhedral tool / compiler that can perform such fusion - because it's not really standard loop fusion, but possible through a generalized slicing-based approach such as ours. PiperOrigin-RevId: 227918338	2019-03-29 14:57:22 -07:00
Nicolas Vasilache	618c6a74c6	[MLIR] Introduce normalized single-result unbounded AffineApplyOp Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL introduces a simpler abstraction and composition of single-result unbounded AffineApplyOp by using the existing unbound AffineMap composition. This CL adds a simple API call and relevant tests: ```c++ OpPointer<AffineApplyOp> makeNormalizedAffineApply( FuncBuilder b, Location loc, AffineMap map, ArrayRef<Value> operands); ``` which creates a single-result unbounded AffineApplyOp. The operands of AffineApplyOp are not themselves results of AffineApplyOp by consrtuction. This represent the simplest possible interface to complement the composition of (mathematical) AffineMap, for the cases when we are interested in applying it to Value*. In this CL the composed AffineMap is not compressed (i.e. there exist operands that are not part of the result). A followup commit will compress to normal form. The single-result unbounded AffineApplyOp abstraction will be used in a followup CL to support the MaterializeVectors pass. PiperOrigin-RevId: 227879021	2019-03-29 14:56:37 -07:00
Chris Lattner	7983bbc251	Introduce a simple canonicalization of affine_apply that drops unused dims and symbols. Included with this is some other infra: - Testcases for other canonicalizations that I will implement next. - Some helpers in AffineMap/Expr for doing simple walks without defining whole visitor classes. - A 'replaceDimsAndSymbols' facility that I'll be using to simplify maps and exprs, e.g. to fold one constant into a mapping and to drop/renumber unused dims. - Allow index (and everything else) to work in memref's, as we previously discussed, to make the testcase easier to write. - A "getAffineBinaryExpr" helper to produce a binop when you know the kind as an enum. This line of work will eventually subsume the ComposeAffineApply pass, but it is no where close to that yet :-) PiperOrigin-RevId: 227852951	2019-03-29 14:56:07 -07:00
Alex Zinenko	0c4ee54198	Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass This change is mechanical and merges the LowerAffineApplyPass and LowerIfAndForPass into a single LowerAffinePass. It makes a step towards defining an "affine dialect" that would contain all polyhedral-related constructs. The motivation for merging these two passes is based on retiring MLFunctions and, eventually, transforming If and For statements into regular operations. After that happens, LowerAffinePass becomes yet another legalization. PiperOrigin-RevId: 227566113	2019-03-29 14:52:52 -07:00
Alex Zinenko	fa710c17f4	LowerForAndIf: expand affine_apply's inplace Existing implementation was created before ML/CFG unification refactoring and did not concern itself with further lowering to separate concerns. As a result, it emitted `affine_apply` instructions to implement `for` loop bounds and `if` conditions and required a follow-up function pass to lower those `affine_apply` to arithmetic primitives. In the unified function world, LowerForAndIf is mostly a lowering pass with low complexity. As we move towards a dialect for affine operations (including `for` and `if`), it makes sense to lower `for` and `if` conditions directly to arithmetic primitives instead of relying on `affine_apply`. Expose `expandAffineExpr` function in LoweringUtils. Use this function together with `expandAffineMaps` to emit primitives that implement loop and branch conditions directly. Also remove tests that become unnecessary after transforming LowerForAndIf into a function pass. PiperOrigin-RevId: 227563608	2019-03-29 14:52:22 -07:00

1 2 3 4 5 ...

369 Commits