llvm-project

Commit Graph

Author	SHA1	Message	Date
Lei Zhang	590012772d	Promote broadcast logic from TensorFlowLite to Dialect/ directory We also need the broadcast logic in the TensorFlow dialect. Move it to a Dialect/ directory for a broader scope. This Dialect/ directory is intended for code not in core IR, but can potentially be shared by multiple dialects. Apart from fixing TensorFlow op TableGen to use this trait, this CL only contains mechanical code shuffling. PiperOrigin-RevId: 229563911	2019-03-29 15:21:14 -07:00
Uday Bondhugula	f99a44a7cd	Address documentation/readability related comments from cl/227252907 on memref store forwarding - NFC. PiperOrigin-RevId: 229561933	2019-03-29 15:20:59 -07:00
River Riddle	18fe1ffcd7	Move the storage of uniqued TypeStorage objects into TypeUniquer and give each context a unique TypeUniquer instance. PiperOrigin-RevId: 229460053	2019-03-29 15:19:56 -07:00
Uday Bondhugula	03e15e1b9f	Minor code cleanup - NFC. - readability changes PiperOrigin-RevId: 229443430	2019-03-29 15:19:41 -07:00
Lei Zhang	b7dbfd04eb	Const fold splat tensors for TFLite AddOp, SubOp, MulOp The constant folding rules assumes value attributes of operands are already verified to be in good standing. For each op in the above, the constant folding rules support both integer and floating point cases. Broadcast behavior is also supported as per the semantics of TFLite ops. This CL does not handle overflow/underflow cases yet. PiperOrigin-RevId: 229441221	2019-03-29 15:19:26 -07:00
River Riddle	f9d2eb1c8c	Change derived type storage objects to define an 'operator==(const KeyTy &)' instead of converting to the KeyTy. This allows for handling cases where the KeyTy does not provide an equality operator on itself. PiperOrigin-RevId: 229423249	2019-03-29 15:19:11 -07:00
River Riddle	f8341cfe06	Verify that the parsed predicate attribute of a cmpi operation is a string. PiperOrigin-RevId: 229419703	2019-03-29 15:18:53 -07:00
Alex Zinenko	0e58de70e7	Initial version of the LLVM IR dialect LLVM IR types are defined using MLIR's extendable type system. The dialect provides the only type kind, LLVMType, that wraps an llvm::Type*. Since LLVM IR types are pointer-unique, MLIR type systems relies on those pointers to perform its own type unique'ing. Type parsing and printing is delegated to LLVM libraries. Define MLIR operations for the LLVM IR instructions currently used by the translation to the LLVM IR Target to simplify eventual transition. Operations classes are defined using TableGen. LLVM IR instruction operands that are only allowed to take constant values are accepted as attributes instead. All operations are using verbose form for printing and parsing. PiperOrigin-RevId: 229400375	2019-03-29 15:18:37 -07:00
Alex Zinenko	44e9869f1a	TableGen: extract TypeConstraints from Type MLIR has support for type-polymorphic instructions, i.e. instructions that may take arguments of different types. For example, standard arithmetic operands take scalars, vectors or tensors. In order to express such instructions in TableGen, we need to be able to verify that a type object satisfies certain constraints, but we don't need to construct an instance of this type. The existing TableGen definition of Type requires both. Extract out a TypeConstraint TableGen class to define restrictions on types. Define the Type TableGen class as a subclass of TypeConstraint for consistency. Accept records of the TypeConstraint class instead of the Type class as values in the Arguments class when defining operators. Replace the predicate logic TableGen class based on conjunctive normal form with the predicate logic classes allowing for abitrary combinations of predicates using Boolean operators (AND/OR/NOT). The combination is implemented using simple string rewriting of C++ expressions and, therefore, respects the short-circuit evaluation order. No logic simplification is performed at the TableGen level so all expressions must be valid C++. Maintaining CNF using TableGen only would have been complicated when one needed to introduce top-level disjunction. It is also unclear if it could lead to a significantly simpler emitted C++ code. In the future, we may replace inplace predicate string combination with a tree structure that can be simplified in TableGen's C++ driver. Combined, these changes allow one to express traits like ArgumentsAreFloatLike directly in TableGen instead of relying on C++ trait classes. PiperOrigin-RevId: 229398247	2019-03-29 15:18:23 -07:00
Uday Bondhugula	4598dafa30	Parsing DmaStartOp: check if source, destination, and tag are of memref type. - fix along the lines of cl/229390720 by @riverriddle PiperOrigin-RevId: 229395218	2019-03-29 15:18:07 -07:00
River Riddle	d50dc4fd6d	When parsing DmaWait, check that the tag is a MemRef type. PiperOrigin-RevId: 229390720	2019-03-29 15:17:52 -07:00
Nicolas Vasilache	515ce1e68e	Add edsc::Indexed helper struct to act as syntactic sugar This CL adds edsc::Indexed. This helper class exists purely for sugaring purposes and allows writing expressions such as: ```mlir Indexed A(...), B(...), C(...); ForNest(ivs, zeros, shapeA, ones, { C[ivs] = A[ivs] + B[ivs] }); ``` PiperOrigin-RevId: 229388644	2019-03-29 15:17:37 -07:00
River Riddle	25d5b895fd	When parsing Select/Cmpi standard operations, emit an error if the type does not have a valid i1 shape instead of crashing. PiperOrigin-RevId: 229384794	2019-03-29 15:17:22 -07:00
Nicolas Vasilache	424041ad58	Add EDSC sugar This allows load, store and ForNest to be used with both Expr and Bindable. This simplifies writing generic pieces of MLIR snippet. For instance, a generic pointwise add can now be written: ```cpp // Different Bindable ivs, one per loop in the loop nest. auto ivs = makeBindables(shapeA.size()); Bindable zero, one; // Same bindable, all equal to `zero`. SmallVector<Bindable, 8> zeros(ivs.size(), zero); // Same bindable, all equal to `one`. SmallVector<Bindable, 8> ones(ivs.size(), one); // clang-format off Bindable A, B, C; Stmt scalarA, scalarB, tmp; Stmt block = edsc::Block({ ForNest(ivs, zeros, shapeA, ones, { scalarA = load(A, ivs), scalarB = load(B, ivs), tmp = scalarA + scalarB, store(tmp, C, ivs) }), }); // clang-format on ``` This CL also adds some extra support for pretty printing that will be used in a future CL when we introduce standalone testing of EDSCs. At the momen twe are lacking the basic infrastructure to write such tests. PiperOrigin-RevId: 229375850	2019-03-29 15:16:53 -07:00
Uday Bondhugula	6e4f3e40c7	Fix outdated comments PiperOrigin-RevId: 229300301	2019-03-29 15:16:08 -07:00
River Riddle	3bb35ad0dc	Don't allocate a buffer for an empty ArrayRef in TypeStorageAllocator. PiperOrigin-RevId: 229290802	2019-03-29 15:15:52 -07:00
River Riddle	b9c791b96d	Change derived type storage objects to be constructed with an instance of the KeyTy. This will simplify the cases where a type can be constructed, and need to be verified, in multiple ways. PiperOrigin-RevId: 229279000	2019-03-29 15:15:37 -07:00
River Riddle	8b0ad6f579	If an instruction contains blocks, IfInst/ForInst, make sure to drop references held by those blocks when dropping references for the instruction. PiperOrigin-RevId: 229278667	2019-03-29 15:15:23 -07:00
River Riddle	6c1631b3f8	Check that at least one constraint is parsed when parsing an IntegerSet. PiperOrigin-RevId: 229248638	2019-03-29 15:15:08 -07:00
Lei Zhang	61ec6c0992	Swap the type and attribute parameter in ConstantOp::build() This is to keep consistent with other TableGen generated builders so that we can also use this builder in TableGen rules. PiperOrigin-RevId: 229244630	2019-03-29 15:14:52 -07:00
River Riddle	ed26dd0421	Add a canonicalization pattern for conditional branch to fold constant branch conditions. PiperOrigin-RevId: 229242007	2019-03-29 15:14:37 -07:00
River Riddle	06b0bd9651	Emit unsupported error when parsing a DenseElementAttr with an integer type of greater than 64 bits. DenseElementAttr currently does not support value bitwidths of > 64. This can result in asan failures and crashes when trying to invoke DenseElementsAttr::writeBits/DenseElementsAttr::readBits. PiperOrigin-RevId: 229241125	2019-03-29 15:14:23 -07:00
River Riddle	e0594ce732	Add missing return post parse failure for the indices of a sparse attribute. PiperOrigin-RevId: 229231462	2019-03-29 15:14:07 -07:00
MLIR Team	38c2fe3158	LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time). ) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality). ) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest. ) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests). ) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths). *) Test: Adds multiple unit tests to test the new functionality. PiperOrigin-RevId: 229219757	2019-03-29 15:13:53 -07:00
River Riddle	d6b71b0d57	Add a Block::dropAllReferences to drop all references from held instructions and call it when clearing the block. This fixes a bug where ForInst/IfInst instructions may still have references to values while being destroyed. PiperOrigin-RevId: 229207798	2019-03-29 15:13:39 -07:00
River Riddle	a674ae8bbd	Return an empty IntegerSet if the '(' is not parsed. PiperOrigin-RevId: 229198934	2019-03-29 15:13:25 -07:00
River Riddle	791049fb34	Add a FloatAttr::getChecked, and invoke it during Attribute parsing. PiperOrigin-RevId: 229167099	2019-03-29 15:13:10 -07:00
Nicolas Vasilache	1b171e9357	Add EDSC support for operator* PiperOrigin-RevId: 229097351	2019-03-29 15:12:55 -07:00
Nicolas Vasilache	d734c50c5f	[MLIR] Clip all access dimensions during LowerVectorTransfers This CL adds a short term remedy to an issue that was found during execution tests. Lowering of vector transfer ops uses the permutation map to determine which ForInst have been super-vectorized. During materialization to HW vector sizes however, some of those dimensions may be fully unrolled and do not appear in the permutation map. Such dimensions were then not clipped and may have accessed out of bounds. This CL conservatively clips all dimensions to ensure no out of bounds access. The longer term solution is still up for debate but will probably require either passing more information between Materialization and lowering, or just merging the 2 passes. PiperOrigin-RevId: 228980787	2019-03-29 15:12:26 -07:00
Nicolas Vasilache	b941dc8238	[MLIR] Make MLIREmitter emit composed single-result AffineMap by construction Arguably the dependence of EDSCs on Analysis is not great but on the other hand this is a strict improvement in the emitted IR and since EDSCs are an alternative to builders it makes sense that they have as much access to Analysis as Transforms. PiperOrigin-RevId: 228967624	2019-03-29 15:12:11 -07:00
Nicolas Vasilache	362557e11c	Simplify compositions of AffineApply This CL is the 6th and last on the path to simplifying AffineMap composition. This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple calls to `fullyComposeAffineMapAndOperands`. PiperOrigin-RevId: 228962580	2019-03-29 15:11:56 -07:00
River Riddle	ba9a544615	Simplify Attribute constructor definitions. PiperOrigin-RevId: 228926113	2019-03-29 15:11:41 -07:00
River Riddle	3fe8eb3f22	Add check for '[' when parsing a tensor literal list. PiperOrigin-RevId: 228913908	2019-03-29 15:11:11 -07:00
River Riddle	6985dc62b5	Make sure that type construction arguments are forwarded. PiperOrigin-RevId: 228910216	2019-03-29 15:10:55 -07:00
Jacques Pienaar	58423ad1c1	Follow up from previous change to avoid setting tokStart 2x. PiperOrigin-RevId: 228903980	2019-03-29 15:10:40 -07:00
Jacques Pienaar	71ec869011	Fix omitted return post failed parse PiperOrigin-RevId: 228903905	2019-03-29 15:10:25 -07:00
Jacques Pienaar	4fd6db3e29	Skip over whitespace using loop. NFC. Else we can stack overflow on a long sequence of whitespace. PiperOrigin-RevId: 228893517	2019-03-29 15:10:10 -07:00
Lei Zhang	311af4abf3	Const fold splat vectors/tensors in standard add, sub, and mul ops The const folding logic is structurally similar, so use a template to abstract the common part. Moved mul(x, 0) to a legalization pattern to be consistent with mul(x, 1). Also promoted getZeroAttr() to be a method on Builder since it is expected to be frequently used. PiperOrigin-RevId: 228891989	2019-03-29 15:09:55 -07:00
Jacques Pienaar	78da6704b7	Verify string type token before attempting to get string value. Add repro that would have resulted in crash previously. PiperOrigin-RevId: 228890749	2019-03-29 15:09:40 -07:00
Jacques Pienaar	4c0faef943	Avoid redundant predicate checking in type matching. Expand type matcher template generator to consider a set of predicates that are known to hold. This avoids inserting redundant checking for trivially true predicates (for example predicate that hold according to the op definition). This only targets predicates that trivially holds and does not attempt any logic equivalence proof. PiperOrigin-RevId: 228880468	2019-03-29 15:09:25 -07:00
Lei Zhang	ac5a50e1e4	Extract openInputFile() into Support/FileUtilities Multiple binaries have the needs to open input files. Use this function to de-duplicate the code. Also changed openOutputFile() to return errors using std::string since it is a library call and accessing I/O in library call is not friendly. PiperOrigin-RevId: 228878221	2019-03-29 15:09:11 -07:00
River Riddle	e8d0e1f72a	Provide dialect hooks for defining named aliases for AffineMap/IntegerSet/Type. The AsmPrinter will then query registered dialects for aliases of symbols used within the module and use them in place. PiperOrigin-RevId: 228831678	2019-03-29 15:08:55 -07:00
Nicolas Vasilache	cfa5831960	Uniformize composition of AffineApplyOp by construction This CL is the 5th on the path to simplifying AffineMap composition. This removes the distinction between normalized single-result AffineMap and more general composed multi-result map. One nice byproduct of making the implementation driven by single-result is that the multi-result extension is a trivial change: the implementation is still single-result and we just use: ``` unsigned idx = getIndexOf(...); map.getResult(idx); ``` This CL also fixes an AffineNormalizer implementation issue related to symbols. Namely it stops performing substitutions on symbols in AffineNormalizer and instead concatenates them all to be consistent with the call to `AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform simplifications of symbols coming from different maps based on positions only: i.e. dims are applied and renumbered but symbols must be concatenated. The only way to determine whether symbols from different AffineApply are the same is to look at the concrete values. The canonicalizeMapAndOperands is thus extended with behavior to support replacing operands that appear multiple times. Lastly, this CL demonstrates that the implementation is correct by rewriting ComposeAffineMaps using only `makeComposedAffineApply`. The implementation uses a matcher because AffineApplyOp are introduced as composed operations on the fly instead of iteratively forwardSubstituting. For this purpose, a walker would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps is scheduled to disappear, this CL replaces the implementation based on iterative `forwardSubstitute` by a composed-by-construction `makeComposedAffineApply`. Remaining calls to `forwardSubstitute` will be removed in the next CL. PiperOrigin-RevId: 228830443	2019-03-29 15:08:40 -07:00
Uday Bondhugula	2370c601ba	Add safeguard against FM explosion - FM has a worst case exponential complexity. For our purposes, this worst case is rarely expected, but could still appear due to improperly constructed constraints (a logical/memory error in other methods for eg.) or artificially created arbitrarily complex integer sets (adversarial / fuzz tests). Add a check to detect such an explosion in the number of constraints and conservatively return false from isEmpty() (instead of running out of memory or running for too long). - Add an artifical virus test case. PiperOrigin-RevId: 228753496	2019-03-29 15:07:55 -07:00
Alex Zinenko	9003490287	Implement branch-free single-division lowering of affine division/remainder This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from affine expressions to the arithmetic primitive operations. Integer division rules in affine expressions explicitly require rounding towards either negative or positive infinity unlike machine implementations that round towards zero. In the general case, implementing `floordiv` and `ceildiv` using machine signed division requires computing both the quotient and the remainder. When the divisor is positive, this can be simplified by adjusting the dividend and the quotient by one and switching signs. In the current use cases, we are unlikely to encounter affine expressions with negative divisors (affine divisions appear in loop transformations such as tiling that guarantee that divisors are positive by construction). Therefore, it is reasonable to use branch-free single-division implementation. In case of affine maps, divisors can only be literals so we can check the sign and implement the case for negative divisors when the need arises. The affine lowering pass can still fail when applied to semi-affine maps (division or modulo by a symbol). PiperOrigin-RevId: 228668181	2019-03-29 15:07:40 -07:00
River Riddle	56b99b4045	Add a few utilities for terminator management: * Get a specific successor operand. * Iterator support for non successor operands. * Fix bug when removing the last operand from the operand list of an Instruction. * Get the argument number for a BlockArgument. PiperOrigin-RevId: 228660898	2019-03-29 15:07:25 -07:00
Uday Bondhugula	742c37abc9	Fix DMA overlap pass buffer mapping - the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2); step wasn't being accounted for. - fix test cases, enable failing test cases PiperOrigin-RevId: 228635726	2019-03-29 15:07:10 -07:00
Lei Zhang	9b034f0bfd	Add tblgen::Attribute to wrap around TableGen Attr defs This CL added a tblgen::Attribute class to wrap around raw TableGen Record getValue*() calls on Attr defs, which will provide a nicer API for handling TableGen Record. PiperOrigin-RevId: 228581107	2019-03-29 15:06:41 -07:00
Alex Zinenko	6ce30becd7	Support verbose parsing and printing of terminator operations Originally, terminators were special kinds of operation and could not be extended by dialects. Only builtin terminators were supported and they had custom parsers and printers. Currently, "terminator" is a property of an operation, making it possible for dialects to define custom terminators. However, verbose forms of operation syntax were not designed to support terminators that may have a list of successors (each successor contains a block name and an optional operand list). Calling printDefaultOp on a terminator drops all successor information. Dialects are thus required to provide custom parsers and printers for their terminators. Introduce the syntax for the list of successors in the verbose from of the operation. Add support for printing and parsing verbose operations with successors. Note that this does not yet add support for unregistered terminators since "terminator" is a property stored in AsbtractOperation and therefore is only available for registered operations that have an instance of AbstractOperation. Add tests for verbose parsing. It is currently impossible to test round-trip for verbose terminators because none of the known dialects use verbose syntax for printing terminators by default, however the printer was exercised on the LLVM IR dialect prototype. PiperOrigin-RevId: 228566453	2019-03-29 15:06:26 -07:00
Uday Bondhugula	303c09299f	Fix affine expr flattener bug + improve simplification in a particular scenario - fix visitDivExpr: constraints constructed for localVarCst used the original divisor instead of the simplified divisor; fix this. Add a simple test case in memref-bound-check that reproduces this bug - although this was encountered in the context of slicing for fusion. - improve mod expr flattening: when flattening mod expressions, cancel out the GCD of the numerator and denominator so that we can get a simpler flattened form along with a simpler floordiv local var for it PiperOrigin-RevId: 228539928	2019-03-29 15:06:11 -07:00
Nicolas Vasilache	1f78d63f05	[MLIR] Make SuperVectorization use normalized AffineApplyOp Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL uses the simpler single-result unbounded AffineApplyOp in the MaterializeVectors pass. PiperOrigin-RevId: 228469085	2019-03-29 15:05:55 -07:00
Lei Zhang	3e5ee82b81	Put Operator and PredCNF into the tblgen namespace PiperOrigin-RevId: 228429130	2019-03-29 15:05:38 -07:00
Lei Zhang	b2cc2c344e	Add tblgen::Type to wrap around TableGen Type defs This CL added a tblgen::Type class to wrap around raw TableGen Record getValue*() calls on Type defs, which will provide a nicer API for handling TableGen Record. The PredCNF class is also updated to work together with tblgen::Type. PiperOrigin-RevId: 228429090	2019-03-29 15:05:23 -07:00
Chris Lattner	2b902f1288	Delete FuncBuilder::createChecked. It is perhaps still a good idea, but has no clients. Let's re-add it in the future if there is ever a reason to. NFC. Unrelatedly, add a use of a variable to unbreak the non-assert build. PiperOrigin-RevId: 228284026	2019-03-29 15:05:08 -07:00
Nicolas Vasilache	997415fa77	Extract BuiltinOps::canonicalizeMapAndOperands This CL is the 4th on the path to simplifying AffineMap composition. This CL extract canonicalizeMapAndOperands so it can be reused by other functions; in particular, this will be used in `makeNormalizedAffineApply`. PiperOrigin-RevId: 228277890	2019-03-29 15:04:52 -07:00
Nicolas Vasilache	00aac70159	Move makeNormalizedAffineApply This CL is the 3rd on the path to simplifying AffineMap composition. This CL just moves `makeNormalizedAffineApply` from VectorAnalysis to AffineAnalysis where it more naturally belongs. PiperOrigin-RevId: 228277182	2019-03-29 15:04:38 -07:00
Nicolas Vasilache	c6f798a976	Introduce AffineMap::compose(AffineMap) This CL is the 2nd on the path to simplifying AffineMap composition. This CL uses the now accepted `AffineExpr::compose(AffineMap)` to implement `AffineMap::compose(AffineMap)`. Implications of keeping the simplification function in Analysis are documented where relevant. PiperOrigin-RevId: 228276646	2019-03-29 15:04:20 -07:00
River Riddle	8eccc429b7	Add parser support for named type aliases. Alias identifiers can be used in the place of the types that they alias, and are defined as: type-alias-def ::= '!' alias-name '=' 'type' type type-alias ::= '!' alias-name Example: !avx.m128 = type vector<4 x f32> ... "foo"(%x) : vector<4 x f32> -> () // becomes: "foo"(%x) : !avx.m128 -> () PiperOrigin-RevId: 228271372	2019-03-29 15:04:05 -07:00
Uday Bondhugula	e94ba6815a	Fix 0-d memref corner case for getMemRefRegion() - fix crash on test/Transforms/canonicalize.mlir with -memref-bound-check PiperOrigin-RevId: 228268486	2019-03-29 15:03:50 -07:00
Nicolas Vasilache	c449e46ceb	Introduce AffineExpr::compose(AffineMap) This CL is the 1st on the path to simplifying AffineMap composition. This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to implement `AffineExpr::compose(AffineMap)`. Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but this CL does not yet pull the trigger on that. PiperOrigin-RevId: 228265845	2019-03-29 15:03:36 -07:00
Uday Bondhugula	21baf86a2f	Extend loop-fusion's slicing utility + other fixes / updates - refactor toAffineFromEq and the code surrounding it; refactor code into FlatAffineConstraints::getSliceBounds - add FlatAffineConstraints methods to detect identifiers as mod's and div's of other identifiers - add FlatAffineConstraints::getConstantLower/UpperBound - Address b/122118218 (don't assert on invalid fusion depths cmdline flags - instead, don't do anything; change cmdline flags src-loop-depth -> fusion-src-loop-depth - AffineExpr/Map print method update: don't fail on null instances (since we have a wrapper around a pointer, it's avoidable); rationale: dump/print methods should never fail if possible. - Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to IsRangeOneToOne when it's trivially going to be true. - Add additional test cases to exercise the new support - update a few existing test cases since the maps are now generated uniformly with all destination loop operands appearing for the backward slice - Fix projectOut - fix wrong range for getBestElimCandidate. - Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since we didn't have any non-hyperrectangular ones. PiperOrigin-RevId: 228265152	2019-03-29 15:03:20 -07:00
Uday Bondhugula	b934d75b8f	Convert expr - c * (expr floordiv c) to expr mod c in AffineExpr - Detect 'mod' to replace the combination of floordiv, mul, and subtract when possible at construction time; when 'c' is a power of two, this reduces the number of operations; also more compact and readable. Update simplifyAdd for this. On a side note: - with the affine expr flattening we have, a mod expression like d0 mod c would be flattened into d0 - c * q, c * q <= d0 <= cq + c - 1, with 'q' being added as the local variable (q = d0 floordiv c); as a result, a mod was turned into a floordiv whenever the expression was reconstructed back, i.e., as d0 - c (d0 floordiv c); as a result of this change, we recover the mod back. - rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but the file hadn't been). PiperOrigin-RevId: 228258120	2019-03-29 15:02:56 -07:00
Uday Bondhugula	56b3640b94	Misc readability and doc / code comment related improvements - NFC - when SSAValue/MLValue existed, code at several places was forced to create additional aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get rid of such redundant code - use filling ctors instead of explicit loops - for smallvectors, change insert(list.end(), ...) -> append(... - improve comments at various places - turn getMemRefAccess into MemRefAccess ctor and drop duplicated getMemRefAccess. In the next CL, provide getAccess() accessors for load, store, DMA op's to return a MemRefAccess. PiperOrigin-RevId: 228243638	2019-03-29 15:02:41 -07:00
Lei Zhang	f8bbe5deca	Various tiny refinements over TableGen Operator class Use "native" vs "derived" to differentiate attributes on ops: native ones are specified when creating the op as a part of defining the op, while derived ones are computed from properties of the op. PiperOrigin-RevId: 228186962	2019-03-29 15:01:56 -07:00
River Riddle	3b2c5600d9	Add support for types belonging to unknown dialects. This allows for types to be round tripped even if the dialect that defines them is not linked in. These types will be represented by a new "UnknownType" that uniques them based upon the dialect namespace and raw string type data. PiperOrigin-RevId: 228184629	2019-03-29 15:01:11 -07:00
Jacques Pienaar	aae85ddce1	Match attributes in input pattern. Bind attributes similar to operands. Use to rewrite leakyreulo and const rewrite pattern. The attribute type/attributes are not currently checked so should only be used where the attributes match due to the construction of the op. To support current attribute namespacing, convert __ in attribute name to "$" for matching purposes ('$' is not valid character in variable in TableGen). Some simplification to make it simpler to specify indented ostream and avoid so many spaces. The goal is not to have perfectly formatted code generated but good enough so that its still easy to read for a user. PiperOrigin-RevId: 228183639	2019-03-29 15:00:55 -07:00
Alex Zinenko	92a899f629	Drop all uses of the ForInst induction variable before deleting ForInst The `for` instruction defines the loop induction variable it uses. In the well-formed IR, the induction variable can only be used by the body of the `for` loop. Existing implementation was explicitly cleaning the body of the for loop to remove all uses of the induction variable before removing its definition. However, in ill-formed IR that may appear in some stages of parsing, there may be (invalid) users of the loop induction variable outside the loop body. In case of unsuccessful parsing, destructor of the ForInst-defined Value would assert because there are remaining though invalid users of this Value. Explicitly drop all uses of the loop induction Value when destroying a ForInst. It is no longer necessary to explicitly clean the body of the loop, destructor of the block will take care of this. PiperOrigin-RevId: 228168880	2019-03-29 15:00:26 -07:00
Alex Zinenko	3b7b0040ce	FunctionParser::~FunctionParser: avoid iterator invalidation When destroying a FunctionParser in case of parsing failure, we clean up all uses of undefined forward-declared references. This has been implemented as iteration over the list of uses. However, deleting one use from the list invalidates the iterator (`IROperand::drop` sets `nextUse` to `nullptr` while the iterator reads `nextUse` to advance; therefore only the first use was deleted from the list). Get a new iterator before calling drop to avoid invalidation. PiperOrigin-RevId: 228168849	2019-03-29 15:00:10 -07:00
Uday Bondhugula	94c2d969ce	Rename getAffineBinaryExpr -> getAffineBinaryOpExpr, getBinaryAffineOpExpr -> getAffineBinaryOpExpr for consistency (NFC) - this is consistent with the name of the class and getAffineDimExpr/ConstantExpr, etc. PiperOrigin-RevId: 228164959	2019-03-29 14:59:52 -07:00
Nicolas Vasilache	7c0bbe0939	Iterate on vector rather than DenseMap during AffineMap normalization This CL removes a flakyness associated to a spurious iteration on DenseMap iterators when normalizing AffineMap. PiperOrigin-RevId: 228160074	2019-03-29 14:59:37 -07:00
Alex Zinenko	c47ed53211	Add simple constant folding hook for CmpIOp Integer comparisons can be constant folded if both of their arguments are known constants, which we can compare in the compiler. This requires implementing all comparison predicates, but thanks to consistency between LLVM and MLIR comparison predicates, we have a one-to-one correspondence between predicates and llvm::APInt comparison functions. Constant folding of comparsions with maximum/minimum values of the integer type are left for future work. This will be used to test the lowering of mod/floordiv/ceildiv in affine expressions at compile time. PiperOrigin-RevId: 228077580	2019-03-29 14:59:22 -07:00
Alex Zinenko	caa7e70627	LLVM IR lowering: support integer division and remainder operations These operations trivially map to LLVM IR counterparts for operands of scalar and (one-dimensional) vector type. Multi-dimensional vector and tensor type operands would fail type conversion before the operation conversion takes place. Add tests for scalar and vector cases. Also add a test for vector `select` instruction for consistency with other tests. PiperOrigin-RevId: 228077564	2019-03-29 14:59:07 -07:00
Alex Zinenko	bc04556cf8	Introduce integer division and remainder operations This adds signed/unsigned integer division and remainder operations to the StandardOps dialect. Two versions are required because MLIR integers are signless, but the meaning of the leading bit is important in division and affects the results. LLVM IR made a similar choice. Define the operations in the tablegen file and add simple constant folding hooks in the C++ implementation. Handle signed division overflow and division by zero errors in constant folding. Canonicalization is left for future work. These operations are necessary to lower affine_apply's down to LLVM IR. PiperOrigin-RevId: 228077549	2019-03-29 14:58:52 -07:00
Nicolas Vasilache	28cf580555	Cleanup spurious DenseMap include PiperOrigin-RevId: 228059305	2019-03-29 14:58:38 -07:00
Jacques Pienaar	8f24943826	Verify type of operands match those specifed in op registry. Expand type to include matcher predicates. Use CNF form to allow specifying combinations of constraints for type. The matching call for the type is used to verify the construction of the operation as well as in rewrite pattern generation. The matching initially includes redundant checks (e.g., even if the operand of the op is guaranteed to satisfy some requirement, it is still checked during matcher generation for now). As well as some of the traits specified now check what the generated code already checks. Some of the traits can be removed in future as the verify method will include the relevant checks based on the op definition already. More work is needed for variadic operands. CNF form is used so that in the follow up redundant checks in the rewrite patterns could be omitted (e.g., when matching a F32Tensor, one does not need to verify that op X's operand 0 is a Tensor if that is guaranteed by op X's definition). The alternative was to have single matcher function specified, but this would not allow for reasoning about what attributes already hold (at the level of PredAtoms). Use this new operand type restrictions to rewrite BiasAdd with floating point operands as declarative pattern. PiperOrigin-RevId: 227991412	2019-03-29 14:58:23 -07:00
Nicolas Vasilache	62dabbfd09	Fix opt build failure PiperOrigin-RevId: 227938032	2019-03-29 14:57:36 -07:00
Uday Bondhugula	8496f2c30b	Complete TODOs / cleanup for loop-fusion utility - this is CL 1/2 that does a clean up and gets rid of one limitation in an underlying method - as a result, fusion works for more cases. - fix bugs/incomplete impl. in toAffineMapFromEq - fusing across rank changing reshapes for example now just works For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this, -loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward to get rid of the temporary: INPUT // Rank 1 -> Rank 2 reshape for %i0 = 0 to 64 { %v = load %A[%i0] store %v, %B[%i0 floordiv 8, i0 mod 8] } for %i1 = 0 to 8 for %i2 = 0 to 8 %w = load %B[%i1, i2] "foo"(%w) : (f32) -> () OUTPUT $ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir #map0 = (d0, d1) -> (d0 * 8 + d1) mlfunc @fuse_reshape(%arg0: memref<64xf32>) { for %i0 = 0 to 8 { for %i1 = 0 to 8 { %0 = affine_apply #map0(%i0, %i1) %1 = load %arg0[%0] : memref<64xf32> "foo"(%1) : (f32) -> () } } } AFAIK, there is no polyhedral tool / compiler that can perform such fusion - because it's not really standard loop fusion, but possible through a generalized slicing-based approach such as ours. PiperOrigin-RevId: 227918338	2019-03-29 14:57:22 -07:00
Smit Hinsu	d3339ea2b8	Handle parsing failure for splat elements attribute Currently, it emits the error but does not terminate parsing. TESTED with unit test PiperOrigin-RevId: 227886274	2019-03-29 14:56:52 -07:00
Nicolas Vasilache	618c6a74c6	[MLIR] Introduce normalized single-result unbounded AffineApplyOp Supervectorization does not plan on handling multi-result AffineMaps and non-canonical chains of > 1 AffineApplyOp. This CL introduces a simpler abstraction and composition of single-result unbounded AffineApplyOp by using the existing unbound AffineMap composition. This CL adds a simple API call and relevant tests: ```c++ OpPointer<AffineApplyOp> makeNormalizedAffineApply( FuncBuilder b, Location loc, AffineMap map, ArrayRef<Value> operands); ``` which creates a single-result unbounded AffineApplyOp. The operands of AffineApplyOp are not themselves results of AffineApplyOp by consrtuction. This represent the simplest possible interface to complement the composition of (mathematical) AffineMap, for the cases when we are interested in applying it to Value*. In this CL the composed AffineMap is not compressed (i.e. there exist operands that are not part of the result). A followup commit will compress to normal form. The single-result unbounded AffineApplyOp abstraction will be used in a followup CL to support the MaterializeVectors pass. PiperOrigin-RevId: 227879021	2019-03-29 14:56:37 -07:00
River Riddle	d2cd083f79	Introduce CRTP TypeBase class to simplify type construction and validation. This impl class currently provides the following: * auto definition of the 'ImplType = StorageClass' * get/getChecked wrappers around TypeUniquer * 'verifyConstructionInvariants' hook - This hook verifies that the arguments passed into get/getChecked are valid to construct a type instance with. With this, all non-generic type uniquing has been moved out of MLIRContext.cpp PiperOrigin-RevId: 227871108	2019-03-29 14:56:22 -07:00
Chris Lattner	7983bbc251	Introduce a simple canonicalization of affine_apply that drops unused dims and symbols. Included with this is some other infra: - Testcases for other canonicalizations that I will implement next. - Some helpers in AffineMap/Expr for doing simple walks without defining whole visitor classes. - A 'replaceDimsAndSymbols' facility that I'll be using to simplify maps and exprs, e.g. to fold one constant into a mapping and to drop/renumber unused dims. - Allow index (and everything else) to work in memref's, as we previously discussed, to make the testcase easier to write. - A "getAffineBinaryExpr" helper to produce a binop when you know the kind as an enum. This line of work will eventually subsume the ComposeAffineApply pass, but it is no where close to that yet :-) PiperOrigin-RevId: 227852951	2019-03-29 14:56:07 -07:00
Alex Zinenko	8281151c2a	TableGen standard arithmetic ops Use tablegen to generate definitions of the standard binary arithmetic operations. These operations share a lot of boilerplate that is better off generated by a tool. Using tablegen for standard binary arithmetic operations requires the following modifications. 1. Add a bit field `hasConstantFolder` to the base Op tablegen class; generate the `constantFold` method signature if the bit is set. Differentiate between single-result and zero/multi-result functions that use different signatures. The implementation of the method remains in C++, similarly to canonicalization patterns, since it may be large and non-trivial. 2. Define the `AnyType` record of class `Type` since `BinaryOp` currently provided in op_base.td is supposed to operate on tensors and other tablegen users may rely on this behavior. Note that this drops the inline documentation on the operation classes that was copy-pasted around anyway. Since we don't generate g3doc from tablegen yet, keep LangRef.md as it is. Eventually, the user documentation can move to the tablegen definition file as well. PiperOrigin-RevId: 227820815	2019-03-29 14:55:37 -07:00
Jacques Pienaar	dde5bf234d	Use Operator class in OpDefinitionsGen. Cleanup NFC. PiperOrigin-RevId: 227764826	2019-03-29 14:55:22 -07:00
Nicolas Vasilache	0ebc0ba72e	[MLIR] More graceful failure in MaterializeVectors Even though it is unexpected except in pathological cases, a nullptr clone may be returned. This CL handles the nullptr return gracefuly. PiperOrigin-RevId: 227764615	2019-03-29 14:55:05 -07:00
Nicolas Vasilache	5b87a5ef4b	[MLIR] Drop strict super-vector requirement in MaterializeVector The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a premature optimization to avoid interfering with other vector code potentially introduced via other means. This CL avoids this premature optimization and the spurious errors it causes when super-vector size == HW vector size (which is a possible corner case). This may be revisited in the future. PiperOrigin-RevId: 227763966	2019-03-29 14:54:49 -07:00
Nicolas Vasilache	17f96ea3dd	[MLIR] Fix uninitialized value found with msan The omission of an early exit created opportunities for unitialized memory reads. This CL fixes the issue. PiperOrigin-RevId: 227761814	2019-03-29 14:54:36 -07:00
Nicolas Vasilache	947e5f4a68	[MLIR] Handle corner case in MaterializeVectors This corner was found when stress testing with a functional end-to-end CPU path. In the case where the hardware vector size is 1x...x1 the `keep` vector is empty and would result a crash. While there is no reason to expect a 1x...x1 HW vector in practice, this case can just gracefully degrade to scalar, which is what this CL allows. PiperOrigin-RevId: 227761097	2019-03-29 14:54:22 -07:00
River Riddle	54948a4380	Split the standard types from builtin types and move them into separate source files(StandardTypes.cpp/h). After this cl only FunctionType and IndexType are builtin types, but IndexType will likely become a standard type when the ml/cfgfunc merger is done. Mechanical NFC. PiperOrigin-RevId: 227750918	2019-03-29 14:54:07 -07:00
Jacques Pienaar	c396c044e6	Match the op via isa instead of string compare. * Match using isa - This limits the rewrite pattern to ops defined in op registry but that is probably better end state (esp. for additional verification). PiperOrigin-RevId: 227598946	2019-03-29 14:53:37 -07:00
River Riddle	8abc06f3d5	Implement initial support for dialect specific types. Dialect specific types are registered similarly to operations, i.e. registerType<...> within the dialect. Unlike operations, there is no notion of a "verbose" type, that is all types must be registered to a dialect. Casting support(isa/dyn_cast/etc.) is implemented by reserving a range of type kinds in the top level Type class as opposed to string comparison like operations. To support derived types a few hooks need to be implemented: In the concrete type class: - static char typeID; * A unique identifier for the type used during registration. In the Dialect: - typeParseHook and typePrintHook must be implemented to provide parser support. The syntax for dialect extended types is as follows: dialect-type: '!' dialect-namespace '<' '"' type-specific-data '"' '>' The 'type-specific-data' is information used to identify different types within the dialect, e.g: - !tf<"variant"> // Tensor Flow Variant Type - !tf<"string"> // Tensor Flow String Type TensorFlow/TensorFlowControl types are now implemented as dialect specific types as a proof of concept. PiperOrigin-RevId: 227580052	2019-03-29 14:53:07 -07:00
Alex Zinenko	0c4ee54198	Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass This change is mechanical and merges the LowerAffineApplyPass and LowerIfAndForPass into a single LowerAffinePass. It makes a step towards defining an "affine dialect" that would contain all polyhedral-related constructs. The motivation for merging these two passes is based on retiring MLFunctions and, eventually, transforming If and For statements into regular operations. After that happens, LowerAffinePass becomes yet another legalization. PiperOrigin-RevId: 227566113	2019-03-29 14:52:52 -07:00
Alex Zinenko	fa710c17f4	LowerForAndIf: expand affine_apply's inplace Existing implementation was created before ML/CFG unification refactoring and did not concern itself with further lowering to separate concerns. As a result, it emitted `affine_apply` instructions to implement `for` loop bounds and `if` conditions and required a follow-up function pass to lower those `affine_apply` to arithmetic primitives. In the unified function world, LowerForAndIf is mostly a lowering pass with low complexity. As we move towards a dialect for affine operations (including `for` and `if`), it makes sense to lower `for` and `if` conditions directly to arithmetic primitives instead of relying on `affine_apply`. Expose `expandAffineExpr` function in LoweringUtils. Use this function together with `expandAffineMaps` to emit primitives that implement loop and branch conditions directly. Also remove tests that become unnecessary after transforming LowerForAndIf into a function pass. PiperOrigin-RevId: 227563608	2019-03-29 14:52:22 -07:00
Alex Zinenko	d64db86f20	Refactor LowerAffineApply In LoweringUtils, extract out `expandAffineMap`. This function takes an affine map and a list of values the map should be applied to and emits a sequence of arithmetic instructions that implement the affine map. It is independent of the AffineApplyOp and can be used in places where we need to insert an evaluation of an affine map without relying on a (temporary) `affine_apply` instruction. This prepares for a merge between LowerAffineApply and LowerForAndIf passes. Move the `expandAffineApply` function to the LowerAffineApply pass since it is the only place that must be aware of the `affine_apply` instructions. PiperOrigin-RevId: 227563439	2019-03-29 14:52:07 -07:00
Chris Lattner	8ebd64b32f	Update the g3docs to reflect the merging of CFG and ML functions. PiperOrigin-RevId: 227562943	2019-03-29 14:51:52 -07:00
Chris Lattner	bbf362b784	Eliminate extfunc/cfgfunc/mlfunc as a concept, and just use 'func' instead. The entire compiler now looks at structural properties of the function (e.g. does it have one block, does it contain an if/for stmt, etc) so the only thing holding up this difference is round tripping through the parser/printer syntax. Removing this shrinks the compile by ~140LOC. This is step 31/n towards merging instructions and statements. The last step is updating the docs, which I will do as a separate patch in order to split it from this mostly mechanical patch. PiperOrigin-RevId: 227540453	2019-03-29 14:51:37 -07:00
River Riddle	ae3f8a79ae	Rename OperationPrefix to Namespace in Dialect. This is important as dialects will soon be able to define more than just operations. Moving forward dialect namespaces cannot contain '.' characters. This cl also standardizes that operation names must begin with the dialect namespace followed by a '.'. PiperOrigin-RevId: 227532193	2019-03-29 14:51:22 -07:00
Alex Zinenko	0565067495	LLVM IR Lowering: support "select" This commit adds support for the "select" operation that lowers directly into its LLVM IR counterpart. A simple test is included. PiperOrigin-RevId: 227527893	2019-03-29 14:51:08 -07:00
Chris Lattner	50a356d118	Simplify FunctionPass to only have a runOnFunction hook, instead of having a runOnCFG/MLFunction override locations. Passes that care can handle this filtering if they choose. Also, eliminate one needless difference between CFG/ML functions in the parser. This is step 30/n towards merging instructions and statements. PiperOrigin-RevId: 227515912	2019-03-29 14:50:53 -07:00
Nicolas Vasilache	73f5c9c380	[MLIR] Sketch a simple set of EDSCs to declaratively write MLIR This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs) in MLIR components: 1. a `Type` system of shell classes that closely matches the MLIR type system. These types are subdivided into `Bindable` leaf expressions and non-bindable `Expr` expressions; 2. an `MLIREmitter` class whose purpose is to: a. maintain a map of `Bindable` leaf expressions to concrete SSAValue; b. provide helper functionality to specify bindings of `Bindable` classes to SSAValue while verifying comformable types; c. traverse the `Expr` and emit the MLIR. This is used on a concrete example to implement MemRef load/store with clipping in the LowerVectorTransfer pass. More specifically, the following pseudo-C++ code: ```c++ MLFuncBuilder *b = ...; Location location = ...; Bindable zero, one, expr, size; // EDSL expression auto access = select(expr < zero, zero, select(expr < size, expr, size - one)); auto ssaValue = MLIREmitter(b) .bind(zero, ...) .bind(one, ...) .bind(expr, ...) .bind(size, ...) .emit(location, access); ``` is used to emit all the MLIR for a clipped MemRef access. This simple EDSL can easily be extended to more powerful patterns and should serve as the counterpart to pattern matchers (and could potentially be unified once we get enough experience). In the future, most of this code should be TableGen'd but for now it has concrete valuable uses: make MLIR programmable in a declarative fashion. This CL also adds Stmt, proper supporting free functions and rewrites VectorTransferLowering fully using EDSCs. The code for creating the EDSCs emitting a VectorTransferReadOp as loops with clipped loads is: ```c++ Stmt block = Block({ tmpAlloc = alloc(tmpMemRefType), vectorView = vector_type_cast(tmpAlloc, vectorMemRefType), ForNest(ivs, lbs, ubs, steps, { scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs), store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs), }), vectorValue = load(vectorView, zero), tmpDealloc = dealloc(tmpAlloc.getLHS())}); emitter.emitStmt(block); ``` where `accessInfo.clippedScalarAccessExprs)` is created with: ```c++ select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one)); ``` The generated MLIR resembles: ```mlir %1 = dim %0, 0 : memref<?x?x?x?xf32> %2 = dim %0, 1 : memref<?x?x?x?xf32> %3 = dim %0, 2 : memref<?x?x?x?xf32> %4 = dim %0, 3 : memref<?x?x?x?xf32> %5 = alloc() : memref<5x4x3xf32> %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>> for %i4 = 0 to 3 { for %i5 = 0 to 4 { for %i6 = 0 to 5 { %7 = affine_apply #map0(%i0, %i4) %8 = cmpi "slt", %7, %c0 : index %9 = affine_apply #map0(%i0, %i4) %10 = cmpi "slt", %9, %1 : index %11 = affine_apply #map0(%i0, %i4) %12 = affine_apply #map1(%1, %c1) %13 = select %10, %11, %12 : index %14 = select %8, %c0, %13 : index %15 = affine_apply #map0(%i3, %i6) %16 = cmpi "slt", %15, %c0 : index %17 = affine_apply #map0(%i3, %i6) %18 = cmpi "slt", %17, %4 : index %19 = affine_apply #map0(%i3, %i6) %20 = affine_apply #map1(%4, %c1) %21 = select %18, %19, %20 : index %22 = select %16, %c0, %21 : index %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32> store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32> } } } %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>> dealloc %5 : memref<5x4x3xf32> ``` In particular notice that only 3 out of the 4-d accesses are clipped: this corresponds indeed to the number of dimensions in the super-vector. This CL also addresses the cleanups resulting from the review of the prevous CL and performs some refactoring to simplify the abstraction. PiperOrigin-RevId: 227367414	2019-03-29 14:50:23 -07:00
Chris Lattner	a250643ec8	Merge together the CFG/ML function paths in the CSE pass. I did a first pass on this to merge together the classes, but there may be other simplification possible. I'll leave that to riverriddle@ as future work. This is step 29/n towards merging instructions and statements. PiperOrigin-RevId: 227328680	2019-03-29 14:50:08 -07:00
Chris Lattner	7974889f54	Update and generalize various passes to work on both CFG and ML functions, simplifying them in minor ways. The only significant cleanup here is the constant folding pass. All the other changes are simple and easy, but this is still enough to shrink the compiler by 45LOC. The one pass left to merge is the CSE pass, which will be move involved, so I'm splitting it out to its own patch (which I'll tackle right after this). This is step 28/n towards merging instructions and statements. PiperOrigin-RevId: 227328115	2019-03-29 14:49:52 -07:00
Chris Lattner	3c8fc797de	Simplify the remapFunctionAttrs logic, merging CFG/ML function handling. Remove an unnecessary restriction in forward substitution. Slightly simplify LLVM IR lowering, which previously would crash if given an ML function, it should now produce a clean error if given a function with an if/for instruction in it, just like it does any other unsupported op. This is step 27/n towards merging instructions and statements. PiperOrigin-RevId: 227324542	2019-03-29 14:49:35 -07:00
Chris Lattner	4bd9f93606	Simplify GreedyPatternRewriteDriver now that functions are merged into one representation, shrinking by 70LOC. The PatternRewriter class can probably also be simplified as well, but one step at a time. This is step 26/n towards merging instructions and statements. NFC. PiperOrigin-RevId: 227324218	2019-03-29 14:49:20 -07:00
Uday Bondhugula	18fbc3e170	Drop unusued HyperRectangularSet.h/.cpp, given the new design being worked on. - drop these ununsed/incomplete sketches given the new design @albertcohen is working on, and given that FlatAffineConstraints is now stable and fast enough for all the analyses/transforms that depend on it. PiperOrigin-RevId: 227322739	2019-03-29 14:49:03 -07:00
Uday Bondhugula	f12182157e	Introduce PostDominanceInfo, fix properlyDominates() for Instructions - introduce PostDominanceInfo in the right/complete way and use that for post dominance check in store-load forwarding - replace all uses of Analysis/Utils::dominates/properlyDominates with DominanceInfo::dominates/properlyDominates - drop all redundant copies of dominance methods in Analysis/Utils/ - in pipeline-data-transfer, replace dominates call with a much less expensive check; similarly, substitute dominates() in checkMemRefAccessDependence with a simpler check suitable for that context - fix a bug in properlyDominates - improve doc for 'for' instruction 'body' PiperOrigin-RevId: 227320507	2019-03-29 14:48:44 -07:00
Uday Bondhugula	cea9f28a2c	Fix dominates() for block's. - dominates() for blocks was assuming that there was only a single block at the top level whenever there was a hierarchy of blocks (as in the case of 'for'/'if' instructions). - fix the comments as well PiperOrigin-RevId: 227319738	2019-03-29 14:48:28 -07:00
Chris Lattner	ae618428f6	Greatly simplify the ConvertToCFG pass, converting it from a module pass to a function pass, and eliminating the need to copy over code and do interprocedural updates. While here, also improve it to make fewer empty blocks, and rename it to "LowerIfAndFor" since that is what it does. This is a net reduction of ~170 lines of code. As drive-bys, change the splitBlock method to not insert an unconditional branch, since that behavior is annoying for all clients. Also improve the AsmPrinter to not crash when a block is referenced that isn't linked into a function. PiperOrigin-RevId: 227308856	2019-03-29 14:48:13 -07:00
Uday Bondhugula	545f3ce430	Fix ASAN failure in memref-dataflow-opt - memrefsToErase had duplicates inserted into it; switch to SmallPtrSet. PiperOrigin-RevId: 227299306	2019-03-29 14:47:58 -07:00
Feng Liu	dfee0a6e9b	Make PrintOpStatsPass a module pass PrintOpStatsPass is maintaining state (op stats ) across functions and doing per-module work - it should be a module pass. PiperOrigin-RevId: 227294151	2019-03-29 14:47:43 -07:00
Uday Bondhugula	b9fe6be6d4	Introduce memref store to load forwarding - a simple memref dataflow analysis - the load/store forwarding relies on memref dependence routines as well as SSA/dominance to identify the memref store instance uniquely supplying a value to a memref load, and replaces the result of that load with the value being stored. The memref is also deleted when possible if only stores remain. - add methods for post dominance for MLFunction blocks. - remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth, getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were earlier static) - add a helper method in FlatAffineConstraints - isRangeOneToOne. PiperOrigin-RevId: 227252907	2019-03-29 14:47:28 -07:00
Uday Bondhugula	6e3462d251	Fix b/122139732; update FlatAffineConstraints::isEmpty() to eliminate IDs in a better order. - update isEmpty() to eliminate IDs in a better order. Speed improvement for complex cases (for eg. high-d reshape's involving mod's/div's). - minor efficiency update to projectOut (was earlier making an extra albeit benign call to gaussianEliminateIds) (NFC). - move getBestIdToEliminate further up in the file (NFC). - add the failing test case. - add debug info to checkMemRefAccessDependence. PiperOrigin-RevId: 227244634	2019-03-29 14:47:13 -07:00
Chris Lattner	dffc589ad2	Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the Function::walk functionality into f->walkInsts/Ops which allows visiting all instructions, not just ops. Eliminate Function::getBody() and Function::getReturn() helpers which crash in CFG functions, and were only kept around as a bridge. This is step 25/n towards merging instructions and statements. PiperOrigin-RevId: 227243966	2019-03-29 14:46:58 -07:00
Chris Lattner	8ef2552df7	Have the asmprinter take advantage of the new capabilities of the asmparser, by printing the entry block in a CFG function's argument line. Since I'm touching all of the testcases anyway, change the argument list from printing as "%arg : type" to "%arg: type" which is more consistent with bb arguments. In addition to being more consistent, this is a much nicer look for cfg functions. PiperOrigin-RevId: 227240069	2019-03-29 14:46:29 -07:00
Chris Lattner	aaa1d77e96	Clean up and improve the parser handling of basic block labels, now that we have a designator. This improves diagnostics and merges handling between CFG and ML functions more. This also eliminates hard coded parser knowledge of terminator keywords, allowing dialects to define their own terminators. PiperOrigin-RevId: 227239398	2019-03-29 14:46:13 -07:00
Chris Lattner	37579ae8c4	Introduce ^ as a basic block sigil, eliminating an ambiguity on the MLIR syntax. PiperOrigin-RevId: 227234174	2019-03-29 14:45:59 -07:00
Chris Lattner	56e2a6cc3b	Merge the verifier logic for all functions into a unified framework, this requires enhancing DominanceInfo to handle the structure of an ML function, which is required anyway. Along the way, this also fixes a const correctness problem with Instruction::getBlock(). This is step 24/n towards merging instructions and statements. PiperOrigin-RevId: 227228900	2019-03-29 14:45:43 -07:00
Chris Lattner	4a96a11d6d	Enhance parsing of CFG and Ext functions to optionally allow named arguments in the function signature, giving them common functionality to ml functions. This is a strictly additive patch that adds new capability without changing behavior in a significant way (other than a few diagnostic cleanups). A subsequent patch will change the printer to use this behavior, which will require updating a ton of testcases. :) This exposes the fact that we need to make a grammar change for block arguments, as is tracked by b/122119779 This is step 23/n towards merging instructions and statements, and one of the first steps towards eliminating the "cfg vs ml" distinction at a syntax and semantic level. PiperOrigin-RevId: 227228342	2019-03-29 14:45:28 -07:00
Chris Lattner	5b9c3f7cdb	Tidy up references to "basic blocks" that should refer to blocks now. NFC. PiperOrigin-RevId: 227196077	2019-03-29 14:44:59 -07:00
Chris Lattner	be9ee4a98e	Merge parser logic for CFG and ML functions, shrinking the code by ~80 lines. This causes a slight change to diagnostics, but is otherwise behavior preserving. This is step 22/n towards merging instructions and statements, MFC. PiperOrigin-RevId: 227187857	2019-03-29 14:44:44 -07:00
Chris Lattner	456ad6a8e0	Standardize naming of statements -> instructions, revisting the code base to be consistent and moving the using declarations over. Hopefully this is the last truly massive patch in this refactoring. This is step 21/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227178245	2019-03-29 14:44:30 -07:00
Uday Bondhugula	b1d9cc4d1e	Extend/complete dependence tester to utilize local var info. - extend/complete dependence tester to utilize local var info while adding access function equality constraints; one more step closer to get slicing based fusion working in the general case of affine_apply's involving mod's/div's. - update test case to reflect more accurate dependence information; remove inaccurate comment on test case mod_deps. - fix a minor "bug" in equality addition in addMemRefAccessConstraints (doesn't affect correctness, but the fixed version is more intuitive). - some more surrounding code clean up - move simplifyAffineExpr out of anonymous AffineExprFlattener class - the latter has state, and the former should reside outside. PiperOrigin-RevId: 227175600	2019-03-29 14:44:14 -07:00
Chris Lattner	315a466aed	Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand. The last major renaming is Statement -> Instruction, which is why Statement and Stmt still appears in various places. This is step 19/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227163082	2019-03-29 14:43:58 -07:00
Jacques Pienaar	2a463c36b1	Add convenience wrapper for operator in tblgen Add convenience wrapper to make it easier to iterate over attributes and operands of operator defined in TableGen file. Use this class in RewriterGen (not used in the op generator yet, will do shortly). Change the RewriterGen to pass the bound arguments explicitly, this is in preparation for multi-op matching. PiperOrigin-RevId: 227156748	2019-03-29 14:43:43 -07:00
Chris Lattner	69f9f6e21c	Merge ext/cfg/ml function printing logic in the AsmPrinter (shrinking it by about 100 LOC), without changing any existing behavior. This is step 20/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227155000	2019-03-29 14:43:29 -07:00
Chris Lattner	69d9e990fa	Eliminate the using decls for MLFunction and CFGFunction standardizing on Function. This is step 18/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227139399	2019-03-29 14:43:13 -07:00
Chris Lattner	d798f9bad5	Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(), StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names. This is step 17/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227121537	2019-03-29 14:42:40 -07:00
Chris Lattner	5187cfcf03	Merge Operation into OperationInst and standardize nomenclature around OperationInst. This is a big mechanical patch. This is step 16/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227093712	2019-03-29 14:42:23 -07:00
Chris Lattner	471c976413	Rework inherentance hierarchy: Operation now derives from Statement, and OperationInst derives from it. This allows eliminating some forwarding functions, other complex code handling multiple paths, and the 'isStatement' bit tracked by Operation. This is the last patch I think I can make before the big mechanical change merging Operation into OperationInst, coming next. This is step 15/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227077411	2019-03-29 14:41:49 -07:00
Feng Liu	9b20a4ccdf	add a method to get FloatAttr value as double Sometimes we have to get the raw value of the FloatAttr to invoke APIs from non-MLIR libraries (i.e. in the tpu_ops.inc and convert_tensor.cc files). Using `FloatAttr::getValue().convertToFloat()` and `FloatAttr::getValue().convertToDouble()` is not safe because interally they checke the semantics of the APFloat in the attribute, and the semantics is not always specified (the default value is f64 then convertToFloat will fail) or inferred incorrectly (for example, using 1.0 instead of 1.f for IEEEFloat). Calling these convert methods without knowing the semantics can usually crash the compiler. This new method converts the value of a FloatAttr to double even if it loses precision. Currently this method can be used to read in f32 data from arrays. PiperOrigin-RevId: 227076616	2019-03-29 14:41:34 -07:00
Chris Lattner	1b430f1d32	Delicately re-layer Operation, Statement, and OperationStmt, reworking #includes so Statements.h includes Operation.h but nothing else does. This is in preparation to eliminate the Operation class and the complexity it brings with it. I split this patch off because it is just moving stuff around, the next patch will be more complex. This is step 14/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227071777	2019-03-29 14:41:05 -07:00
Chris Lattner	4fbcd1ac52	Minor renamings: Trim the "Stmt" prefix off StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the CFGFunctionViewGraph pass to ViewFunctionGraph. This is step 13/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227069438	2019-03-29 14:40:51 -07:00
Uday Bondhugula	294687ef59	Fix affine expr flattener bug introduced by cl/225452174. - inconsistent local var constraint size when repeatedly using the same flattener for all expressions in a map. PiperOrigin-RevId: 227067836	2019-03-29 14:40:37 -07:00
Chris Lattner	4c05f8cac6	Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new FuncBuilder class. Also rename SSAValue.cpp to Value.cpp This is step 12/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227067644	2019-03-29 14:40:22 -07:00
Chris Lattner	3f190312f8	Merge SSAValue, CFGValue, and MLValue together into a single Value class, which is the new base of the SSA value hierarchy. This CL also standardizes all the nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing. This is step 11/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227064624	2019-03-29 14:40:06 -07:00
Chris Lattner	776b035646	Eliminate the Instruction, BasicBlock, CFGFunction, MLFunction, and ExtFunction classes, using the Statement/StmtBlock hierarchy and Function instead. This only changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure. This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed. This is step 10/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227044402	2019-03-29 14:39:49 -07:00
Alex Zinenko	a63f440601	LoopAnalysis: isContiguousAccess fail gracefully Existing implementation of isContiguousAccess asserts that one of the function arguments is within certain range, depending on another parameter. However, the value of this argument may come from outside, in particular in the loop vectorization pass it may come from command line arguments. This leads to 'mlir-opt' crashing on an assertion depending on flags. Handle the error gracefully by reporting error returning a negative result instead. This negative result prevents any further transformation by the vectorizer so the IR remains valid. PiperOrigin-RevId: 227029496	2019-03-29 14:39:34 -07:00
Jacques Pienaar	057984d05d	Move print op stats pass to analysis. Move PrintOpStatsPass out of tools and to other passes (moved to Analysis as it doesn't modify the program but it is different than the other analysis passes as it is only consumer at present is the user). PiperOrigin-RevId: 227018996	2019-03-29 14:39:19 -07:00
Chris Lattner	abf72a8bb1	Rename findFunction from the ML side of the house to be named getFunction(), making it more similar to the CFG side of things. It is true that in a deeply nested case that this is not a guaranteed O(1) time operation, and that 'get' could lead compiler hackers to think this is cheap, but we need to merge these and we can look into solutions for this in the future if it becomes a problem in practice. This is step 9/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226983931	2019-03-29 14:38:49 -07:00
Mehdi Amini	4e5337601e	Inline Instruction's operands as TrailingObjects For performance/memory saving purpose, having the Instruction holding a std::vector for the operands isn't a really good tradeoff. The only reason for this was to support adding/removing easily BasicBlock arguments to Terminator. Since this isn't the most common operation, we instead force a pre-allocated list of operands on Instructions at creation time. PiperOrigin-RevId: 226981227	2019-03-29 14:38:34 -07:00
Chris Lattner	036f87b15f	Rename CFGFunctionGraphTraits.h -> FunctionGraphTraits.h and add graph specializations for doing CFG traversals of ML Functions, making the two sorts of functions have the same capabilities. This is step 8/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226968502	2019-03-29 14:38:19 -07:00
Chris Lattner	3bd8ff6699	Eliminate the MLFuncArgument class representing arguments to MLFunctions: use the BlockArgument arguments of the entry block instead. This makes MLFunctions and CFGFunctions work more similarly. This is step 7/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226966975	2019-03-29 14:38:04 -07:00
Chris Lattner	5ff0001dc7	Introduce a new StmtBlockList type to hold a list of StmtBlocks. Use it in MLFunction, IfStmt, ForStmt even though they currently only contain exactly one block in that list. This is step 6/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226960278	2019-03-29 14:37:49 -07:00
Feng Liu	63068da4d9	Support NameLoc and CallSiteLoc for mlir::Location The NameLoc can be used to represent a variable, node or method. The CallSiteLoc has two fields, one represents the concrete location and another one represents the caller's location. Multiple CallSiteLocs can be chained as a call stack. For example, the following call stack ``` AAA at file1:1 at file2:135 at file3:34 ``` can be formed by call0: ``` auto name = NameLoc::get("AAA"); auto file1 = FileLineColLoc::get("file1", 1); auto file2 = FileLineColLoc::get("file2", 135); auto file3 = FileLineColLoc::get("file3", 34); auto call2 = CallSiteLoc::get(file2, file3); auto call1 = CallSiteLoc::get(file1, call2); auto call0 = CallSiteLoc::get(name, call1); ``` PiperOrigin-RevId: 226941797	2019-03-29 14:37:34 -07:00
Alex Zinenko	eb0f9f37af	SuperVectorization: fix 'isa' assertion Supervectorization uses null pointers to SSA values as a means of communicating the failure to vectorize. In operation vectorization, all operations producing the values of operation arguments must be vectorized for the given operation to be vectorized. The existing check verified if any of the value "def" statements was vectorized instead, sometimes leading to assertions inside `isa` called on a null pointer. Fix this to check that all "def" statements were vectorized. PiperOrigin-RevId: 226941552	2019-03-29 14:37:20 -07:00
Alex Zinenko	9403f80dd3	LLVM IR lowering: support SubIOp and SubFOp The binary subtraction operations were not supported by the lowering because they were not essential for the testing flow. Add support for these operations. PiperOrigin-RevId: 226941463	2019-03-29 14:37:05 -07:00
Jacques Pienaar	58d50a6325	Rename convenience methods to make type explicit. PiperOrigin-RevId: 226939383	2019-03-29 14:36:50 -07:00
Chris Lattner	d613f5ab65	Refactor MLFunction to contain a StmtBlock for its body instead of inheriting from it. This is necessary progress to squaring away the parent relationship that a StmtBlock has with its enclosing if/for/fn, and makes room for functions to have more than one block in the future. This also removes IfClause and ForStmtBody. This is step 5/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226936541	2019-03-29 14:36:35 -07:00
Chris Lattner	9a4060d3f5	Eliminate the ability to add operands to an instruction, used in a narrow case for SSA values in terminators, but easily worked around. At the same time, move the StmtOperand list in a OperationStmt to the end of its trailing objects list so we can reduce the number of operands, without affecting offsets to the other stuff in the allocation. This is important because we want OperationStmts to be consequtive, including their operands - we don't want to use an std::vector of operands like Instructions have. This is patch 4/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226865727	2019-03-29 14:36:20 -07:00
Chris Lattner	eadaa1101c	Implement StmtBlocks support for arguments and pred/succ iteration. This isn't tested yet, but will when stuff starts switching over to it. This is part 3/n of merging CFGFunctions and MLFunctions. PiperOrigin-RevId: 226794787	2019-03-29 14:36:05 -07:00
Chris Lattner	87ce4cc501	Per review on the previous CL, drop MLFuncBuilder::createOperation, changing clients to use OperationState instead. This makes MLFuncBuilder more similiar to CFGFuncBuilder. This whole area will get tidied up more when cfg and ml worlds get unified. This patch is just gardening, NFC. PiperOrigin-RevId: 226701959	2019-03-29 14:35:49 -07:00
Chris Lattner	49315c6f6b	Give StmtBlocks a use-def list, and give OperationStmt's the ability to have optional successor operands when they are terminator operations. This isn't used yet, but is part 2/n towards merging BasicBlock into StmtBlock and Instruction into OperationStmt. PiperOrigin-RevId: 226684636	2019-03-29 14:35:34 -07:00
Chris Lattner	1301f907a1	Refactor ForStmt: having it contain a StmtBlock instead of subclassing StmtBlock. This is more consistent with IfStmt and also conceptually makes more sense - a forstmt "isn't" its body, it contains its body. This is step 1/N towards merging BasicBlock and StmtBlock. This is required because in the new regime StmtBlock will have a use list (just like BasicBlock does) of operands, and ForStmt already has a use list for its induction variable. This is a mechanical patch, NFC. PiperOrigin-RevId: 226684158	2019-03-29 14:35:19 -07:00
MLIR Team	4eef795a1d	Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice. Updates LoopFusion pass to take these parameters as command line flags for experimentation. PiperOrigin-RevId: 226514297	2019-03-29 14:35:03 -07:00
River Riddle	1e0ebabf66	Unify type uniquing and construction. This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types. To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator. * TypeUniquer is now responsible for all construction and uniquing of types. * TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext. This cl also standardizes what a derived type storage class needs to provide: - Define a type alias, KeyTy, to a type that uniquely identifies the instance of the type within its kind. * The key type must be constructible from the values passed into the detail::TypeUniquer::get call after the type kind. * The key type must have a llvm::DenseMapInfo specialization for hashing. - Provide a method, 'KeyTy getKey() const', to construct the key type from an existing storage instance. - Provide a construction method: 'DerivedStorage *construct(TypeStorageAllocator &, ...)' that builds a unique instance of the derived storage. The arguments after the TypeStorageAllocator must correspond with the values passed into the detail::TypeUniquer::get call after the type kind. PiperOrigin-RevId: 226507184	2019-03-29 14:34:46 -07:00
MLIR Team	bcb7c4742d	Do proper indexing for local variables when building access function equality constraints (working on test cases). PiperOrigin-RevId: 226399089	2019-03-29 14:34:02 -07:00
MLIR Team	4f5ef1619e	Pass loop depth 1 to memref dependence check when constructing dependence constraints used to calculate computation slice for loop fusion. This done so that the dominance check between ancestors of op statements from src/dst memref accesses will be run. PiperOrigin-RevId: 226350443	2019-03-29 14:33:46 -07:00
MLIR Team	2570fb5bb7	Address some issues from memref dependence check bug (b/121216762), adds tests cases. PiperOrigin-RevId: 226277453	2019-03-29 14:33:17 -07:00
MLIR Team	6892ffb896	Improve loop fusion algorithm by using a memref dependence graph. Fixed TODO for reduction fusion unit test. PiperOrigin-RevId: 226277226	2019-03-29 14:33:02 -07:00
Uday Bondhugula	14d2618f63	Simplify memref-dependence-check's meta data structures / drop duplication and reuse existing ones. - drop IterationDomainContext, redundant since FlatAffineConstraints has MLValue information associated with its dimensions. - refactor to use existing support - leads to a reduction in LOC - as a result of these changes, non-constant loop bounds get naturally supported for dep analysis. - update test cases to include a couple with non-constant loop bounds - rename addBoundsFromForStmt -> addForStmtDomain - complete TODO for getLoopIVs (handle 'if' statements) PiperOrigin-RevId: 226082008	2019-03-29 14:32:46 -07:00
Uday Bondhugula	1d72f2e47e	Update / complete a TODO for addBoundsForForStmt - when adding constraints from a 'for' stmt into FlatAffineConstraints, correctly add bound operands of the 'for' stmt as a dimensional identifier or a symbolic identifier depending on whether the bound operand is a valid MLFunction symbol - update test case to exercise this. PiperOrigin-RevId: 225988511	2019-03-29 14:32:31 -07:00
Alex Zinenko	49c81ebcb0	Densify storage for f16, f32 and support f16 semantics in FloatAttrs Existing implementation always uses 64 bits to store floating point values in DenseElementsAttr. This was due to FloatAttrs always a `double` for storage independently of the actual type. Recent commits added support for FloatAttrs with the proper f32 type and floating semantics and changed the bitwidth reporting on FloatType. Use the existing infrastructure for densely storing 16 and 32-bit values in DenseElementsAttr storage to store f16 and f32 values. Move floating semantics definition to the FloatType level. Properly support f16 / IEEEhalf semantics at the FloatAttr level and in the builder. Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics because APFloat does not have first-class support for bf16 types. PiperOrigin-RevId: 225981289	2019-03-29 14:32:14 -07:00
Uday Bondhugula	20531932f4	Refactor/update memref-dep-check's addMemRefAccessConstraints and addDomainConstraints; add support for mod/div for dependence testing. - add support for mod/div expressions in dependence analysis - refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead of getFlattenedAffineExpr); update addDomainConstraints. - rename AffineExprFlattener::cst -> localVarCst PiperOrigin-RevId: 225933306	2019-03-29 14:31:58 -07:00
Alex Zinenko	4dbd94b543	Refactor LowerVectorTransfersPass using pattern rewriters This introduces a generic lowering pass for ML functions. The pass is parameterized by template arguments defining individual pattern rewriters. Concrete lowering passes define individual pattern rewriters and inherit from the generic class that takes care of allocating rewriters, traversing ML functions and performing the actual rewrite. While this is similar to the greedy pattern rewriter available in Transform/Utils, it requires adjustments due to the ML/CFG duality. In particular, ML function rewriters must be able to create statements, not only operations, and need access to an MLFuncBuilder. When we move to using the unified function type, the ML-specific rewriting will become unnecessary. Use LowerVectorTransfers as a testbed for the generic pass. PiperOrigin-RevId: 225887424	2019-03-29 14:31:43 -07:00
Alex Zinenko	699a2f5373	LLVM IR lowering: support vector_type_cast Introduce support for lowering vector_type_cast to LLVM IR. It consists in creating a new MemRef descriptor with the base pointer with the type that corresponds to the lowered element type of the target memref. Since `vector_type_cast` does not support dynamic shapes in the target type, no dynamic size conversion is necessary. This commit goes in the opposite direction of what is expected of LLVM IR lowering: it should not be aware of all the other dialects. Instead, we should have separate definitions for conversions in a global lowering framework. However, this requires LLVM dialect to be implemented, which is currently blocked by the absence of user-defined types. Implement the lowering anyway to unblock end-to-end vectorization experiments. PiperOrigin-RevId: 225887368	2019-03-29 14:31:28 -07:00
Alex Zinenko	51c8a095a3	Materialize vector_type_cast operation in the SuperVector dialect This operation is produced and used by the super-vectorization passes and has been emitted as an abstract unregistered operation until now. For end-to-end testing purposes, it has to be eventually lowered to LLVM IR. Matching abstract operation by name goes into the opposite direction of the generic lowering approach that is expected to be used for LLVM IR lowering in the future. Register vector_type_cast operation as a part of the SuperVector dialect. Arguably, this operation is a special case of the `view` operation from the Standard dialect. The semantics of `view` is not fully specified at this point so it is safer to rely on a custom operation. Additionally, using a custom operation may help to achieve clear dialect separation. PiperOrigin-RevId: 225887305	2019-03-29 14:31:13 -07:00
Uday Bondhugula	19b2ce23a5	Refactor / eliminate duplicate code in memref-dep-check / getIterationDomainContext PiperOrigin-RevId: 225857762	2019-03-29 14:30:58 -07:00
Alex Zinenko	df9bd857b1	Type system: replace Type::getBitWidth with getIntOrFloatBitWidth As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does not make sense for all of them. Even with the current type system, the bit width is not defined (and causes the method in question to abort) for all TensorFlow types. This commit restricts the bit width definition to primitive standard types that have a number of bits appearing verbatim in their type, i.e., integers and floats. As a side effect, it delegates the decision on the bit width of the `index` to the backends. Existing backends currently hardcode it to 64 bits. The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that only applies to integers and floats. The call sites are updated to use the new method, where applicable, or rewritten so as not rely on it. Incidentally, this fixes a utility method that did not account for memrefs being allowed to have vectors as element types in the size computation. As an observation, several places in the code use Type in places where a more specific type could be used instead. Some of those are fixed by this commit. PiperOrigin-RevId: 225844792	2019-03-29 14:30:43 -07:00
Uday Bondhugula	4a3e4e8ea7	loop-unroll - add function callback argument for outside targets to provide unroll factors, and a cmd line argument to specify number of innermost loop unroll repetitions. - add function callback parameter for outside targets to provide unroll factors - add a cmd line parameter to repeatedly apply innermost loop unroll a certain number of times (to avoid using -loop-unroll -loop-unroll ...; instead -unroll-num-reps=2). - implement the callback for a target - update test cases / usage PiperOrigin-RevId: 225843191	2019-03-29 14:30:28 -07:00
MLIR Team	3b69230b3a	Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion. ) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality. ) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop. ) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access. ) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint. *) Adds multiple fusion unit tests. PiperOrigin-RevId: 225842944	2019-03-29 14:30:13 -07:00
Jacques Pienaar	49c4d2a630	Fix builder getFloatAttr of double to use F64 type and use fltSemantics in FloatAttr. Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type: * FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches; * FloatAttr created using double has the APFloat created to match the semantics of the type; Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute. PiperOrigin-RevId: 225821834	2019-03-29 14:29:58 -07:00
Uday Bondhugula	dced746bd1	Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile - use addBoundsForForStmt - getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the returned things aren't owned / part of the stmt on which it's being called. - other minor API cleanup PiperOrigin-RevId: 225774301	2019-03-29 14:29:28 -07:00
Uday Bondhugula	c41ee60647	'memref-bound-check': extend to store op's as well - extend memref-bound-check to store op's - make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that one doesn't need to always create a pass to use it) PiperOrigin-RevId: 225564830	2019-03-29 14:29:13 -07:00
Alex Zinenko	bc52a639f9	Extract vector_transfer_* Ops into a SuperVectorDialect. From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492	2019-03-29 14:28:58 -07:00
Uday Bondhugula	45a0f52519	Expression flattening improvement - reuse local expressions. - if a local id was already for a specific mod/div expression, just reuse it if the expression repeats (instead of adding a new one). - drastically reduces the number of local variables added during flattening for real use cases - since the same div's and mod expressions often repeat. - add getFlattenedAffineExprs for AffineMap, IntegerSet based on the above As a natural result of the above: - FlatAffineConstraints(IntegerSet) ctor now deals with integer sets that have mod and div constraints as well, and these get simplified as well from -simplify-affine-structures PiperOrigin-RevId: 225452174	2019-03-29 14:28:13 -07:00
Uday Bondhugula	8365bdc17f	FlatAffineConstraints - complete TODOs: add method to remove duplicate / trivially redundant constraints. Update projectOut to eliminate identifiers in a more efficient order. Fix b/120801118. - add method to remove duplicate / trivially redundant constraints from FlatAffineConstraints (use a hashing-based approach with DenseSet) - update projectOut to eliminate identifiers in a more efficient order (A sequence of affine_apply's like this (from a real use case) finally exposed the lack of the above trivial/low hanging simplifications). for %ii = 0 to 64 { for %jj = 0 to 9 { %a0 = affine_apply (d0, d1) -> (d0 * (9 * 1024) + d1 * 128) (%ii, %jj) %a1 = affine_apply (d0) -> (d0 floordiv (2 * 3 * 3 * 128 * 128), (d0 mod 294912) floordiv (3 * 3 * 128 * 128), (((d0 mod 294912) mod 147456) floordiv 1152) floordiv 8, (((d0 mod 294912) mod 147456) mod 1152) floordiv 384, ((((d0 mod 294912) mod 147456) mod 1152) mod 384) floordiv 128, (((((d0 mod 294912) mod 147456) mod 1152) mod 384) mod 128) floordiv 128) (%a0) %v0 = load %in[%a1tensorflow/mlir#0, %a1tensorflow/mlir#1, %a1tensorflow/mlir#3, %a1tensorflow/mlir#4, %a1tensorflow/mlir#2, %a1tensorflow/mlir#5] : memref<2x2x3x3x16x1xi32> } } - update FlatAffineConstraints::print to print number of constraints. PiperOrigin-RevId: 225397480	2019-03-29 14:27:29 -07:00
River Riddle	5c4f1fdd42	Check if the operation is already in the worklist before adding it. PiperOrigin-RevId: 225379496	2019-03-29 14:27:14 -07:00
Alex Zinenko	359835eb27	LLVM IR lowering: support 1D vector operations Introduce initial support for 1D vector operations. LLVM does not support higher-dimensional vectors so the caller must make sure they don't appear in the input MLIR. Handle the presence of higher-dimensional vectors by failing gracefully. Introduce the type conversion for 1D vector types and hook it up with the rest of the type convresion system. Support "splat" constants for vector types. As a side effect, this refactors constant operation emission by separating out scalar integer constants into a separate case and by extracting out the helper function for scalar float construction. Existing binary operations apply to vectors transparently. PiperOrigin-RevId: 225172349	2019-03-29 14:26:37 -07:00
Alex Zinenko	97d2f3cd3d	ConvertToCFG: use affine_apply to implement loop steps Originally, loop steps were implemented using `addi` and `constant` operations because `affine_apply` was not handled in the first implementation. The support for `affine_apply` has been added, use it to implement the update of the loop induction variable. This is more consistent with the lower and upper bounds of the loop that are also implemented as `affine_apply`, removes the dependence of the converted function on the StandardOps dialect and makes it clear from the CFG function that all operations on the loop induction variable are purely affine. PiperOrigin-RevId: 225165337	2019-03-29 14:26:22 -07:00
Uday Bondhugula	c86c414765	Remove dead code from FlatAffineConstraints - getDimensionBounds() was added initially for quick experimentation - no longer used (getConstantBoundOnDimSize is the more powerful/complete replacement). - FlatAffineConstraints::getConstantLower/UpperBound are incomplete, functionality/naming-wise misleading, and not used currently. Removing these; complete/fixed version will be added in an upcoming CL. PiperOrigin-RevId: 225075061	2019-03-29 14:25:52 -07:00
Alex Zinenko	63261aa9a8	Disallow index types as elements of vector, memref and tensor types An extensive discussion demonstrated that it is difficult to support `index` types as elements of compound (vector, memref, tensor) types. In particular, their size is unknown until the target-specific lowering takes place. MLIR may need to store constants of the fixed-shape compound types (e.g., vector<4 x index>) internally and must know the size of the element type and data layout constraints. The same information is necessary for target-specific lowering and translation to reliably support compound types with `index` elements, but MLIR does not have a dedicated target description mechanism yet. The uses cases for compound types with `index` elements, should they appear, can be handled via an `index_cast` operation that converts between `index` and fixed-size integer types at the SSA value level instead of the type level. PiperOrigin-RevId: 225064373	2019-03-29 14:25:22 -07:00
Uday Bondhugula	b9f53dc0bd	Update/Fix LoopUtils::stmtBodySkew to handle loop step. - loop step wasn't handled and there wasn't a TODO or an assertion; fix this. - rename 'delay' to shift for consistency/readability. - other readability changes. - remove duplicate attribute print for DmaStartOp; fix misplaced attribute print for DmaWaitOp - add build method for AddFOp (unrelated to this CL, but add it anyway) PiperOrigin-RevId: 224892958	2019-03-29 14:25:07 -07:00
Uday Bondhugula	d59a95a05c	Fix missing check for dependent DMAs in pipeline-data-transfer - adding a conservative check for now (TODO: use the dependence analysis pass once the latter is extended to deal with DMA ops). resolve an existing bug on a test case. - update test cases PiperOrigin-RevId: 224869526	2019-03-29 14:24:53 -07:00
Uday Bondhugula	6757fb151d	FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD(). - add method normalizeConstraintsByGCD - call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of projectOut. - remove call to GCDTightenInequalities() from getMemRefRegion - change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each time an identifier is eliminated (to detect emptiness early). - make FourierMotzkinEliminate, gaussianEliminateId(s), GCDTightenInequalities() private - improve / update stale comments PiperOrigin-RevId: 224866741	2019-03-29 14:24:37 -07:00
Uday Bondhugula	2ef57806ba	Update/fix -pipeline-data-transfer; fix b/120770946 - fix replaceAllMemRefUsesWith call to replace only inside loop body. - handle the case where DMA buffers are dynamic; extend doubleBuffer() method to handle dynamically shaped DMA buffers (pass the right operands to AllocOp) - place alloc's for DMA buffers at the depth at which pipelining is being done (instead of at top-level) - add more test cases PiperOrigin-RevId: 224852231	2019-03-29 14:24:22 -07:00
Alex Zinenko	073c3ad997	Properly namespace createLowerAffineApply This was missing from the original commit. The implementation of createLowerAffineApply was defined in the default namespace but declared in the `mlir` namespace, which could lead to linking errors when it was used. Put the definition in `mlir` namespace. PiperOrigin-RevId: 224830894	2019-03-29 14:24:04 -07:00
Nicolas Vasilache	c28aeef901	[MLIR] Drop bug-prone global map indexed by MLFunction* PiperOrigin-RevId: 224610805	2019-03-29 14:23:49 -07:00
Uday Bondhugula	2d6478fa92	Extend loop tiling utility to handle non-constant loop bounds and bounds that are a max/min of several expressions. - Extend loop tiling to handle non-constant loop bounds and bounds that are a max/min of several expressions, i.e., bounds using multi-result affine maps - also fix b/120630124 as a result (the IR was in an invalid state when tiled loop generation failed; SSA uses were created that weren't plugged into the IR). PiperOrigin-RevId: 224604460	2019-03-29 14:23:34 -07:00
Uday Bondhugula	dfc752e42b	Generate strided DMAs from -dma-generate - generate DMAs correctly now using strided DMAs where needed - add support for multi-level/nested strides; op still supports one level of stride for now. Other things - add test case for symbolic lower/upper bound; cases where the DMA buffer size can't be bounded by a known constant - add test case for dynamic shapes where the DMA buffers are however bounded by constants - refactor some of the '-dma-generate' code PiperOrigin-RevId: 224584529	2019-03-29 14:23:19 -07:00
Nicolas Vasilache	d9b6420fc9	[MLIR] Add LowerVectorTransfersPass This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp to a simple loop nest via local buffer allocations. This is an MLIR->MLIR lowering based on builders. A few TODOs are left to address in particular: 1. invert the permutation map so the accesses to the remote memref are coalesced; 2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory); 3. support broadcast / avoid copies when permutation_map is not of full column rank 4. add a proper "element_cast" op One notable limitation is this does not plan on supporting boundary conditions. It should be significantly easier to use pre-baked MLIR functions to handle such paddings. This is left for future consideration. Therefore the current CL only works properly for full-tile cases atm. This CL also adds 2 simple tests: ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N step 4 { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 step 4 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5) for %i6 = 0 to 3 { %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32> store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32> dealloc %1 : memref<5x4x3xf32> ``` and ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32> ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { for %i6 = 0 to 3 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32> store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32> %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>> dealloc %1 : memref<5x4x3xf32> ``` PiperOrigin-RevId: 224552717	2019-03-29 14:23:05 -07:00
Nicolas Vasilache	879be718a0	[MLIR] Fix the name of the MaterializeVectorPass PiperOrigin-RevId: 224536381	2019-03-29 14:22:49 -07:00
Nicolas Vasilache	db1b9f7381	[MLIR] Add composeWithUnboundedMap This CL adds a finer grain composition function between AffineExpr and an unbounded map. This will be used in the next CL. Also cleans up some comments remaining from a previous CL. PiperOrigin-RevId: 224536314	2019-03-29 14:22:34 -07:00
Smit Hinsu	adca59e4f7	Return bool from all emitError methods similar to Operation::emitOpError This simplifies call-sites returning true after emitting an error. After the conversion, dropped braces around single statement blocks as that seems more common. Also, switched to emitError method instead of emitting Error kind using the emitDiagnostic method. TESTED with existing unit tests PiperOrigin-RevId: 224527868	2019-03-29 14:22:06 -07:00
Nicolas Vasilache	13bc77045e	[MLIR] Drop assert for NYI in Vectorize.cpp This CLs adds proper error emission, removes NYI assertions and documents assumptions that are required in the relevant functions. PiperOrigin-RevId: 224377207	2019-03-29 14:21:37 -07:00
Nicolas Vasilache	2408f0eba5	[MLIR] Drop assert for NYI in VectorAnalysis This CLs adds proper error emission, removes NYI assertions and documents assumptions that are required in the relevant functions. PiperOrigin-RevId: 224377143	2019-03-29 14:21:22 -07:00
Nicolas Vasilache	48d22e83e3	[MLIR] Drop unnecessary mention of NYI. This CL also documents the `substExpr` helper function assumptions. The assumptions are properly propagated up already. PiperOrigin-RevId: 224377072	2019-03-29 14:21:07 -07:00
Nicolas Vasilache	a019379cdb	[MLIR] Remove NYI assertions in LoopAnalysis.cpp This CL also cleans up some loose ends and returns conservative answers while emitting errors in the NYI cases. PiperOrigin-RevId: 224377004	2019-03-29 14:20:52 -07:00
Nicolas Vasilache	5b610630b2	[MLIR] Error handling in MaterializeVectors This removes assertions as a means to capture NYI behavior and propagates errors up. PiperOrigin-RevId: 224376935	2019-03-29 14:20:37 -07:00
Nicolas Vasilache	4adc169bd0	[MLIR] Add AffineMap composition and use it in Materialization This CL adds the following free functions: ``` /// Returns the AffineExpr e o m. AffineExpr compose(AffineExpr e, AffineMap m); /// Returns the AffineExpr f o g. AffineMap compose(AffineMap f, AffineMap g); ``` This addresses the issue that AffineMap composition is only available at a distance via AffineValueMap and is thus unusable on Attributes. This CL thus implements AffineMap composition in a more modular and composable way. This CL does not claim that it can be a good replacement for the implementation in AffineValueMap, in particular it does not support bounded maps atm. Standalone tests are added that replicate some of the logic of the AffineMap composition pass. Lastly, affine map composition is used properly inside MaterializeVectors and a standalone test is added that requires permutation_map composition with a projection map. PiperOrigin-RevId: 224376870	2019-03-29 14:20:22 -07:00
Nicolas Vasilache	df0a25efee	[MLIR] Add support for permutation_map This CL hooks up and uses permutation_map in vector_transfer ops. In particular, when going into the nuts and bolts of the implementation, it became clear that cases arose that required supporting broadcast semantics. Broadcast semantics are thus added to the general permutation_map. The verify methods and tests are updated accordingly. Examples of interest include. Example 1: The following MLIR snippet: ```mlir for %i3 = 0 to %M { for %i4 = 0 to %N { for %i5 = 0 to %P { %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32> }}} ``` may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into: ```mlir for %i3 = 0 to %0 step 32 { for %i4 = 0 to %1 { for %i5 = 0 to %2 step 256 { %4 = vector_transfer_read %arg0, %i4, %i5, %i3 {permutation_map: (d0, d1, d2) -> (d2, d1)} : (memref<?x?x?xf32>, index, index) -> vector<32x256xf32> }}} ```` Meaning that vector_transfer_read will be responsible for reading the 2-D slice: `%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will require a transposition when vector_transfer_read is further lowered. Example 2: The following MLIR snippet: ```mlir %cst0 = constant 0 : index for %i0 = 0 to %M { %a0 = load %A[%cst0, %cst0] : memref<?x?xf32> } ``` may vectorize with {permutation_map: (d0) -> (0)} into: ```mlir for %i0 = 0 to %0 step 128 { %3 = vector_transfer_read %arg0, %c0_0, %c0_0 {permutation_map: (d0, d1) -> (0)} : (memref<?x?xf32>, index, index) -> vector<128xf32> } ```` Meaning that vector_transfer_read will be responsible of reading the 0-D slice `%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector broadcast when vector_transfer_read is further lowered. Additionally, some minor cleanups and refactorings are performed. One notable thing missing here is the composition with a projection map during materialization. This is because I could not find an AffineMap composition that operates on AffineMap directly: everything related to composition seems to require going through SSAValue and only operates on AffinMap at a distance via AffineValueMap. I have raised this concern a bunch of times already, the followup CL will actually do something about it. In the meantime, the projection is hacked at a minimum to pass verification and materialiation tests are temporarily incorrect. PiperOrigin-RevId: 224376828	2019-03-29 14:20:07 -07:00
Alex Zinenko	7c89a225cf	ConvertToCFG: support min/max in loop bounds. The recently introduced `select` operation enables ConvertToCFG to support min(max) in loop bounds. Individual min(max) is implemented as `cmpi "lt"`(`cmpi "gt"`) followed by a `select` between the compared values. Multiple results of an `affine_apply` operation extracted from the loop bounds are reduced using min(max) in a sequential manner. While this may decrease the potential for instruction-level parallelism, it is easier to recognize for the following passes, in particular for the vectorizer. PiperOrigin-RevId: 224376233	2019-03-29 14:19:52 -07:00
Alex Zinenko	513d6d896c	OpPointer: replace conversion operator to Operation* to OpType. The implementation of OpPointer<OpType> provides an implicit conversion to Operation , but not to the underlying OpType . This has led to awkward-looking code when an OpPointer needs to be passed to a function accepting an OpType . For example, if (auto someOp = genericOp.dyn_cast<OpType>()) someFunction(&someOp); where "&" makes it harder to read. Arguably, one does not want to spell out OpPointer<OpType> in the line with dyn_cast. More generally, OpPointer is now being used as an owning pointer to OpType rather than to operation. Replace the implicit conversion to Operation* with the conversion to OpType* taking into account const-ness of the type. An Operation* can be obtained from an OpType with a simple call. Since an instance of OpPointer owns the OpType value, the pointer to it is never null. However, the OpType value may not be associated with any Operation*. In this case, return nullptr when conversion is attempted to maintain consistency with the existing null checks. PiperOrigin-RevId: 224368103	2019-03-29 14:19:37 -07:00
Uday Bondhugula	73fc0223e4	Fix cases where unsigned / signed arithmetic was being mixed (following up on cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds. - while on this, sweep through and fix potential repeated evaluation of expressions in loop upper bounds PiperOrigin-RevId: 224268918	2019-03-29 14:19:22 -07:00
MLIR Team	a53ed1b767	Fix bug in GCD calculation when flattening AffineExpr (adds unit test which triggers the bug and tests the fix). PiperOrigin-RevId: 224246657	2019-03-29 14:19:07 -07:00
Uday Bondhugula	9f77faae87	Strided DMA support for DmaStartOp - add optional stride arguments for DmaStartOp - add DmaStartOp::verify(), and missing test cases for DMA op's in test/IR/memory-ops.mlir. PiperOrigin-RevId: 224232466	2019-03-29 14:18:37 -07:00
Uday Bondhugula	a92130880e	Complete multiple unhandled cases for DmaGeneration / getMemRefRegion; update/improve/clean up API. - update FlatAffineConstraints::getConstBoundDifference; return constant differences between symbolic affine expressions, look at equalities as well. - fix buffer size computation when generating DMAs symbolic in outer loops, correctly handle symbols at various places (affine access maps, loop bounds, loop IVs outer to the depth at which DMA generation is being done) - bug fixes / complete some TODOs for getMemRefRegion - refactor common code b/w memref dependence check and getMemRefRegion - FlatAffineConstraints API update; added methods employ trivial checks / detection - sufficient to handle hyper-rectangular cases in a precise way while being fast / low complexity. Hyper-rectangular cases fall out as trivial cases for these methods while other cases still do not cause failure (either return conservative or return failure that is handled by the caller). PiperOrigin-RevId: 224229879	2019-03-29 14:18:22 -07:00
Lei Zhang	b572322859	Add isIntOrIndex() and isIntOrIndexOrFloat() into Type The checks for `isa<IndexType>() \|\| isa<IntegerType>()` and `isa<IndexType>() \|\| isa<IntegerType>() \|\| isa<FloatType>()` are frequently used, so it's useful to have some helper methods for them. PiperOrigin-RevId: 224133596	2019-03-29 14:17:38 -07:00
Uday Bondhugula	f9af62998b	Remove duplicate FlatAffineConstraints::removeId - refactor to use removeColumnRange - remove functionally duplicate code in removeId. - rename removeColumnRange -> removeIdRange - restrict valid input to just the identifier columns (not the constant term column). PiperOrigin-RevId: 224054064	2019-03-29 14:17:24 -07:00
Uday Bondhugula	7c2347266d	FlatAffineConstraints::removeId() fix. This is an obvious bug, but none of the test cases exposed it since numIds was correctly updated, and the dimensional identifiers were always eliminated before the symbolic identifiers in all cases that removeId was getting called from. However, other work in progress exercises the other scenarios and exposes this bug. Add an hasConsistentState() private method to move common assertion checks, and call it from several base methods. Make hasInvalidConstraint() a private method as well (from a file static one). PiperOrigin-RevId: 224032721	2019-03-29 14:17:10 -07:00
MLIR Team	753109547d	During forward substitution, merge symbols from input AffineMap with the symbol list of the target AffineMap. Symbols can be used as dim identifiers and symbolic identifiers, and so we must preserve the symbolic identifies from the input AffineMap during forward substitution, even if that same identifier is used as a dimension identifier in the target AffineMap. Test case added. Going forward, we may want to explore solutions where we do not maintain this split between dimensions and symbols, and instead verify the validity of each use of each AffineMap operand AffineMap in the context where the AffineMap operand usage is required to be a symbol: in the denominator of floordiv/ceildiv/mod for semi-affine maps, and in instructions that can capture symbols (i.e. alloc) PiperOrigin-RevId: 224017364	2019-03-29 14:16:40 -07:00
Alex Zinenko	7868abd9d8	ConvertToCFG: convert "if" statements. The condition of the "if" statement is an integer set, defined as a conjunction of affine constraints. An affine constraints consists of an affine expression and a flag indicating whether the expression is strictly equal to zero or is also allowed to be greater than zero. Affine maps, accepted by `affine_apply` are also formed from affine expressions. Leverage this fact to implement the checking of "if" conditions. Each affine expression from the integer set is converted into an affine map. This map is applied to the arguments of the "if" statement. The result of the application is compared with zero given the equality flag to obtain the final boolean value. The conjunction of conditions is tested sequentially with short-circuit branching to the "else" branch if any of the condition evaluates to false. Create an SESE region for the if statement (including its "then" and optional "else" statement blocks) and append it to the end of the current region. The conditional region consists of a sequence of condition-checking blocks that implement the short-circuit scheme, followed by a "then" SESE region and an "else" SESE region, and the continuation block that post-dominates all blocks of the "if" statement. The flow of blocks that correspond to the "then" and "else" clauses are constructed recursively, enabling easy nesting of "if" statements and if-then-else-if chains. Note that MLIR semantics does not require nor prohibit short-circuit evaluation. Since affine expressions do not have side effects, there is no observable difference in the program behavior. We may trade off extra operations for operation-level parallelism opportunity by first performing all `affine_apply` and comparison operations independently, and then performing a tree pattern reduction of the resulting boolean values with the `muli i1` operations (in absence of the dedicated bit operations). The pros and cons are not clear, and since MLIR does not include parallel semantics, we prefer to minimize the number of sequentially executed operations. PiperOrigin-RevId: 223970248	2019-03-29 14:16:10 -07:00
Alex Zinenko	dee51d0961	LLVM IR Lowering: support multi-value returns. Unlike MLIR, LLVM IR does not support functions that return multiple values. Simulate this by packing values into the LLVM structure type in the same order as they appear in the MLIR return. If the function returns only a single value, return it directly without packing. PiperOrigin-RevId: 223964886	2019-03-29 14:15:56 -07:00
Nicolas Vasilache	b39d1f0bdb	[MLIR] Add VectorTransferOps This CL implements and uses VectorTransferOps in lieu of the former custom call op. Tests are updated accordingly. VectorTransferOps come in 2 flavors: VectorTransferReadOp and VectorTransferWriteOp. VectorTransferOps can be thought of as a backend-independent pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before it can be lowered to backend-dependent IR. Note that the current implementation does not yet support a real permutation map. Proper support will come in a followup CL. VectorTransferReadOp ==================== VectorTransferReadOp performs a blocking read from a scalar memref location into a super-vector of the same elemental type. This operation is called 'read' by opposition to 'load' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer read has semantics similar to a vector load, with additional support for: 1. an optional value of the elemental type of the MemRef. This value supports non-effecting padding and is inserted in places where the vector read exceeds the MemRef bounds. If the value is not specified, the access is statically guaranteed to be within bounds; 2. an attribute of type AffineMap to specify a slice of the original MemRef access and its transposition into the super-vector shape. The permutation_map is an unbounded AffineMap that must represent a permutation from the MemRef dim space projected onto the vector dim space. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32> ... %val = `ssa-value` : f32 // let %i, %j, %k, %l be ssa-values of type index %v0 = vector_transfer_read %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index) -> vector<16x32x64xf32> %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (memref<?x?x?x?xf32>, index, index, index, index, f32) -> vector<16x32x64xf32> ``` VectorTransferWriteOp ===================== VectorTransferWriteOp performs a blocking write from a super-vector to a scalar memref of the same elemental type. This operation is called 'write' by opposition to 'store' because the super-vector granularity is generally not representable with a single hardware register. As a consequence, memory transfers will generally be required when lowering VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile only code. A vector transfer write has semantics similar to a vector store, with additional support for handling out-of-bounds situations. Example: ```mlir %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>. %val = `ssa-value` : vector<16x32x64xf32> // let %i, %j, %k, %l be ssa-values of type index vector_transfer_write %val, %src, %i, %j, %k, %l {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} : (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index) ``` PiperOrigin-RevId: 223873234	2019-03-29 14:15:25 -07:00
Jacques Pienaar	bb3ffc1c22	Fix two more getHashValues. These were still returning the hash of the pointers resulting in the two getHashValues being different. PiperOrigin-RevId: 223862743	2019-03-29 14:15:11 -07:00
Uday Bondhugula	89c41fdca1	FlatAffineConstraints::composeMap: return failure instead of asserting on semi-affine maps FlatAffineConstraints::composeMap: should return false instead of asserting on a semi-affine map. Make getMemRefRegion just propagate false when encountering semi-affine maps (instead of crashing!) PiperOrigin-RevId: 223828743	2019-03-29 14:14:56 -07:00
Uday Bondhugula	5f76245cfe	Minor fix for replaceAllMemRefUsesWith. The check for whether the memref was used in a non-derefencing context had to be done inside, i.e., only for the op stmt's that the replacement was specified to be performed on (by the domStmtFilter arg if provided). As such, it is completely fine for example for a function to return a memref while the replacement is being performed only a specific loop's body (as in the case of DMA generation). PiperOrigin-RevId: 223827753	2019-03-29 14:14:43 -07:00
River Riddle	7669a259c4	Add a simple common sub expression elimination pass. The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches. PiperOrigin-RevId: 223811328	2019-03-29 14:14:28 -07:00
Jacques Pienaar	3277f94bf4	Update getHashValue for ptr values stored in a DenseMap/Set to use getHasValue of KeyTy. Ensures both hash values returned are the same. Tested by triggering resize of map/set and verifying failure before change. PiperOrigin-RevId: 223651443	2019-03-29 14:13:58 -07:00
Jacques Pienaar	45e3139bc8	RankedTensorType: Use getHashValue(KeyTy) when calling getHashValue(RankedTensorTypeStorage*). PiperOrigin-RevId: 223649958	2019-03-29 14:13:44 -07:00
Jacques Pienaar	21ed46abb8	Avoid failing when attempting to print null Attribute. This avoids segfaulting when dumping during debugging of failures. PiperOrigin-RevId: 223449494	2019-03-29 14:13:14 -07:00
Uday Bondhugula	a619b5c295	Debug output / logging memref sizes in DMA generation + related changes - Add method to get a memref's size in bytes - clean up a loop tiling pass helper (NFC) PiperOrigin-RevId: 223422077	2019-03-29 14:12:56 -07:00
Chris Lattner	3f2530cdf5	Split "rewrite" functionality out of Pattern into a new RewritePattern derived class. This change is NFC, but allows for new kinds of patterns, specifically LegalizationPatterns which will be allowed to change the types of things they rewrite. PiperOrigin-RevId: 223243783	2019-03-29 14:12:07 -07:00
Lei Zhang	1f5330ac90	Verify CmpIOp's result type to be bool-like This CL added two new traits, SameOperandsAndResultShape and ResultsAreBoolLike, and changed CmpIOp to embody these two traits. As a consequence, CmpIOp's result type now is verified to be bool-like. PiperOrigin-RevId: 223208438	2019-03-29 14:11:53 -07:00
Alex Zinenko	a3fb6d0da3	StandardOps: introduce 'select'. The semantics of 'select' is conventional: return the second operand if the first operand is true (1 : i1) and the third operand otherwise. It is applicable to vectors and tensors element-wise, similarly to LLVM instruction. This operation is necessary to implement min/max to lower 'for' loops with complex bounds to CFG functions and to support ternary operations in ML functions. It is preferred to first-class min/max because of its simplicity, e.g. it is not concered with signedness. PiperOrigin-RevId: 223160860	2019-03-29 14:11:25 -07:00
Alex Zinenko	e7f43c8361	LLVM IR lowering: support 'dim' operation. Add support for translating 'dim' opreation on MemRefs to LLVM IR. For a static size, this operation merely defines an LLVM IR constant value that may not appear in the output IR if not used (and had not been removed before by DCE). For a dynamic size, this operation is translated into an access to the MemRef descriptor that contains the dynamic size. PiperOrigin-RevId: 223160774	2019-03-29 14:11:10 -07:00
Alex Zinenko	90d1b6b5f2	LLVM IR lowering: support simple MemRef types Introduce initial support for MemRef types, including type conversion, allocation and deallocation, read and write element-wise access, passing MemRefs to and returning from functions. Affine map compositions and non-default memory spaces are NOT YET supported. Lowered code needs to handle potentially dynamic sizes of the MemRef. To do so, it replaces a MemRef-typed value with a special MemRef descriptor that carries the data and the dynamic sizes together. A MemRef type is converted to LLVM's first-class structure type with the first element being the pointer to the data buffer with data layed out linearly, followed by as many integer-typed elements as MemRef has dynamic sizes. The type of these elements is that of MLIR index lowered to LLVM. For example, `memref<?x42x?xf32>` is converted to `{ f32, i64, i64 }` provided `index` is lowered to `i64`. While it is possible to convert MemRefs with fully static sizes to simple pointers to their elemental types, we opted for consistency and convert them to the single-element structure. This makes the conversion code simpler and the calling convention of the generated LLVM IR functions consistent. Loads from and stores to a MemRef element are lowered to a sequence of LLVM instructions that, first, computes the linearized index of the element in the data buffer using the access indices and combining the static sizes with the dynamic sizes stored in the descriptor, and then loads from or stores to the buffer element indexed by the linearized subscript. While some of the index computations may be redundant (i.e., consecutive load and store to the same location in the same scope could reuse the linearized index), we emit them for every operation. A subsequent optimization pass may eliminate them if necessary. MemRef allocation and deallocation is performed using external functions `__mlir_alloc(index) -> i8` and `__mlir_free(i8*)` that must be implemented by the caller. These functions behave similarly to `malloc` and `free`, but can be extended to support different memory spaces in future. Allocation and deallocation instructions take care of casting the pointers. Prior to calling the allocation function, the emitted code creates an SSA Value for the descriptor and uses it to store the dynamic sizes of the MemRef passed to the allocation operation. It further emits instructions that compute the dynamic amount of memory to allocate in bytes. Finally, the allocation stores the result of calling the `__mlir_alloc` in the MemRef descriptor. Deallocation extracts the pointer to the allocated memory from the descriptor and calls `__mlir_free` on it. The descriptor itself is not modified and, being stack-allocated, ceases to exist when it goes out of scope. MLIR functions that access MemRef values as arguments or return them are converted to LLVM IR functions that accept MemRef descriptors as LLVM IR structure types by value. This significantly simplifies the calling convention at the LLVM IR level and avoids handling descriptors in the dynamic memory, however is not always comaptible with LLVM IR functions emitted from C code with similar signatures. A separate LLVM pass may be introduced in the future to provide C-compatible calling conventions for LLVM IR functions generated from MLIR. PiperOrigin-RevId: 223134883	2019-03-29 14:10:55 -07:00
River Riddle	759fd1c6a3	Add support for setting the location of an IROperandOwner. PiperOrigin-RevId: 222995814	2019-03-29 14:09:43 -07:00
Chris Lattner	721a30d6a0	Tidy up the replaceOp hooks in PatternMatch, generalizing them to support any number of result ops. Among other things, this results in shorter names PiperOrigin-RevId: 222685039	2019-03-29 14:09:28 -07:00
Chris Lattner	1427d0f01b	Minimal patch to allow patterns to rewrite multi-result instructions, related to b/119877155 PiperOrigin-RevId: 222597798	2019-03-29 14:09:14 -07:00
Alex Zinenko	68e9721aa8	Rename Deaffinator to LowerAffineApply and patch it. Several things were suggested in post-submission reviews. In particular, use pointers in function interfaces instead of references (still use references internally). Clarify the behavior of the pass in presence of MLFunctions. PiperOrigin-RevId: 222556851	2019-03-29 14:08:59 -07:00
Nicolas Vasilache	63bc6d2f6a	[MLIR] Fix opt build PiperOrigin-RevId: 222491353	2019-03-29 14:08:45 -07:00
Nicolas Vasilache	a5782f0d40	[MLIR][MaterializeVectors] Add a MaterializeVector pass via unrolling. This CL adds an MLIR-MLIR pass which materializes super-vectors to hardware-dependent sized vectors. While the physical vector size is target-dependent, the pass is written in a target-independent way: the target vector size is specified as a parameter to the pass. This pass is thus a partial lowering that opens the "greybox" that is the super-vector abstraction. This first CL adds a first materilization pass iterates over vector_transfer_write operations and: 1. computes the program slice including the current vector_transfer_write; 2. computes the multi-dimensional ratio of super-vector shape to hardware vector shape; 3. for each possible multi-dimensional value within the bounds of ratio, a new slice is instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on the hardware vector type. As a simple example, given: ```mlir mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> for %i0 = 0 to %M { for %i1 = 0 to %N { %a1 = load %A[%i0, %i1] : memref<?x?xf32> %b1 = load %B[%i0, %i1] : memref<?x?xf32> %s1 = addf %a1, %b1 : f32 store %s1, %C[%i0, %i1] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` and the following options: ``` -vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8 ``` materialization emits: ```mlir #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0, d1 + 8) #map2 = (d0, d1) -> (d0, d1 + 16) #map3 = (d0, d1) -> (d0, d1 + 24) mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 32 { %3 = affine_apply #map0(%i0, %i1) %4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %5 = affine_apply #map1(%i0, %i1) %6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %7 = affine_apply #map2(%i0, %i1) %8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %9 = affine_apply #map3(%i0, %i1) %10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %11 = affine_apply #map0(%i0, %i1) %12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %13 = affine_apply #map1(%i0, %i1) %14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %15 = affine_apply #map2(%i0, %i1) %16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %17 = affine_apply #map3(%i0, %i1) %18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32> %19 = addf %4, %12 : vector<8xf32> %20 = addf %6, %14 : vector<8xf32> %21 = addf %8, %16 : vector<8xf32> %22 = addf %10, %18 : vector<8xf32> %23 = affine_apply #map0(%i0, %i1) "vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %24 = affine_apply #map1(%i0, %i1) "vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %25 = affine_apply #map2(%i0, %i1) "vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () %26 = affine_apply #map3(%i0, %i1) "vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` PiperOrigin-RevId: 222455351	2019-03-29 14:08:31 -07:00
Nicolas Vasilache	258dae5d73	[MLIR][Slicing] Apply cleanups This CL applies a few last cleanups from a previous CL that have been missed during the previous submit. PiperOrigin-RevId: 222454774	2019-03-29 14:08:17 -07:00
Nicolas Vasilache	5c16564bca	[MLIR][Slicing] Add utils for computing slices. This CL adds tooling for computing slices as an independent CL. The first consumer of this analysis will be super-vector materialization in a followup CL. In particular, this adds: 1. a getForwardStaticSlice function with documentation, example and a standalone unit test; 2. a getBackwardStaticSlice function with documentation, example and a standalone unit test; 3. a getStaticSlice function with documentation, example and a standalone unit test; 4. a topologicalSort function that is exercised through the getStaticSlice unit test. The getXXXStaticSlice functions take an additional root (resp. terminators) parameter which acts as a boundary that the transitive propagation algorithm is not allowed to cross. PiperOrigin-RevId: 222446208	2019-03-29 14:08:02 -07:00
MLIR Team	cff7789a49	Clean up parse_headers in mlir Not having self-contained headers in LLVM is a constant pain. Don't make the same mistake in mlir. The only interesting change here is moving setSuccessor to Instructions.cpp, which breaks the cycle between Instructions.h and BasicBlock.h. PiperOrigin-RevId: 222440816	2019-03-29 14:07:46 -07:00
Uday Bondhugula	2631b155a9	Fix bugs in DMA generation and FlatAffineConstraints; add more test cases. - fix bug in calculating index expressions for DMA buffers in certain cases (affected tiled loop nests); add more test cases for better coverage. - introduce an additional optional argument to replaceAllMemRefUsesWith; additional operands to the index remap AffineMap can now be supplied by the client. - FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound, ::composeMap - fix position bug. - Some clean up and more comments PiperOrigin-RevId: 222434628	2019-03-29 14:07:31 -07:00
Alex Zinenko	615c41c788	Introduce Deaffinator pass. This function pass replaces affine_apply operations in CFG functions with sequences of primitive arithmetic instructions that form the affine map. The actual replacement functionality is located in LoweringUtils as a standalone function operating on an individual affine_apply operation and inserting the result at the location of the original operation. It is expected to be useful for other, target-specific lowering passes that may start at MLFunction level that Deaffinator does not support. PiperOrigin-RevId: 222406692	2019-03-29 14:07:16 -07:00
Alex Zinenko	ac6bfa6780	Lower scalar parts of CFG functions to LLVM IR Initial restricted implementaiton of the MLIR to LLVM IR translation. Introduce a new flow into the mlir-translate tool taking an MLIR module containing CFG functions only and producing and LLVM IR module. The MLIR features supported by the translator are as follows: - primitive and function types; - integer constants; - cfg and ext functions with 0 or 1 return values; - calls to these functions; - basic block conversion translation of arguments to phi nodes; - conversion between arguments of the first basic block and function arguments; - (conditional) branches; - integer addition and comparison operations. Are NOT supported: - vector and tensor types and operations on them; - memrefs and operations on them; - allocations; - functions returning multiple values; - LLVM Module triple and data layout (index type is hardcoded to i64). Create a new MLIR library and place it under lib/Target/LLVMIR. The "Target" library group is similar to the one present in LLVM and is intended to contain all future public MLIR translation targets. The general flow of MLIR to LLVM IR convresion will include several lowering and simplification passes on the MLIR itself in order to make the translation as simple as possible. In particular, ML functions should be transformed to CFG functions by the recently introduced pass, operations on structured types will be converted to sequences of operations on primitive types, complex operations such as affine_apply will be converted into sequence of primitive operations, primitive operations themselves may eventually be converted to an LLVM dialect that uses LLVM-like operations. Introduce the first translation test so that further changes make sure the basic translation functionality is not broken. PiperOrigin-RevId: 222400112	2019-03-29 14:07:01 -07:00
Alex Zinenko	6e1a050f7e	Create the Support library. This has been a long-standing TODO in the build system. Now that we need to share the non-inlined implementation of file utilities for translators, create a separate library for support functionality. Move Support/* headers to the new library in the build system. PiperOrigin-RevId: 222398880	2019-03-29 14:06:47 -07:00
Alex Zinenko	6c5317eafa	Separate translators into "from MLIR" and "to MLIR". Translations performed by mlir-translate only have MLIR on one end. MLIR-to-MLIR conversions (including dialect changes) should be treated as passes and run by mlir-opt. Individual translations should not care about reading or writing MLIR and should work on in-memory representation of MLIR modules instead. Split the TranslateFunction interface and the translate registry into two parts: "from MLIR" and "to MLIR". Update mlir-translate to handle both registries together by wrapping translation functions into source-to-source convresions. Remove MLIR parsing and writing from individual translations and make them operate on Modules instead. This removes the need for individual translators to include tools/mlir-translate/mlir-translate.h, which can now be safely removed. Remove mlir-to-mlir translation that only existed as a registration example and use mlir-opt instead for tests. PiperOrigin-RevId: 222398707	2019-03-29 14:06:33 -07:00
Alex Zinenko	b5756fdaa1	Factor out translation registry. The mlir-translate tool is expected to discover individual translations at link time. These translations must register themselves and may need the utilities that are currently defined in mlir-translate.cpp for their entry point functions. Since mlir-translate is linking against individual translations, the translations cannot link against mlir-translate themselves. Extract out the utilities into a separate "Translation" library to avoid the potential dependency cycle. Individual translations link to that library to access TranslateRegistration. The mlir-translate tool links to individual translations and to the "Translation" library because it needs the utilities as well. The main header of the new library is located in include/mlir/Translation.h to make it easily accessible by translators. The rationale for putting it to include/mlir rather than to one of its subdirectories is that its purpose is similar to that of include/mlir/Pass.h so it makes sense to put them at the same level. PiperOrigin-RevId: 222398617	2019-03-29 14:06:19 -07:00
River Riddle	1cfe508316	Add verifier check for integer constants to check that the value can fit within the type bit width. PiperOrigin-RevId: 222335526	2019-03-29 14:05:48 -07:00
River Riddle	58cd315a68	Remove unnecessary include from StandardOps.cpp. PiperOrigin-RevId: 222316745	2019-03-29 14:05:34 -07:00
Uday Bondhugula	b6c03917ad	Remove allocations for memref's that become dead as a result of double buffering in the auto DMA overlap pass. This is done online in the pass. PiperOrigin-RevId: 222313640	2019-03-29 14:05:19 -07:00
Feng Liu	a9d3e5ee38	Adds ConstantFoldHook registry in MLIRContext This reverts the previous method which needs to create a new dialect with the constant fold hook from TensorFlow. This new method uses a function object in dialect to store the constant fold hook. Once a hook is registered to the dialect, this function object will be assigned when the dialect is added to the MLIRContext. For the operations which are not registered, a new method getRegisteredDialects is added to the MLIRContext to query the dialects which matches their op name prefixes. PiperOrigin-RevId: 222310149	2019-03-29 14:04:34 -07:00
River Riddle	5041e13c96	Add functionality for erasing terminator successor operands and basic block arguments. PiperOrigin-RevId: 222303233	2019-03-29 14:04:19 -07:00
Nicolas Vasilache	87d46aaf4b	[MLIR][Vectorize] Refactor Vectorize use-def propagation. This CL refactors a few things in Vectorize.cpp: 1. a clear distinction is made between: a. the LoadOp are the roots of vectorization and must be vectorized eagerly and propagate their value; and b. the StoreOp which are the terminals of vectorization and must be vectorized late (i.e. they do not produce values that need to be propagated). 2. the StoreOp must be vectorized late because in general it can store a value that is not reachable from the subset of loads defined in the current pattern. One trivial such case is storing a constant defined at the top-level of the MLFunction and that needs to be turned into a splat. 3. a description of the algorithm is given; 4. the implementation matches the algorithm; 5. the last example is made parametric, in practice it will fully rely on the implementation of vector_transfer_read/write which will handle boundary conditions and padding. This will happen by lowering to a lower-level abstraction either: a. directly in MLIR (whether DMA or just loops or any async tasks in the future) (whiteboxing); b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor one may want to use; c. a partial mix of a. and b. (grey-boxing) 5. minor cleanups are applied; 6. mistakenly disabled unit tests are re-enabled (oopsie). With this CL, this MLIR snippet: ``` mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> { %A = alloc (%M, %N) : memref<?x?xf32> %B = alloc (%M, %N) : memref<?x?xf32> %C = alloc (%M, %N) : memref<?x?xf32> %f1 = constant 1.0 : f32 %f2 = constant 2.0 : f32 for %i0 = 0 to %M { for %i1 = 0 to %N { // non-scoped %f1 store %f1, %A[%i0, %i1] : memref<?x?xf32> } } for %i4 = 0 to %M { for %i5 = 0 to %N { %a5 = load %A[%i4, %i5] : memref<?x?xf32> %b5 = load %B[%i4, %i5] : memref<?x?xf32> %s5 = addf %a5, %b5 : f32 // non-scoped %f1 %s6 = addf %s5, %f1 : f32 store %s6, %C[%i4, %i5] : memref<?x?xf32> } } return %C : memref<?x?xf32> } ``` vectorized with these arguments: ``` -vectorize -virtual-vector-size 256 --test-fastest-varying=0 ``` vectorization produces this standard innermost-loop vectorized code: ``` mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> { %0 = alloc(%arg0, %arg1) : memref<?x?xf32> %1 = alloc(%arg0, %arg1) : memref<?x?xf32> %2 = alloc(%arg0, %arg1) : memref<?x?xf32> %cst = constant 1.000000e+00 : f32 %cst_0 = constant 2.000000e+00 : f32 for %i0 = 0 to %arg0 { for %i1 = 0 to %arg1 step 256 { %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } for %i2 = 0 to %arg0 { for %i3 = 0 to %arg1 step 256 { %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32> %5 = addf %3, %4 : vector<256xf32> %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32> %6 = addf %5, %cst_2 : vector<256xf32> "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> () } } return %2 : memref<?x?xf32> } ``` Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now. PiperOrigin-RevId: 222280209	2019-03-29 14:03:50 -07:00
Alex Zinenko	f986d5920b	ConvertToCFG: handle loop 1D affine loop bounds. In the general case, loop bounds can be expressed as affine maps of the outer loop iterators and function arguments. Relax the check for loop bounds to be known integer constants and also accept one-dimensional affine bounds in ConvertToCFG ForStmt lowering. Emit affine_apply operations for both the upper and the lower bound. The semantics of MLFunctions guarantees that both bounds can be computed before the loop starts iterating. Constant bounds are merely a short-hand notation for zero-dimensional affine maps and get supported transparently. Multidimensional affine bounds are not yet supported because the target IR dialect lacks min/max operations necessary to implement the corresponding semantics. PiperOrigin-RevId: 222275801	2019-03-29 14:03:20 -07:00
River Riddle	85f86ca203	Add support for getting the operand number from an IROperandImpl(InstOperand, BasicBlockOperand, StmtOperand). PiperOrigin-RevId: 222274598	2019-03-29 14:03:05 -07:00
Jacques Pienaar	d0590caa90	Add op stats pass to mlir-opt. op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting. PiperOrigin-RevId: 222259727	2019-03-29 14:02:46 -07:00
River Riddle	d63ab4b47a	Add support for Operation::moveBefore(Operation *). PiperOrigin-RevId: 222252521	2019-03-29 14:02:31 -07:00

... 3 4 5 6 7 ...

804 Commits