llvm-project

Commit Graph

Author	SHA1	Message	Date
MLIR Team	4eef795a1d	Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice. Updates LoopFusion pass to take these parameters as command line flags for experimentation. PiperOrigin-RevId: 226514297	2019-03-29 14:35:03 -07:00
River Riddle	1e0ebabf66	Unify type uniquing and construction. This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types. To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator. * TypeUniquer is now responsible for all construction and uniquing of types. * TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext. This cl also standardizes what a derived type storage class needs to provide: - Define a type alias, KeyTy, to a type that uniquely identifies the instance of the type within its kind. * The key type must be constructible from the values passed into the detail::TypeUniquer::get call after the type kind. * The key type must have a llvm::DenseMapInfo specialization for hashing. - Provide a method, 'KeyTy getKey() const', to construct the key type from an existing storage instance. - Provide a construction method: 'DerivedStorage *construct(TypeStorageAllocator &, ...)' that builds a unique instance of the derived storage. The arguments after the TypeStorageAllocator must correspond with the values passed into the detail::TypeUniquer::get call after the type kind. PiperOrigin-RevId: 226507184	2019-03-29 14:34:46 -07:00
Jacques Pienaar	7e24010382	Expand rewriter gen to handle string attributes in output. * Extend to handle rewrite patterns with output attributes; - Constant attributes are defined with a value and a type; - The type of the value is mapped to the corresponding attribute type (string -> StringAttr); * Verifies the type of operands in the resultant matches the defined op's operands; PiperOrigin-RevId: 226468908	2019-03-29 14:34:31 -07:00
Jacques Pienaar	592dbc8326	Add method to retrieve a pass's ID. Add passID member to Pass and enable querying it. PiperOrigin-RevId: 226445431	2019-03-29 14:34:17 -07:00
MLIR Team	bcb7c4742d	Do proper indexing for local variables when building access function equality constraints (working on test cases). PiperOrigin-RevId: 226399089	2019-03-29 14:34:02 -07:00
MLIR Team	4f5ef1619e	Pass loop depth 1 to memref dependence check when constructing dependence constraints used to calculate computation slice for loop fusion. This done so that the dominance check between ancestors of op statements from src/dst memref accesses will be run. PiperOrigin-RevId: 226350443	2019-03-29 14:33:46 -07:00
Jacques Pienaar	df90f000a8	Change attribute to be input argument. Change operands to arguments in Op and use it for both operands and arguments. This unifies the way that operands and attributes are specified and the intended way that matching/creating ops with attributes will look. Both can now be represented using the same dag structure (and also makes the ordering more explicit). Derived attributes are not considered as part of the arguments (as they are inferred from the created op, not something needed to created it). * Generate named operand accessors; * Simplified the way of specifying Attr and use ElementAttr for TFL_Const instead. * Fix a incorrect assertion generated; The input parsing can be made more robust, I'll address that in a follow up. PiperOrigin-RevId: 226307424	2019-03-29 14:33:31 -07:00
MLIR Team	2570fb5bb7	Address some issues from memref dependence check bug (b/121216762), adds tests cases. PiperOrigin-RevId: 226277453	2019-03-29 14:33:17 -07:00
MLIR Team	6892ffb896	Improve loop fusion algorithm by using a memref dependence graph. Fixed TODO for reduction fusion unit test. PiperOrigin-RevId: 226277226	2019-03-29 14:33:02 -07:00
Uday Bondhugula	14d2618f63	Simplify memref-dependence-check's meta data structures / drop duplication and reuse existing ones. - drop IterationDomainContext, redundant since FlatAffineConstraints has MLValue information associated with its dimensions. - refactor to use existing support - leads to a reduction in LOC - as a result of these changes, non-constant loop bounds get naturally supported for dep analysis. - update test cases to include a couple with non-constant loop bounds - rename addBoundsFromForStmt -> addForStmtDomain - complete TODO for getLoopIVs (handle 'if' statements) PiperOrigin-RevId: 226082008	2019-03-29 14:32:46 -07:00
Uday Bondhugula	1d72f2e47e	Update / complete a TODO for addBoundsForForStmt - when adding constraints from a 'for' stmt into FlatAffineConstraints, correctly add bound operands of the 'for' stmt as a dimensional identifier or a symbolic identifier depending on whether the bound operand is a valid MLFunction symbol - update test case to exercise this. PiperOrigin-RevId: 225988511	2019-03-29 14:32:31 -07:00
Alex Zinenko	49c81ebcb0	Densify storage for f16, f32 and support f16 semantics in FloatAttrs Existing implementation always uses 64 bits to store floating point values in DenseElementsAttr. This was due to FloatAttrs always a `double` for storage independently of the actual type. Recent commits added support for FloatAttrs with the proper f32 type and floating semantics and changed the bitwidth reporting on FloatType. Use the existing infrastructure for densely storing 16 and 32-bit values in DenseElementsAttr storage to store f16 and f32 values. Move floating semantics definition to the FloatType level. Properly support f16 / IEEEhalf semantics at the FloatAttr level and in the builder. Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics because APFloat does not have first-class support for bf16 types. PiperOrigin-RevId: 225981289	2019-03-29 14:32:14 -07:00
Uday Bondhugula	20531932f4	Refactor/update memref-dep-check's addMemRefAccessConstraints and addDomainConstraints; add support for mod/div for dependence testing. - add support for mod/div expressions in dependence analysis - refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead of getFlattenedAffineExpr); update addDomainConstraints. - rename AffineExprFlattener::cst -> localVarCst PiperOrigin-RevId: 225933306	2019-03-29 14:31:58 -07:00
Alex Zinenko	4dbd94b543	Refactor LowerVectorTransfersPass using pattern rewriters This introduces a generic lowering pass for ML functions. The pass is parameterized by template arguments defining individual pattern rewriters. Concrete lowering passes define individual pattern rewriters and inherit from the generic class that takes care of allocating rewriters, traversing ML functions and performing the actual rewrite. While this is similar to the greedy pattern rewriter available in Transform/Utils, it requires adjustments due to the ML/CFG duality. In particular, ML function rewriters must be able to create statements, not only operations, and need access to an MLFuncBuilder. When we move to using the unified function type, the ML-specific rewriting will become unnecessary. Use LowerVectorTransfers as a testbed for the generic pass. PiperOrigin-RevId: 225887424	2019-03-29 14:31:43 -07:00
Alex Zinenko	699a2f5373	LLVM IR lowering: support vector_type_cast Introduce support for lowering vector_type_cast to LLVM IR. It consists in creating a new MemRef descriptor with the base pointer with the type that corresponds to the lowered element type of the target memref. Since `vector_type_cast` does not support dynamic shapes in the target type, no dynamic size conversion is necessary. This commit goes in the opposite direction of what is expected of LLVM IR lowering: it should not be aware of all the other dialects. Instead, we should have separate definitions for conversions in a global lowering framework. However, this requires LLVM dialect to be implemented, which is currently blocked by the absence of user-defined types. Implement the lowering anyway to unblock end-to-end vectorization experiments. PiperOrigin-RevId: 225887368	2019-03-29 14:31:28 -07:00
Alex Zinenko	51c8a095a3	Materialize vector_type_cast operation in the SuperVector dialect This operation is produced and used by the super-vectorization passes and has been emitted as an abstract unregistered operation until now. For end-to-end testing purposes, it has to be eventually lowered to LLVM IR. Matching abstract operation by name goes into the opposite direction of the generic lowering approach that is expected to be used for LLVM IR lowering in the future. Register vector_type_cast operation as a part of the SuperVector dialect. Arguably, this operation is a special case of the `view` operation from the Standard dialect. The semantics of `view` is not fully specified at this point so it is safer to rely on a custom operation. Additionally, using a custom operation may help to achieve clear dialect separation. PiperOrigin-RevId: 225887305	2019-03-29 14:31:13 -07:00
Uday Bondhugula	19b2ce23a5	Refactor / eliminate duplicate code in memref-dep-check / getIterationDomainContext PiperOrigin-RevId: 225857762	2019-03-29 14:30:58 -07:00
Alex Zinenko	df9bd857b1	Type system: replace Type::getBitWidth with getIntOrFloatBitWidth As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does not make sense for all of them. Even with the current type system, the bit width is not defined (and causes the method in question to abort) for all TensorFlow types. This commit restricts the bit width definition to primitive standard types that have a number of bits appearing verbatim in their type, i.e., integers and floats. As a side effect, it delegates the decision on the bit width of the `index` to the backends. Existing backends currently hardcode it to 64 bits. The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that only applies to integers and floats. The call sites are updated to use the new method, where applicable, or rewritten so as not rely on it. Incidentally, this fixes a utility method that did not account for memrefs being allowed to have vectors as element types in the size computation. As an observation, several places in the code use Type in places where a more specific type could be used instead. Some of those are fixed by this commit. PiperOrigin-RevId: 225844792	2019-03-29 14:30:43 -07:00
Uday Bondhugula	4a3e4e8ea7	loop-unroll - add function callback argument for outside targets to provide unroll factors, and a cmd line argument to specify number of innermost loop unroll repetitions. - add function callback parameter for outside targets to provide unroll factors - add a cmd line parameter to repeatedly apply innermost loop unroll a certain number of times (to avoid using -loop-unroll -loop-unroll ...; instead -unroll-num-reps=2). - implement the callback for a target - update test cases / usage PiperOrigin-RevId: 225843191	2019-03-29 14:30:28 -07:00
MLIR Team	3b69230b3a	Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion. ) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality. ) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop. ) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access. ) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint. *) Adds multiple fusion unit tests. PiperOrigin-RevId: 225842944	2019-03-29 14:30:13 -07:00
Jacques Pienaar	49c4d2a630	Fix builder getFloatAttr of double to use F64 type and use fltSemantics in FloatAttr. Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type: * FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches; * FloatAttr created using double has the APFloat created to match the semantics of the type; Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute. PiperOrigin-RevId: 225821834	2019-03-29 14:29:58 -07:00
Lei Zhang	72159f5ede	Free the name symbol in TableGen Renamed the name field in Op to opName since it is the opcode's name. Renamed the name parameters in TFLite op templates to opSummary since they are meant as a summary of the op's functionality. We will use the name symbol later for the name given by users via TF. PiperOrigin-RevId: 225807135	2019-03-29 14:29:44 -07:00
Uday Bondhugula	dced746bd1	Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile - use addBoundsForForStmt - getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the returned things aren't owned / part of the stmt on which it's being called. - other minor API cleanup PiperOrigin-RevId: 225774301	2019-03-29 14:29:28 -07:00
Uday Bondhugula	c41ee60647	'memref-bound-check': extend to store op's as well - extend memref-bound-check to store op's - make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that one doesn't need to always create a pass to use it) PiperOrigin-RevId: 225564830	2019-03-29 14:29:13 -07:00
Alex Zinenko	bc52a639f9	Extract vector_transfer_* Ops into a SuperVectorDialect. From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492	2019-03-29 14:28:58 -07:00
Jacques Pienaar	30a30d205b	Fix asan failures in mlir-op-gen. PiperOrigin-RevId: 225532488	2019-03-29 14:28:44 -07:00
Jacques Pienaar	7a62e35644	Use dag instead of list for operands to allow named operands. Named operands allow generating builders with more meaningful names + lay the groundwork for allowing the specification of attributes as part of the inputs pattern of an op (which allows the declarative pattern rewrite generator to define ops with attributs). This is a minimal change that just changes how input operands are represented, changes to attributes in follow up and returnTypes later. PiperOrigin-RevId: 225509805	2019-03-29 14:28:29 -07:00
Uday Bondhugula	45a0f52519	Expression flattening improvement - reuse local expressions. - if a local id was already for a specific mod/div expression, just reuse it if the expression repeats (instead of adding a new one). - drastically reduces the number of local variables added during flattening for real use cases - since the same div's and mod expressions often repeat. - add getFlattenedAffineExprs for AffineMap, IntegerSet based on the above As a natural result of the above: - FlatAffineConstraints(IntegerSet) ctor now deals with integer sets that have mod and div constraints as well, and these get simplified as well from -simplify-affine-structures PiperOrigin-RevId: 225452174	2019-03-29 14:28:13 -07:00
Feng Liu	b0c41e54ef	Convert tf.FakeQuantWithMinMaxArgs/Vars to tfl.FakeQuant - Define tf.FakeQuantWithMinMaxArgs and tf.FakeQuantWithMinMaxVars - Add the unit tests for valid and invalid IRs - Rewrite both to the tfl.FakeQuant op - Add the unit tests for the rewriting PiperOrigin-RevId: 225447109	2019-03-29 14:27:58 -07:00
Feng Liu	a138c12cb3	Define TFLite Dequantize and FakeQuant ops Besides the ops.td file changes to define both ops, this CL also changes the mlir-op-gen to allow more flexible traits definition for "optional" operation inputs. Unit tests are added. One TODO for the mlir-op-gen is to make attribute optional in the ops. PiperOrigin-RevId: 225408349	2019-03-29 14:27:43 -07:00
Uday Bondhugula	8365bdc17f	FlatAffineConstraints - complete TODOs: add method to remove duplicate / trivially redundant constraints. Update projectOut to eliminate identifiers in a more efficient order. Fix b/120801118. - add method to remove duplicate / trivially redundant constraints from FlatAffineConstraints (use a hashing-based approach with DenseSet) - update projectOut to eliminate identifiers in a more efficient order (A sequence of affine_apply's like this (from a real use case) finally exposed the lack of the above trivial/low hanging simplifications). for %ii = 0 to 64 { for %jj = 0 to 9 { %a0 = affine_apply (d0, d1) -> (d0 * (9 * 1024) + d1 * 128) (%ii, %jj) %a1 = affine_apply (d0) -> (d0 floordiv (2 * 3 * 3 * 128 * 128), (d0 mod 294912) floordiv (3 * 3 * 128 * 128), (((d0 mod 294912) mod 147456) floordiv 1152) floordiv 8, (((d0 mod 294912) mod 147456) mod 1152) floordiv 384, ((((d0 mod 294912) mod 147456) mod 1152) mod 384) floordiv 128, (((((d0 mod 294912) mod 147456) mod 1152) mod 384) mod 128) floordiv 128) (%a0) %v0 = load %in[%a1tensorflow/mlir#0, %a1tensorflow/mlir#1, %a1tensorflow/mlir#3, %a1tensorflow/mlir#4, %a1tensorflow/mlir#2, %a1tensorflow/mlir#5] : memref<2x2x3x3x16x1xi32> } } - update FlatAffineConstraints::print to print number of constraints. PiperOrigin-RevId: 225397480	2019-03-29 14:27:29 -07:00
River Riddle	5c4f1fdd42	Check if the operation is already in the worklist before adding it. PiperOrigin-RevId: 225379496	2019-03-29 14:27:14 -07:00
Uday Bondhugula	4860f0e8fd	Fix loop unrolling test cases - These test cases had to be updated post the switch to exclusive upper bound; however, the test cases hadn't originally been written to check correctly; as a result, they didn't fail and weren't updated. Update test case and fix upper bound. PiperOrigin-RevId: 225194016	2019-03-29 14:26:56 -07:00
Alex Zinenko	359835eb27	LLVM IR lowering: support 1D vector operations Introduce initial support for 1D vector operations. LLVM does not support higher-dimensional vectors so the caller must make sure they don't appear in the input MLIR. Handle the presence of higher-dimensional vectors by failing gracefully. Introduce the type conversion for 1D vector types and hook it up with the rest of the type convresion system. Support "splat" constants for vector types. As a side effect, this refactors constant operation emission by separating out scalar integer constants into a separate case and by extracting out the helper function for scalar float construction. Existing binary operations apply to vectors transparently. PiperOrigin-RevId: 225172349	2019-03-29 14:26:37 -07:00
Alex Zinenko	97d2f3cd3d	ConvertToCFG: use affine_apply to implement loop steps Originally, loop steps were implemented using `addi` and `constant` operations because `affine_apply` was not handled in the first implementation. The support for `affine_apply` has been added, use it to implement the update of the loop induction variable. This is more consistent with the lower and upper bounds of the loop that are also implemented as `affine_apply`, removes the dependence of the converted function on the StandardOps dialect and makes it clear from the CFG function that all operations on the loop induction variable are purely affine. PiperOrigin-RevId: 225165337	2019-03-29 14:26:22 -07:00
Jacques Pienaar	a2222a9448	Add rudimentary pattern rewrite matching generation. * Start very basic (about as basic as possible) with the pattern rewrite generation by only - Matching single node dags, - Single output, single result, - No constraints on inputs/outputs. - No attributes (only operands) * The matcher generates C++ code akin to what is currently manually written. - This is very much not the final end state, and only intended for the short term; * Always generate the default builder method to make it easier to generate calls; - Also add additional builder method for TFL::Add as attributes are not yet supported; * Replace TF Add -> TFL Add matching using this generation; * Introduce a conceptual textual namespace in the op registry - Will allow importing multiple dialect's op registry - Avoids needing to do anything special with tablegen or define a custom DSL; = I really want to do a custom DSL but this urge could just be as its fun :) So defer for now. From this structure we can dump out another structured form if needed; - Add a mapping from <namespace>_<op> in the op_gen and pattern rewrite gen = This allows placing ops in different namespaces from the same op registry which is convenient, esp. if we want to consider subnamespaces in future; * Update tfl namespace to TFL to match TF and XLA; PiperOrigin-RevId: 225155164	2019-03-29 14:26:07 -07:00
Uday Bondhugula	c86c414765	Remove dead code from FlatAffineConstraints - getDimensionBounds() was added initially for quick experimentation - no longer used (getConstantBoundOnDimSize is the more powerful/complete replacement). - FlatAffineConstraints::getConstantLower/UpperBound are incomplete, functionality/naming-wise misleading, and not used currently. Removing these; complete/fixed version will be added in an upcoming CL. PiperOrigin-RevId: 225075061	2019-03-29 14:25:52 -07:00
Lei Zhang	a9eb2e8ffc	Generate another op builder with aggregated parameters For each op, generate another builder with the following signature: static void build(Builder* builder, OperationState* result, ArrayRef<Type> resultTypes, ArrayRef<SSAValue*> args, ArrayRef<NamedAttribute> attributes); PiperOrigin-RevId: 225066007	2019-03-29 14:25:37 -07:00
Alex Zinenko	63261aa9a8	Disallow index types as elements of vector, memref and tensor types An extensive discussion demonstrated that it is difficult to support `index` types as elements of compound (vector, memref, tensor) types. In particular, their size is unknown until the target-specific lowering takes place. MLIR may need to store constants of the fixed-shape compound types (e.g., vector<4 x index>) internally and must know the size of the element type and data layout constraints. The same information is necessary for target-specific lowering and translation to reliably support compound types with `index` elements, but MLIR does not have a dedicated target description mechanism yet. The uses cases for compound types with `index` elements, should they appear, can be handled via an `index_cast` operation that converts between `index` and fixed-size integer types at the SSA value level instead of the type level. PiperOrigin-RevId: 225064373	2019-03-29 14:25:22 -07:00
Uday Bondhugula	b9f53dc0bd	Update/Fix LoopUtils::stmtBodySkew to handle loop step. - loop step wasn't handled and there wasn't a TODO or an assertion; fix this. - rename 'delay' to shift for consistency/readability. - other readability changes. - remove duplicate attribute print for DmaStartOp; fix misplaced attribute print for DmaWaitOp - add build method for AddFOp (unrelated to this CL, but add it anyway) PiperOrigin-RevId: 224892958	2019-03-29 14:25:07 -07:00
Uday Bondhugula	d59a95a05c	Fix missing check for dependent DMAs in pipeline-data-transfer - adding a conservative check for now (TODO: use the dependence analysis pass once the latter is extended to deal with DMA ops). resolve an existing bug on a test case. - update test cases PiperOrigin-RevId: 224869526	2019-03-29 14:24:53 -07:00
Uday Bondhugula	6757fb151d	FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD(). - add method normalizeConstraintsByGCD - call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of projectOut. - remove call to GCDTightenInequalities() from getMemRefRegion - change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each time an identifier is eliminated (to detect emptiness early). - make FourierMotzkinEliminate, gaussianEliminateId(s), GCDTightenInequalities() private - improve / update stale comments PiperOrigin-RevId: 224866741	2019-03-29 14:24:37 -07:00
Uday Bondhugula	2ef57806ba	Update/fix -pipeline-data-transfer; fix b/120770946 - fix replaceAllMemRefUsesWith call to replace only inside loop body. - handle the case where DMA buffers are dynamic; extend doubleBuffer() method to handle dynamically shaped DMA buffers (pass the right operands to AllocOp) - place alloc's for DMA buffers at the depth at which pipelining is being done (instead of at top-level) - add more test cases PiperOrigin-RevId: 224852231	2019-03-29 14:24:22 -07:00
Alex Zinenko	073c3ad997	Properly namespace createLowerAffineApply This was missing from the original commit. The implementation of createLowerAffineApply was defined in the default namespace but declared in the `mlir` namespace, which could lead to linking errors when it was used. Put the definition in `mlir` namespace. PiperOrigin-RevId: 224830894	2019-03-29 14:24:04 -07:00
Nicolas Vasilache	c28aeef901	[MLIR] Drop bug-prone global map indexed by MLFunction* PiperOrigin-RevId: 224610805	2019-03-29 14:23:49 -07:00
Uday Bondhugula	2d6478fa92	Extend loop tiling utility to handle non-constant loop bounds and bounds that are a max/min of several expressions. - Extend loop tiling to handle non-constant loop bounds and bounds that are a max/min of several expressions, i.e., bounds using multi-result affine maps - also fix b/120630124 as a result (the IR was in an invalid state when tiled loop generation failed; SSA uses were created that weren't plugged into the IR). PiperOrigin-RevId: 224604460	2019-03-29 14:23:34 -07:00
Uday Bondhugula	dfc752e42b	Generate strided DMAs from -dma-generate - generate DMAs correctly now using strided DMAs where needed - add support for multi-level/nested strides; op still supports one level of stride for now. Other things - add test case for symbolic lower/upper bound; cases where the DMA buffer size can't be bounded by a known constant - add test case for dynamic shapes where the DMA buffers are however bounded by constants - refactor some of the '-dma-generate' code PiperOrigin-RevId: 224584529	2019-03-29 14:23:19 -07:00
Nicolas Vasilache	d9b6420fc9	[MLIR] Add LowerVectorTransfersPass This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp to a simple loop nest via local buffer allocations. This is an MLIR->MLIR lowering based on builders. A few TODOs are left to address in particular: 1. invert the permutation map so the accesses to the remote memref are coalesced; 2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory); 3. support broadcast / avoid copies when permutation_map is not of full column rank 4. add a proper "element_cast" op One notable limitation is this does not plan on supporting boundary conditions. It should be significantly easier to use pre-baked MLIR functions to handle such paddings. This is left for future consideration. Therefore the current CL only works properly for full-tile cases atm. This CL also adds 2 simple tests: ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N step 4 { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 step 4 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5) for %i6 = 0 to 3 { %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32> store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32> dealloc %1 : memref<5x4x3xf32> ``` and ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32> ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { for %i6 = 0 to 3 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32> store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32> %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>> dealloc %1 : memref<5x4x3xf32> ``` PiperOrigin-RevId: 224552717	2019-03-29 14:23:05 -07:00
Nicolas Vasilache	879be718a0	[MLIR] Fix the name of the MaterializeVectorPass PiperOrigin-RevId: 224536381	2019-03-29 14:22:49 -07:00
Nicolas Vasilache	db1b9f7381	[MLIR] Add composeWithUnboundedMap This CL adds a finer grain composition function between AffineExpr and an unbounded map. This will be used in the next CL. Also cleans up some comments remaining from a previous CL. PiperOrigin-RevId: 224536314	2019-03-29 14:22:34 -07:00

1 2 3 4 5 ...

550 Commits All Branches Search

550 Commits

All Branches