Commit Graph

550 Commits

Author SHA1 Message Date
MLIR Team 4eef795a1d Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice.
Updates LoopFusion pass to take these parameters as command line flags for experimentation.

PiperOrigin-RevId: 226514297
2019-03-29 14:35:03 -07:00
River Riddle 1e0ebabf66 Unify type uniquing and construction.
This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types.

To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator.

* TypeUniquer is now responsible for all construction and uniquing of types.
* TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext.

This cl also standardizes what a derived type storage class needs to provide:
    - Define a type alias, KeyTy, to a type that uniquely identifies the
      instance of the type within its kind.
      * The key type must be constructible from the values passed into the
        detail::TypeUniquer::get call after the type kind.
      * The key type must have a llvm::DenseMapInfo specialization for
        hashing.

    - Provide a method, 'KeyTy getKey() const', to construct the key type
      from an existing storage instance.

    - Provide a construction method:
        'DerivedStorage *construct(TypeStorageAllocator &, ...)'
      that builds a unique instance of the derived storage. The arguments
      after the TypeStorageAllocator must correspond with the values passed
      into the detail::TypeUniquer::get call after the type kind.

PiperOrigin-RevId: 226507184
2019-03-29 14:34:46 -07:00
Jacques Pienaar 7e24010382 Expand rewriter gen to handle string attributes in output.
* Extend to handle rewrite patterns with output attributes;
  - Constant attributes are defined with a value and a type;
  - The type of the value is mapped to the corresponding attribute type (string -> StringAttr);
* Verifies the type of operands in the resultant matches the defined op's operands;

PiperOrigin-RevId: 226468908
2019-03-29 14:34:31 -07:00
Jacques Pienaar 592dbc8326 Add method to retrieve a pass's ID.
Add passID member to Pass and enable querying it.

PiperOrigin-RevId: 226445431
2019-03-29 14:34:17 -07:00
MLIR Team bcb7c4742d Do proper indexing for local variables when building access function equality constraints (working on test cases).
PiperOrigin-RevId: 226399089
2019-03-29 14:34:02 -07:00
MLIR Team 4f5ef1619e Pass loop depth 1 to memref dependence check when constructing dependence constraints used to calculate computation slice for loop fusion.
This done so that the dominance check between ancestors of op statements from src/dst memref accesses will be run.

PiperOrigin-RevId: 226350443
2019-03-29 14:33:46 -07:00
Jacques Pienaar df90f000a8 Change attribute to be input argument.
Change operands to arguments in Op and use it for both operands and arguments. This unifies the way that operands and attributes are specified and the intended way that matching/creating ops with attributes will look. Both can now be represented using the same dag structure (and also makes the ordering more explicit). Derived attributes are not considered as part of the arguments (as they are inferred from the created op, not something needed to created it).
* Generate named operand accessors;
* Simplified the way of specifying Attr and use ElementAttr for TFL_Const instead.
* Fix a incorrect assertion generated;

The input parsing can be made more robust, I'll address that in a follow up.

PiperOrigin-RevId: 226307424
2019-03-29 14:33:31 -07:00
MLIR Team 2570fb5bb7 Address some issues from memref dependence check bug (b/121216762), adds tests cases.
PiperOrigin-RevId: 226277453
2019-03-29 14:33:17 -07:00
MLIR Team 6892ffb896 Improve loop fusion algorithm by using a memref dependence graph.
Fixed TODO for reduction fusion unit test.

PiperOrigin-RevId: 226277226
2019-03-29 14:33:02 -07:00
Uday Bondhugula 14d2618f63 Simplify memref-dependence-check's meta data structures / drop duplication and
reuse existing ones.

- drop IterationDomainContext, redundant since FlatAffineConstraints has
  MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
  supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)

PiperOrigin-RevId: 226082008
2019-03-29 14:32:46 -07:00
Uday Bondhugula 1d72f2e47e Update / complete a TODO for addBoundsForForStmt
- when adding constraints from a 'for' stmt into FlatAffineConstraints,
  correctly add bound operands of the 'for' stmt as a dimensional identifier or
  a symbolic identifier depending on whether the bound operand is a valid
  MLFunction symbol
- update test case to exercise this.

PiperOrigin-RevId: 225988511
2019-03-29 14:32:31 -07:00
Alex Zinenko 49c81ebcb0 Densify storage for f16, f32 and support f16 semantics in FloatAttrs
Existing implementation always uses 64 bits to store floating point values in
DenseElementsAttr.  This was due to FloatAttrs always a `double` for storage
independently of the actual type.  Recent commits added support for FloatAttrs
with the proper f32 type and floating semantics and changed the bitwidth
reporting on FloatType.

Use the existing infrastructure for densely storing 16 and 32-bit values in
DenseElementsAttr storage to store f16 and f32 values.  Move floating semantics
definition to the FloatType level.  Properly support f16 / IEEEhalf semantics
at the FloatAttr level and in the builder.

Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics
because APFloat does not have first-class support for bf16 types.

PiperOrigin-RevId: 225981289
2019-03-29 14:32:14 -07:00
Uday Bondhugula 20531932f4 Refactor/update memref-dep-check's addMemRefAccessConstraints and
addDomainConstraints; add support for mod/div for dependence testing.

- add support for mod/div expressions in dependence analysis
- refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead
  of getFlattenedAffineExpr); update addDomainConstraints.
- rename AffineExprFlattener::cst -> localVarCst

PiperOrigin-RevId: 225933306
2019-03-29 14:31:58 -07:00
Alex Zinenko 4dbd94b543 Refactor LowerVectorTransfersPass using pattern rewriters
This introduces a generic lowering pass for ML functions.  The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.

While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality.  In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder.  When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.

Use LowerVectorTransfers as a testbed for the generic pass.

PiperOrigin-RevId: 225887424
2019-03-29 14:31:43 -07:00
Alex Zinenko 699a2f5373 LLVM IR lowering: support vector_type_cast
Introduce support for lowering vector_type_cast to LLVM IR.  It consists in
creating a new MemRef descriptor with the base pointer with the type that
corresponds to the lowered element type of the target memref.  Since
`vector_type_cast` does not support dynamic shapes in the target type, no
dynamic size conversion is necessary.

This commit goes in the opposite direction of what is expected of LLVM IR
lowering: it should not be aware of all the other dialects.  Instead, we should
have separate definitions for conversions in a global lowering framework.
However, this requires LLVM dialect to be implemented, which is currently
blocked by the absence of user-defined types.  Implement the lowering anyway to
unblock end-to-end vectorization experiments.
PiperOrigin-RevId: 225887368
2019-03-29 14:31:28 -07:00
Alex Zinenko 51c8a095a3 Materialize vector_type_cast operation in the SuperVector dialect
This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now.  For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR.  Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future.  Register vector_type_cast operation as a part of the SuperVector
dialect.

Arguably, this operation is a special case of the `view` operation from the
Standard dialect.  The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation.  Additionally, using a custom
operation may help to achieve clear dialect separation.

PiperOrigin-RevId: 225887305
2019-03-29 14:31:13 -07:00
Uday Bondhugula 19b2ce23a5 Refactor / eliminate duplicate code in
memref-dep-check / getIterationDomainContext

PiperOrigin-RevId: 225857762
2019-03-29 14:30:58 -07:00
Alex Zinenko df9bd857b1 Type system: replace Type::getBitWidth with getIntOrFloatBitWidth
As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does
not make sense for all of them.  Even with the current type system, the bit
width is not defined (and causes the method in question to abort) for all
TensorFlow types.

This commit restricts the bit width definition to primitive standard types that
have a number of bits appearing verbatim in their type, i.e., integers and
floats.  As a side effect, it delegates the decision on the bit width of the
`index` to the backends.  Existing backends currently hardcode it to 64 bits.

The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that
only applies to integers and floats.  The call sites are updated to use the new
method, where applicable, or rewritten so as not rely on it.  Incidentally,
this fixes a utility method that did not account for memrefs being allowed to
have vectors as element types in the size computation.

As an observation, several places in the code use Type in places where a more
specific type could be used instead.  Some of those are fixed by this commit.

PiperOrigin-RevId: 225844792
2019-03-29 14:30:43 -07:00
Uday Bondhugula 4a3e4e8ea7 loop-unroll - add function callback argument for outside targets to
provide unroll factors, and a cmd line argument to specify number of
innermost loop unroll repetitions.

- add function callback parameter for outside targets to provide unroll factors
- add a cmd line parameter to repeatedly apply innermost loop unroll a certain
  number of times (to avoid using -loop-unroll -loop-unroll ...; instead
  -unroll-num-reps=2).
- implement the callback for a target
- update test cases / usage

PiperOrigin-RevId: 225843191
2019-03-29 14:30:28 -07:00
MLIR Team 3b69230b3a Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion.
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.

PiperOrigin-RevId: 225842944
2019-03-29 14:30:13 -07:00
Jacques Pienaar 49c4d2a630 Fix builder getFloatAttr of double to use F64 type and use fltSemantics in FloatAttr.
Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type:
* FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches;
* FloatAttr created using double has the APFloat created to match the semantics of the type;

Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute.

PiperOrigin-RevId: 225821834
2019-03-29 14:29:58 -07:00
Lei Zhang 72159f5ede Free the name symbol in TableGen
Renamed the name field in Op to opName since it is the opcode's name.

Renamed the name parameters in TFLite op templates to opSummary since
they are meant as a summary of the op's functionality.

We will use the name symbol later for the name given by users via TF.

PiperOrigin-RevId: 225807135
2019-03-29 14:29:44 -07:00
Uday Bondhugula dced746bd1 Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile
- use addBoundsForForStmt
- getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the
  returned things aren't owned / part of the stmt on which it's being called.
- other minor API cleanup

PiperOrigin-RevId: 225774301
2019-03-29 14:29:28 -07:00
Uday Bondhugula c41ee60647 'memref-bound-check': extend to store op's as well
- extend memref-bound-check to store op's
- make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that
  one doesn't need to always create a pass to use it)

PiperOrigin-RevId: 225564830
2019-03-29 14:29:13 -07:00
Alex Zinenko bc52a639f9 Extract vector_transfer_* Ops into a SuperVectorDialect.
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction.  In particular, they
are lowered to the StandardOps dialect before further processing.  As such, it
does not make sense to keep them at the same level as StandardOps.  Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.

PiperOrigin-RevId: 225554492
2019-03-29 14:28:58 -07:00
Jacques Pienaar 30a30d205b Fix asan failures in mlir-op-gen.
PiperOrigin-RevId: 225532488
2019-03-29 14:28:44 -07:00
Jacques Pienaar 7a62e35644 Use dag instead of list for operands to allow named operands.
Named operands allow generating builders with more meaningful names + lay the groundwork for allowing the specification of attributes as part of the inputs pattern of an op (which allows the declarative pattern rewrite generator to define ops with attributs). This is a minimal change that just changes how input operands are represented, changes to attributes in follow up and returnTypes later.

PiperOrigin-RevId: 225509805
2019-03-29 14:28:29 -07:00
Uday Bondhugula 45a0f52519 Expression flattening improvement - reuse local expressions.
- if a local id was already for a specific mod/div expression, just reuse it if
  the expression repeats (instead of adding a new one).
- drastically reduces the number of local variables added during flattening for
  real use cases - since the same div's and mod expressions often repeat.
- add getFlattenedAffineExprs for AffineMap, IntegerSet based on the above

As a natural result of the above:

- FlatAffineConstraints(IntegerSet) ctor now deals with integer sets that have mod
  and div constraints as well, and these get simplified as well from -simplify-affine-structures

PiperOrigin-RevId: 225452174
2019-03-29 14:28:13 -07:00
Feng Liu b0c41e54ef Convert tf.FakeQuantWithMinMaxArgs/Vars to tfl.FakeQuant
- Define tf.FakeQuantWithMinMaxArgs and tf.FakeQuantWithMinMaxVars
- Add the unit tests for valid and invalid IRs
- Rewrite both to the tfl.FakeQuant op
- Add the unit tests for the rewriting

PiperOrigin-RevId: 225447109
2019-03-29 14:27:58 -07:00
Feng Liu a138c12cb3 Define TFLite Dequantize and FakeQuant ops
Besides the ops.td file changes to define both ops, this CL also changes the
mlir-op-gen to allow more flexible traits definition for "optional" operation
inputs.

Unit tests are added.

One TODO for the mlir-op-gen is to make attribute optional in the ops.

PiperOrigin-RevId: 225408349
2019-03-29 14:27:43 -07:00
Uday Bondhugula 8365bdc17f FlatAffineConstraints - complete TODOs: add method to remove duplicate /
trivially redundant constraints. Update projectOut to eliminate identifiers in
a more efficient order. Fix b/120801118.

- add method to remove duplicate / trivially redundant constraints from
  FlatAffineConstraints (use a hashing-based approach with DenseSet)
- update projectOut to eliminate identifiers in a more efficient order

(A sequence of affine_apply's like this (from a real use case) finally exposed
the lack of the above trivial/low hanging simplifications).

  for %ii = 0 to 64 {
    for %jj = 0 to 9 {
      %a0 = affine_apply (d0, d1) -> (d0 * (9 * 1024) + d1 * 128) (%ii, %jj)
      %a1 = affine_apply (d0) ->
        (d0 floordiv (2 * 3 * 3 * 128 * 128),
        (d0 mod 294912) floordiv (3 * 3 * 128 * 128),
        (((d0 mod 294912) mod 147456) floordiv 1152) floordiv 8,
        (((d0 mod 294912) mod 147456) mod 1152) floordiv 384,
        ((((d0 mod 294912) mod 147456) mod 1152) mod 384) floordiv 128,
        (((((d0 mod 294912) mod 147456) mod 1152) mod 384) mod 128)
          floordiv 128) (%a0)
      %v0 = load %in[%a1tensorflow/mlir#0, %a1tensorflow/mlir#1, %a1tensorflow/mlir#3, %a1tensorflow/mlir#4, %a1tensorflow/mlir#2, %a1tensorflow/mlir#5]
        : memref<2x2x3x3x16x1xi32>
    }
  }

- update FlatAffineConstraints::print to print number of constraints.

PiperOrigin-RevId: 225397480
2019-03-29 14:27:29 -07:00
River Riddle 5c4f1fdd42 Check if the operation is already in the worklist before adding it.
PiperOrigin-RevId: 225379496
2019-03-29 14:27:14 -07:00
Uday Bondhugula 4860f0e8fd Fix loop unrolling test cases
- These test cases had to be updated post the switch to exclusive upper bound;
  however, the test cases hadn't originally been written to check correctly; as
  a result, they didn't fail and weren't updated. Update test case and fix
  upper bound.

PiperOrigin-RevId: 225194016
2019-03-29 14:26:56 -07:00
Alex Zinenko 359835eb27 LLVM IR lowering: support 1D vector operations
Introduce initial support for 1D vector operations.  LLVM does not support
higher-dimensional vectors so the caller must make sure they don't appear in
the input MLIR.  Handle the presence of higher-dimensional vectors by failing
gracefully.

Introduce the type conversion for 1D vector types and hook it up with the rest
of the type convresion system.  Support "splat" constants for vector types.  As
a side effect, this refactors constant operation emission by separating out
scalar integer constants into a separate case and by extracting out the helper
function for scalar float construction.  Existing binary operations apply to
vectors transparently.

PiperOrigin-RevId: 225172349
2019-03-29 14:26:37 -07:00
Alex Zinenko 97d2f3cd3d ConvertToCFG: use affine_apply to implement loop steps
Originally, loop steps were implemented using `addi` and `constant` operations
because `affine_apply` was not handled in the first implementation.  The
support for `affine_apply` has been added, use it to implement the update of
the loop induction variable.  This is more consistent with the lower and upper
bounds of the loop that are also implemented as `affine_apply`, removes the
dependence of the converted function on the StandardOps dialect and makes it
clear from the CFG function that all operations on the loop induction variable
are purely affine.

PiperOrigin-RevId: 225165337
2019-03-29 14:26:22 -07:00
Jacques Pienaar a2222a9448 Add rudimentary pattern rewrite matching generation.
* Start very basic (about as basic as possible) with the pattern rewrite generation by only
  - Matching single node dags,
  - Single output, single result,
  - No constraints on inputs/outputs.
  - No attributes (only operands)
* The matcher generates C++ code akin to what is currently manually written.
  - This is very much not the final end state, and only intended for the short term;
* Always generate the default builder method to make it easier to generate calls;
  - Also add additional builder method for TFL::Add as attributes are not yet supported;
* Replace TF Add -> TFL Add matching using this generation;
* Introduce a conceptual textual namespace in the op registry
  - Will allow importing multiple dialect's op registry
  - Avoids needing to do anything special with tablegen or define a custom DSL;
    = I really want to do a custom DSL but this urge could just be as its fun :) So defer for now. From this structure we can dump out another structured form if needed;
  - Add a mapping from <namespace>_<op> in the op_gen and pattern rewrite gen
    = This allows placing ops in different namespaces from the same op registry which is convenient, esp. if we want to consider subnamespaces in future;
* Update tfl namespace to TFL to match TF and XLA;

PiperOrigin-RevId: 225155164
2019-03-29 14:26:07 -07:00
Uday Bondhugula c86c414765 Remove dead code from FlatAffineConstraints
- getDimensionBounds() was added initially for quick experimentation - no
  longer used (getConstantBoundOnDimSize is the more powerful/complete
  replacement).
- FlatAffineConstraints::getConstantLower/UpperBound are incomplete,
  functionality/naming-wise misleading, and not used currently. Removing these;
  complete/fixed version will be added in an upcoming CL.

PiperOrigin-RevId: 225075061
2019-03-29 14:25:52 -07:00
Lei Zhang a9eb2e8ffc Generate another op builder with aggregated parameters
For each op, generate another builder with the following signature:

  static void build(Builder* builder, OperationState* result,
                    ArrayRef<Type> resultTypes,
                    ArrayRef<SSAValue*> args,
                    ArrayRef<NamedAttribute> attributes);

PiperOrigin-RevId: 225066007
2019-03-29 14:25:37 -07:00
Alex Zinenko 63261aa9a8 Disallow index types as elements of vector, memref and tensor types
An extensive discussion demonstrated that it is difficult to support `index`
types as elements of compound (vector, memref, tensor) types.  In particular,
their size is unknown until the target-specific lowering takes place.  MLIR may
need to store constants of the fixed-shape compound types (e.g.,
vector<4 x index>) internally and must know the size of the element type and
data layout constraints.  The same information is necessary for target-specific
lowering and translation to reliably support compound types with `index`
elements, but MLIR does not have a dedicated target description mechanism yet.

The uses cases for compound types with `index` elements, should they appear,
can be handled via an `index_cast` operation that converts between `index` and
fixed-size integer types at the SSA value level instead of the type level.

PiperOrigin-RevId: 225064373
2019-03-29 14:25:22 -07:00
Uday Bondhugula b9f53dc0bd Update/Fix LoopUtils::stmtBodySkew to handle loop step.
- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
  print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)

PiperOrigin-RevId: 224892958
2019-03-29 14:25:07 -07:00
Uday Bondhugula d59a95a05c Fix missing check for dependent DMAs in pipeline-data-transfer
- adding a conservative check for now (TODO: use the dependence analysis pass
  once the latter is extended to deal with DMA ops). resolve an existing bug on
  a test case.

- update test cases

PiperOrigin-RevId: 224869526
2019-03-29 14:24:53 -07:00
Uday Bondhugula 6757fb151d FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD().
- add method normalizeConstraintsByGCD
- call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of
  projectOut.
- remove call to GCDTightenInequalities() from getMemRefRegion
- change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each
  time an identifier is eliminated (to detect emptiness early).
- make FourierMotzkinEliminate, gaussianEliminateId(s),
  GCDTightenInequalities() private
- improve / update stale comments

PiperOrigin-RevId: 224866741
2019-03-29 14:24:37 -07:00
Uday Bondhugula 2ef57806ba Update/fix -pipeline-data-transfer; fix b/120770946
- fix replaceAllMemRefUsesWith call to replace only inside loop body.
- handle the case where DMA buffers are dynamic; extend doubleBuffer() method
  to handle dynamically shaped DMA buffers (pass the right operands to AllocOp)
- place alloc's for DMA buffers at the depth at which pipelining is being done
  (instead of at top-level)
- add more test cases

PiperOrigin-RevId: 224852231
2019-03-29 14:24:22 -07:00
Alex Zinenko 073c3ad997 Properly namespace createLowerAffineApply
This was missing from the original commit.  The implementation of
createLowerAffineApply was defined in the default namespace but declared in the
`mlir` namespace, which could lead to linking errors when it was used.  Put the
definition in `mlir` namespace.

PiperOrigin-RevId: 224830894
2019-03-29 14:24:04 -07:00
Nicolas Vasilache c28aeef901 [MLIR] Drop bug-prone global map indexed by MLFunction*
PiperOrigin-RevId: 224610805
2019-03-29 14:23:49 -07:00
Uday Bondhugula 2d6478fa92 Extend loop tiling utility to handle non-constant loop bounds and bounds that
are a max/min of several expressions.

- Extend loop tiling to handle non-constant loop bounds and bounds that
  are a max/min of several expressions, i.e., bounds using multi-result affine
  maps

- also fix b/120630124 as a result (the IR was in an invalid state when tiled
  loop generation failed; SSA uses were created that weren't plugged into the IR).

PiperOrigin-RevId: 224604460
2019-03-29 14:23:34 -07:00
Uday Bondhugula dfc752e42b Generate strided DMAs from -dma-generate
- generate DMAs correctly now using strided DMAs where needed
- add support for multi-level/nested strides; op still supports one level of
  stride for now.

Other things
- add test case for  symbolic lower/upper bound; cases where the DMA buffer
  size can't be bounded by a known constant
- add test case for dynamic shapes where the DMA buffers are however bounded by
  constants
- refactor some of the '-dma-generate' code

PiperOrigin-RevId: 224584529
2019-03-29 14:23:19 -07:00
Nicolas Vasilache d9b6420fc9 [MLIR] Add LowerVectorTransfersPass
This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp
to a simple loop nest via local buffer allocations.

This is an MLIR->MLIR lowering based on builders.

A few TODOs are left to address in particular:
1. invert the permutation map so the accesses to the remote memref are coalesced;
2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory);
3. support broadcast / avoid copies when permutation_map is not of full column rank
4. add a proper "element_cast" op

One notable limitation is this does not plan on supporting boundary conditions.
It should be significantly easier to use pre-baked MLIR functions to handle such paddings.
This is left for future consideration.
Therefore the current CL only works properly for full-tile cases atm.

This CL also adds 2 simple tests:

```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N step 4 {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 step 4 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5)
            for %i6 = 0 to 3 {
              %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32>
              store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32>
       dealloc %1 : memref<5x4x3xf32>
```

and
```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>

```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            for %i6 = 0 to 3 {
              %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32>
              store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32>
        %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>>
        dealloc %1 : memref<5x4x3xf32>
```

PiperOrigin-RevId: 224552717
2019-03-29 14:23:05 -07:00
Nicolas Vasilache 879be718a0 [MLIR] Fix the name of the MaterializeVectorPass
PiperOrigin-RevId: 224536381
2019-03-29 14:22:49 -07:00
Nicolas Vasilache db1b9f7381 [MLIR] Add composeWithUnboundedMap
This CL adds a finer grain composition function between AffineExpr and an
unbounded map. This will be used in the next CL.
Also cleans up some comments remaining from a previous CL.

PiperOrigin-RevId: 224536314
2019-03-29 14:22:34 -07:00