Commit Graph

686 Commits

Author SHA1 Message Date
River Riddle c4a5386e48 NFC: Replace usages of iterator_range<operand_iterator> with operand_range.
--

PiperOrigin-RevId: 242031201
2019-04-05 07:42:29 -07:00
MLIR Team 0cd589c337 Create a LoopUtil function to return perfectly nested loop set
--

PiperOrigin-RevId: 242019230
2019-04-05 07:42:01 -07:00
River Riddle a8f4b9eeeb Iterate on the operations to fold in TestConstantFold in reverse to remove the need for ConstantFoldHelper to have a flag for insertion at the head of the entry block. This also fixes an asan bug in TestConstantFold due to the iteration order of operations and ConstantFoldHelper's constant insertion placement.
Note: This now means that we cannot fold chains of operations, i.e. where constant foldable operations feed into each other. Given that this is a testing pass solely for constant folding, this isn't really something that we want anyways. Constant fold tests should be simple and direct, with more advanced folding/feeding being tested with the canonicalizer.

--

PiperOrigin-RevId: 242011744
2019-04-05 07:41:52 -07:00
River Riddle dca21299cb Fix a few warnings for missing parentheses around '||' and extra semicolons.
--

PiperOrigin-RevId: 241994767
2019-04-05 07:41:43 -07:00
Lei Zhang 4e40c83291 Deduplicate constant folding logic in ConstantFold and GreedyPatternRewriteDriver
There are two places containing constant folding logic right now: the ConstantFold
    pass and the GreedyPatternRewriteDriver. The logic was not shared and started to
    drift apart. We were testing constant folding logic using the ConstantFold pass,
    but lagged behind the GreedyPatternRewriteDriver, where we really want the constant
    folding to happen.

    This CL pulled the logic into utility functions and classes for sharing between
    these two places. A new ConstantFoldHelper class is created to help constant fold
    and de-duplication.

    Also, renamed the ConstantFold pass to TestConstantFold to make it clear that it is
    intended for testing purpose.

--

PiperOrigin-RevId: 241971681
2019-04-05 07:41:32 -07:00
River Riddle 6fa3181329 Remove the non-postorder walk functions from Function/Block/Instruction and rename walkPostOrder to walk.
--

PiperOrigin-RevId: 241965239
2019-04-05 07:41:23 -07:00
Andy Davis d0d1b2a30d Fix bug in LoopTiling where creation of tile-space loop upper bound did not handle symbol operands correctly.
--

PiperOrigin-RevId: 241958502
2019-04-05 07:41:12 -07:00
Nicolas Vasilache f1b12f5a64 Fix test that fails on non-determinism in LowerVectorTransfers
This CL fixes the non-determinism across compilers in an edsc::select expression used in LowerVectorTransfers. This is achieved by factoring the expression out of the function call to ensure a deterministic order of evaluation.
    Since the expression is now factored out, fewer IR is generated and the test is updated accordingly.

--

PiperOrigin-RevId: 241679962
2019-04-03 01:09:13 -07:00
River Riddle 67a52c44b1 Rewrite the verify hooks on operations to use LogicalResult instead of bool. This also changes the return of Operation::emitError/emitOpError to LogicalResult as well.
--

PiperOrigin-RevId: 241588075
2019-04-02 13:40:47 -07:00
Andy Davis 7c1fc9e795 Enable producer-consumer fusion for liveout memrefs if consumer read region matches producer write region.
--

PiperOrigin-RevId: 241517207
2019-04-02 13:39:50 -07:00
River Riddle 084669e005 Remove MLPatternLoweringPass and rewrite LowerVectorTransfers to use RewritePattern instead.
--

PiperOrigin-RevId: 241455472
2019-04-02 13:39:17 -07:00
Mehdi Amini b3a407fa68 Fix MacOS build
This is making up for some differences in standard library and linker flags.
    It also get rid of the requirement to build with RTTI.

--

PiperOrigin-RevId: 241348845
2019-04-01 11:00:30 -07:00
Jacques Pienaar 1273af232c Add build files and update README.
* Add initial version of build files;
    * Update README with instructions to download and build MLIR from github;

--

PiperOrigin-RevId: 241102092
2019-03-30 11:23:22 -07:00
Nicolas Vasilache c9d5f3418a Cleanup SuperVectorization dialect printing and parsing.
On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```

becomes:

```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```

On the write side,

```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```

becomes

```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```

Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.

PiperOrigin-RevId: 241021879
2019-03-29 17:56:42 -07:00
Nicolas Vasilache f93a5be65f Make createMaterializeVectorsPass take a vectorSize parameter - NFC
This CL allows the programmatic control of the target hardware vector size when creating a MaterializeVectorsPass.
This is useful for registering passes for the tutorial.

PiperOrigin-RevId: 240996136
2019-03-29 17:56:12 -07:00
Nicolas Vasilache 094ca64ab0 Refactor vectorization patterns
This CL removes the reliance of the vectorize pass on the specification of a `fastestVaryingDim` parameter. This parameter is a restriction meant to more easily target a particular loop/memref combination for vectorization and is mainly used for testing.

This also had the side-effect of restricting vectorization patterns to only the ones in which all memrefs were contiguous along the same loop dimension. This simple restriction prevented matmul to vectorize in 2-D.

this CL removes the restriction and adds the matmul test which vectorizes in 2-D along the parallel loops. Support for reduction loops is left for future work.

PiperOrigin-RevId: 240993827
2019-03-29 17:55:36 -07:00
MLIR Team 9d30b36aaf Enable input-reuse fusion to search function arguments for fusion candidates (takes care of a TODO, enables another tutorial test case).
PiperOrigin-RevId: 240979894
2019-03-29 17:54:36 -07:00
River Riddle 106dd08e99 Change the vectorizer test pass to output via diagnostics instead of llvm::outs. This allows for the output to be deterministic when multi-threading is enabled.
PiperOrigin-RevId: 240905858
2019-03-29 17:54:21 -07:00
Jacques Pienaar cd0b925dc2 Remove extra qualification
PiperOrigin-RevId: 240875432
2019-03-29 17:52:36 -07:00
Alex Zinenko 3173a63f3f Dialect Conversion: convert regions of operations when cloning them
Dialect conversion currently clones the operations that did not match any
pattern.  This includes cloning any regions that belong to these operations.
Instead, apply conversion recursively to the nested regions.

Note that if an operation matched one of the conversion patterns, it is up to
the pattern rewriter to fill in the regions of the converted operation.  This
may require calling back to the converter and is left for future work.

PiperOrigin-RevId: 240872410
2019-03-29 17:52:04 -07:00
MLIR Team 9d9675fc8f Remove overly conservative check in LoopFusion pass (enables fusion in tutorial example).
PiperOrigin-RevId: 240859227
2019-03-29 17:51:16 -07:00
River Riddle 213b8d4d3b Rename InstOperand to OpOperand.
PiperOrigin-RevId: 240814651
2019-03-29 17:50:41 -07:00
Nicolas Vasilache 4dc7af9da8 Make vectorization aware of loop semantics
Now that we have a dependence analysis, we can check that loops are indeed parallel and make vectorization correct.

PiperOrigin-RevId: 240682727
2019-03-29 17:49:30 -07:00
Nicolas Vasilache c3742d20b5 Give the Vectorize pass a virtualVectorSize argument.
This CL allows vectorization to be called and configured in other ways than just via command line arguments.
This allows triggering vectorization programmatically.

PiperOrigin-RevId: 240638208
2019-03-29 17:48:12 -07:00
River Riddle 99b87c9707 Replace usages of Instruction with Operation in the Transforms/ directory.
PiperOrigin-RevId: 240636130
2019-03-29 17:47:26 -07:00
Mehdi Amini 3518122e86 Simplify API uses of `getContext()` (NFC)
The Pass base class is providing a convenience getContext() accessor.

PiperOrigin-RevId: 240634961
2019-03-29 17:47:11 -07:00
Jacques Pienaar b0244b66a5 Fix include path in test pass.
PiperOrigin-RevId: 240628260
2019-03-29 17:46:41 -07:00
River Riddle 9c08540690 Replace usages of Instruction with Operation in the /Analysis directory.
PiperOrigin-RevId: 240569775
2019-03-29 17:44:56 -07:00
Alex Zinenko 5a5bba0279 Introduce affine terminator
Due to legacy reasons (ML/CFG function separation), regions in affine control
flow operations require contained blocks not to have terminators.  This is
inconsistent with the notion of the block and may complicate code motion
between regions of affine control operations and other regions.

Introduce `affine.terminator`, a special terminator operation that must be used
to terminate blocks inside affine operations and transfers the control back to
he region enclosing the affine operation.  For brevity and readability reasons,
allow `affine.for` and `affine.if` to omit the `affine.terminator` in their
regions when using custom printing and parsing format.  The custom parser
injects the `affine.terminator` if it is missing so as to always have it
present in constructed operations.

Update transformations to account for the presence of terminator.  In
particular, most code motion transformation between loops should leave the
terminator in place, and code motion between loops and non-affine blocks should
drop the terminator.

PiperOrigin-RevId: 240536998
2019-03-29 17:44:24 -07:00
River Riddle af45236c70 Add experimental support for multi-threading the pass manager. This adds support for running function pipelines on functions across multiple threads, and is guarded by an off-by-default flag 'experimental-mt-pm'. There are still quite a few things that need to be done before multi-threading is ready for general use(e.g. pass-timing), but this allows for those things to be tested in a multi-threaded environment.
PiperOrigin-RevId: 240489002
2019-03-29 17:44:08 -07:00
River Riddle f9d91531df Replace usages of Instruction with Operation in the /IR directory.
This is step 2/N to renaming Instruction to Operation.

PiperOrigin-RevId: 240459216
2019-03-29 17:43:37 -07:00
River Riddle 9ffdc930c0 Rename the Instruction class to Operation. This just renames the class, usages of Instruction will still refer to a typedef in the interim.
This is step 1/N to renaming Instruction to Operation.

PiperOrigin-RevId: 240431520
2019-03-29 17:42:50 -07:00
Alex Zinenko a7215a9032 Allow creating standalone Regions
Currently, regions can only be constructed by passing in a `Function` or an
`Instruction` pointer referencing the parent object, unlike `Function`s or
`Instruction`s themselves that can be created without a parent.  It leads to a
rather complex flow in operation construction where one has to create the
operation first before being able to work with its regions.  It may be
necessary to work with the regions before the operation is created.  In
particular, in `build` and `parse` functions that are executed _before_ the
operation is created in cases where boilerplate region manipulation is required
(for example, inserting the hypothetical default terminator in affine regions).
Allow creating standalone regions.  Such regions are meant to own a list of
blocks and transfer them to other regions on demand.

Each instruction stores a fixed number of regions as trailing objects and has
ownership of them.  This decreases the size of the Instruction object for the
common case of instructions without regions.  Keep this behavior intact.  To
allow some flexibility in construction, make OperationState store an owning
vector of regions.  When the Builder creates an Instruction from
OperationState, the bodies of the regions are transferred into the
instruction-owned regions to minimize copying.  Thus, it becomes possible to
fill standalone regions with blocks and move them to an operation when it is
constructed, or move blocks from a region to an operation region, e.g., for
inlining.

PiperOrigin-RevId: 240368183
2019-03-29 17:40:59 -07:00
Chris Lattner 46ade282c8 Make FunctionPass::getFunction() return a reference to the function, instead of
a pointer.  This makes it consistent with all the other methods in
FunctionPass, as well as with ModulePass::getModule().  NFC.

PiperOrigin-RevId: 240257910
2019-03-29 17:40:44 -07:00
River Riddle 96ebde9cfd Replace usages of "Op::operator->" with ".".
This is step 2/N of removing the temporary operator-> method as part of the de-const transition.

PiperOrigin-RevId: 240200792
2019-03-29 17:40:09 -07:00
River Riddle 5de726f493 Refactor the Pattern framework to allow for combined match/rewrite patterns. This is done by adding a new 'matchAndRewrite' function to RewritePattern that performs the match and rewrite in one step. The default behavior simply calls into the existing 'match' and 'rewrite' functions. The 'PatternMatcher' class has now been specialized for RewritePatterns and has been rewritten to make use of the new matchAndRewrite functionality.
This combined match/rewrite functionality allows simplifying the majority of existing RewritePatterns, as they do not benefit from separate match and rewrite functions.

Some of the existing canonicalization patterns in StandardOps have been modified to take advantage of this functionality.

PiperOrigin-RevId: 240187856
2019-03-29 17:39:35 -07:00
River Riddle af1abcc80b Replace usages of "operator->" with "." for the AffineOps.
Note: The "operator->" method is a temporary helper for the de-const transition and is gradually being phased out.
PiperOrigin-RevId: 240179439
2019-03-29 17:39:19 -07:00
River Riddle 832567b379 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and set the namespace of the AffineOps dialect to 'affine'.
PiperOrigin-RevId: 240165792
2019-03-29 17:39:03 -07:00
Mehdi Amini bb621a5596 Using getContext() instead of getInstruction()->getContext() on Operation (NFC)
PiperOrigin-RevId: 240088209
2019-03-29 17:38:29 -07:00
Chris Lattner e510de0305 Various small cleanups to the code, mostly removing const_cast's.
PiperOrigin-RevId: 240083489
2019-03-29 17:37:58 -07:00
River Riddle 9c6e92360c NFC: Rename the 'if' operation in the AffineOps dialect to 'affine.if'.
PiperOrigin-RevId: 240071154
2019-03-29 17:36:53 -07:00
Chris Lattner d9b5bc8f55 Remove OpPointer, cleaning up a ton of code. This also moves Ops to using
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().

This removes another 200 lines of code.

PiperOrigin-RevId: 240068113
2019-03-29 17:36:21 -07:00
Chris Lattner dd2b2ec542 Push a bunch of 'consts' out of the *Op structure, in prep for removing
OpPointer.

PiperOrigin-RevId: 240044712
2019-03-29 17:35:35 -07:00
Nicolas Vasilache f26c7cd792 Cleanup ValueHandleArray
We just need a way to unpack ArrayRef<ValueHandle> to ArrayRef<Value*>.
No need to expose this to the user.

This reduces the cognitive overhead for the tutorial.

PiperOrigin-RevId: 240037425
2019-03-29 17:35:20 -07:00
Chris Lattner 986310a68f Remove const from Value, Instruction, Argument, and the various methods on the
*Op classes.  This is a net reduction by almost 400LOC.

PiperOrigin-RevId: 239972443
2019-03-29 17:34:33 -07:00
Chris Lattner 3d6c74fff5 Remove const from mlir::Block.
This also eliminates some incorrect reinterpret_cast logic working around it, and numerous const-incorrect issues (like block argument iteration).

PiperOrigin-RevId: 239712029
2019-03-29 17:30:30 -07:00
Chris Lattner 88e9f418f5 Continue pushing const out of the core IR types - in this case, remove const
from Function.

PiperOrigin-RevId: 239638635
2019-03-29 17:29:58 -07:00
Nicolas Vasilache fc5bbdd6c8 Improve comment for `augmentMapAndBounds`
Followup from a previous CL.

PiperOrigin-RevId: 239591775
2019-03-29 17:27:57 -07:00
Chris Lattner 589df37142 Move to new `const` model, part 1: remove ConstOpPointer.
This eliminate ConstOpPointer (but keeps OpPointer for now) by making OpPointer
implicitly launder const in a const incorrect way.  It will eventually go away
entirely, this is a progressive step towards the new const model.

PiperOrigin-RevId: 239512640
2019-03-29 17:26:56 -07:00
Nicolas Vasilache d6c650cfb5 Properly propagate induction variable in tiling
This CL fixes an issue where cloned loop induction variables were not properly
propagated and beefs up the corresponding test.

PiperOrigin-RevId: 239422961
2019-03-29 17:25:53 -07:00
Jacques Pienaar a8ed2ca8fd Cleanup for changes failing with std=c++11
The static constexpr were failing with undefined reference due to lacking definition at namespace scope.

PiperOrigin-RevId: 239241157
2019-03-29 17:25:24 -07:00
Jacques Pienaar 57270a9a99 Remove some statements that required >C++11, add includes and qualify names. NFC.
PiperOrigin-RevId: 239197784
2019-03-29 17:24:53 -07:00
Dimitrios Vytiniotis ee4cfefca8 Avoiding allocations during argument attribute conversion.
PiperOrigin-RevId: 239144675
2019-03-29 17:24:38 -07:00
Nicolas Vasilache c3b0c6a0dc Cleanups Vectorize and SliceAnalysis - NFC
This CL cleans up and refactors super-vectorization and slice analysis.

PiperOrigin-RevId: 238986866
2019-03-29 17:23:07 -07:00
Nicolas Vasilache a89d8c0a1a Port Tablegen'd reference implementation of Add to declarative builders.
PiperOrigin-RevId: 238977252
2019-03-29 17:22:36 -07:00
Nicolas Vasilache 3a12bc5041 Remove LOAD/STORE/RETURN boilerplate in declarative builders.
This CL introduces a ValueArrayHandle helper to manage the implicit conversion
of ArrayRef<ValueHandle> -> ArrayRef<Value*> by converting first to ValueArrayHandle.
Without this, boilerplate operations that take ArrayRef<Value*> cannot be removed easily.

This all seems to boil down to decoupling Value from Type.
Alternative solutions exist (e.g. MLIR using Value by value everywhere) but they would be very intrusive. This seems to be the lowest impedance change.

Intrinsics are also lowercased by popular demand.

PiperOrigin-RevId: 238974125
2019-03-29 17:22:20 -07:00
Nicolas Vasilache f43388e4ce Port LowerVectorTransfers from EDSC + AST to declarative builders
This CL removes the dependency of LowerVectorTransfers on the AST version of EDSCs which will be retired.

This exhibited a pretty fundamental staging difference in AST-based vs declarative based emission.

Since the delayed creation with an AST was staged, the loop order came into existence after the clipping expressions were computed.
This now changes as the loops first need to be created declaratively in fixed order and then the clipping expressions are created.
Also, due to lack of staging, coalescing cannot be done on the fly anymore and
needs to be done either as a pre-pass (current implementation) or as a local transformation on the generated IR (future work).

Tests are updated accordingly.

PiperOrigin-RevId: 238971631
2019-03-29 17:22:06 -07:00
River Riddle 27d1bb920e Cache the simplified attributes in SimplifyAffineStructures to avoid redundant simplifications, as well as unnecessary accesses to the MLIRContext.
PiperOrigin-RevId: 238654325
2019-03-29 17:20:46 -07:00
Alex Zinenko 276fae1b0d Rename BlockList into Region
NFC.  This is step 1/n to specifying regions as parts of any operation.

PiperOrigin-RevId: 238472370
2019-03-29 17:18:04 -07:00
Uday Bondhugula a228b7d477 Change getMemoryFootprintBytes emitError to a warning
- this is really not a hard error; emit a warning instead (for inability to compute
  footprint due to the union failing due to unimplemented cases)
- remove a misleading warning from LoopFusion.cpp

PiperOrigin-RevId: 238118711
2019-03-29 17:16:12 -07:00
Uday Bondhugula 9f2781e8dd Fix misc bugs / TODOs / other improvements to analysis utils
- fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent
- make getConstantBoundOnDimSize also return the identifier upper bound
- fix unionBoundingBox to correctly use the divisor and upper bound identified by
  getConstantBoundOnDimSize
- deal with loop step correctly in addAffineForOpDomain (covers most cases now)
- fully compose bound map / operands and simplify/canonicalize before adding
  dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add
  test case there
- expose mlir::isTopLevelSymbol from AffineOps

PiperOrigin-RevId: 238050395
2019-03-29 17:15:27 -07:00
Uday Bondhugula 075090f891 Extend loop unrolling and unroll-jamming to non-matching bound operands and
multi-result upper bounds, complete TODOs, fix/improve test cases.

- complete TODOs for loop unroll/unroll-and-jam. Something as simple as
  "for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written
  as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now.

- update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it
  more powerful as it composes inputs into it

- getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same
  code; refactor and remove one.

- reorganize test cases, write previous ones better; most of these changes are
  "label replacements".

- fix wrongly labeled test cases in unroll-jam.mlir

PiperOrigin-RevId: 238014653
2019-03-29 17:14:12 -07:00
River Riddle 5e1f1d2cab Update the constantFold/fold API to use LogicalResult instead of bool.
PiperOrigin-RevId: 237719658
2019-03-29 17:10:50 -07:00
River Riddle 0310d49f46 Move the success/failure functions out of LogicalResult and into the mlir namespace.
PiperOrigin-RevId: 237712180
2019-03-29 17:10:21 -07:00
River Riddle 80d3568c0a Rename Status to LogicalResult to avoid conflictions with the Status in xla/tensorflow/etc.
PiperOrigin-RevId: 237537341
2019-03-29 17:08:50 -07:00
Uday Bondhugula ce7e59536c Add a basic model to set tile sizes + some cleanup
- compute tile sizes based on a simple model that looks at memory footprints
  (instead of using the hardcoded default value)
- adjust tile sizes to make them factors of trip counts based on an option
- update loop fusion CL options to allow setting maximal fusion at pass creation
- change an emitError to emitWarning (since it's not a hard error unless the client
  treats it that way, in which case, it can emit one)

$ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir

test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ]

  for %i = 0 to 256 {

for %i0 = 0 to 256 step 4 {
    for %i1 = 0 to 256 step 4 {
      for %i2 = 0 to 250 step 5 {
        for %i3 = #map4(%i0) to #map11(%i0) {
          for %i4 = #map4(%i1) to #map11(%i1) {
            for %i5 = #map4(%i2) to #map12(%i2) {
              %0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>>
              %1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>>
              %2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
              %3 = mulf %0, %1 : vector<64xf32>
              %4 = addf %2, %3 : vector<64xf32>
              store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
            }
          }
        }
      }
    }
  }

PiperOrigin-RevId: 237461836
2019-03-29 17:06:51 -07:00
River Riddle 1e55ae19a0 Convert ambiguous bool returns in /Analysis to use Status instead.
PiperOrigin-RevId: 237390240
2019-03-29 17:06:21 -07:00
River Riddle 10ddae6d88 Use Status instead of bool in DialectConversion.
PiperOrigin-RevId: 237339277
2019-03-29 17:06:06 -07:00
River Riddle ba6fdc8b01 Move UtilResult into the Support directory and rename it to Status. Status provides an unambiguous way to specify success/failure results. These can be generated by 'Status::success()' and Status::failure()'. Status provides no implicit conversion to bool and should be consumed by one of the following utility functions:
* bool succeeded(Status)
  - Return if the status corresponds to a success value.

* bool failed(Status)
  - Return if the status corresponds to a failure value.

PiperOrigin-RevId: 237153884
2019-03-29 17:04:19 -07:00
River Riddle d43f630de8 NFC: Remove 'Result' from the analysis manager api to better reflect the implementation. There is no distinction between analysis computation and result.
PiperOrigin-RevId: 237093101
2019-03-29 17:02:12 -07:00
River Riddle 1d87b62afe Add support for preserving specific analyses in the analysis manager. Passes can now preserve specific analyses via 'markAnalysesPreserved'.
Example:

markAnalysesPreserved<DominanceInfo>();
markAnalysesPreserved<DominanceInfo, PostDominanceInfo>();

PiperOrigin-RevId: 237081454
2019-03-29 17:01:41 -07:00
MLIR Team c1ff9e866e Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union for loop fusion pass (WIP).
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.

PiperOrigin-RevId: 236973261
2019-03-29 16:59:21 -07:00
Uday Bondhugula 5836fae8a0 DMA generation CL flag update
- allow mem capacity to be overridden by command-line flag
- change default fast mem space to 2

PiperOrigin-RevId: 236951598
2019-03-29 16:59:05 -07:00
Uday Bondhugula 02af8c22df Change Pass:getFunction() to return pointer instead of ref - NFC
- change this for consistency - everything else similar takes/returns a
  Function pointer - the FuncBuilder ctor,
  Block/Value/Instruction::getFunction(), etc.
- saves a whole bunch of &s everywhere

PiperOrigin-RevId: 236928761
2019-03-29 16:58:35 -07:00
Nicolas Vasilache 069c818f40 Fix lower/upper bound mismatch in stripmineSink
Also beef up the corresponding test case.

PiperOrigin-RevId: 236878818
2019-03-29 16:57:21 -07:00
Dimitrios Vytiniotis a60ba7d908 Supporting conversion of argument attributes along their types.
This fixes a bug: previously, during conversion function argument
attributes were neither beings passed through nor converted. This fix
extends DialectConversion to allow for simultaneous conversion of the
function type and the argument attributes.

This was important when lowering MLIR to LLVM where attribute
information (e.g. noalias) needs to be preserved in MLIR(LLVMDialect).

Longer run it seems reasonable that we want to convert both the
function attribute and its type and the argument attributes, but that
requires a small refactoring in Function.h to aggregate these three
fields in an inner struct, which will require some discussion.

PiperOrigin-RevId: 236709409
2019-03-29 16:55:51 -07:00
MLIR Team d42ef78a75 Handle MemRefRegion::compute return value in loop fusion pass (NFC).
PiperOrigin-RevId: 236685849
2019-03-29 16:55:20 -07:00
River Riddle 485746f524 Implement the initial AnalysisManagement infrastructure, with the introduction of the FunctionAnalysisManager and ModuleAnalysisManager classes. These classes provide analysis computation, caching, and invalidation for a specific IR unit. The invalidation is currently limited to either all or none, i.e. you cannot yet preserve specific analyses.
An analysis can be any class, but it must provide the following:
* A constructor for a given IR unit.

struct MyAnalysis {
  // Compute this analysis with the provided module.
  MyAnalysis(Module *module);
};

Analyses can be accessed from a Pass by calling either the 'getAnalysisResult<AnalysisT>' or 'getCachedAnalysisResult<AnalysisT>' methods. A FunctionPass may query for a cached analysis on the parent module with 'getCachedModuleAnalysisResult'. Similary, a ModulePass may query an analysis, it doesn't need to be cached, on a child function with 'getFunctionAnalysisResult'.

By default, when running a pass all cached analyses are set to be invalidated. If no transformation was performed, a pass can use the method 'markAllAnalysesPreserved' to preserve all analysis results. As noted above, preserving specific analyses is not yet supported.

PiperOrigin-RevId: 236505642
2019-03-29 16:54:50 -07:00
Uday Bondhugula eee85361bb Remove hidden flag from fusion CL options
PiperOrigin-RevId: 236409185
2019-03-29 16:54:05 -07:00
River Riddle f37651c708 NFC. Move all of the remaining operations left in BuiltinOps to StandardOps. The only thing left in BuiltinOps are the core MLIR types. The standard types can't be moved because they are referenced within the IR directory, e.g. in things like Builder.
PiperOrigin-RevId: 236403665
2019-03-29 16:53:35 -07:00
Lei Zhang 85d9b6c8f7 Use consistent names for dialect op source files
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:

  <full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}

Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.

Purely mechanical. NFC.

PiperOrigin-RevId: 236371358
2019-03-29 16:53:19 -07:00
MLIR Team d038e34735 Loop fusion for input reuse.
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.

PiperOrigin-RevId: 236350987
2019-03-29 16:52:35 -07:00
River Riddle ddc6788cc7 Provide a Builder::getNamedAttr and (Instruction|Function)::setAttr(StringRef, Attribute) to simplify attribute manipulation.
PiperOrigin-RevId: 236222504
2019-03-29 16:50:59 -07:00
River Riddle ed5fe2098b Remove PassResult and have the runOnFunction/runOnModule functions return void instead. To signal a pass failure, passes should now invoke the 'signalPassFailure' method. This provides the equivalent functionality when needed, but isn't an intrusive part of the API like PassResult.
PiperOrigin-RevId: 236202029
2019-03-29 16:50:44 -07:00
Uday Bondhugula 58889884a2 Change some of the debug messages to use emitError / emitWarning / emitNote - NFC
PiperOrigin-RevId: 236169676
2019-03-29 16:50:29 -07:00
River Riddle c6c534493d Port all of the existing passes over to the new pass manager infrastructure. This is largely NFC.
PiperOrigin-RevId: 235952357
2019-03-29 16:47:14 -07:00
Uday Bondhugula 7aa60a383f Temp change in FlatAffineConstraints::getSliceBounds() to deal with TODO in
LoopFusion

- getConstDifference in LoopFusion is pending a refactoring to handle bounds
  with min's and max's; it currently asserts on some useful test cases that we
  want to experiment with. This CL changes getSliceBounds to be more
  conservative so as to not trigger the assertion. Filed b/126426796 to track this.

PiperOrigin-RevId: 235826538
2019-03-29 16:45:23 -07:00
Uday Bondhugula d4b3ff1096 Loop fusion comand line options cleanup
- clean up loop fusion CL options for promoting local buffers to fast memory
  space
- add parameters to loop fusion pass instantiation

PiperOrigin-RevId: 235813419
2019-03-29 16:44:38 -07:00
River Riddle cdbfd48471 Rewrite the dominance info classes to allow for operating on arbitrary control flow within operation regions. The CSE pass is also updated to properly handle nested dominance.
PiperOrigin-RevId: 235742627
2019-03-29 16:43:35 -07:00
Nicolas Vasilache 62c54a2ec4 Add a stripmineSink and imperfectly nested tiling primitives.
This CL adds a primitive to perform stripmining of a loop by a given factor and
sinking it under multiple target loops.
In turn this is used to implement imperfectly nested loop tiling (with interchange) by repeatedly calling the stripmineSink primitive.

The API returns the point loops and allows repeated invocations of tiling to achieve declarative, multi-level, imperfectly-nested tiling.

Note that this CL is only concerned with the mechanical aspects and does not worry about analysis and legality.

The API is demonstrated in an example which creates an EDSC block, emits the corresponding MLIR and applies imperfectly-nested tiling:

```cpp
    auto block = edsc::block({
      For(ArrayRef<edsc::Expr>{i, j}, {zero, zero}, {M, N}, {one, one}, {
        For(k1, zero, O, one, {
          C({i, j, k1}) = A({i, j, k1}) + B({i, j, k1})
        }),
        For(k2, zero, O, one, {
          C({i, j, k2}) = A({i, j, k2}) + B({i, j, k2})
        }),
      }),
    });
    // clang-format on
    emitter.emitStmts(block.getBody());

    auto l_i = emitter.getAffineForOp(i), l_j = emitter.getAffineForOp(j),
         l_k1 = emitter.getAffineForOp(k1), l_k2 = emitter.getAffineForOp(k2);
    auto indicesL1 = mlir::tile({l_i, l_j}, {512, 1024}, {l_k1, l_k2});
    auto l_ii1 = indicesL1[0][0], l_jj1 = indicesL1[1][0];
    mlir::tile({l_jj1, l_ii1}, {32, 16}, l_jj1);
```

The edsc::Expr for the induction variables (i, j, k_1, k_2) provide the programmatic hooks from which tiling can be applied declaratively.

PiperOrigin-RevId: 235548228
2019-03-29 16:41:20 -07:00
Uday Bondhugula dfe07b7bf6 Refactor AffineExprFlattener and move FlatAffineConstraints out of IR into
Analysis - NFC

- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
  doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
  could be moved out of IR/; the simplification that the IR needs for
  AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
  all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
  class

- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor

PiperOrigin-RevId: 235283610
2019-03-29 16:39:32 -07:00
River Riddle f48716146e NFC: Make DialectConversion not directly inherit from ModulePass. It is now just a utility class that performs dialect conversion on a provided module.
PiperOrigin-RevId: 235194067
2019-03-29 16:38:57 -07:00
River Riddle 5410dff790 Rewrite MLPatternLoweringPass to no longer inherit from FunctionPass and just provide a utility function that applies ML patterns.
PiperOrigin-RevId: 235194034
2019-03-29 16:38:41 -07:00
MLIR Team 8564b274db Internal change
PiperOrigin-RevId: 235191129
2019-03-29 16:38:24 -07:00
River Riddle 3e656599f1 Define a PassID class to use when defining a pass. This allows for the type used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures.
PiperOrigin-RevId: 235107957
2019-03-29 16:37:12 -07:00
Uday Bondhugula 4d3af6be82 Print debug message better + switch a dma-generate cl opt to uint64_t
PiperOrigin-RevId: 234840316
2019-03-29 16:35:41 -07:00
Uday Bondhugula a1dad3a5d9 Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox
- compute slices precisely where the destination iteration depends on multiple source
  iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case

PiperOrigin-RevId: 234714069
2019-03-29 16:33:11 -07:00
River Riddle 48ccae2476 NFC: Refactor the files related to passes.
* PassRegistry is split into its own source file.
* Pass related files are moved to a new library 'Pass'.

PiperOrigin-RevId: 234705771
2019-03-29 16:32:56 -07:00
Uday Bondhugula 5021dc4fa0 DMA placement update - hoist loops invariant DMAs
- hoist DMAs past all loops immediately surrounding the region that the latter
  is invariant on - do this at DMA generation time itself

PiperOrigin-RevId: 234628447
2019-03-29 16:32:41 -07:00
Uday Bondhugula 4ca6219099 Update pass documentation + improve/fix some comments
- add documentation for passes
- improve / fix outdated doc comments

PiperOrigin-RevId: 234627076
2019-03-29 16:32:11 -07:00
River Riddle da0ebe0670 Add a generic pattern matcher for matching constant values produced by an operation with zero operands and a single result.
PiperOrigin-RevId: 234616691
2019-03-29 16:31:56 -07:00
Alex Zinenko b4dba895a6 EDSC: make Expr typed and extensible
Expose the result types of edsc::Expr, which are now stored for all types of
Exprs and not only for the variadic ones.  Require return types when an Expr is
constructed, if it will ever have some.  An empty return type list is
interpreted as an Expr that does not create a value (e.g. `return` or `store`).

Conceptually, all edss::Exprs are now typed, with the type being a (potentially
empty) tuple of return types.  Unbound expressions and Bindables must now be
constructed with a specific type they will take.  This makes EDSC less
evidently type-polymorphic, but we can still write generic code such as

    Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; }

and use it to construct different typed expressions as

    sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx)));
    sumOfSquares(Bindable(FloatType::getF32(ctx)),
                 Bindable(FloatType::getF32(ctx)));

On the positive side, we get the following.
1. We can now perform type checking when constructing Exprs rather than during
   MLIR emission.  Nevertheless, this is still duplicates the Op::verify()
   until we can factor out type checking from that.
2. MLIREmitter is significantly simplified.
3. ExprKind enum is only used for actual kinds of expressions.  Data structures
   are converging with AbstractOperation, and the users can now create a
   VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even
   an unregistered one without having to extend the enum and make pervasive
   changes to EDSCs.

On the negative side, we get the following.
1. Typed bindables are more verbose, even in Python.
2. We lose the ability to do print debugging for higher-level EDSC abstractions
   that are implemented as multiple MLIR Ops, for example logical disjunction.

This is the step 2/n towards making EDSC extensible.

***

Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr
now has sufficient information to build itself.

This is the step 3/n towards making EDSC extensible.

Both of these strive to minimize the amount of irrelevant changes.  In
particular, this introduces more complex pretty-printing for affine and binary
expression to make sure tests continue to pass.  It also relies on string
comparison to identify specific operations that an Expr produces.

PiperOrigin-RevId: 234609882
2019-03-29 16:31:26 -07:00
Alex Zinenko 0a4c940c1b EDSC: introduce support for blocks
EDSC currently implement a block as a statement that is itself a list of
statements.  This suffers from two modeling problems: (1) these blocks are not
addressable, i.e. one cannot create an instruction where thus constructed block
is a successor; (2) they support block nesting, which is not supported by MLIR
blocks.  Furthermore, emitting such "compound statement" (misleadingly named
`Block` in Python bindings) does not actually produce a new Block in the IR.

Implement support for creating actual IR Blocks in EDSC.  In particular, define
a new StmtBlock EDSC class that is neither an Expr nor a Stmt but contains a
list of Stmts.  Additionally, StmtBlock may have (early-) typed arguments.
These arguments are Bindable expressions that can be used inside the block.
Provide two calls in the MLIREmitter, `emitBlock` that actually emits a new
block and `emitBlockBody` that only emits the instructions contained in the
block without creating a new block.  In the latter case, the instructions must
not use block arguments.

Update Python bindings to make it clear when instruction emission happens
without creating a new block.

PiperOrigin-RevId: 234556474
2019-03-29 16:30:56 -07:00
Uday Bondhugula f97c1c5b06 Misc. updates/fixes to analysis utils used for DMA generation; update DMA
generation pass to make it drop certain assumptions, complete TODOs.

- multiple fixes for getMemoryFootprintBytes
  - pass loopDepth correctly from getMemoryFootprintBytes()
  - use union while computing memory footprints

- bug fixes for addAffineForOpDomain
  - take into account loop step
  - add domains of other loop IVs in turn that might have been used in the bounds

- dma-generate: drop assumption of "non-unit stride loops being tile space loops
  and skipping those and recursing to inner depths"; DMA generation is now purely
  based on available fast mem capacity and memory footprint's calculated

- handle memory region compute failures/bailouts correctly from dma-generate

- loop tiling cleanup/NFC

- update some debug and error messages to use emitNote/emitError in
  pipeline-data-transfer pass - NFC

PiperOrigin-RevId: 234245969
2019-03-29 16:30:26 -07:00
MLIR Team 58aa383e60 Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region.
PiperOrigin-RevId: 234240958
2019-03-29 16:30:11 -07:00
MLIR Team 8f5f2c765d LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests.
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.

PiperOrigin-RevId: 234158370
2019-03-29 16:29:26 -07:00
Alex Zinenko d7aa700ccb Dialect conversion: decouple function signature conversion from type conversion
Function types are built-in in MLIR and affect the validity of the IR itself.
However, advanced target dialects such as the LLVM IR dialect may include
custom function types.  Until now, dialect conversion was expecting function
types not to be converted to the custom type: although the signatures was
allowed to change, the outer type must have been an mlir::FunctionType.  This
effectively prevented dialect conversion from creating instructions that
operate on values of the custom function type.

Dissociate function signature conversion from general type conversion.
Function signature conversion must still produce an mlir::FunctionType and is
used in places where built-in types are required to make IR valid.  General
type conversion is used for SSA values, including function and block arguments
and function results.

Exercise this behavior in the LLVM IR dialect conversion by converting function
types to LLVM IR function pointer types.  The pointer to a function is chosen
to provide consistent lowering of higher-order functions: while it is possible
to have a value of function type, it is not possible to create a function type
accepting a returning another function type.

PiperOrigin-RevId: 234124494
2019-03-29 16:28:41 -07:00
Uday Bondhugula 6b7a49dd6a Add -tile-sizes command line option for loop tiling; clean up cl options for
for dma-generate, loop-unroll.

- add -tile-sizes command line option for loop tiling to specify different tile
  sizes for loops in a band

- clean up command line options for loop-unroll, dma-generate (remove
  cl::hidden)

PiperOrigin-RevId: 234006232
2019-03-29 16:28:10 -07:00
Uday Bondhugula 00860662a2 Generate dealloc's for alloc's of pipeline-data-transfer
- for the DMA transfers being pipelined through double buffering, generate
  deallocs for the double buffers being alloc'ed

This change is along the lines of cl/233502632. We initially wanted to experiment with
scoped allocation - so the deallocation's were usually not necessary; however, they are
needed even with scoped allocations in some situations - for eg. when the enclosing loop
gets unrolled. The dealloc serves as an end of lifetime marker.

PiperOrigin-RevId: 233653463
2019-03-29 16:25:53 -07:00
River Riddle 4755774d16 Make IndexType a standard type instead of a builtin. This also cleans up some unnecessary factory methods on the Type class.
PiperOrigin-RevId: 233640730
2019-03-29 16:25:38 -07:00
Alex Zinenko 0e59e5c49b EDSC: move Expr and Stmt construction operators to a namespace
In the current state, edsc::Expr and edsc::Stmt overload operators to construct
other Exprs and Stmts.  This includes some unconventional overloads of the
`operator==` to create a comparison expression and of the `operator!` to create
a negation expression.  This situation could lead to unpleasant surprises where
the code does not behave like expected.  Make all Expr and Stmt construction
operators free functions and move them to the `edsc::op` namespace.  Callers
willing to use these operators must explicitly include them with the `using`
declaration.  This can be done in some local scope.

Additionally, we currently emit signed comparisons for order-comparison
operators.  With namespaces, we can later introduce two sets of operators in
different namespace, e.g. `edsc::op::sign` and `edsc::op::unsign` to clearly
state which kind of comparison is implied.

PiperOrigin-RevId: 233578674
2019-03-29 16:25:08 -07:00
Uday Bondhugula 8b3f841daf Generate dealloc's for the alloc's of dma-generate.
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass

Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.

PiperOrigin-RevId: 233502632
2019-03-29 16:24:08 -07:00
River Riddle 366ebcf6aa Remove the restriction that only registered terminator operations may terminate a block and have block operands. This allows for any operation to hold block operands. It also introduces the notion that unregistered operations may terminate a block. As such, the 'isTerminator' api on Instruction has been split into 'isKnownTerminator' and 'isKnownNonTerminator'.
PiperOrigin-RevId: 233076831
2019-03-29 16:22:23 -07:00
Uday Bondhugula c419accea3 Automated rollback of changelist 232728977.
PiperOrigin-RevId: 232944889
2019-03-29 16:21:38 -07:00
River Riddle a886625813 Modify the canonicalizations of select and muli to use the fold hook.
This also extends the greedy pattern rewrite driver to add the operands of folded operations back to the worklist.

PiperOrigin-RevId: 232878959
2019-03-29 16:20:06 -07:00
Uday Bondhugula 4ba8c9147d Automated rollback of changelist 232717775.
PiperOrigin-RevId: 232807986
2019-03-29 16:19:33 -07:00
River Riddle 99fee0b181 When canonicalizing only erase the operation after calling the 'fold' hook if replacement results were supplied. This fixes a bug where the operation would always get erased, even if it was modified in place.
PiperOrigin-RevId: 232757964
2019-03-29 16:19:17 -07:00
River Riddle fd2d7c857b Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace
the AffineOps dialect with 'affine'.

PiperOrigin-RevId: 232728977
2019-03-29 16:18:59 -07:00
River Riddle 90d10b4e00 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
2019-03-29 16:17:59 -07:00
River Riddle 3227dee15d NFC: Rename affine_apply to affine.apply. This is the first step to adding a namespace to the affine dialect.
PiperOrigin-RevId: 232707862
2019-03-29 16:17:29 -07:00
MLIR Team b9dde91ea6 Adds the ability to compute the MemRefRegion of a sliced loop nest. Utilizes this feature during loop fusion cost computation, to compute what the write region of a fusion candidate loop nest slice would be (without having to materialize the slice or change the IR).
*) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice.
*) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes.

PiperOrigin-RevId: 232706165
2019-03-29 16:17:15 -07:00
River Riddle 0c65cf283c Move the AffineFor loop bound folding to a canonicalization pattern on the AffineForOp.
PiperOrigin-RevId: 232610715
2019-03-29 16:16:11 -07:00
River Riddle 10237de8eb Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring:
* AffineStructures has moved to IR.

* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.

* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.

* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.

PiperOrigin-RevId: 232586468
2019-03-29 16:15:41 -07:00
MLIR Team a78edcda5b Loop fusion improvements:
*) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion.
*) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check.

PiperOrigin-RevId: 232477853
2019-03-29 16:13:50 -07:00
Uday Bondhugula ed27b40085 Remove stray debug output - NFC
PiperOrigin-RevId: 232390076
2019-03-29 16:13:17 -07:00
River Riddle bf9c381d1d Remove InstWalker and move all instruction walking to the api facilities on Function/Block/Instruction.
PiperOrigin-RevId: 232388113
2019-03-29 16:12:59 -07:00
River Riddle c9ad4621ce NFC: Move AffineApplyOp to the AffineOps dialect. This also moves the isValidDim/isValidSymbol methods from Value to the AffineOps dialect.
PiperOrigin-RevId: 232386632
2019-03-29 16:12:40 -07:00
Uday Bondhugula 0f50414fa4 Refactor common code getting memref access in getMemRefRegion - NFC
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
  allocation and unique_ptr where possible)

- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
  arguments. Since the method is just returning copies of 'Value *', the client
  can't mutate the pointers themselves; it's fine to mutate the 'Value''s
  themselves, but that doesn't mutate the pointers to those.

- change the way extractForInductionVars returns (see b/123437690)

PiperOrigin-RevId: 232359277
2019-03-29 16:12:25 -07:00
River Riddle b499277fb6 Remove remaining usages of OperationInst in lib/Transforms.
PiperOrigin-RevId: 232323671
2019-03-29 16:10:53 -07:00
River Riddle a3d9ccaecb Replace the walkOps/visitOperationInst variants from the InstWalkers with the Instruction variants.
PiperOrigin-RevId: 232322030
2019-03-29 16:10:24 -07:00
Uday Bondhugula b26900dce5 Update dma-generate pass to (1) work on blocks of instructions (instead of just
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors

- change dma-generate pass to work on blocks of instructions (start/end
  iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
  straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
  in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
  generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
  capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
  option (besides command line flag)

- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
  replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods

Eg. output

$ mlir-opt  -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
            ^

$ mlir-opt -debug-only=dma-generate  -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {

PiperOrigin-RevId: 232297044
2019-03-29 16:09:52 -07:00
River Riddle de2d0dfbca Fold the functionality of OperationInst into Instruction. OperationInst still exists as a forward declaration and will be removed incrementally in a set of followup cleanup patches.
PiperOrigin-RevId: 232198540
2019-03-29 16:09:19 -07:00
River Riddle 126ec14e2d Fix the handling of the resizable operands bit of OperationState in a few places.
PiperOrigin-RevId: 232163738
2019-03-29 16:08:28 -07:00
Uday Bondhugula 8be2627436 Promote local buffers created post fusion to higher memory space
- fusion already includes the necessary analysis to create small/local buffers
  post fusion; allocate these buffers in a higher memory space if the necessary
  pass parameters are provided (threshold size, memory space id)

- although there will be a separate utility at some point to directly detect
  and promote small local buffers to higher memory spaces, doing it while fusion
  when possible is much less expensive, comes free with fusion analysis, and covers
  a key common case.

PiperOrigin-RevId: 232063894
2019-03-29 16:07:23 -07:00
River Riddle 5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
Nicolas Vasilache 0353ef99eb Cleanup EDSCs and start a functional auto-generated library of custom Ops
This CL applies the following simplifications to EDSCs:
1. Rename Block to StmtList because an MLIR Block is a different, not yet
supported, notion;
2. Rework Bindable to drop specific storage and just use it as a simple wrapper
around Expr. The only value of Bindable is to force a static cast when used by
the user to bind into the emitter. For all intended purposes, Bindable is just
a lightweight check that an Expr is Unbound. This simplifies usage and reduces
the API footprint. After playing with it for some time, it wasn't worth the API
cognition overhead;
3. Replace makeExprs and makeBindables by makeNewExprs and copyExprs which is
more explicit and less easy to misuse;
4. Add generally useful functionality to MLIREmitter:
  a. expose zero and one for the ubiquitous common lower bounds and step;
  b. add support to create already bound Exprs for all function arguments as
  well as shapes and views for Exprs bound to memrefs.
5. Delete Stmt::operator= and replace by a `Stmt::set` method which is more
explicit.
6. Make Stmt::operator Expr() explicit.
7. Indexed.indices assertions are removed to pave the way for expressing slices
and views as well as to work with 0-D memrefs.

The CL plugs those simplifications with TableGen and allows emitting a full MLIR function for
pointwise add.

This "x.add" op is both type and rank-agnostic (by allowing ArrayRef of Expr
passed to For loops) and opens the door to spinning up a composable library of
existing and custom ops that should automate a lot of the tedious work in
TF/XLA -> MLIR.

Testing needs to be significantly improved but can be done in a separate CL.

PiperOrigin-RevId: 231982325
2019-03-29 16:05:23 -07:00
River Riddle 9f22a2391b Define an detail::OperandStorage class to handle managing instruction operands. This class stores operands in a similar way to SmallVector except for two key differences. The first is the inline storage, which is a trailing objects array. The second is that being able to dynamically resize the operand list is optional. This means that we can enable the cases where operations need to change the number of operands after construction without losing the spatial locality benefits of the common case (operation instructions / non-control flow instructions with a lifetime fixed number of operands).
PiperOrigin-RevId: 231910497
2019-03-29 16:05:08 -07:00
Nicolas Vasilache d4921f4a96 Address Performance issue in NestedMatcher
A performance issue was reported due to the usage of NestedMatcher in
ComposeAffineMaps. The main culprit was the ubiquitous copies that were
occuring when appending even a single element in `matchOne`.

This CL generally simplifies the implementation and removes one level of indirection by getting rid of
auxiliary storage as well as simplifying the API.
The users of the API are updated accordingly.

The implementation was tested on a heavily unrolled example with
ComposeAffineMaps and is now close in performance with an implementation based
on stateless InstWalker.

As a reminder, the whole ComposeAffineMaps pass is slated to disappear but the bug report was very useful as a stress test for NestedMatchers.

Lastly, the following cleanups reported by @aminim were addressed:
1. make NestedPatternContext scoped within runFunction rather than at the Pass level. This was caused by a previous misunderstanding of Pass lifetime;
2. use defensive assertions in the constructor of NestedPatternContext to make it clear a unique such locally scoped context is allowed to exist.

PiperOrigin-RevId: 231781279
2019-03-29 16:04:07 -07:00
MLIR Team 1e85191d07 Fix ASAN issue: snapshot edge list before loop which can modify this list.
PiperOrigin-RevId: 231686040
2019-03-29 16:03:38 -07:00
MLIR Team d7c824451f LoopFusion: insert the source loop nest slice at a depth in the destination loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out.
PiperOrigin-RevId: 231684492
2019-03-29 16:03:23 -07:00
Uday Bondhugula 44064d5b3b 3000x speed improvement on compose-affine-maps by dropping NestedMatcher for
a trivial inst walker :-) (reduces pass time from several minutes non-terminating to 120ms) - (fixes b/123541184)

- use a simple 7-line inst walker to collect affine_apply op's instead of the nested
  matcher; -compose-affine-maps pass runs in 120ms now instead of 5 minutes + (non-
  terminating / out of memory) - on a realistic test case that is 20,000 lines 12-d
  loop nest

- this CL is also pushing for simple existing/standard patterns unless there
  is a real efficiency issue (OTOH, fixing nested matcher to address this issue requires
  cl/231400521)

- the improvement is from swapping out the nested walker as opposed to from a bug
  or anything else that this CL changes

- update stale comment

PiperOrigin-RevId: 231623619
2019-03-29 16:02:53 -07:00
River Riddle b6928c945c Standardize the spelling of debug info to "debuginfo" in opt flags.
PiperOrigin-RevId: 231610337
2019-03-29 16:02:38 -07:00
Uday Bondhugula c0e9e5eb07 Fix getFullMemRefAsRegion() and FlatAffineConstraints::reset
PiperOrigin-RevId: 231426734
2019-03-29 16:00:39 -07:00
MLIR Team a0f3db4024 Support fusing loop nests which require insertion into a new instruction Block position while preserving dependences, opening up additional fusion opportunities.
- Adds SSA Value edges to the data dependence graph used in the loop fusion pass.

PiperOrigin-RevId: 231417649
2019-03-29 16:00:04 -07:00
River Riddle 755538328b Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231342063
2019-03-29 15:59:30 -07:00
Nicolas Vasilache ae772b7965 Automated rollback of changelist 231318632.
PiperOrigin-RevId: 231327161
2019-03-29 15:42:38 -07:00
River Riddle 5ecef2b3f6 Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231318632
2019-03-29 15:42:08 -07:00
Nicolas Vasilache 1a5287d594 Replace too obscure usage of functional::map by declare + reserve + loop.
Cleanup a usage of functional::map that is deemed too obscure in
`reindexAffineIndices`. Also fix a stale comment in `reindexAffineIndices`.

PiperOrigin-RevId: 231211184
2019-03-29 15:41:08 -07:00
Chris Lattner b42bea215a Change AffineApplyOp to produce a single result, simplifying the code that
works with it, and updating the g3docs.

PiperOrigin-RevId: 231120927
2019-03-29 15:40:38 -07:00
River Riddle 36babbd781 Change the ForInst induction variable to be a block argument of the body instead of the ForInst itself. This is a necessary step in converting ForInst into an operation.
PiperOrigin-RevId: 231064139
2019-03-29 15:40:23 -07:00
Nicolas Vasilache 0e7a8a9027 Drop AffineMap::Null and IntegerSet::Null
Addresses b/122486036

This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.

As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.

This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.

PiperOrigin-RevId: 231058512
2019-03-29 15:40:08 -07:00
Nicolas Vasilache 81c7f2e2f3 Cleanup resource management and rename recursive matchers
This CL follows up on a memory leak issue related to SmallVector growth that
escapes the BumpPtrAllocator.
The fix is to properly use ArrayRef and placement new to define away the
issue.

The following renaming is also applied:
1. MLFunctionMatcher -> NestedPattern
2. MLFunctionMatches -> NestedMatch

As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator.

PiperOrigin-RevId: 231047766
2019-03-29 15:39:53 -07:00
River Riddle 75c21e1de0 Wrap cl::opt flags within passes in a category with the pass name. This improves the help output of tools like mlir-opt.
Example:

dma-generate options:

  -dma-fast-mem-capacity                 - Set fast memory space  ...
  -dma-fast-mem-space=<uint>             - Set fast memory space  ...

loop-fusion options:

  -fusion-compute-tolerance=<number>     - Fractional increase in  ...
  -fusion-maximal                        - Enables maximal loop fusion

loop-tile options:

  -tile-size=<uint>                      - Use this tile size for  ...

loop-unroll options:

  -unroll-factor=<uint>                  - Use this unroll factor  ...
  -unroll-full                           - Fully unroll loops
  -unroll-full-threshold=<uint>          - Unroll all loops with  ...
  -unroll-num-reps=<uint>                - Unroll innermost loops  ...

loop-unroll-jam options:

  -unroll-jam-factor=<uint>              - Use this unroll jam factor ...

PiperOrigin-RevId: 231019363
2019-03-29 15:39:38 -07:00
Uday Bondhugula b4a1443508 Update replaceAllMemRefUsesWith to generate single result affine_apply's for
index remapping
- generate a sequence of single result affine_apply's for the index remapping
  (instead of one multi result affine_apply)
- update dma-generate and loop-fusion test cases; while on this, change test cases
  to use single result affine apply ops
- some fusion comment fix/cleanup

PiperOrigin-RevId: 230985830
2019-03-29 15:38:23 -07:00
Uday Bondhugula b588d58c5f Update createAffineComputationSlice to generate single result affine maps
- Update createAffineComputationSlice to generate a sequence of single result
  affine apply ops instead of one multi-result affine apply
- update pipeline-data-transfer test case; while on this, also update the test
  case to use only single result affine maps, and make it more robust to
  change.

PiperOrigin-RevId: 230965478
2019-03-29 15:37:53 -07:00
River Riddle c3424c3c75 Allow operations to hold a blocklist and add support for parsing/printing a block list for verbose printing.
PiperOrigin-RevId: 230951462
2019-03-29 15:37:37 -07:00
Alex Zinenko 6d37a255e2 Generic dialect conversion pass exercised by LLVM IR lowering
This commit introduces a generic dialect conversion/lowering/legalization pass
and illustrates it on StandardOps->LLVMIR conversion.

It partially reuses the PatternRewriter infrastructure and adds the following
functionality:
- an actual pass;
- non-default pattern constructors;
- one-to-many rewrites;
- rewriting terminators with successors;
- not applying patterns iteratively (unlike the existing greedy rewrite driver);
- ability to change function signature;
- ability to change basic block argument types.

The latter two things required, given the existing API, to create new functions
in the same module.  Eventually, this should converge with the rest of
PatternRewriter.  However, we may want to keep two pass versions: "heavy" with
function/block argument conversion and "light" that only touches operations.

This pass creates new functions within a module as a means to change function
signature, then creates new blocks with converted argument types in the new
function.  Then, it traverses the CFG in DFS-preorder to make sure defs are
converted before uses in the dominated blocks.  The generic pass has a minimal
interface with two hooks: one to fill in the set of patterns, and another one
to convert types for functions and blocks.  The patterns are defined as
separate classes that can be table-generated in the future.

The LLVM IR lowering pass partially inherits from the existing LLVM IR
translator, in particular for type conversion.  It defines a conversion pattern
template, instantiated for different operations, and is a good candidate for
tablegen.  The lowering does not yet support loads and stores and is not
connected to the translator as it would have broken the existing flows.  Future
patches will add missing support before switching the translator in a single
patch.

PiperOrigin-RevId: 230951202
2019-03-29 15:37:23 -07:00
Uday Bondhugula 95f19d558c Fix return value logic / error reporting in -dma-generate
PiperOrigin-RevId: 230906158
2019-03-29 15:36:23 -07:00
MLIR Team 5c5739d42b Change the dependence check in the loop fusion pass to use the MLIR instruction list ordering (instead of the dependence graph node id ordering). This breaks the overloading of dependence graph node ids as both edge endpoints and instruction list position.
PiperOrigin-RevId: 230849232
2019-03-29 15:35:53 -07:00
Uday Bondhugula f94b15c247 Update dma-generate: update for multiple load/store op's per memref
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
  supplied index remap was identity

PiperOrigin-RevId: 230848185
2019-03-29 15:35:38 -07:00
Uday Bondhugula 06d21d9f64 loop-fusion: debug info cleanup
PiperOrigin-RevId: 230817383
2019-03-29 15:35:08 -07:00
Chris Lattner 934b6d125f Introduce a new operation hook point for implementing simple local
canonicalizations of operations.  The ultimate important user of this is
going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it
is just a more convenient way to write certain classes of canonicalizations
(see the change in StandardOps.cpp).

NFC.

PiperOrigin-RevId: 230770021
2019-03-29 15:34:35 -07:00
River Riddle 451869f394 Add cloning functionality to Block and Function, this also adds support for remapping successor block operands of terminator operations. We define a new BlockAndValueMapping class to simplify mapping between cloned values.
PiperOrigin-RevId: 230768759
2019-03-29 15:34:20 -07:00
Uday Bondhugula 72e5c7f428 Minor updates + cleanup to dma-generate
- switch some debug info to emitError
- use a single constant op for zero index to make it easier to write/update
  test cases; avoid creating new constant op's for common zero index cases
- test case cleanup

This is in preparation for an upcoming major update to this pass.

PiperOrigin-RevId: 230728379
2019-03-29 15:34:06 -07:00
River Riddle f319bbbd28 Add a function pass to strip debug info from functions and instructions.
PiperOrigin-RevId: 230654315
2019-03-29 15:33:50 -07:00
River Riddle 6859f33292 Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int.
PiperOrigin-RevId: 230605756
2019-03-29 15:33:20 -07:00
MLIR Team b28009b681 Fix single producer check in loop fusion pass.
PiperOrigin-RevId: 230565482
2019-03-29 15:32:20 -07:00
Uday Bondhugula 864d9e02a1 Update fusion cost model + some additional infrastructure and debug information for -loop-fusion
- update fusion cost model to fuse while tolerating a certain amount of redundant
  computation; add cl option -fusion-compute-tolerance
  evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC

PiperOrigin-RevId: 230541857
2019-03-29 15:32:06 -07:00
Uday Bondhugula 92e9d9484c loop unroll update: unroll factor one for a single iteration loop
- unrolling a single iteration loop by a factor of one should promote its body
  into its parent; this makes it consistent with the behavior/expectation that
  unrolling a loop by a factor equal to its trip count makes the loop go away.

PiperOrigin-RevId: 230426499
2019-03-29 15:31:35 -07:00
Uday Bondhugula 1b735dfe27 Refactor -dma-generate walker - NFC
- ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have
  this instead of deriving from the InstWalker

PiperOrigin-RevId: 230413820
2019-03-29 15:31:03 -07:00
Uday Bondhugula 94a03f864f Allocate private/local buffers for slices accurately during fusion
- the size of the private memref created for the slice should be based on
  the memref region accessed at the depth at which the slice is being
  materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
  to the region accessed based on the entire domain.

- leads to a significant contraction of the temporary / intermediate memref
  whenever the memref isn't reduced to a single scalar (through store fwd'ing).

Other changes

- update to promoteIfSingleIteration - avoid introducing unnecessary identity
  map affine_apply from IV; makes it much easier to write and read test cases
  and pass output for all passes that use promoteIfSingleIteration; loop-fusion
  test cases become much simpler

- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
  'domInstFilter' could be one of the ops erased due to a memref replacement in
  it.

- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
  missing (the latter need not always be 1); add lbFloorDivisors output argument

- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape

PiperOrigin-RevId: 230405218
2019-03-29 15:30:31 -07:00
MLIR Team 71495d58a7 Handle escaping memrefs in loop fusion pass:
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).

PiperOrigin-RevId: 230398630
2019-03-29 15:30:14 -07:00
Nicolas Vasilache 9f3f39d61a Cleanup EDSCs
This CL performs a bunch of cleanups related to EDSCs that are generally
useful in the context of using them with a simple wrapping C API (not in this
CL) and with simple language bindings to Python and Swift.

PiperOrigin-RevId: 230066505
2019-03-29 15:27:58 -07:00
Lei Zhang 1e484b5ef4 Mark (void)indexRemap to please compiler for unused variable check
PiperOrigin-RevId: 229957023
2019-03-29 15:26:59 -07:00
MLIR Team c4237ae990 LoopFusion: Creates private MemRefs which are used only by operations in the fused loop.
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).

PiperOrigin-RevId: 229936698
2019-03-29 15:26:15 -07:00
Smit Hinsu 0eebe6ffd9 Update comment in the constant folding pass as constant folding is supported even when not all operands are constants
PiperOrigin-RevId: 229670189
2019-03-29 15:24:28 -07:00
Nicolas Vasilache 4573a8da9a Fix improperly indexed DimOp in LowerVectorTransfers.cpp
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.

The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`

Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`

and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:

```
    %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```

But this was actuallyincorrect.

This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.

PiperOrigin-RevId: 229622766
2019-03-29 15:24:13 -07:00
Uday Bondhugula c1ca23ef6e Some loop fusion code cleanup/simplification post cl/229575126
- enforce the assumptions better / in a simpler way

PiperOrigin-RevId: 229612424
2019-03-29 15:23:43 -07:00
MLIR Team 27d067e164 LoopFusion improvements:
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.

PiperOrigin-RevId: 229575126
2019-03-29 15:21:59 -07:00
Uday Bondhugula f99a44a7cd Address documentation/readability related comments from cl/227252907 on memref
store forwarding - NFC.

PiperOrigin-RevId: 229561933
2019-03-29 15:20:59 -07:00
Uday Bondhugula 03e15e1b9f Minor code cleanup - NFC.
- readability changes

PiperOrigin-RevId: 229443430
2019-03-29 15:19:41 -07:00
Nicolas Vasilache 424041ad58 Add EDSC sugar
This allows load, store and ForNest to be used with both Expr and Bindable.
This simplifies writing generic pieces of MLIR snippet.

For instance, a generic pointwise add can now be written:

```cpp
// Different Bindable ivs, one per loop in the loop nest.
auto ivs = makeBindables(shapeA.size());
Bindable zero, one;
// Same bindable, all equal to `zero`.
SmallVector<Bindable, 8> zeros(ivs.size(), zero);
// Same bindable, all equal to `one`.
SmallVector<Bindable, 8> ones(ivs.size(), one);
// clang-format off
Bindable A, B, C;
Stmt scalarA, scalarB, tmp;
Stmt block = edsc::Block({
  ForNest(ivs, zeros, shapeA, ones, {
    scalarA = load(A, ivs),
    scalarB = load(B, ivs),
    tmp = scalarA + scalarB,
    store(tmp, C, ivs)
  }),
});
// clang-format on
```

This CL also adds some extra support for pretty printing that will be used in
a future CL when we introduce standalone testing of EDSCs. At the momen twe
are lacking the basic infrastructure to write such tests.

PiperOrigin-RevId: 229375850
2019-03-29 15:16:53 -07:00
Uday Bondhugula 6e4f3e40c7 Fix outdated comments
PiperOrigin-RevId: 229300301
2019-03-29 15:16:08 -07:00
Lei Zhang 61ec6c0992 Swap the type and attribute parameter in ConstantOp::build()
This is to keep consistent with other TableGen generated builders
so that we can also use this builder in TableGen rules.

PiperOrigin-RevId: 229244630
2019-03-29 15:14:52 -07:00
MLIR Team 38c2fe3158 LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time).
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.

PiperOrigin-RevId: 229219757
2019-03-29 15:13:53 -07:00
Nicolas Vasilache d734c50c5f [MLIR] Clip all access dimensions during LowerVectorTransfers
This CL adds a short term remedy to an issue that was found during execution
tests.

Lowering of vector transfer ops uses the permutation map to determine which
ForInst have been super-vectorized. During materialization to HW vector sizes
however, some of those dimensions may be fully unrolled and do not appear in
the permutation map.
Such dimensions were then not clipped and may have accessed out of bounds.

This CL conservatively clips all dimensions to ensure no out of bounds access.
The longer term solution is still up for debate but will probably require
either passing more information between Materialization and lowering, or just
merging the 2 passes.

PiperOrigin-RevId: 228980787
2019-03-29 15:12:26 -07:00
Nicolas Vasilache 362557e11c Simplify compositions of AffineApply
This CL is the 6th and last on the path to simplifying AffineMap composition.
This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple
calls to `fullyComposeAffineMapAndOperands`.

PiperOrigin-RevId: 228962580
2019-03-29 15:11:56 -07:00
Nicolas Vasilache cfa5831960 Uniformize composition of AffineApplyOp by construction
This CL is the 5th on the path to simplifying AffineMap composition.
This removes the distinction between normalized single-result AffineMap and
more general composed multi-result map.

One nice byproduct of making the implementation driven by single-result is
that the multi-result extension is a trivial change: the implementation is
still single-result and we just use:

```
unsigned idx = getIndexOf(...);
map.getResult(idx);
```

This CL also fixes an AffineNormalizer implementation issue related to symbols.
Namely it stops performing substitutions on symbols in AffineNormalizer and
instead concatenates them all to be consistent with the call to
`AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform
simplifications of symbols coming from different maps based on positions only:
i.e. dims are applied and renumbered but symbols must be concatenated.

The only way to determine whether symbols from different AffineApply are the
same is to look at the concrete values. The canonicalizeMapAndOperands is thus
extended with behavior to support replacing operands that appear multiple
times.

Lastly, this CL demonstrates that the implementation is correct by rewriting
ComposeAffineMaps using only `makeComposedAffineApply`. The implementation
uses a matcher because AffineApplyOp are introduced as composed operations on
the fly instead of iteratively forwardSubstituting. For this purpose, a walker
would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps
is scheduled to disappear, this CL replaces the implementation based on
iterative `forwardSubstitute` by a composed-by-construction
`makeComposedAffineApply`.
Remaining calls to `forwardSubstitute` will be removed in the next CL.

PiperOrigin-RevId: 228830443
2019-03-29 15:08:40 -07:00
Alex Zinenko 9003490287 Implement branch-free single-division lowering of affine division/remainder
This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from
affine expressions to the arithmetic primitive operations.  Integer division
rules in affine expressions explicitly require rounding towards either negative
or positive infinity unlike machine implementations that round towards zero.
In the general case, implementing `floordiv` and `ceildiv` using machine signed
division requires computing both the quotient and the remainder.  When the
divisor is positive, this can be simplified by adjusting the dividend and the
quotient by one and switching signs.

In the current use cases, we are unlikely to encounter affine expressions with
negative divisors (affine divisions appear in loop transformations such as
tiling that guarantee that divisors are positive by construction).  Therefore,
it is reasonable to use branch-free single-division implementation.  In case of
affine maps, divisors can only be literals so we can check the sign and
implement the case for negative divisors when the need arises.

The affine lowering pass can still fail when applied to semi-affine maps
(division or modulo by a symbol).

PiperOrigin-RevId: 228668181
2019-03-29 15:07:40 -07:00
Uday Bondhugula 742c37abc9 Fix DMA overlap pass buffer mapping
- the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2);
  step wasn't being accounted for.

- fix test cases, enable failing test cases

PiperOrigin-RevId: 228635726
2019-03-29 15:07:10 -07:00
Nicolas Vasilache 1f78d63f05 [MLIR] Make SuperVectorization use normalized AffineApplyOp
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL uses the simpler single-result unbounded AffineApplyOp in the
MaterializeVectors pass.

PiperOrigin-RevId: 228469085
2019-03-29 15:05:55 -07:00
Nicolas Vasilache c6f798a976 Introduce AffineMap::compose(AffineMap)
This CL is the 2nd on the path to simplifying AffineMap composition.
This CL uses the now accepted `AffineExpr::compose(AffineMap)` to
implement `AffineMap::compose(AffineMap)`.

Implications of keeping the simplification function in
Analysis are documented where relevant.

PiperOrigin-RevId: 228276646
2019-03-29 15:04:20 -07:00
Uday Bondhugula 21baf86a2f Extend loop-fusion's slicing utility + other fixes / updates
- refactor toAffineFromEq and the code surrounding it; refactor code into
  FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
  identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
  instead, don't do anything; change cmdline flags
  src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
  a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
  never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
  IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
  all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
  we didn't have any non-hyperrectangular ones.

PiperOrigin-RevId: 228265152
2019-03-29 15:03:20 -07:00
Uday Bondhugula b934d75b8f Convert expr - c * (expr floordiv c) to expr mod c in AffineExpr
- Detect 'mod' to replace the combination of floordiv, mul, and subtract when
  possible at construction time; when 'c' is a power of two, this reduces the number of
  operations; also more compact and readable. Update simplifyAdd for this.

  On a side note:
  - with the affine expr flattening we have, a mod expression like d0 mod c
    would be flattened into d0 - c * q,  c * q <= d0 <= c*q + c - 1, with 'q'
    being added as the local variable (q = d0 floordiv c); as a result, a mod
    was turned into a floordiv whenever the expression was reconstructed back,
    i.e., as  d0 - c * (d0 floordiv c); as a result of this change, we recover
    the mod back.

- rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but
  the file hadn't been).

PiperOrigin-RevId: 228258120
2019-03-29 15:02:56 -07:00
Uday Bondhugula 56b3640b94 Misc readability and doc / code comment related improvements - NFC
- when SSAValue/MLValue existed, code at several places was forced to create additional
  aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
  rid of such redundant code

- use filling ctors instead of explicit loops

- for smallvectors, change insert(list.end(), ...) -> append(...

- improve comments at various places

- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
  getMemRefAccess. In the next CL, provide getAccess() accessors for load,
  store, DMA op's to return a MemRefAccess.

PiperOrigin-RevId: 228243638
2019-03-29 15:02:41 -07:00
Nicolas Vasilache 618c6a74c6 [MLIR] Introduce normalized single-result unbounded AffineApplyOp
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL introduces a simpler abstraction and composition of single-result
unbounded AffineApplyOp by using the existing unbound AffineMap composition.

This CL adds a simple API call and relevant tests:

```c++
OpPointer<AffineApplyOp> makeNormalizedAffineApply(
  FuncBuilder *b, Location loc, AffineMap map, ArrayRef<Value*> operands);
```

which creates a single-result unbounded AffineApplyOp.
The operands of AffineApplyOp are not themselves results of AffineApplyOp by
consrtuction.

This represent the simplest possible interface to complement the composition
of (mathematical) AffineMap, for the cases when we are interested in applying
it to Value*.

In this CL the composed AffineMap is not compressed (i.e. there exist operands
that are not part of the result). A followup commit will compress to normal
form.

The single-result unbounded AffineApplyOp abstraction will be used in a
followup CL to support the MaterializeVectors pass.

PiperOrigin-RevId: 227879021
2019-03-29 14:56:37 -07:00
Nicolas Vasilache 0ebc0ba72e [MLIR] More graceful failure in MaterializeVectors
Even though it is unexpected except in pathological cases, a nullptr clone may
be returned. This CL handles the nullptr return gracefuly.

PiperOrigin-RevId: 227764615
2019-03-29 14:55:05 -07:00
Nicolas Vasilache 5b87a5ef4b [MLIR] Drop strict super-vector requirement in MaterializeVector
The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a
premature optimization to avoid interfering with other vector code potentially
introduced via other means.

This CL avoids this premature optimization and the spurious errors it causes
when super-vector size == HW vector size (which is a possible corner case).

This may be revisited in the future.

PiperOrigin-RevId: 227763966
2019-03-29 14:54:49 -07:00
Nicolas Vasilache 947e5f4a68 [MLIR] Handle corner case in MaterializeVectors
This corner was found when stress testing with a functional end-to-end CPU
path. In the case where the hardware vector size is 1x...x1 the `keep` vector
is empty and would result a crash.

While there is no reason to expect a 1x...x1 HW vector in practice, this case
can just gracefully degrade to scalar, which is what this CL allows.

PiperOrigin-RevId: 227761097
2019-03-29 14:54:22 -07:00
River Riddle 54948a4380 Split the standard types from builtin types and move them into separate source files(StandardTypes.cpp/h). After this cl only FunctionType and IndexType are builtin types, but IndexType will likely become a standard type when the ml/cfgfunc merger is done. Mechanical NFC.
PiperOrigin-RevId: 227750918
2019-03-29 14:54:07 -07:00
Alex Zinenko 0c4ee54198 Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass
This change is mechanical and merges the LowerAffineApplyPass and
LowerIfAndForPass into a single LowerAffinePass.  It makes a step towards
defining an "affine dialect" that would contain all polyhedral-related
constructs.  The motivation for merging these two passes is based on retiring
MLFunctions and, eventually, transforming If and For statements into regular
operations.  After that happens, LowerAffinePass becomes yet another
legalization.

PiperOrigin-RevId: 227566113
2019-03-29 14:52:52 -07:00
Alex Zinenko fa710c17f4 LowerForAndIf: expand affine_apply's inplace
Existing implementation was created before ML/CFG unification refactoring and
did not concern itself with further lowering to separate concerns.  As a
result, it emitted `affine_apply` instructions to implement `for` loop bounds
and `if` conditions and required a follow-up function pass to lower those
`affine_apply` to arithmetic primitives.  In the unified function world,
LowerForAndIf is mostly a lowering pass with low complexity.  As we move
towards a dialect for affine operations (including `for` and `if`), it makes
sense to lower `for` and `if` conditions directly to arithmetic primitives
instead of relying on `affine_apply`.

Expose `expandAffineExpr` function in LoweringUtils.  Use this function
together with `expandAffineMaps` to emit primitives that implement loop and
branch conditions directly.

Also remove tests that become unnecessary after transforming LowerForAndIf into
a function pass.

PiperOrigin-RevId: 227563608
2019-03-29 14:52:22 -07:00
Alex Zinenko d64db86f20 Refactor LowerAffineApply
In LoweringUtils, extract out `expandAffineMap`.  This function takes an affine
map and a list of values the map should be applied to and emits a sequence of
arithmetic instructions that implement the affine map.  It is independent of
the AffineApplyOp and can be used in places where we need to insert an
evaluation of an affine map without relying on a (temporary) `affine_apply`
instruction.  This prepares for a merge between LowerAffineApply and
LowerForAndIf passes.

Move the `expandAffineApply` function to the LowerAffineApply pass since it is
the only place that must be aware of the `affine_apply` instructions.

PiperOrigin-RevId: 227563439
2019-03-29 14:52:07 -07:00
Chris Lattner bbf362b784 Eliminate extfunc/cfgfunc/mlfunc as a concept, and just use 'func' instead.
The entire compiler now looks at structural properties of the function (e.g.
does it have one block, does it contain an if/for stmt, etc) so the only thing
holding up this difference is round tripping through the parser/printer syntax.
Removing this shrinks the compile by ~140LOC.

This is step 31/n towards merging instructions and statements.  The last step
is updating the docs, which I will do as a separate patch in order to split it
from this mostly mechanical patch.

PiperOrigin-RevId: 227540453
2019-03-29 14:51:37 -07:00
Nicolas Vasilache 73f5c9c380 [MLIR] Sketch a simple set of EDSCs to declaratively write MLIR
This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs)
in MLIR components:
1. a `Type` system of shell classes that closely matches the MLIR type system. These
types are subdivided into `Bindable` leaf expressions and non-bindable `Expr`
expressions;
2. an `MLIREmitter` class whose purpose is to:
  a. maintain a map of `Bindable` leaf expressions to concrete SSAValue*;
  b. provide helper functionality to specify bindings of `Bindable` classes to
     SSAValue* while verifying comformable types;
  c. traverse the `Expr` and emit the MLIR.

This is used on a concrete example to implement MemRef load/store with clipping in the
LowerVectorTransfer pass. More specifically, the following pseudo-C++ code:
```c++
MLFuncBuilder *b = ...;
Location location = ...;
Bindable zero, one, expr, size;
// EDSL expression
auto access = select(expr < zero, zero, select(expr < size, expr, size - one));
auto ssaValue = MLIREmitter(b)
    .bind(zero, ...)
    .bind(one, ...)
    .bind(expr, ...)
    .bind(size, ...)
    .emit(location, access);
```
is used to emit all the MLIR for a clipped MemRef access.

This simple EDSL can easily be extended to more powerful patterns and should
serve as the counterpart to pattern matchers (and could potentially be unified
once we get enough experience).

In the future, most of this code should be TableGen'd but for now it has
concrete valuable uses: make MLIR programmable in a declarative fashion.

This CL also adds Stmt, proper supporting free functions and rewrites
VectorTransferLowering fully using EDSCs.

The code for creating the EDSCs emitting a VectorTransferReadOp as loops
with clipped loads is:

```c++
  Stmt block = Block({
    tmpAlloc = alloc(tmpMemRefType),
    vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
    ForNest(ivs, lbs, ubs, steps, {
      scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs),
      store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs),
    }),
    vectorValue = load(vectorView, zero),
    tmpDealloc = dealloc(tmpAlloc.getLHS())});
  emitter.emitStmt(block);
```

where `accessInfo.clippedScalarAccessExprs)` is created with:

```c++
select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one));
```

The generated MLIR resembles:

```mlir
    %1 = dim %0, 0 : memref<?x?x?x?xf32>
    %2 = dim %0, 1 : memref<?x?x?x?xf32>
    %3 = dim %0, 2 : memref<?x?x?x?xf32>
    %4 = dim %0, 3 : memref<?x?x?x?xf32>
    %5 = alloc() : memref<5x4x3xf32>
    %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
    for %i4 = 0 to 3 {
      for %i5 = 0 to 4 {
        for %i6 = 0 to 5 {
          %7 = affine_apply #map0(%i0, %i4)
          %8 = cmpi "slt", %7, %c0 : index
          %9 = affine_apply #map0(%i0, %i4)
          %10 = cmpi "slt", %9, %1 : index
          %11 = affine_apply #map0(%i0, %i4)
          %12 = affine_apply #map1(%1, %c1)
          %13 = select %10, %11, %12 : index
          %14 = select %8, %c0, %13 : index
          %15 = affine_apply #map0(%i3, %i6)
          %16 = cmpi "slt", %15, %c0 : index
          %17 = affine_apply #map0(%i3, %i6)
          %18 = cmpi "slt", %17, %4 : index
          %19 = affine_apply #map0(%i3, %i6)
          %20 = affine_apply #map1(%4, %c1)
          %21 = select %18, %19, %20 : index
          %22 = select %16, %c0, %21 : index
          %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32>
          store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32>
        }
      }
    }
    %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>>
    dealloc %5 : memref<5x4x3xf32>
```

In particular notice that only 3 out of the 4-d accesses are clipped: this
corresponds indeed to the number of dimensions in the super-vector.

This CL also addresses the cleanups resulting from the review of the prevous
CL and performs some refactoring to simplify the abstraction.

PiperOrigin-RevId: 227367414
2019-03-29 14:50:23 -07:00
Chris Lattner a250643ec8 Merge together the CFG/ML function paths in the CSE pass. I did a first pass
on this to merge together the classes, but there may be other simplification
possible.  I'll leave that to riverriddle@ as future work.

This is step 29/n towards merging instructions and statements.

PiperOrigin-RevId: 227328680
2019-03-29 14:50:08 -07:00
Chris Lattner 7974889f54 Update and generalize various passes to work on both CFG and ML functions,
simplifying them in minor ways.  The only significant cleanup here
is the constant folding pass.  All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.

The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).

This is step 28/n towards merging instructions and statements.

PiperOrigin-RevId: 227328115
2019-03-29 14:49:52 -07:00
Chris Lattner 3c8fc797de Simplify the remapFunctionAttrs logic, merging CFG/ML function handling.
Remove an unnecessary restriction in forward substitution.  Slightly
simplify LLVM IR lowering, which previously would crash if given an ML
function, it should now produce a clean error if given a function with an
if/for instruction in it, just like it does any other unsupported op.

This is step 27/n towards merging instructions and statements.

PiperOrigin-RevId: 227324542
2019-03-29 14:49:35 -07:00
Chris Lattner 4bd9f93606 Simplify GreedyPatternRewriteDriver now that functions are merged into one
representation, shrinking by 70LOC.  The PatternRewriter class can probably
also be simplified as well, but one step at a time.

This is step 26/n towards merging instructions and statements.  NFC.

PiperOrigin-RevId: 227324218
2019-03-29 14:49:20 -07:00
Uday Bondhugula f12182157e Introduce PostDominanceInfo, fix properlyDominates() for Instructions
- introduce PostDominanceInfo in the right/complete way and use that for post
  dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
  DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
  check; similarly, substitute dominates() in checkMemRefAccessDependence with
  a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'

PiperOrigin-RevId: 227320507
2019-03-29 14:48:44 -07:00
Chris Lattner ae618428f6 Greatly simplify the ConvertToCFG pass, converting it from a module pass to a
function pass, and eliminating the need to copy over code and do
interprocedural updates.  While here, also improve it to make fewer empty
blocks, and rename it to "LowerIfAndFor" since that is what it does.  This is
a net reduction of ~170 lines of code.

As drive-bys, change the splitBlock method to *not* insert an unconditional
branch, since that behavior is annoying for all clients.  Also improve the
AsmPrinter to not crash when a block is referenced that isn't linked into a
function.

PiperOrigin-RevId: 227308856
2019-03-29 14:48:13 -07:00
Uday Bondhugula 545f3ce430 Fix ASAN failure in memref-dataflow-opt
- memrefsToErase had duplicates inserted into it; switch to SmallPtrSet.

PiperOrigin-RevId: 227299306
2019-03-29 14:47:58 -07:00
Uday Bondhugula b9fe6be6d4 Introduce memref store to load forwarding - a simple memref dataflow analysis
- the load/store forwarding relies on memref dependence routines as well as
  SSA/dominance to identify the memref store instance uniquely supplying a value
  to a memref load, and replaces the result of that load with the value being
  stored. The memref is also deleted when possible if only stores remain.

- add methods for post dominance for MLFunction blocks.

- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
  getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
  earlier static)

- add a helper method in FlatAffineConstraints - isRangeOneToOne.

PiperOrigin-RevId: 227252907
2019-03-29 14:47:28 -07:00
Chris Lattner dffc589ad2 Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops.  Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.

This is step 25/n towards merging instructions and statements.

PiperOrigin-RevId: 227243966
2019-03-29 14:46:58 -07:00
Chris Lattner 5b9c3f7cdb Tidy up references to "basic blocks" that should refer to blocks now. NFC.
PiperOrigin-RevId: 227196077
2019-03-29 14:44:59 -07:00
Chris Lattner 456ad6a8e0 Standardize naming of statements -> instructions, revisting the code base to be
consistent and moving the using declarations over.  Hopefully this is the last
truly massive patch in this refactoring.

This is step 21/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227178245
2019-03-29 14:44:30 -07:00
Chris Lattner 315a466aed Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand.
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.

This is step 19/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227163082
2019-03-29 14:43:58 -07:00
Chris Lattner 69d9e990fa Eliminate the using decls for MLFunction and CFGFunction standardizing on
Function.

This is step 18/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227139399
2019-03-29 14:43:13 -07:00
Chris Lattner d798f9bad5 Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(),
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.

This is step 17/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227121537
2019-03-29 14:42:40 -07:00
Chris Lattner 5187cfcf03 Merge Operation into OperationInst and standardize nomenclature around
OperationInst.  This is a big mechanical patch.

This is step 16/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227093712
2019-03-29 14:42:23 -07:00
Chris Lattner 471c976413 Rework inherentance hierarchy: Operation now derives from Statement, and
OperationInst derives from it.  This allows eliminating some forwarding
functions, other complex code handling multiple paths, and the 'isStatement'
bit tracked by Operation.

This is the last patch I think I can make before the big mechanical change
merging Operation into OperationInst, coming next.

This is step 15/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227077411
2019-03-29 14:41:49 -07:00
Chris Lattner 4fbcd1ac52 Minor renamings: Trim the "Stmt" prefix off
StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the
CFGFunctionViewGraph pass to ViewFunctionGraph.

This is step 13/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227069438
2019-03-29 14:40:51 -07:00
Chris Lattner 4c05f8cac6 Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new
FuncBuilder class.  Also rename SSAValue.cpp to Value.cpp

This is step 12/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227067644
2019-03-29 14:40:22 -07:00
Chris Lattner 3f190312f8 Merge SSAValue, CFGValue, and MLValue together into a single Value class, which
is the new base of the SSA value hierarchy.  This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate.  This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.

This is step 11/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227064624
2019-03-29 14:40:06 -07:00
Chris Lattner 776b035646 Eliminate the Instruction, BasicBlock, CFGFunction, MLFunction, and ExtFunction classes, using the Statement/StmtBlock hierarchy and Function instead.
This *only* changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code.  Function gets new "isCFG()" sorts of predicates as a transitional measure.

This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go.  It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed.

This is step 10/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 227044402
2019-03-29 14:39:49 -07:00
Chris Lattner abf72a8bb1 Rename findFunction from the ML side of the house to be named getFunction(),
making it more similar to the CFG side of things.  It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.

This is step 9/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226983931
2019-03-29 14:38:49 -07:00
Chris Lattner 036f87b15f Rename CFGFunctionGraphTraits.h -> FunctionGraphTraits.h and add
graph specializations for doing CFG traversals of ML Functions, making the two
sorts of functions have the same capabilities.

This is step 8/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226968502
2019-03-29 14:38:19 -07:00
Alex Zinenko eb0f9f37af SuperVectorization: fix 'isa' assertion
Supervectorization uses null pointers to SSA values as a means of communicating
the failure to vectorize.  In operation vectorization, all operations producing
the values of operation arguments must be vectorized for the given operation to
be vectorized.  The existing check verified if any of the value "def"
statements was vectorized instead, sometimes leading to assertions inside `isa`
called on a null pointer.  Fix this to check that all "def" statements were
vectorized.

PiperOrigin-RevId: 226941552
2019-03-29 14:37:20 -07:00
Jacques Pienaar 58d50a6325 Rename convenience methods to make type explicit.
PiperOrigin-RevId: 226939383
2019-03-29 14:36:50 -07:00
Chris Lattner d613f5ab65 Refactor MLFunction to contain a StmtBlock for its body instead of inheriting
from it.  This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future.  This also removes IfClause and ForStmtBody.

This is step 5/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226936541
2019-03-29 14:36:35 -07:00
Chris Lattner 9a4060d3f5 Eliminate the ability to add operands to an instruction, used in a narrow case
for SSA values in terminators, but easily worked around.  At the same time,
move the StmtOperand list in a OperationStmt to the end of its trailing
objects list so we can *reduce* the number of operands, without affecting
offsets to the other stuff in the allocation.

This is important because we want OperationStmts to be consequtive, including
their operands - we don't want to use an std::vector of operands like
Instructions have.

This is patch 4/n towards merging instructions and statements, NFC.

PiperOrigin-RevId: 226865727
2019-03-29 14:36:20 -07:00
Chris Lattner 87ce4cc501 Per review on the previous CL, drop MLFuncBuilder::createOperation, changing
clients to use OperationState instead.  This makes MLFuncBuilder more similiar
to CFGFuncBuilder.  This whole area will get tidied up more when cfg and ml
worlds get unified.  This patch is just gardening, NFC.

PiperOrigin-RevId: 226701959
2019-03-29 14:35:49 -07:00
Chris Lattner 1301f907a1 Refactor ForStmt: having it contain a StmtBlock instead of subclassing
StmtBlock.  This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.

This is step 1/N towards merging BasicBlock and StmtBlock.  This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.

This is a mechanical patch, NFC.

PiperOrigin-RevId: 226684158
2019-03-29 14:35:19 -07:00
MLIR Team 4eef795a1d Computation slice update: adds parameters to insertBackwardComputationSlice which specify the source loop nest depth at which to perform iteration space slicing, and the destination loop nest depth at which to insert the compution slice.
Updates LoopFusion pass to take these parameters as command line flags for experimentation.

PiperOrigin-RevId: 226514297
2019-03-29 14:35:03 -07:00
MLIR Team 6892ffb896 Improve loop fusion algorithm by using a memref dependence graph.
Fixed TODO for reduction fusion unit test.

PiperOrigin-RevId: 226277226
2019-03-29 14:33:02 -07:00
Uday Bondhugula 14d2618f63 Simplify memref-dependence-check's meta data structures / drop duplication and
reuse existing ones.

- drop IterationDomainContext, redundant since FlatAffineConstraints has
  MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
  supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)

PiperOrigin-RevId: 226082008
2019-03-29 14:32:46 -07:00
Alex Zinenko 4dbd94b543 Refactor LowerVectorTransfersPass using pattern rewriters
This introduces a generic lowering pass for ML functions.  The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.

While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality.  In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder.  When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.

Use LowerVectorTransfers as a testbed for the generic pass.

PiperOrigin-RevId: 225887424
2019-03-29 14:31:43 -07:00
Alex Zinenko 51c8a095a3 Materialize vector_type_cast operation in the SuperVector dialect
This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now.  For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR.  Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future.  Register vector_type_cast operation as a part of the SuperVector
dialect.

Arguably, this operation is a special case of the `view` operation from the
Standard dialect.  The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation.  Additionally, using a custom
operation may help to achieve clear dialect separation.

PiperOrigin-RevId: 225887305
2019-03-29 14:31:13 -07:00
Uday Bondhugula 4a3e4e8ea7 loop-unroll - add function callback argument for outside targets to
provide unroll factors, and a cmd line argument to specify number of
innermost loop unroll repetitions.

- add function callback parameter for outside targets to provide unroll factors
- add a cmd line parameter to repeatedly apply innermost loop unroll a certain
  number of times (to avoid using -loop-unroll -loop-unroll ...; instead
  -unroll-num-reps=2).
- implement the callback for a target
- update test cases / usage

PiperOrigin-RevId: 225843191
2019-03-29 14:30:28 -07:00
MLIR Team 3b69230b3a Loop Fusion pass update: introduce utilities to perform generalized loop fusion based on slicing; encompasses standard loop fusion.
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.

PiperOrigin-RevId: 225842944
2019-03-29 14:30:13 -07:00
Uday Bondhugula dced746bd1 Remove duplicate code / reuse right utilities from memref-dep-check / loop-tile
- use addBoundsForForStmt
- getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the
  returned things aren't owned / part of the stmt on which it's being called.
- other minor API cleanup

PiperOrigin-RevId: 225774301
2019-03-29 14:29:28 -07:00
Alex Zinenko bc52a639f9 Extract vector_transfer_* Ops into a SuperVectorDialect.
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction.  In particular, they
are lowered to the StandardOps dialect before further processing.  As such, it
does not make sense to keep them at the same level as StandardOps.  Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.

PiperOrigin-RevId: 225554492
2019-03-29 14:28:58 -07:00
River Riddle 5c4f1fdd42 Check if the operation is already in the worklist before adding it.
PiperOrigin-RevId: 225379496
2019-03-29 14:27:14 -07:00
Alex Zinenko 97d2f3cd3d ConvertToCFG: use affine_apply to implement loop steps
Originally, loop steps were implemented using `addi` and `constant` operations
because `affine_apply` was not handled in the first implementation.  The
support for `affine_apply` has been added, use it to implement the update of
the loop induction variable.  This is more consistent with the lower and upper
bounds of the loop that are also implemented as `affine_apply`, removes the
dependence of the converted function on the StandardOps dialect and makes it
clear from the CFG function that all operations on the loop induction variable
are purely affine.

PiperOrigin-RevId: 225165337
2019-03-29 14:26:22 -07:00
Uday Bondhugula b9f53dc0bd Update/Fix LoopUtils::stmtBodySkew to handle loop step.
- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
  print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)

PiperOrigin-RevId: 224892958
2019-03-29 14:25:07 -07:00
Uday Bondhugula d59a95a05c Fix missing check for dependent DMAs in pipeline-data-transfer
- adding a conservative check for now (TODO: use the dependence analysis pass
  once the latter is extended to deal with DMA ops). resolve an existing bug on
  a test case.

- update test cases

PiperOrigin-RevId: 224869526
2019-03-29 14:24:53 -07:00
Uday Bondhugula 6757fb151d FlatAffineConstraints API cleanup; add normalizeConstraintsByGCD().
- add method normalizeConstraintsByGCD
- call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of
  projectOut.
- remove call to GCDTightenInequalities() from getMemRefRegion
- change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each
  time an identifier is eliminated (to detect emptiness early).
- make FourierMotzkinEliminate, gaussianEliminateId(s),
  GCDTightenInequalities() private
- improve / update stale comments

PiperOrigin-RevId: 224866741
2019-03-29 14:24:37 -07:00
Uday Bondhugula 2ef57806ba Update/fix -pipeline-data-transfer; fix b/120770946
- fix replaceAllMemRefUsesWith call to replace only inside loop body.
- handle the case where DMA buffers are dynamic; extend doubleBuffer() method
  to handle dynamically shaped DMA buffers (pass the right operands to AllocOp)
- place alloc's for DMA buffers at the depth at which pipelining is being done
  (instead of at top-level)
- add more test cases

PiperOrigin-RevId: 224852231
2019-03-29 14:24:22 -07:00
Alex Zinenko 073c3ad997 Properly namespace createLowerAffineApply
This was missing from the original commit.  The implementation of
createLowerAffineApply was defined in the default namespace but declared in the
`mlir` namespace, which could lead to linking errors when it was used.  Put the
definition in `mlir` namespace.

PiperOrigin-RevId: 224830894
2019-03-29 14:24:04 -07:00
Nicolas Vasilache c28aeef901 [MLIR] Drop bug-prone global map indexed by MLFunction*
PiperOrigin-RevId: 224610805
2019-03-29 14:23:49 -07:00
Uday Bondhugula 2d6478fa92 Extend loop tiling utility to handle non-constant loop bounds and bounds that
are a max/min of several expressions.

- Extend loop tiling to handle non-constant loop bounds and bounds that
  are a max/min of several expressions, i.e., bounds using multi-result affine
  maps

- also fix b/120630124 as a result (the IR was in an invalid state when tiled
  loop generation failed; SSA uses were created that weren't plugged into the IR).

PiperOrigin-RevId: 224604460
2019-03-29 14:23:34 -07:00
Uday Bondhugula dfc752e42b Generate strided DMAs from -dma-generate
- generate DMAs correctly now using strided DMAs where needed
- add support for multi-level/nested strides; op still supports one level of
  stride for now.

Other things
- add test case for  symbolic lower/upper bound; cases where the DMA buffer
  size can't be bounded by a known constant
- add test case for dynamic shapes where the DMA buffers are however bounded by
  constants
- refactor some of the '-dma-generate' code

PiperOrigin-RevId: 224584529
2019-03-29 14:23:19 -07:00
Nicolas Vasilache d9b6420fc9 [MLIR] Add LowerVectorTransfersPass
This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp
to a simple loop nest via local buffer allocations.

This is an MLIR->MLIR lowering based on builders.

A few TODOs are left to address in particular:
1. invert the permutation map so the accesses to the remote memref are coalesced;
2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory);
3. support broadcast / avoid copies when permutation_map is not of full column rank
4. add a proper "element_cast" op

One notable limitation is this does not plan on supporting boundary conditions.
It should be significantly easier to use pre-baked MLIR functions to handle such paddings.
This is left for future consideration.
Therefore the current CL only works properly for full-tile cases atm.

This CL also adds 2 simple tests:

```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N step 4 {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 step 4 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5)
            for %i6 = 0 to 3 {
              %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32>
              store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32>
       dealloc %1 : memref<5x4x3xf32>
```

and
```mlir
  for %i0 = 0 to %M step 3 {
    for %i1 = 0 to %N {
      for %i2 = 0 to %O {
        for %i3 = 0 to %P step 5 {
          %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>

```

lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
  for %i1 = 0 to %arg1 {
    for %i2 = 0 to %arg2 {
      for %i3 = 0 to %arg3 step 5 {
        %1 = alloc() : memref<5x4x3xf32>
        %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
        for %i4 = 0 to 5 {
          %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
          for %i5 = 0 to 4 {
            for %i6 = 0 to 3 {
              %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
              %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32>
              store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32>
        %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>>
        dealloc %1 : memref<5x4x3xf32>
```

PiperOrigin-RevId: 224552717
2019-03-29 14:23:05 -07:00
Nicolas Vasilache 879be718a0 [MLIR] Fix the name of the MaterializeVectorPass
PiperOrigin-RevId: 224536381
2019-03-29 14:22:49 -07:00
Smit Hinsu adca59e4f7 Return bool from all emitError methods similar to Operation::emitOpError
This simplifies call-sites returning true after emitting an error. After the
conversion, dropped braces around single statement blocks as that seems more
common.

Also, switched to emitError method instead of emitting Error kind using the
emitDiagnostic method.

TESTED with existing unit tests

PiperOrigin-RevId: 224527868
2019-03-29 14:22:06 -07:00
Nicolas Vasilache 13bc77045e [MLIR] Drop assert for NYI in Vectorize.cpp
This CLs adds proper error emission, removes NYI assertions and documents
assumptions that are required in the relevant functions.

PiperOrigin-RevId: 224377207
2019-03-29 14:21:37 -07:00
Nicolas Vasilache 5b610630b2 [MLIR] Error handling in MaterializeVectors
This removes assertions as a means to capture NYI behavior and propagates
errors up.

PiperOrigin-RevId: 224376935
2019-03-29 14:20:37 -07:00
Nicolas Vasilache 4adc169bd0 [MLIR] Add AffineMap composition and use it in Materialization
This CL adds the following free functions:
```
/// Returns the AffineExpr e o m.
AffineExpr compose(AffineExpr e, AffineMap m);
/// Returns the AffineExpr f o g.
AffineMap compose(AffineMap f, AffineMap g);
```

This addresses the issue that AffineMap composition is only available at a
distance via AffineValueMap and is thus unusable on Attributes.
This CL thus implements AffineMap composition in a more modular and composable
way.

This CL does not claim that it can be a good replacement for the
implementation in AffineValueMap, in particular it does not support bounded
maps atm.

Standalone tests are added that replicate some of the logic of the AffineMap
composition pass.

Lastly, affine map composition is used properly inside MaterializeVectors and
a standalone test is added that requires permutation_map composition with a
projection map.

PiperOrigin-RevId: 224376870
2019-03-29 14:20:22 -07:00
Nicolas Vasilache df0a25efee [MLIR] Add support for permutation_map
This CL hooks up and uses permutation_map in vector_transfer ops.
In particular, when going into the nuts and bolts of the implementation, it
became clear that cases arose that required supporting broadcast semantics.
Broadcast semantics are thus added to the general permutation_map.
The verify methods and tests are updated accordingly.

Examples of interest include.

Example 1:
The following MLIR snippet:
```mlir
   for %i3 = 0 to %M {
     for %i4 = 0 to %N {
       for %i5 = 0 to %P {
         %a5 = load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
   }}}
```
may vectorize with {permutation_map: (d0, d1, d2) -> (d2, d1)} into:
```mlir
   for %i3 = 0 to %0 step 32 {
     for %i4 = 0 to %1 {
       for %i5 = 0 to %2 step 256 {
         %4 = vector_transfer_read %arg0, %i4, %i5, %i3
              {permutation_map: (d0, d1, d2) -> (d2, d1)} :
              (memref<?x?x?xf32>, index, index) -> vector<32x256xf32>
   }}}
````
Meaning that vector_transfer_read will be responsible for reading the 2-D slice:
`%arg0[%i4, %i5:%15+256, %i3:%i3+32]` into vector<32x256xf32>. This will
require a transposition when vector_transfer_read is further lowered.

Example 2:
The following MLIR snippet:
```mlir
   %cst0 = constant 0 : index
   for %i0 = 0 to %M {
     %a0 = load %A[%cst0, %cst0] : memref<?x?xf32>
   }
```
may vectorize with {permutation_map: (d0) -> (0)} into:
```mlir
   for %i0 = 0 to %0 step 128 {
     %3 = vector_transfer_read %arg0, %c0_0, %c0_0
          {permutation_map: (d0, d1) -> (0)} :
          (memref<?x?xf32>, index, index) -> vector<128xf32>
   }
````
Meaning that vector_transfer_read will be responsible of reading the 0-D slice
`%arg0[%c0, %c0]` into vector<128xf32>. This will require a 1-D vector
broadcast when vector_transfer_read is further lowered.

Additionally, some minor cleanups and refactorings are performed.

One notable thing missing here is the composition with a projection map during
materialization. This is because I could not find an AffineMap composition
that operates on AffineMap directly: everything related to composition seems
to require going through SSAValue and only operates on AffinMap at a distance
via AffineValueMap. I have raised this concern a bunch of times already, the
followup CL will actually do something about it.

In the meantime, the projection is hacked at a minimum to pass verification
and materialiation tests are temporarily incorrect.

PiperOrigin-RevId: 224376828
2019-03-29 14:20:07 -07:00
Alex Zinenko 7c89a225cf ConvertToCFG: support min/max in loop bounds.
The recently introduced `select` operation enables ConvertToCFG to support
min(max) in loop bounds.  Individual min(max) is implemented as
`cmpi "lt"`(`cmpi "gt"`) followed by a `select` between the compared values.
Multiple results of an `affine_apply` operation extracted from the loop bounds
are reduced using min(max) in a sequential manner.  While this may decrease the
potential for instruction-level parallelism, it is easier to recognize for the
following passes, in particular for the vectorizer.

PiperOrigin-RevId: 224376233
2019-03-29 14:19:52 -07:00
Alex Zinenko 513d6d896c OpPointer: replace conversion operator to Operation* to OpType*.
The implementation of OpPointer<OpType> provides an implicit conversion to
Operation *, but not to the underlying OpType *.  This has led to
awkward-looking code when an OpPointer needs to be passed to a function
accepting an OpType *.  For example,

    if (auto someOp = genericOp.dyn_cast<OpType>())
      someFunction(&*someOp);

where "&*" makes it harder to read.  Arguably, one does not want to spell out
OpPointer<OpType> in the line with dyn_cast.  More generally, OpPointer is now
being used as an owning pointer to OpType rather than to operation.

Replace the implicit conversion to Operation* with the conversion to OpType*
taking into account const-ness of the type.  An Operation* can be obtained from
an OpType with a simple call.  Since an instance of OpPointer owns the OpType
value, the pointer to it is never null.  However, the OpType value may not be
associated with any Operation*.  In this case, return nullptr when conversion
is attempted to maintain consistency with the existing null checks.

PiperOrigin-RevId: 224368103
2019-03-29 14:19:37 -07:00
Uday Bondhugula 73fc0223e4 Fix cases where unsigned / signed arithmetic was being mixed (following up on
cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds.

- while on this, sweep through and fix potential repeated evaluation of
  expressions in loop upper bounds

PiperOrigin-RevId: 224268918
2019-03-29 14:19:22 -07:00
Uday Bondhugula a92130880e Complete multiple unhandled cases for DmaGeneration / getMemRefRegion;
update/improve/clean up API.

- update FlatAffineConstraints::getConstBoundDifference; return constant
  differences between symbolic affine expressions, look at equalities as well.
- fix buffer size computation when generating DMAs symbolic in outer loops,
  correctly handle symbols at various places (affine access maps, loop bounds,
  loop IVs outer to the depth at which DMA generation is being done)
- bug fixes / complete some TODOs for getMemRefRegion
- refactor common code b/w memref dependence check and getMemRefRegion
- FlatAffineConstraints API update; added methods employ trivial checks /
  detection - sufficient to handle hyper-rectangular cases in a precise way
  while being fast / low complexity. Hyper-rectangular cases fall out as
  trivial cases for these methods while other cases still do not cause failure
  (either return conservative or return failure that is handled by the caller).

PiperOrigin-RevId: 224229879
2019-03-29 14:18:22 -07:00
Alex Zinenko 7868abd9d8 ConvertToCFG: convert "if" statements.
The condition of the "if" statement is an integer set, defined as a conjunction
of affine constraints.  An affine constraints consists of an affine expression
and a flag indicating whether the expression is strictly equal to zero or is
also allowed to be greater than zero.  Affine maps, accepted by `affine_apply`
are also formed from affine expressions.  Leverage this fact to implement the
checking of "if" conditions.  Each affine expression from the integer set is
converted into an affine map.  This map is applied to the arguments of the "if"
statement.  The result of the application is compared with zero given the
equality flag to obtain the final boolean value.  The conjunction of conditions
is tested sequentially with short-circuit branching to the "else" branch if any
of the condition evaluates to false.

Create an SESE region for the if statement (including its "then" and optional
"else" statement blocks) and append it to the end of the current region.  The
conditional region consists of a sequence of condition-checking blocks that
implement the short-circuit scheme, followed by a "then" SESE region and an
"else" SESE region, and the continuation block that post-dominates all blocks
of the "if" statement.  The flow of blocks that correspond to the "then" and
"else" clauses are constructed recursively, enabling easy nesting of "if"
statements and if-then-else-if chains.

Note that MLIR semantics does not require nor prohibit short-circuit
evaluation.  Since affine expressions do not have side effects, there is no
observable difference in the program behavior.  We may trade off extra
operations for operation-level parallelism opportunity by first performing all
`affine_apply` and comparison operations independently, and then performing a
tree pattern reduction of the resulting boolean values with the `muli i1`
operations (in absence of the dedicated bit operations).  The pros and cons are
not clear, and since MLIR does not include parallel semantics, we prefer to
minimize the number of sequentially executed operations.

PiperOrigin-RevId: 223970248
2019-03-29 14:16:10 -07:00
Nicolas Vasilache b39d1f0bdb [MLIR] Add VectorTransferOps
This CL implements and uses VectorTransferOps in lieu of the former custom
call op. Tests are updated accordingly.

VectorTransferOps come in 2 flavors: VectorTransferReadOp and
VectorTransferWriteOp.

VectorTransferOps can be thought of as a backend-independent
pseudo op/library call that needs to be legalized to MLIR (whiteboxed) before
it can be lowered to backend-dependent IR.

Note that the current implementation does not yet support a real permutation
map. Proper support will come in a followup CL.

VectorTransferReadOp
====================
VectorTransferReadOp performs a blocking read from a scalar memref
location into a super-vector of the same elemental type. This operation is
called 'read' by opposition to 'load' because the super-vector granularity
is generally not representable with a single hardware register. As a
consequence, memory transfers will generally be required when lowering
VectorTransferReadOp. A VectorTransferReadOp is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile
only code.

A vector transfer read has semantics similar to a vector load, with additional
support for:
  1. an optional value of the elemental type of the MemRef. This value
     supports non-effecting padding and is inserted in places where the
     vector read exceeds the MemRef bounds. If the value is not specified,
     the access is statically guaranteed to be within bounds;
  2. an attribute of type AffineMap to specify a slice of the original
     MemRef access and its transposition into the super-vector shape. The
     permutation_map is an unbounded AffineMap that must represent a
     permutation from the MemRef dim space projected onto the vector dim
     space.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>
  ...
  %val = `ssa-value` : f32
  // let %i, %j, %k, %l be ssa-values of type index
  %v0 = vector_transfer_read %src, %i, %j, %k, %l
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index) ->
            vector<16x32x64xf32>
  %v1 = vector_transfer_read %src, %i, %j, %k, %l, %val
        {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          (memref<?x?x?x?xf32>, index, index, index, index, f32) ->
            vector<16x32x64xf32>
```

VectorTransferWriteOp
=====================
VectorTransferWriteOp performs a blocking write from a super-vector to
a scalar memref of the same elemental type. This operation is
called 'write' by opposition to 'store' because the super-vector
granularity is generally not representable with a single hardware register. As
a consequence, memory transfers will generally be required when lowering
VectorTransferWriteOp. A VectorTransferWriteOp is thus a mid-level
abstraction that supports super-vectorization with non-effecting padding
for full-tile only code.
A vector transfer write has semantics similar to a vector store, with
additional support for handling out-of-bounds situations.

Example:
```mlir
  %A = alloc(%size1, %size2, %size3, %size4) : memref<?x?x?x?xf32>.
  %val = `ssa-value` : vector<16x32x64xf32>
  // let %i, %j, %k, %l be ssa-values of type index
  vector_transfer_write %val, %src, %i, %j, %k, %l
    {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
  (vector<16x32x64xf32>, memref<?x?x?x?xf32>, index, index, index, index)
```
PiperOrigin-RevId: 223873234
2019-03-29 14:15:25 -07:00
Uday Bondhugula 5f76245cfe Minor fix for replaceAllMemRefUsesWith.
The check for whether the memref was used in a non-derefencing context had to
be done inside, i.e., only for the op stmt's that the replacement was specified
to be performed on (by the domStmtFilter arg if provided). As such, it is
completely fine for example for a function to return a memref while the replacement
is being performed only a specific loop's body (as in the case of DMA
generation).

PiperOrigin-RevId: 223827753
2019-03-29 14:14:43 -07:00
River Riddle 7669a259c4 Add a simple common sub expression elimination pass.
The algorithm collects defining operations within a scoped hash table. The scopes within the hash table correspond to nodes within the dominance tree for a function. This cl only adds support for simple operations, i.e non side-effecting. Such operations, e.g. load/store/call, will be handled in later patches.

PiperOrigin-RevId: 223811328
2019-03-29 14:14:28 -07:00
Uday Bondhugula a619b5c295 Debug output / logging memref sizes in DMA generation + related changes
- Add method to get a memref's size in bytes
- clean up a loop tiling pass helper (NFC)

PiperOrigin-RevId: 223422077
2019-03-29 14:12:56 -07:00
Chris Lattner 3f2530cdf5 Split "rewrite" functionality out of Pattern into a new RewritePattern derived
class.  This change is NFC, but allows for new kinds of patterns, specifically
LegalizationPatterns which will be allowed to change the types of things they
rewrite.

PiperOrigin-RevId: 223243783
2019-03-29 14:12:07 -07:00
Alex Zinenko 68e9721aa8 Rename Deaffinator to LowerAffineApply and patch it.
Several things were suggested in post-submission reviews.  In particular, use
pointers in function interfaces instead of references (still use references
internally).  Clarify the behavior of the pass in presence of MLFunctions.

PiperOrigin-RevId: 222556851
2019-03-29 14:08:59 -07:00
Nicolas Vasilache 63bc6d2f6a [MLIR] Fix opt build
PiperOrigin-RevId: 222491353
2019-03-29 14:08:45 -07:00
Nicolas Vasilache a5782f0d40 [MLIR][MaterializeVectors] Add a MaterializeVector pass via unrolling.
This CL adds an MLIR-MLIR pass which materializes super-vectors to
hardware-dependent sized vectors.

While the physical vector size is target-dependent, the pass is written in
a target-independent way: the target vector size is specified as a parameter
to the pass. This pass is thus a partial lowering that opens the "greybox"
that is the super-vector abstraction.

This first CL adds a first materilization pass iterates over vector_transfer_write operations and:
1. computes the program slice including the current vector_transfer_write;
2. computes the multi-dimensional ratio of super-vector shape to hardware
vector shape;
3. for each possible multi-dimensional value within the bounds of ratio, a new slice is
instantiated (i.e. cloned and rewritten) so that all operations in this instance operate on
the hardware vector type.

As a simple example, given:
```mlir
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
  %A = alloc (%M, %N) : memref<?x?xf32>
  %B = alloc (%M, %N) : memref<?x?xf32>
  %C = alloc (%M, %N) : memref<?x?xf32>
  for %i0 = 0 to %M {
    for %i1 = 0 to %N {
      %a1 = load %A[%i0, %i1] : memref<?x?xf32>
      %b1 = load %B[%i0, %i1] : memref<?x?xf32>
      %s1 = addf %a1, %b1 : f32
      store %s1, %C[%i0, %i1] : memref<?x?xf32>
    }
  }
  return %C : memref<?x?xf32>
}
```

and the following options:
```
-vectorize -virtual-vector-size 32 --test-fastest-varying=0 -materialize-vectors -vector-size=8
```

materialization emits:
```mlir
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0, d1 + 8)
#map2 = (d0, d1) -> (d0, d1 + 16)
#map3 = (d0, d1) -> (d0, d1 + 24)
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %1 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %2 = alloc(%arg0, %arg1) : memref<?x?xf32>
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg1 step 32 {
      %3 = affine_apply #map0(%i0, %i1)
      %4 = "vector_transfer_read"(%0, %3tensorflow/mlir#0, %3tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %5 = affine_apply #map1(%i0, %i1)
      %6 = "vector_transfer_read"(%0, %5tensorflow/mlir#0, %5tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %7 = affine_apply #map2(%i0, %i1)
      %8 = "vector_transfer_read"(%0, %7tensorflow/mlir#0, %7tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %9 = affine_apply #map3(%i0, %i1)
      %10 = "vector_transfer_read"(%0, %9tensorflow/mlir#0, %9tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %11 = affine_apply #map0(%i0, %i1)
      %12 = "vector_transfer_read"(%1, %11tensorflow/mlir#0, %11tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %13 = affine_apply #map1(%i0, %i1)
      %14 = "vector_transfer_read"(%1, %13tensorflow/mlir#0, %13tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %15 = affine_apply #map2(%i0, %i1)
      %16 = "vector_transfer_read"(%1, %15tensorflow/mlir#0, %15tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %17 = affine_apply #map3(%i0, %i1)
      %18 = "vector_transfer_read"(%1, %17tensorflow/mlir#0, %17tensorflow/mlir#1) : (memref<?x?xf32>, index, index) -> vector<8xf32>
      %19 = addf %4, %12 : vector<8xf32>
      %20 = addf %6, %14 : vector<8xf32>
      %21 = addf %8, %16 : vector<8xf32>
      %22 = addf %10, %18 : vector<8xf32>
      %23 = affine_apply #map0(%i0, %i1)
      "vector_transfer_write"(%19, %2, %23tensorflow/mlir#0, %23tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
      %24 = affine_apply #map1(%i0, %i1)
      "vector_transfer_write"(%20, %2, %24tensorflow/mlir#0, %24tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
      %25 = affine_apply #map2(%i0, %i1)
      "vector_transfer_write"(%21, %2, %25tensorflow/mlir#0, %25tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
      %26 = affine_apply #map3(%i0, %i1)
      "vector_transfer_write"(%22, %2, %26tensorflow/mlir#0, %26tensorflow/mlir#1) : (vector<8xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  return %2 : memref<?x?xf32>
}
```

PiperOrigin-RevId: 222455351
2019-03-29 14:08:31 -07:00
Nicolas Vasilache 258dae5d73 [MLIR][Slicing] Apply cleanups
This CL applies a few last cleanups from a previous CL that have been
missed during the previous submit.

PiperOrigin-RevId: 222454774
2019-03-29 14:08:17 -07:00
Nicolas Vasilache 5c16564bca [MLIR][Slicing] Add utils for computing slices.
This CL adds tooling for computing slices as an independent CL.
The first consumer of this analysis will be super-vector materialization in a
followup CL.

In particular, this adds:
1. a getForwardStaticSlice function with documentation, example and a
standalone unit test;
2. a getBackwardStaticSlice function with documentation, example and a
standalone unit test;
3. a getStaticSlice function with documentation, example and a standalone unit
test;
4. a topologicalSort function that is exercised through the getStaticSlice
unit test.

The getXXXStaticSlice functions take an additional root (resp. terminators)
parameter which acts as a boundary that the transitive propagation algorithm
is not allowed to cross.

PiperOrigin-RevId: 222446208
2019-03-29 14:08:02 -07:00
Uday Bondhugula 2631b155a9 Fix bugs in DMA generation and FlatAffineConstraints; add more test
cases.

- fix bug in calculating index expressions for DMA buffers in certain cases
  (affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
  additional operands to the index remap AffineMap can now be supplied by the
  client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
  ::composeMap - fix position bug.
- Some clean up and more comments

PiperOrigin-RevId: 222434628
2019-03-29 14:07:31 -07:00
Alex Zinenko 615c41c788 Introduce Deaffinator pass.
This function pass replaces affine_apply operations in CFG functions with
sequences of primitive arithmetic instructions that form the affine map.

The actual replacement functionality is located in LoweringUtils as a
standalone function operating on an individual affine_apply operation and
inserting the result at the location of the original operation.  It is expected
to be useful for other, target-specific lowering passes that may start at
MLFunction level that Deaffinator does not support.

PiperOrigin-RevId: 222406692
2019-03-29 14:07:16 -07:00
Uday Bondhugula b6c03917ad Remove allocations for memref's that become dead as a result of double
buffering in the auto DMA overlap pass.

This is done online in the pass.

PiperOrigin-RevId: 222313640
2019-03-29 14:05:19 -07:00
Nicolas Vasilache 87d46aaf4b [MLIR][Vectorize] Refactor Vectorize use-def propagation.
This CL refactors a few things in Vectorize.cpp:
1. a clear distinction is made between:
  a. the LoadOp are the roots of vectorization and must be vectorized
  eagerly and propagate their value; and
  b. the StoreOp which are the terminals of vectorization and must be
  vectorized late (i.e. they do not produce values that need to be
  propagated).
2. the StoreOp must be vectorized late because in general it can store a value
that is not reachable from the subset of loads defined in the
current pattern. One trivial such case is storing a constant defined at the
top-level of the MLFunction and that needs to be turned into a splat.
3. a description of the algorithm is given;
4. the implementation matches the algorithm;
5. the last example is made parametric, in practice it will fully rely on the
implementation of vector_transfer_read/write which will handle boundary
conditions and padding. This will happen by lowering to a lower-level
abstraction either:
  a. directly in MLIR (whether DMA or just loops or any async tasks in the
     future) (whiteboxing);
  b. in LLO/LLVM-IR/whatever blackbox library call/ search + swizzle inventor
  one may want to use;
  c. a partial mix of a. and b. (grey-boxing)
5. minor cleanups are applied;
6. mistakenly disabled unit tests are re-enabled (oopsie).

With this CL, this MLIR snippet:
```
mlfunc @vector_add_2d(%M : index, %N : index) -> memref<?x?xf32> {
  %A = alloc (%M, %N) : memref<?x?xf32>
  %B = alloc (%M, %N) : memref<?x?xf32>
  %C = alloc (%M, %N) : memref<?x?xf32>
  %f1 = constant 1.0 : f32
  %f2 = constant 2.0 : f32
  for %i0 = 0 to %M {
    for %i1 = 0 to %N {
      // non-scoped %f1
      store %f1, %A[%i0, %i1] : memref<?x?xf32>
    }
  }
  for %i4 = 0 to %M {
    for %i5 = 0 to %N {
      %a5 = load %A[%i4, %i5] : memref<?x?xf32>
      %b5 = load %B[%i4, %i5] : memref<?x?xf32>
      %s5 = addf %a5, %b5 : f32
      // non-scoped %f1
      %s6 = addf %s5, %f1 : f32
      store %s6, %C[%i4, %i5] : memref<?x?xf32>
    }
  }
  return %C : memref<?x?xf32>
}
```

vectorized with these arguments:
```
-vectorize -virtual-vector-size 256 --test-fastest-varying=0
```

vectorization produces this standard innermost-loop vectorized code:
```
mlfunc @vector_add_2d(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %1 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %2 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %cst = constant 1.000000e+00 : f32
  %cst_0 = constant 2.000000e+00 : f32
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg1 step 256 {
      %cst_1 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
      "vector_transfer_write"(%cst_1, %0, %i0, %i1) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  for %i2 = 0 to %arg0 {
    for %i3 = 0 to %arg1 step 256 {
      %3 = "vector_transfer_read"(%0, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
      %4 = "vector_transfer_read"(%1, %i2, %i3) : (memref<?x?xf32>, index, index) -> vector<256xf32>
      %5 = addf %3, %4 : vector<256xf32>
      %cst_2 = constant splat<vector<256xf32>, 1.000000e+00> : vector<256xf32>
      %6 = addf %5, %cst_2 : vector<256xf32>
      "vector_transfer_write"(%6, %2, %i2, %i3) : (vector<256xf32>, memref<?x?xf32>, index, index) -> ()
    }
  }
  return %2 : memref<?x?xf32>
}
```

Of course, much more intricate n-D imperfectly-nested patterns can be emitted too in a fully declarative fashion, but this is enough for now.

PiperOrigin-RevId: 222280209
2019-03-29 14:03:50 -07:00
Alex Zinenko f986d5920b ConvertToCFG: handle loop 1D affine loop bounds.
In the general case, loop bounds can be expressed as affine maps of the outer
loop iterators and function arguments.  Relax the check for loop bounds to be
known integer constants and also accept one-dimensional affine bounds in
ConvertToCFG ForStmt lowering.  Emit affine_apply operations for both the upper
and the lower bound.  The semantics of MLFunctions guarantees that both bounds
can be computed before the loop starts iterating.  Constant bounds are merely a
short-hand notation for zero-dimensional affine maps and get supported
transparently.

Multidimensional affine bounds are not yet supported because the target IR
dialect lacks min/max operations necessary to implement the corresponding
semantics.

PiperOrigin-RevId: 222275801
2019-03-29 14:03:20 -07:00
Jacques Pienaar d0590caa90 Add op stats pass to mlir-opt.
op-stats pass currently returns the number of occurrences of different operations in a Module. Useful for verifying transformation properties (e.g., 3 ops of specific dialect, 0 of another), but probably not useful outside of that so keeping it local to mlir-opt. This does not consider op attributes when counting.

PiperOrigin-RevId: 222259727
2019-03-29 14:02:46 -07:00
Nicolas Vasilache 89d9913a20 [MLIR][VectorAnalysis] Add a VectorAnalysis and standalone tests
This CL adds some vector support in prevision of the upcoming vector
materialization pass. In particular this CL adds 2 functions to:
1. compute the multiplicity of a subvector shape in a supervector shape;
2. help match operations on strict super-vectors. This is defined for a given
subvector shape as an operation that manipulates a vector type that is an
integral multiple of the subtype, with multiplicity at least 2.

This CL also adds a TestUtil pass where we can dump arbitrary testing of
functions and analysis that operate at a much smaller granularity than a pass
(e.g. an analysis for which it is convenient to write a bit of artificial MLIR
and write some custom test). This is in order to keep using Filecheck for
things that essentially look and feel like C++ unit tests.

PiperOrigin-RevId: 222250910
2019-03-29 14:02:17 -07:00
Uday Bondhugula fff1efbaf5 Updates to transformation/analysis passes/utilities. Update DMA generation pass
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.

- add support for getMemRefRegion symbolic in outer loops - hence support for
  DMAs symbolic in outer surrounding loops.

- add DMA generation support for outgoing DMAs (store op's to lower memory
  space); extend getMemoryRegion to store op's. -memref-bound-check now works
  with store op's as well.

- fix dma-generate (references to the old memref in the dma_start op were also
  being replaced with the new buffer); we need replace all memref uses to work
  only on a subset of the uses - add a new optional argument for
  replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
  'operation' argument to serve as a filter - if provided, only those uses that
  are dominated by the filter are replaced.

- Add missing print for attributes for dma_start, dma_wait op's.

- update the FlatAffineConstraints API

PiperOrigin-RevId: 221889223
2019-03-29 14:00:51 -07:00
River Riddle d34fcce2a7 [MLIR] Rename OperationInst to Instruction.
PiperOrigin-RevId: 221795407
2019-03-29 14:00:09 -07:00
River Riddle 503caf0722 Replace TerminatorInst with builtin terminator operations.
Note: Terminators will be merged into the operations list in a follow up patch.
PiperOrigin-RevId: 221670037
2019-03-29 13:58:55 -07:00
Alex Zinenko d030433443 ConvertToCFG: properly remap nested function attributes.
Array attributes can nested and function attributes can appear anywhere at that
level.  They should be remapped to point to the generated CFGFunction after
ML-to-CFG conversion, similarly to plain function attributes.  Extract the
nested attribute remapping functionality from the Parser to Utils.  Extract out
the remapping function for individual Functions from the module remapping
function.  Use these new functions in the ML-to-CFG conversion pass and in the
parser.

PiperOrigin-RevId: 221510997
2019-03-29 13:57:58 -07:00
Alex Zinenko cb40633969 Move definitions of lopoUnroll* functions to LoopUtils.cpp.
These functions are declared in Transforms/LoopUtils.h (included to the
Transforms/Utils library) but were defined in the loop unrolling pass in
Transforms/LoopUnroll.cpp.  As a result, targets depending only on
TransformUtils library but not on Transforms could get link errors.  Move the
definitions to Transforms/Utils/LoopUtils.cpp where they should actually live.
This does not modify any code.

PiperOrigin-RevId: 221508882
2019-03-29 13:57:44 -07:00
Nicolas Vasilache fefbf91314 [MLIR] Support for vectorizing operations.
This CL adds support for and a vectorization test to perform scalar 2-D addf.

The support extension notably comprises:
1. extend vectorizable test to exclude vector_transfer operations and
expose them to LoopAnalysis where they are needed. This is a temporary
solution a concrete MLIR Op exists;
2. add some more functional sugar mapKeys, apply and ScopeGuard (which became
relevant again);
3. fix improper shifting during coarsening;
4. rename unaligned load/store to vector_transfer_read/write and simplify the
design removing the unnecessary AllocOp that were introduced prematurely:
vector_transfer_read currently has the form:
  (memref<?x?x?xf32>, index, index, index) -> vector<32x64x256xf32>
vector_transfer_write currently has the form:
  (vector<32x64x256xf32>, memref<?x?x?xf32>, index, index, index) -> ()
5. adds vectorizeOperations which traverses the operations in a ForStmt and
rewrites them to their vector form;
6. add support for vector splat from a constant.

The relevant tests are also updated.

PiperOrigin-RevId: 221421426
2019-03-29 13:56:47 -07:00
Alex Zinenko 5a0d3d0204 Basic conversion of MLFunctions to CFGFunctions.
Implement a pass converting a subset of MLFunctions to CFGFunctions.  Currently
supports arbitrarily complex imperfect loop nests with statically constant
(i.e., not affine map) bounds filled with operations.  Does NOT support
branches and non-constant loop bounds.

Conversion is performed per-function and the function names are preserved to
avoid breaking any external references to the current module.  In-memory IR is
updated to point to the right functions in direct calls and constant loads.
This behavior is tested via a really hidden flag that enables function
renaming.

Inside each function, the control flow conversion is based on single-entry
single-exit regions, i.e. subgraphs of the CFG that have exactly one incoming
and exactly one outgoing edge.  Since an MLFunction must have a single "return"
statement as per MLIR spec, it constitutes an SESE region.  Individual
operations are appended to this region.  Control flow statements are
recursively converted into such regions that are concatenated with the current
region.  Bodies of the compound statement also form SESE regions, which allows
to nest control flow statements easily.  Note that SESE regions are not
materialized in the code.  It is sufficent to keep track of the end of the
region as the current instruction insertion point as long as all recursive
calls update the insertion point in the end.

The converter maintains a mapping between SSA values in ML functions and their
CFG counterparts.  The mapping is used to find the operands for each operation
and is updated to contain the results of each operation as the conversion
continues.

PiperOrigin-RevId: 221162602
2019-03-29 13:55:22 -07:00
Jacques Pienaar 25e6b541cd Switch IntegerAttr to use APInt.
Change the storage type to APInt from int64_t for IntegerAttr (following the change to APFloat storage in FloatAttr). Effectively a direct change from int64_t to 64-bit APInt throughout (the bitwidth hardcoded). This change also adds a getInt convenience method to IntegerAttr and replaces previous getValue calls with getInt calls.

While this changes updates the storage type, it does not update all constant folding calls.

PiperOrigin-RevId: 221082788
2019-03-29 13:55:08 -07:00
MLIR Team b5424dd0cb Adds support for returning the direction of the dependence between memref accesses (distance/direction vectors).
Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths.
Updates old and adds new memref dependence check tests.
Resolves multiple TODOs.

PiperOrigin-RevId: 220816515
2019-03-29 13:53:28 -07:00
Uday Bondhugula e0623d4b86 Automatic DMA generation for simple cases.
- constant bounded memory regions, static shapes, no handling of
  overlapping/duplicate regions (through union) for now; also only, load memory
  op's.
- add build methods for DmaStartOp, DmaWaitOp.
- move getMemoryRegion() into Analysis/Utils and expose it.
- fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds;
  update test cases for memref-bound-check and memref-dependence-check for
  exclusive bounds (missed in a previous CL)

PiperOrigin-RevId: 220729810
2019-03-29 13:53:14 -07:00
River Riddle 2fa4bc9fc8 Implement value type abstraction for locations.
Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>.

PiperOrigin-RevId: 220682078
2019-03-29 13:52:31 -07:00
Uday Bondhugula 23ddd577ef Complete migration to exclusive upper bound
cl/220448963 had missed a part of the updates.

- while on this, clean up some of the test cases to use ops' custom forms.

PiperOrigin-RevId: 220675303
2019-03-29 13:52:17 -07:00
Jacques Pienaar cc9a6ed09d Initialize Pass with PassID.
The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo.

PiperOrigin-RevId: 220485662
2019-03-29 13:50:34 -07:00
Nicolas Vasilache cde8248753 [MLIR] Make upper bound implementation exclusive
This CL implement exclusive upper bound behavior as per b/116854378.
A followup CL will update the semantics of the for loop.

PiperOrigin-RevId: 220448963
2019-03-29 13:49:49 -07:00
Jacques Pienaar 6f0fb22723 Add static pass registration
Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage.

Change build targets to alwayslink to enforce registration.

PiperOrigin-RevId: 220390178
2019-03-29 13:49:34 -07:00
Uday Bondhugula 6cd5d5c544 Introduce loop tiling code generation (hyper-rectangular case)
- simple perfectly nested band tiling with fixed tile sizes.
- only the hyper-rectangular case is handled, with other limitations of
  getIndexSet applying (constant loop bounds, etc.);  once
  the latter utility is extended, tiled code generation should become more
  general.
- Add FlatAffineConstraints::isHyperRectangular()

PiperOrigin-RevId: 220324933
2019-03-29 13:49:05 -07:00
MLIR Team f28e4df666 Adds a dependence check to test whether two accesses to the same memref access the same element.
- Builds access functions and iterations domains for each access.
- Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds.
- Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses.
- Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops.
- The LoopFusion pass will be extended in subsequent CLs.

PiperOrigin-RevId: 219630898
2019-03-29 13:47:13 -07:00
Nicolas Vasilache 21638dcda9 [MLIR] Extend vectorization to 2+-D patterns
This CL adds support for vectorization using more interesting 2-D and 3-D
patterns. Note in particular the fact that we match some pretty complex
imperfectly nested 2-D patterns with a quite minimal change to the
implementation: we just add a bit of recursion to traverse the matched
patterns and actually vectorize the loops.

For instance, vectorizing the following loop by 128:
```
for %i3 = 0 to %0 {
  %7 = affine_apply (d0) -> (d0)(%i3)
  %8 = load %arg0[%c0_0, %7] : memref<?x?xf32>
}
```

Currently generates:
```
#map0 = ()[s0] -> (s0 + 127)
#map1 = (d0) -> (d0)
for %i3 = 0 to #map0()[%0] step 128 {
  %9 = affine_apply #map1(%i3)
  %10 = alloc() : memref<1xvector<128xf32>>
  %11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) :
    (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) ->
    (memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index)
   %12 = load %10[%c0] : memref<1xvector<128xf32>>
}
```

The above is subject to evolution.

PiperOrigin-RevId: 219629745
2019-03-29 13:46:58 -07:00
Jacques Pienaar e1f9e65b9a Enable constructing a FuncBuilder using a Operation*.
FuncBuilder is useful to build a operation to replace an existing operation, so change the constructor to allow constructing it with an existing operation. Change FuncBuilder to contain (effectively) a tagged union of CFGFuncBuilder and MLFuncBuilder (as these should be cheap to copy and avoid allocating/deletion when created via a operation).

PiperOrigin-RevId: 219532952
2019-03-29 13:46:22 -07:00
Uday Bondhugula 8201e19e3d Introduce memref bound checking.
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:

$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir

/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
  %0 = alloc() : memref<9x9xi32>
  %1 = alloc() : memref<128xi32>
  for %i0 = -1 to 9 {
    for %i1 = -1 to 9 {
      %2 = affine_apply #map0(%i0, %i1)
      %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
      %4 = affine_apply #map1(%i0, %i1)
      %5 = load %1[%4] : memref<128xi32>
    }
  }
  return
}

- Improves productivity while manually / semi-automatically developing MLIR for
  testing / prototyping; also provides an indirect way to catch errors in
  transformations.

- This pass is an easy way to test the underlying affine analysis
  machinery including low level routines.

Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.

While on this:

- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/

- fix a bug in AffineAnalysis.cpp::toAffineExpr

TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
2019-03-29 13:46:08 -07:00
River Riddle 4c465a181d Implement value type abstraction for types.
This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.

PiperOrigin-RevId: 219372163
2019-03-29 13:45:54 -07:00
Nicolas Vasilache af7f56fdf8 [MLIR] Implement 1-D vectorization for fastest varying load/stores
This CL is a first in a series that implements early vectorization of
increasingly complex patterns. In particular, early vectorization will support
arbitrary loop nesting patterns (both perfectly and imperfectly nested), at
arbitrary depths in the loop tree.

This first CL builds the minimal support for applying 1-D patterns.
It relies on an unaligned load/store op abstraction that can be inplemented
differently on different HW.
Future CLs will support higher dimensional patterns, but 1-D patterns already
exhibit interesting properties.
In particular, we want to separate pattern matching (i.e. legality both
structural and dependency analysis based), from profitability analysis, from
application of the transformation.
As a consequence patterns may intersect and we need to verify that a pattern
can still apply by the time we get to applying it.

A non-greedy analysis on profitability that takes into account pattern
intersection is left for future work.

Additionally the CL makes the following cleanups:
1. the matches method now returns a value, not a reference;
2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by
value;
3. added size and empty methods to matches;
4. added a negative vectorization test with a conditional, this exhibited a
but in the iterators. Iterators now return nullptr if the underlying storage
is nullpt.

PiperOrigin-RevId: 219299489
2019-03-29 13:44:26 -07:00
Chris Lattner 085b687fbd Add support for walking the use list of an SSAValue and converting owners to
Operation*'s, simplifying some code in GreedyPatternRewriteDriver.cpp.

Also add print/dump methods on Operation.

PiperOrigin-RevId: 219045764
2019-03-29 13:43:01 -07:00
Chris Lattner 967d934180 Fix two issues:
1) We incorrectly reassociated non-reassociative operations like subi, causing
    miscompilations.
 2) When constant folding, we didn't add users of the new constant back to the
    worklist for reprocessing, causing us to miss some cases (pointed out by
    Uday).

The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch.

PiperOrigin-RevId: 218803984
2019-03-29 13:40:35 -07:00
Chris Lattner adbba70d82 Simplify FunctionPass to eliminate the CFGFunctionPass/MLFunctionPass
distinction.  FunctionPasses can now choose to get called on all functions, or
have the driver split CFG/ML Functions up for them.  NFC.

PiperOrigin-RevId: 218775885
2019-03-29 13:40:05 -07:00
Chris Lattner 7de0da9594 Refactor all of the canonicalization patterns out of the Canonicalize pass, and
make operations provide a list of canonicalizations that can be applied to
them.  This allows canonicalization to be general to any IR definition.

As part of this, sink PatternMatch.h/cpp down to the IR library to fix a
layering problem.

PiperOrigin-RevId: 218773981
2019-03-29 13:39:49 -07:00
River Riddle 792d1c25e4 Implement value type abstraction for attributes.
This is done by changing Attribute to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.

PiperOrigin-RevId: 218764173
2019-03-29 13:39:19 -07:00
Chris Lattner 64d52014bd Move transform utilities out to their own TransformUtils library, instead of
just having the pattern matcher in its own library.  At this point,
lib/Transforms/*.cpp are all actually passes themselves (and will probably
eventually be themselves move to a new subdirectory as we accrete more).

PiperOrigin-RevId: 218745193
2019-03-29 13:39:06 -07:00
Chris Lattner 92285814e2 Refactor the bulk of the worklist driver out of the canonicalizer into its own
helper function, in preparation for it being used by other passes.

There is still a lot of room for improvement in its design, this patch is
intended as an NFC refactoring, and the improvements will continue after this
lands.

PiperOrigin-RevId: 218737116
2019-03-29 13:38:52 -07:00
Uday Bondhugula 80610c2f49 Introduce Fourier-Motzkin variable elimination + other cleanup/support
- Introduce Fourier-Motzkin variable elimination to eliminate a dimension from
  a system of linear equalities/inequalities. Update isEmpty to use this.
  Since FM is only exact on rational/real spaces, an emptiness check based on
  this is guaranteed to be exact whenever it says the underlying set is empty;
  if it says, it's not empty, there may still be no integer points in it.
  Also, supports a version that computes "dark shadows".

- Test this by checking for "always false" conditionals in if statements.

- Unique IntegerSet's that are small (few constraints, few variables). This
  basically means the canonical empty set and other small sets that are
  likely commonly used get uniqued; allows checking for the canonical empty set
  by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size
  for uniqui'ing.

- rename simplify-affine-expr -> simplify-affine-structures

Other cleanup

- IntegerSet::numConstraints, AffineMap::numResults are no longer needed;
  remove them.
- add copy assignment operators for AffineMap, IntegerSet.
- rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet
- Misc cleanup for FlatAffineConstraints API

PiperOrigin-RevId: 218690456
2019-03-29 13:38:24 -07:00
MLIR Team 5413239350 Adds Gaussian Elimination to FlatAffineConstraints.
- Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system.
- Adds GCD test check if equality constraints have no solution.
- Adds unit test cases.

PiperOrigin-RevId: 218546319
2019-03-29 13:38:10 -07:00
Lei Zhang 52a0e58bdb Change typedef to using to be consistent across the codebase
Google C++ style guide also prefers using to typedef.

PiperOrigin-RevId: 218541849
2019-03-29 13:37:55 -07:00
Chris Lattner bd01f9541f Teach canonicalize pass to unique and hoist constants to the entry block. This
is a straight-forward change, but required adding missing moveBefore() methods
on operations (requiring moving some traits around to make C++ happy).  This
also fixes a constness issue with the getBlock/getFunction() methods on
Instruction, and adds a missing getFunction() method on MLFuncBuilder.

PiperOrigin-RevId: 218523905
2019-03-29 13:36:59 -07:00
Chris Lattner 301f83f906 Implement shape folding in the canonicalization pass:
- Add a few canonicalization patterns to fold memref_cast into
   load/store/dealloc.
 - Canonicalize alloc(constant) into an alloc with a constant shape followed by
   a cast.
 - Add a new PatternRewriter::updatedRootInPlace API to make this more convenient.

SimplifyAllocConst and the testcase is heavily based on Uday's implementation work, just
in a different framework.

PiperOrigin-RevId: 218361237
2019-03-29 13:36:31 -07:00
Uday Bondhugula ccfe593715 PassResult return cleanup.
- return success as long as IR is in a valid state.

PiperOrigin-RevId: 218225317
2019-03-29 13:35:47 -07:00
Chris Lattner a03051b9c4 Add a pattern (x+0) -> x, generalize Canonicalize to CFGFunc's, address a few TODOs,
and add some casting support to Operation.

PiperOrigin-RevId: 218219340
2019-03-29 13:35:33 -07:00
Chris Lattner 7850258c49 Introduce a new Operation::erase helper to generalize some code in
the pattern matcher / canonicalizer, and rename existing eraseFromBlock methods
to align with it.

PiperOrigin-RevId: 218104455
2019-03-29 13:34:51 -07:00
Chris Lattner 73a802741e Introduce a new PatternRewriter class to help keep the worklist in
PatternMatcher clients up to date and provide a funnel point for newly added
operations.  This is also progress towards the canonicalizer supporting
CFGFunctions.

This paves the way for more complex patterns, but by itself doesn't do much
useful, so no testcase.

PiperOrigin-RevId: 218101737
2019-03-29 13:34:23 -07:00
Uday Bondhugula 2f1103bd93 Loop bound constant folding: follow-up / address comments from cl/215997346
- create a single function to fold both bounds
- move bound constant folding into transforms

PiperOrigin-RevId: 217954701
2019-03-29 13:33:55 -07:00
Feng Liu 34927e2474 Rename Operation::getAs to Operation::dyn_cast
Also rename Operation::is to Operation::isa
Introduce Operation::cast

All of these are for consistency with global dyn_cast/cast/isa operators.

PiperOrigin-RevId: 217878786
2019-03-29 13:33:41 -07:00
Uday Bondhugula 18e666702c Generalize / improve DMA transfer overlap; nested and multiple DMA support; resolve
multiple TODOs.

- replace the fake test pass (that worked on just the first loop in the
  MLFunction) to perform DMA pipelining on all suitable loops.
- nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops)
- fix bugs / assumptions: correctly copy memory space and elemental type of source
  memref for double buffering.
- correctly identify matching start/finish statements, handle multiple DMAs per
  loop.
- introduce dominates/properlyDominates utitilies for MLFunction statements.
- move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it
  getShiftValidity
- refactor getContainingStmtPos -> findAncestorStmtInBlock - move into
  Analysis/Utils.h; has two users.
- other improvements / cleanup for related API/utilities
- add size argument to dma_wait - for nested DMAs or in general, it makes it
  easy to obtain the size to use when lowering the dma_wait since we wouldn't
  want to identify the matching dma_start, and more importantly, in general/in the
  future, there may not always be a dma_start dominating the dma_wait.
- add debug information in the pass

PiperOrigin-RevId: 217734892
2019-03-29 13:32:28 -07:00
Nicolas Vasilache 3013dadb7c [MLIR] Basic infrastructure for vectorization test
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.

The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.

The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.

In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.

One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.

This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).

PiperOrigin-RevId: 217622889
2019-03-29 13:32:13 -07:00
Jacques Pienaar 47e7cd333e Use FuncBuilder instead of MLFuncBuilder in pattern matcher.
Use the general function buil wrapper instead of the CFG/ML specific one.

PiperOrigin-RevId: 217335607
2019-03-29 13:31:59 -07:00
Chris Lattner 80e884a9f8 Add constant folding and binary operator reassociation to the canonicalize
pass, build up the worklist infra in anticipation of improving the pattern
matcher to match more than one node.

PiperOrigin-RevId: 217330579
2019-03-29 13:31:44 -07:00
Feng Liu 0faf563383 Move Pattern and related classes to a different file
So we can use it as a library.

PiperOrigin-RevId: 217267049
2019-03-29 13:31:03 -07:00
MLIR Team 0114e232d8 Adds method to AffineApplyOp which forward substitutes its results into any of its users which are also AffineApplyOps.
Updates ComposeAffineMaps test pass to use this method.
Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass.

PiperOrigin-RevId: 217163351
2019-03-29 13:30:49 -07:00
Chris Lattner 7e7157fd1d Various improvements to pattern matching and other infra:
- Make it so OpPointer implicitly converts to SSAValue* when the underlying op
   has a single value.  This eliminates a lot more ->getResult() calls and makes
   the behavior more LLVM-like
 - Fill out PatternBenefit to be typed instead of just a typedef for int with
   magic numbers.
 - Simplify various code due to these changes.

PiperOrigin-RevId: 217020717
2019-03-29 13:29:49 -07:00
Uday Bondhugula 86eac4618c Create private exclusive / single use affine computation slice for an op stmt.
- add util to create a private / exclusive / single use affine
  computation slice for an op stmt (see method doc comment); a single
  multi-result affine_apply op is prepended to the op stmt to provide all
  results needed for its operands as a function of loop iterators and symbols.
- use it for DMA pipelining (to create private slices for DMA start stmt's);
  resolve TODOs/feature request (b/117159533)
- move createComposedAffineApplyOp to Transforms/Utils; free it from taking a
  memref as input / generalize it.

PiperOrigin-RevId: 216926818
2019-03-29 13:29:21 -07:00
Chris Lattner 9e3b928e32 Implement a super sketched out pattern match/rewrite framework and a sketched
out canonicalization pass to drive it, and a simple (x-x) === 0 pattern match
as a test case.

There is a tremendous number of improvements that need to land, and the
matcher/rewriter and patterns will be split out of this file, but this is a
starting point.

PiperOrigin-RevId: 216788604
2019-03-29 13:29:07 -07:00
Chris Lattner 8dda701a9c Add MLFunction::walk/walkPostOrder methods for doing a simple traversal of
operations.  This is a simplified form for the existing walker API.

PiperOrigin-RevId: 216754991
2019-03-29 13:28:26 -07:00
Jacques Pienaar 764fd035b0 Split BuiltinOps out of StandardOps.
* Move Return, Constant and AffineApply out into BuiltinOps;
* BuiltinOps are always registered, while StandardOps follow the same dynamic registration;
* Kept isValidX in MLValue as we don't have a verify on AffineMap so need to keep it callable from Parser (I wanted to move it to be called in verify instead);

PiperOrigin-RevId: 216592527
2019-03-29 13:28:12 -07:00
Nicolas Vasilache 1d3e7e2616 [MLIR] AffineMap value type
This CL applies the same pattern as AffineExpr to AffineMap: a simple struct
that acts as the storage is allocated in the bump pointer. The AffineMap is
immutable and accessed everywhere by value.

PiperOrigin-RevId: 216445930
2019-03-29 13:26:24 -07:00
Uday Bondhugula 82e55750d2 Add target independent standard DMA ops: dma.start, dma.wait
Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline
data transfer to use these to detect DMA ops.

While on this
- return failure from mlir-opt::performActions if a pass generates invalid output
- improve error message for verify 'n' operand traits

PiperOrigin-RevId: 216429885
2019-03-29 13:26:10 -07:00
MLIR Team c386143834 Address comments from previous CL/216216446
PiperOrigin-RevId: 216298139
2019-03-29 13:25:28 -07:00
Nicolas Vasilache 6707c7bea1 [MLIR] AffineExpr final cleanups
This CL:
1. performs the global codemod AffineXExpr->AffineXExprClass and
AffineXExprRef -> AffineXExpr;
2. simplifies function calls by removing the redundant MLIRContext parameter;
3. adds missing binary operator versions of scalar op AffineExpr where it
makes sense.

PiperOrigin-RevId: 216242674
2019-03-29 13:25:14 -07:00
MLIR Team fe490043b0 Affine map composition.
*) Implements AffineValueMap forward substitution for AffineApplyOps.
*) Adds ComposeAffineMaps transformation pass, which composes affine maps for all loads/stores in an MLFunction.
*) Adds multiple affine map composition tests.

PiperOrigin-RevId: 216216446
2019-03-29 13:24:59 -07:00
Nicolas Vasilache ce2edea135 [MLIR] Cleanup AffineExpr
This CL introduces a series of cleanups for AffineExpr value types:
1. to make it clear that the value types should be used, the pointer
AffineExpr types are put in the detail namespace. Unfortunately, since the
value type operator-> only forwards to the underlying pointer type, we
still
need to expose this in the include file for now;
2. AffineExprKind is ok to use, it thus comes out of detail and thus of
AffineExpr
3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are
similarly
extracted as free functions and their naming is mande consistent across
Builder, MLContext and AffineExpr
4. AffineBinaryOpEx::simplify functions are made into static free
functions.
In particular it is moved away from AffineMap.cpp where it does not belong
5. operator AffineExprType is made explicit
6. uses the binary operators everywhere possible
7. drops the pointer usage everywhere outside of AffineExpr.cpp,
MLIRContext.cpp and AsmPrinter.cpp

PiperOrigin-RevId: 216207212
2019-03-29 13:24:45 -07:00
Nicolas Vasilache 4911978f7e [MLIR] Value types for AffineXXXExpr
This CL makes AffineExprRef into a value type.

Notably:
1. drops llvm isa, cast, dyn_cast on pointer type and uses member functions on
the value type. It may be possible to still use classof  (in a followup CL)
2. AffineBaseExprRef aggressively casts constness away: if we mean the type is
immutable then let's jump in with both feet;
3. Drop implicit casts to the underlying pointer type because that always
results in surprising behavior and is not needed in practice once enough
cleanup has been applied.

The remaining negative I see is that we still need to mix operator. and
operator->. There is an ugly solution that forwards the methods but that ends
up duplicating the class hierarchy which I tried to avoid as much as
possible. But maybe it's not that bad anymore since AffineExpr.h would still
contain a single class hierarchy (the duplication would be impl detail in.cpp)

PiperOrigin-RevId: 216188003
2019-03-29 13:24:31 -07:00
Chris Lattner d2d89cbc19 Rename affineint type to index type. The name 'index' may not be perfect, but is better than the old name. Here is some justification:
1) affineint (as it is named) is not a type suitable for general computation (e.g. the multiply/adds in an integer matmul).  It has undefined width and is undefined on overflow.  They are used as the indices for forstmt because they are intended to be used as indexes inside the loop.

2) It can be used in both cfg and ml functions, and in cfg functions.  As you mention, “symbols” are not affine, and we use affineint values for symbols.

3) Integers aren’t affine, the algorithms applied to them can be. :)

4) The only suitable use for affineint in MLIR is for indexes and dimension sizes (i.e. the bounds of those indexes).

PiperOrigin-RevId: 216057974
2019-03-29 13:24:16 -07:00
Uday Bondhugula d18ae9e2c7 Constant folding for loop bounds.
- Fold the lower/upper bound of a loop to a constant whenever the result of the
  application of the bound's affine map on the operand list yields a constant.

- Update/complete 'for' stmt's API to set lower/upper bounds with operands.
  Resolve TODOs for ForStmt::set{Lower,Upper}Bound.

- Moved AffineExprConstantFolder into AffineMap.cpp and added
  AffineMap::constantFold to be used by both AffineApplyOp and
  ForStmt::constantFoldBound.

PiperOrigin-RevId: 215997346
2019-03-29 13:24:01 -07:00
Uday Bondhugula f069d796f3 Fix opt build breakage - lib/Transforms/Utils.cpp
PiperOrigin-RevId: 215924308
2019-03-29 13:23:46 -07:00
Chris Lattner 6822c4e29c Implement support for constant folding operations even when their operands are
not all constant.  Implement support for folding dim, x*0, and affine_apply.

PiperOrigin-RevId: 215917432
2019-03-29 13:23:32 -07:00
Uday Bondhugula 6cfdb756b1 Introduce memref replacement/rewrite support: to replace an existing memref
with a new one (of a potentially different rank/shape) with an optional index
remapping.

- introduce Utils::replaceAllMemRefUsesWith
- use this for DMA double buffering

(This CL also adds a few temporary utilities / code that will be done away with
once:
1) abstract DMA op's are added
2) memref deferencing side-effect / trait is available on op's
3) b/117159533 is resolved (memref index computation slices).
PiperOrigin-RevId: 215831373
2019-03-29 13:23:19 -07:00
Nicolas Vasilache b55b407601 [RFC][MLIR] Use AffineExprRef in place of AffineExpr* in IR
This CL starts by replacing AffineExpr* with value-type AffineExprRef in a few
places in the IR. By a domino effect that is pretty telling of the
inconsistencies in the codebase, const is removed where it makes sense.

The rationale is that the decision was concisously made that unique'd types
have pointer semantics without const specifier. This is fine but we should be
consistent. In the end, the only logical invariant is that there should never
be such a thing as a const AffineExpr*, const AffineMap* or const IntegerSet*
in our codebase.

This CL takes a number of shortcuts to killing const with fire, in particular
forcing const AffineExprRef to return the underlying non-const
AffineExpr*. This will be removed once AffineExpr* has disappeared in
containers but for now such shortcuts allow a bit of sanity in this long quest
for cleanups.

The **only** places where const AffineExpr*, const AffineMap* or const
IntegerSet* may still appear is by transitive needs from containers,
comparison operators etc.

There is still one major thing remaining here: figure out why cast/dyn_cast
return me a const AffineXXX*, which in turn requires a bunch of ugly
const_casts. I suspect this is due to the classof
taking const AffineXXXExpr*. I wonder whether this is a side effect of 1., if
it is coming from llvm itself (I'd doubt it) or something else (clattner@?)

In light of this, the whole discussion about const makes total sense to me now
and I would systematically apply the rule that in the end, we should never
have any const XXX in our codebase for unique'd types (assuming we can remove
them all in containers and no additional constness constraint is added on us
from the outside world).

PiperOrigin-RevId: 215811554
2019-03-29 13:23:05 -07:00
Nicolas Vasilache 5b8017db18 [MLIR] Templated AffineExprBaseRef
This CL implements AffineExprBaseRef as a templated type to allow LLVM-style
casts to work properly. This also allows making AffineExprBaseRef::expr
private.

To achieve this, it is necessary to use llvm::simplify_type and make
AffineConstExpr derive from both AffineExpr and llvm::simplify<AffineExprRef>.
Note that llvm::simplify_type is just an interface to enable the proper
template resolution of isa/cast/dyn_cast but it otherwise holds no value.

Lastly note that certain dyn_cast operations wanted the const AffineExpr* form
of AffineExprBaseRef so I made the implicit constructor take that by default
and documented the immutable behavior. I think this is consistent with the
decision to make unique'd type immutable by convention and never use const on
them.

PiperOrigin-RevId: 215642247
2019-03-29 13:22:49 -07:00
Nicolas Vasilache 544f5e7a9b [MLIR] Remove uses of AffineExpr* outside of IR
This CL uniformizes the uses of AffineExprWrap outside of IR.
The public API of AffineExpr builder is modified to only use AffineExprWrap.
A few places access AffineExprWrap.expr, this is only while the API is in
transition to easily keep track (i.e. make expr private and let the compiler
track the errors).

Parser.cpp exhibits patterns that are dependent on nullptr values so
converting it is left for another CL.

PiperOrigin-RevId: 215642005
2019-03-29 13:22:35 -07:00
Nicolas Vasilache 9ef87c4b6b [MLIR] AffineExpr lightweight value type for operators
This CL proposes adding MLIRContext* to AffineExpr as discussed previously.
This allows the value class to not require the context in its constructor and
makes it a POD that it makes sense to pass by value everywhere.
A list of other RFC CLs will build on this. The RFC CLs are small incremental
pushes of the API which would be a pretty big change otherwise.

Pushing the thinking a little bit more it seems reasonable to use implicit
cast/constructor to/from AffineExpr*.
As this thing evolves, it looks to me like IR (and
probably Parser, for not so good reasons) want to operate on AffineExpr* and
the rest of the code wants to operate on the value type.

For this reason I think AffineExprImpl*/AffineExpr may also make sense but I
do not have a particular naming preference.
The jury is still out for naming decision between the above and
AffineExprBase*/AffineExpr or AffineExpr*/AffineExprRef.

PiperOrigin-RevId: 215641596
2019-03-29 13:22:21 -07:00
Nicolas Vasilache 4805e629c5 [MLIR] Use chainable ligthweight wrapper for AffineExpr
This CL argues that the builder API for AffineExpr should be used
with a lightweight wrapper that supports operators chaining.
This CL takes the ill-named AffineExprWrap and proposes a simple
set of operators with builtin constant simplifications.

This allows:
1. removing the getAddMulPureAffineExpr function;
2. avoiding concerns about constant vs non-constant simplifications
at **every call site**;
3. writing the mathematical expressions we want to write without unnecessary
obfuscations.

The points above represent pure technical debt that we don't want to carry on.
It is important to realize that this is not a mere convenience or "just sugar"
but reduction in cognitive overhead.

This thinking can be pushed significantly further, I have added some comments
with some basic ideas but we could make AffineMap, AffineApply and other
objects that use map applications more functional and value-based.

I am putting this out to get a first batch of reviews and see what people
think.

I think in my preferred design I would have the Builder directly return such
AffineExprPtr objects by value everywhere and avoid the boilerplate explicit
creations that I am doing by hand at this point.

Yes this AffineExprPtr would implicitly convert to AffineExpr* because that is
what it is.

PiperOrigin-RevId: 215641317
2019-03-29 13:22:07 -07:00
Uday Bondhugula 041817a45e Introduce loop body skewing / loop pipelining / loop shifting utility.
- loopBodySkew shifts statements of a loop body by stmt-wise delays, and is
  typically meant to be used to:
  - allow overlap of non-blocking start/wait until completion operations with
    other computation
  - allow shifting of statements (for better register
    reuse/locality/parallelism)
  - software pipelining (when applied to the innermost loop)
- an additional argument specifies whether to unroll the prologue and epilogue.
- add method to check SSA dominance preservation.
- add a fake loop pipeline pass to test this utility.

Sample input/output are below. While on this, fix/add following:

- fix minor bug in getAddMulPureAffineExpr
- add additional builder methods for common affine map cases
- fix const_operand_iterator's for ForStmt, etc. When there is no such thing
  as 'const MLValue', the iterator shouldn't be returning const MLValue's.
  Returning MLValue is const correct.

Sample input/output examples:

1) Simplest case: shift second statement by one.

Input:

for %i = 0 to 7 {
  %y = "foo"(%i) : (affineint) -> affineint
  %x = "bar"(%i) : (affineint) -> affineint
}

Output:

#map0 = (d0) -> (d0 - 1)
mlfunc @loop_nest_simple1() {
  %c8 = constant 8 : affineint
  %c0 = constant 0 : affineint
  %0 = "foo"(%c0) : (affineint) -> affineint
  for %i0 = 1 to 7 {
    %1 = "foo"(%i0) : (affineint) -> affineint
    %2 = affine_apply #map0(%i0)
    %3 = "bar"(%2) : (affineint) -> affineint
  }
  %4 = affine_apply #map0(%c8)
  %5 = "bar"(%4) : (affineint) -> affineint
  return
}

2) DMA overlap: shift dma.wait and compute by one.

Input
  for %i = 0 to 7 {
    %pingpong = affine_apply (d0) -> (d0 mod 2) (%i)
    "dma.enqueue"(%pingpong) : (affineint) -> affineint
    %pongping = affine_apply (d0) -> (d0 mod 2) (%i)
    "dma.wait"(%pongping) : (affineint) -> affineint
    "compute1"(%pongping) : (affineint) -> affineint
  }

Output

#map0 = (d0) -> (d0 mod 2)
#map1 = (d0) -> (d0 - 1)
#map2 = ()[s0] -> (s0 + 7)
mlfunc @loop_nest_dma() {
  %c8 = constant 8 : affineint
  %c0 = constant 0 : affineint
  %0 = affine_apply #map0(%c0)
  %1 = "dma.enqueue"(%0) : (affineint) -> affineint
  for %i0 = 1 to 7 {
    %2 = affine_apply #map0(%i0)
    %3 = "dma.enqueue"(%2) : (affineint) -> affineint
    %4 = affine_apply #map1(%i0)
    %5 = affine_apply #map0(%4)
    %6 = "dma.wait"(%5) : (affineint) -> affineint
    %7 = "compute1"(%5) : (affineint) -> affineint
  }
  %8 = affine_apply #map1(%c8)
  %9 = affine_apply #map0(%8)
  %10 = "dma.wait"(%9) : (affineint) -> affineint
  %11 = "compute1"(%9) : (affineint) -> affineint
  return
}

3) With arbitrary affine bound maps:

Shift last two statements by two.

Input:

  for %i = %N to ()[s0] -> (s0 + 7)()[%N] {
    %y = "foo"(%i) : (affineint) -> affineint
    %x = "bar"(%i) : (affineint) -> affineint
    %z = "foo_bar"(%i) : (affineint) -> (affineint)
    "bar_foo"(%i) : (affineint) -> (affineint)
  }

Output

#map0 = ()[s0] -> (s0 + 1)
#map1 = ()[s0] -> (s0 + 2)
#map2 = ()[s0] -> (s0 + 7)
#map3 = (d0) -> (d0 - 2)
#map4 = ()[s0] -> (s0 + 8)
#map5 = ()[s0] -> (s0 + 9)

  for %i0 = %arg0 to #map0()[%arg0] {
    %0 = "foo"(%i0) : (affineint) -> affineint
    %1 = "bar"(%i0) : (affineint) -> affineint
  }
  for %i1 = #map1()[%arg0] to #map2()[%arg0] {
    %2 = "foo"(%i1) : (affineint) -> affineint
    %3 = "bar"(%i1) : (affineint) -> affineint
    %4 = affine_apply #map3(%i1)
    %5 = "foo_bar"(%4) : (affineint) -> affineint
    %6 = "bar_foo"(%4) : (affineint) -> affineint
  }
  for %i2 = #map4()[%arg0] to #map5()[%arg0] {
    %7 = affine_apply #map3(%i2)
    %8 = "foo_bar"(%7) : (affineint) -> affineint
    %9 = "bar_foo"(%7) : (affineint) -> affineint
  }

4) Shift one by zero, second by one, third by two

  for %i = 0 to 7 {
    %y = "foo"(%i) : (affineint) -> affineint
    %x = "bar"(%i) : (affineint) -> affineint
    %z = "foobar"(%i) : (affineint) -> affineint
  }

#map0 = (d0) -> (d0 - 1)
#map1 = (d0) -> (d0 - 2)
#map2 = ()[s0] -> (s0 + 7)

  %c9 = constant 9 : affineint
  %c8 = constant 8 : affineint
  %c1 = constant 1 : affineint
  %c0 = constant 0 : affineint
  %0 = "foo"(%c0) : (affineint) -> affineint
  %1 = "foo"(%c1) : (affineint) -> affineint
  %2 = affine_apply #map0(%c1)
  %3 = "bar"(%2) : (affineint) -> affineint
  for %i0 = 2 to 7 {
    %4 = "foo"(%i0) : (affineint) -> affineint
    %5 = affine_apply #map0(%i0)
    %6 = "bar"(%5) : (affineint) -> affineint
    %7 = affine_apply #map1(%i0)
    %8 = "foobar"(%7) : (affineint) -> affineint
  }
  %9 = affine_apply #map0(%c8)
  %10 = "bar"(%9) : (affineint) -> affineint
  %11 = affine_apply #map1(%c8)
  %12 = "foobar"(%11) : (affineint) -> affineint
  %13 = affine_apply #map1(%c9)
  %14 = "foobar"(%13) : (affineint) -> affineint

5) SSA dominance violated; no shifting if a shift is specified for the second
statement.

  for %i = 0 to 7 {
    %x = "foo"(%i) : (affineint) -> affineint
    "bar"(%x) : (affineint) -> affineint
  }

PiperOrigin-RevId: 214975731
2019-03-29 13:21:26 -07:00