Commit Graph

369 Commits

Author SHA1 Message Date
River Riddle 66ed7d6d83 Update the OperationFolder to find a valid insertion point when materializing constants.
The OperationFolder currently just inserts into the entry block of a Function, but regions may be isolated above, i.e. explicit capture only, and blindly inserting constants may break the invariants of these regions.

PiperOrigin-RevId: 254987796
2019-06-25 09:43:21 -07:00
Nicolas Vasilache dac75ae5ff Split test-specific passes out of mlir-opt
Instead put their impl in test/lib and link them into mlir-test-opt

PiperOrigin-RevId: 254837439
2019-06-24 17:47:12 -07:00
River Riddle b67cab4c44 Update CSE to respect nested regions that are isolated from above. This cl also removes the unused 'NthRegionIsIsolatedFromAbove' trait as it was replaced with a more general 'IsIsolatedFromAbove'.
PiperOrigin-RevId: 254709704
2019-06-24 13:44:53 -07:00
River Riddle 704a7fb13e Add support for 1->N type mappings in the dialect conversion infrastructure. To support these mappings a hook must be overridden on the type converter: 'materializeConversion' :to generate a cast operation from the new types to the old type. This operation is automatically erased if all uses are removed, otherwise it remains in the IR for the user to handle.
PiperOrigin-RevId: 254411383
2019-06-22 09:16:06 -07:00
River Riddle 9764ae3f24 Refactor the TypeConverter to support more robust type conversions:
* Support for 1->0 type mappings, i.e. when the argument is being removed.
* Reordering types when converting a type signature.
* Adding new inputs when converting a type signature.

This cl also lays down the initial foundation for supporting 1->N type mappings, but full support will come in a followup.

Moving forward, function signature changes will be driven by populating a SignatureConversion instance. This class contains all of the necessary information for adding/removing/remapping function signatures; e.g. addInputs, addResults, remapInputs, etc.

PiperOrigin-RevId: 254064665
2019-06-19 23:08:33 -07:00
Geoffrey Martin-Noble fd99b6ce97 Remove unnecessary -verify-diagnostics
These were likely added in error because of confusion about the flag when it was just called "-verify". The extra flag doesn't cause much harm, but it does make mlir-opt do more work and clutter the RUN line

PiperOrigin-RevId: 254037016
2019-06-19 23:08:13 -07:00
Geoffrey Martin-Noble d7d69569e7 Rename -verify mlir-opt flag to -verify-expected-diagnostics
This name has caused some confusion because it suggests that it's running op verification (and that this verification isn't getting run by default).

PiperOrigin-RevId: 254035268
2019-06-19 23:08:03 -07:00
Andy Davis 898cf0e968 LoopFusion: adds support for computing forward computation slices, which will enable fusion of consumer loop nests into their producers in subsequent CLs.
PiperOrigin-RevId: 253601994
2019-06-19 23:03:42 -07:00
River Riddle 6a0555a875 Refactor SplatElementsAttr to inherit from DenseElementsAttr as opposed to being a separate Attribute type. DenseElementsAttr provides a better internal representation for splat values as well as better API for accessing elements.
PiperOrigin-RevId: 253138287
2019-06-19 23:01:52 -07:00
River Riddle 5da741f671 Add basic cost modeling to the dialect conversion infrastructure. This initial cost model favors specific patterns based upon two criteria:
1) Lowest minimum pattern stack depth when legalizing.
  - This leads the system to favor patterns that have lower legalization stacks, i.e. represent a more direct mapping to the target.

2)  Pattern benefit.
  - When considering multiple patterns with the same legalization depth, this favors patterns with a larger specified benefit.

PiperOrigin-RevId: 252713470
2019-06-19 22:59:06 -07:00
Amit Sabne 7a43da6060 Loop invariant code motion - remove reliance on getForwardSlice. Add more tests.
--

PiperOrigin-RevId: 250950703
2019-06-01 20:13:30 -07:00
Rasmus Munk Larsen 861c55e150 Add a rank op to MLIR. Example:
%1 = rank %0 : index

--

PiperOrigin-RevId: 250505411
2019-06-01 20:06:51 -07:00
Andy Davis a560f2c646 Affine Loop Fusion Utility Module (1/n).
*) Adds LoopFusionUtils which will expose a set of loop fusion utilities (e.g. dependence checks, fusion cost/storage reduction, loop fusion transformation) for use by loop fusion algorithms. Support for checking block-level fusion-preventing dependences is added in this CL (additional loop fusion utilities will be added in subsequent CLs).
    *) Adds TestLoopFusion test pass for testing LoopFusionUtils at a fine granularity.
    *) Adds unit test for testing dependence check for block-level fusion-preventing dependences.

--

PiperOrigin-RevId: 249861071
2019-06-01 20:00:23 -07:00
River Riddle 1a100849c4 Add support for saving and restoring the insertion point of a FuncBuilder. This also updates the edsc::ScopedContext to use a single builder that saves/restores insertion points. This is necessary for using edscs within RewritePatterns.
--

PiperOrigin-RevId: 248812645
2019-05-20 13:46:35 -07:00
Andy Davis 12e31761ce Fixes a small bug in computing dependence direction vectors, where equality constraint can be created on the wrong loop IVs when source/sink of the dependence are at different loop depths. Adds unit tests for cases where source/sink of the dependence are at varying loop depths.
--

PiperOrigin-RevId: 248627490
2019-05-20 13:45:00 -07:00
Tamas Berghammer 9cc5747a7b Add test for affine-loop-tile pass with a loop of trip count 1
--

PiperOrigin-RevId: 247950156
2019-05-20 13:38:52 -07:00
Chris Lattner 81e478adca rename -memref-dependence-check to -test-memref-dependence-check since it
generates remarks for testing, it isn't itself a transformation.

    While there, upgrade its diagnostic emission to use the streaming interface.

    Prune some unnecessary #includes.

--

PiperOrigin-RevId: 247768062
2019-05-20 13:36:38 -07:00
Andy Davis 0412bf6f09 Add memref dimension bounds as upper/lower bounds on MemRefRegion constraints, to guard against potential over-approximation from projection.
--

PiperOrigin-RevId: 247431201
2019-05-10 19:25:53 -07:00
Andy Davis 6254a42d58 Fix bug in DmaGenerate pass where MemRefRegion union was not propagated to read region.
Also cleaned up dma-generate.mlir a bit.

--

PiperOrigin-RevId: 247417358
2019-05-10 19:25:44 -07:00
River Riddle e088f93f0d Simplify the parser/printer of ConstantOp now that all attributes have types. This has the added benefit of removing type redundancy from the pretty form. As a consequence, IntegerAttr/FloatAttr will now always print the type even if it is i64/f64.
--

PiperOrigin-RevId: 247295828
2019-05-10 19:24:30 -07:00
Jacques Pienaar a1b24a0e08 Verify that attribute type and constant op return type matches.
--

PiperOrigin-RevId: 247263129
2019-05-10 19:24:14 -07:00
Geoffrey Martin-Noble c34386e3e5 CmpFOp. Add float comparison op
This closely mirrors the llvm fcmp instruction, defining 16 different predicates

    Constant folding is unsupported for NaN and Inf because there's no way to represent those as constants at the moment

--

PiperOrigin-RevId: 246932358
2019-05-10 19:22:58 -07:00
Geoffrey Martin-Noble c4891378e2 Add split-input-file to constant fold test
Better to keep tests as separate as possible

--

PiperOrigin-RevId: 246900564
2019-05-10 19:22:50 -07:00
Alex Zinenko d3380a504f Change syntax of regions in the generic form of operations
The generic form of operations currently supports optional regions to be
    located after the operation type.  As we are going to add a type to each
    region in a leading position in the region syntax, similarly to functions, it
    becomes ambiguous to have regions immediately after the operation type.  Put
    regions between operands the optional list of successors in the generic
    operation syntax and wrap them in parentheses.  The effect on the exisitng IR
    syntax is minimal since only three operations (`affine.for`, `affine.if` and
    `gpu.kernel`) currently use regions.

--

PiperOrigin-RevId: 246787087
2019-05-06 08:29:48 -07:00
Nicolas Vasilache 258e8d9ce2 Prepend an "affine-" prefix to Affine pass option names - NFC
Trying to activate both LLVM and MLIR passes in mlir-cpu-runner showed name collisions when registering pass names.
    One possible way of disambiguating that should also work across dialects is to prepend the dialect name to the passes that specifically operate on that dialect.

    With this CL, mlir-cpu-runner tests still run when both LLVM and MLIR passes are registered

--

PiperOrigin-RevId: 246539917
2019-05-06 08:26:44 -07:00
River Riddle b14c4b4ca8 Add support for basic remark diagnostics. This is the minimal functionality needed to separate notes from remarks. It also provides a starting point to start building out better remark infrastructure.
--

PiperOrigin-RevId: 246175216
2019-05-06 08:24:02 -07:00
Smit Hinsu c9b0540b9c Make identity cast operations with the same operand and result types legal
Instead, fold such operations. This way callers don't need to conditionally create cast operations depending on if a value already has the target type.

    Also, introduce areCastCompatible to allow cast users to verify that the generated op will be valid before creating the operation.

    TESTED with unit tests

--

PiperOrigin-RevId: 245606133
2019-05-06 08:19:37 -07:00
Feng Liu 5c757087c7 Apply patterns repeatly if the function is modified
During the pattern rewrite, if the function is changed, i.e. ops created,
    deleted or swapped, the pattern rewriter needs to re-scan the function entirely
    and apply the patterns again, so the patterns whose root ops have been popped
    out from the working list nor an immediate users of the changed ops can be
    reconsidered.

    A command line flag is added to set the max number of iterations rescanning the
    function for pattern match. If the rewrite doesn' converge after this number,
    this compiling will continue and the result can be sub-optimal.

    One unit test is updated because this change fixed the missing optimization opportunities.

--

PiperOrigin-RevId: 244754190
2019-04-23 22:02:16 -07:00
Amit Sabne 7905da656e Loop invariant code motion.
--

PiperOrigin-RevId: 244043679
2019-04-18 11:49:31 -07:00
Smit Hinsu 074cb4292f Fix CHECK-EMPTY directives without trailing colon
There are no empty lines in output for three of these directives so removed
them and replaced the remaining one with 'CHECK-NOT:' as otherwise it is
failing with the following error.

error: found 'CHECK-EMPTY' without previous 'CHECK: line

TESTED = n/a
PiperOrigin-RevId: 243288605
2019-04-18 11:48:01 -07:00
Stephan Herhut af016ba7a4 Add xor bitwise operation to StandardOps.
This adds parsing, printing and some folding/canonicalization.

    Also extends rewriting of subi %0, %0 to handle vectors and tensors.

--

PiperOrigin-RevId: 242448164
2019-04-08 19:17:56 -07:00
Stephan Herhut a8a5c06961 Add and and or bitwise operations to StandardOps.
This adds parsing, printing and some folding/canonicalization.

--

PiperOrigin-RevId: 242409840
2019-04-08 19:17:50 -07:00
River Riddle a8f4b9eeeb Iterate on the operations to fold in TestConstantFold in reverse to remove the need for ConstantFoldHelper to have a flag for insertion at the head of the entry block. This also fixes an asan bug in TestConstantFold due to the iteration order of operations and ConstantFoldHelper's constant insertion placement.
Note: This now means that we cannot fold chains of operations, i.e. where constant foldable operations feed into each other. Given that this is a testing pass solely for constant folding, this isn't really something that we want anyways. Constant fold tests should be simple and direct, with more advanced folding/feeding being tested with the canonicalizer.

--

PiperOrigin-RevId: 242011744
2019-04-05 07:41:52 -07:00
Lei Zhang 4e40c83291 Deduplicate constant folding logic in ConstantFold and GreedyPatternRewriteDriver
There are two places containing constant folding logic right now: the ConstantFold
    pass and the GreedyPatternRewriteDriver. The logic was not shared and started to
    drift apart. We were testing constant folding logic using the ConstantFold pass,
    but lagged behind the GreedyPatternRewriteDriver, where we really want the constant
    folding to happen.

    This CL pulled the logic into utility functions and classes for sharing between
    these two places. A new ConstantFoldHelper class is created to help constant fold
    and de-duplication.

    Also, renamed the ConstantFold pass to TestConstantFold to make it clear that it is
    intended for testing purpose.

--

PiperOrigin-RevId: 241971681
2019-04-05 07:41:32 -07:00
River Riddle 6fa3181329 Remove the non-postorder walk functions from Function/Block/Instruction and rename walkPostOrder to walk.
--

PiperOrigin-RevId: 241965239
2019-04-05 07:41:23 -07:00
Andy Davis d0d1b2a30d Fix bug in LoopTiling where creation of tile-space loop upper bound did not handle symbol operands correctly.
--

PiperOrigin-RevId: 241958502
2019-04-05 07:41:12 -07:00
Andy Davis 55014813e3 Adds dependence analysis support for iteration domains with local variables (enables dependence analysis of loops with non-unit step).
--

PiperOrigin-RevId: 241957023
2019-04-05 07:41:02 -07:00
Nicolas Vasilache f1b12f5a64 Fix test that fails on non-determinism in LowerVectorTransfers
This CL fixes the non-determinism across compilers in an edsc::select expression used in LowerVectorTransfers. This is achieved by factoring the expression out of the function call to ensure a deterministic order of evaluation.
    Since the expression is now factored out, fewer IR is generated and the test is updated accordingly.

--

PiperOrigin-RevId: 241679962
2019-04-03 01:09:13 -07:00
Andy Davis 7c1fc9e795 Enable producer-consumer fusion for liveout memrefs if consumer read region matches producer write region.
--

PiperOrigin-RevId: 241517207
2019-04-02 13:39:50 -07:00
River Riddle 084669e005 Remove MLPatternLoweringPass and rewrite LowerVectorTransfers to use RewritePattern instead.
--

PiperOrigin-RevId: 241455472
2019-04-02 13:39:17 -07:00
Nicolas Vasilache c9d5f3418a Cleanup SuperVectorization dialect printing and parsing.
On the read side,
```
%3 = vector_transfer_read %arg0, %i2, %i1, %i0 {permutation_map: (d0, d1, d2)->(d2, d0)} : (memref<?x?x?xf32>, index, index, index) -> vector<32x256xf32>
```

becomes:

```
%3 = vector_transfer_read %arg0[%i2, %i1, %i0] {permutation_map: (d0, d1, d2)->(d2, d0)} : memref<?x?x?xf32>, vector<32x256xf32>
```

On the write side,

```
vector_transfer_write %0, %arg0, %c3, %c3 {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>, index, index
```

becomes

```
vector_transfer_write %0, %arg0[%c3, %c3] {permutation_map: (d0, d1)->(d0)} : vector<128xf32>, memref<?x?xf32>
```

Documentation will be cleaned up in a followup commit that also extracts a proper .md from the top of the file comments.

PiperOrigin-RevId: 241021879
2019-03-29 17:56:42 -07:00
Nicolas Vasilache 094ca64ab0 Refactor vectorization patterns
This CL removes the reliance of the vectorize pass on the specification of a `fastestVaryingDim` parameter. This parameter is a restriction meant to more easily target a particular loop/memref combination for vectorization and is mainly used for testing.

This also had the side-effect of restricting vectorization patterns to only the ones in which all memrefs were contiguous along the same loop dimension. This simple restriction prevented matmul to vectorize in 2-D.

this CL removes the restriction and adds the matmul test which vectorizes in 2-D along the parallel loops. Support for reduction loops is left for future work.

PiperOrigin-RevId: 240993827
2019-03-29 17:55:36 -07:00
MLIR Team 9d30b36aaf Enable input-reuse fusion to search function arguments for fusion candidates (takes care of a TODO, enables another tutorial test case).
PiperOrigin-RevId: 240979894
2019-03-29 17:54:36 -07:00
River Riddle 106dd08e99 Change the vectorizer test pass to output via diagnostics instead of llvm::outs. This allows for the output to be deterministic when multi-threading is enabled.
PiperOrigin-RevId: 240905858
2019-03-29 17:54:21 -07:00
River Riddle 01140bd137 Change the muli-return syntax for operations. The name of the operation result now contains the number of results that it refers to if the number of results is greater than 1.
Example:
    %call:2 = call @multi_return() : () -> (f32, i32)
    use(%calltensorflow/mlir#0, %calltensorflow/mlir#1)

This cl also adds parser support for uniquely named result values. This means that a test writer can now write something like:
    %foo, %bar = call @multi_return() : () -> (f32, i32)
    use(%foo, %bar)

Note: The printer will still print the collapsed form.
PiperOrigin-RevId: 240860058
2019-03-29 17:51:32 -07:00
MLIR Team 9d9675fc8f Remove overly conservative check in LoopFusion pass (enables fusion in tutorial example).
PiperOrigin-RevId: 240859227
2019-03-29 17:51:16 -07:00
River Riddle af9760fe18 Replace remaining usages of the Instruction class with Operation.
PiperOrigin-RevId: 240777521
2019-03-29 17:50:04 -07:00
Nicolas Vasilache 31442a66ef Cleanup vectorize_1d.mlir test - NFC
This CL splits a large monolithic test function into smaller ones that are each CHECK-LABEL'd

PiperOrigin-RevId: 240684979
2019-03-29 17:49:45 -07:00
Nicolas Vasilache 4dc7af9da8 Make vectorization aware of loop semantics
Now that we have a dependence analysis, we can check that loops are indeed parallel and make vectorization correct.

PiperOrigin-RevId: 240682727
2019-03-29 17:49:30 -07:00
River Riddle 832567b379 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for' and set the namespace of the AffineOps dialect to 'affine'.
PiperOrigin-RevId: 240165792
2019-03-29 17:39:03 -07:00
River Riddle 9c6e92360c NFC: Rename the 'if' operation in the AffineOps dialect to 'affine.if'.
PiperOrigin-RevId: 240071154
2019-03-29 17:36:53 -07:00
Nicolas Vasilache 071ca8da91 Support composition of symbols in AffineApplyOp
This CL revisits the composition of AffineApplyOp for the special case where a symbol
itself comes from an AffineApplyOp.
This is achieved by rewriting such symbols into dims to allow composition to occur mathematically.
The implementation is also refactored to improve readability.

Rationale for locally rewriting symbols as dims:
================================================
The mathematical composition of AffineMap must always concatenate symbols
because it does not have enough information to do otherwise. For example,
composing `(d0)[s0] -> (d0 + s0)` with itself must produce
`(d0)[s0, s1] -> (d0 + s0 + s1)`.

The result is only equivalent to `(d0)[s0] -> (d0 + 2 * s0)` when
applied to the same mlir::Value* for both s0 and s1.
As a consequence mathematical composition of AffineMap always concatenates
symbols.

When AffineMaps are used in AffineApplyOp however, they may specify
composition via symbols, which is ambiguous mathematically. This corner case
is handled by locally rewriting such symbols that come from AffineApplyOp
into dims and composing through dims.

PiperOrigin-RevId: 239791597
2019-03-29 17:30:59 -07:00
Nicolas Vasilache f43388e4ce Port LowerVectorTransfers from EDSC + AST to declarative builders
This CL removes the dependency of LowerVectorTransfers on the AST version of EDSCs which will be retired.

This exhibited a pretty fundamental staging difference in AST-based vs declarative based emission.

Since the delayed creation with an AST was staged, the loop order came into existence after the clipping expressions were computed.
This now changes as the loops first need to be created declaratively in fixed order and then the clipping expressions are created.
Also, due to lack of staging, coalescing cannot be done on the fly anymore and
needs to be done either as a pre-pass (current implementation) or as a local transformation on the generated IR (future work).

Tests are updated accordingly.

PiperOrigin-RevId: 238971631
2019-03-29 17:22:06 -07:00
Uday Bondhugula e1e455f7dd Change parallelism detection test pass to emit a note
- emit a note on the loop being parallel instead of setting a loop attribute
- rename the pass -test-detect-parallel (from -detect-parallel)

PiperOrigin-RevId: 238122847
2019-03-29 17:16:27 -07:00
Uday Bondhugula 9f2781e8dd Fix misc bugs / TODOs / other improvements to analysis utils
- fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent
- make getConstantBoundOnDimSize also return the identifier upper bound
- fix unionBoundingBox to correctly use the divisor and upper bound identified by
  getConstantBoundOnDimSize
- deal with loop step correctly in addAffineForOpDomain (covers most cases now)
- fully compose bound map / operands and simplify/canonicalize before adding
  dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add
  test case there
- expose mlir::isTopLevelSymbol from AffineOps

PiperOrigin-RevId: 238050395
2019-03-29 17:15:27 -07:00
Uday Bondhugula 075090f891 Extend loop unrolling and unroll-jamming to non-matching bound operands and
multi-result upper bounds, complete TODOs, fix/improve test cases.

- complete TODOs for loop unroll/unroll-and-jam. Something as simple as
  "for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written
  as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now.

- update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it
  more powerful as it composes inputs into it

- getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same
  code; refactor and remove one.

- reorganize test cases, write previous ones better; most of these changes are
  "label replacements".

- fix wrongly labeled test cases in unroll-jam.mlir

PiperOrigin-RevId: 238014653
2019-03-29 17:14:12 -07:00
MLIR Team 8d62a6092f Clean up some stray mlfunc/cfgfunc leftovers.
PiperOrigin-RevId: 237936610
2019-03-29 17:13:26 -07:00
Uday Bondhugula ce7e59536c Add a basic model to set tile sizes + some cleanup
- compute tile sizes based on a simple model that looks at memory footprints
  (instead of using the hardcoded default value)
- adjust tile sizes to make them factors of trip counts based on an option
- update loop fusion CL options to allow setting maximal fusion at pass creation
- change an emitError to emitWarning (since it's not a hard error unless the client
  treats it that way, in which case, it can emit one)

$ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir

test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ]

  for %i = 0 to 256 {

for %i0 = 0 to 256 step 4 {
    for %i1 = 0 to 256 step 4 {
      for %i2 = 0 to 250 step 5 {
        for %i3 = #map4(%i0) to #map11(%i0) {
          for %i4 = #map4(%i1) to #map11(%i1) {
            for %i5 = #map4(%i2) to #map12(%i2) {
              %0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>>
              %1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>>
              %2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
              %3 = mulf %0, %1 : vector<64xf32>
              %4 = addf %2, %3 : vector<64xf32>
              store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
            }
          }
        }
      }
    }
  }

PiperOrigin-RevId: 237461836
2019-03-29 17:06:51 -07:00
MLIR Team c1ff9e866e Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union for loop fusion pass (WIP).
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.

PiperOrigin-RevId: 236973261
2019-03-29 16:59:21 -07:00
Uday Bondhugula 5836fae8a0 DMA generation CL flag update
- allow mem capacity to be overridden by command-line flag
- change default fast mem space to 2

PiperOrigin-RevId: 236951598
2019-03-29 16:59:05 -07:00
Uday Bondhugula 7e288e7c19 Add missing run command to fusion test cases - follow up to cl/236882988
PiperOrigin-RevId: 236947383
2019-03-29 16:58:50 -07:00
Uday Bondhugula b34f8d3c83 Fix and improve detectAsMod
- fix for the mod detection
- simplify/avoid the mod at construction (if the dividend is already known to be less
  than the divisor), since the information is available at hand there

PiperOrigin-RevId: 236882988
2019-03-29 16:57:36 -07:00
Uday Bondhugula a77734e185 Make sure that fusion test cases don't have out of bounds accesses
- fix out of bounds test case
- -memref-bound-check on the test/Transforms/loop-fusion.mlir no longer reports any
  errors, before or after -loop-fusion is run

PiperOrigin-RevId: 236757658
2019-03-29 16:56:35 -07:00
MLIR Team 39a1ddeb1c Adds loop attribute as a temporary work around to prevent slice fusion of loop nests containing instructions with side effects (the proper solution will be do use memref read/write regions in the future).
PiperOrigin-RevId: 236733739
2019-03-29 16:56:20 -07:00
Uday Bondhugula 12b9dece8d Bug fix for getConstantBoundOnDimSize
- this was detected when memref-bound-check was run on the output of the
  loop-fusion pass
- the addition (to represent ceildiv as a floordiv) had to be performed only
  for the constant term of the constraint
- update test cases
- memref-bound-check no longer returns an error on the output of this test case

PiperOrigin-RevId: 236731137
2019-03-29 16:56:06 -07:00
River Riddle eeeef090ef Set the namespace of the StandardOps dialect to "std", but add a special case to the parser to allow parsing standard operations without the "std" prefix. This will now allow for the standard dialect to be looked up dynamically by name.
PiperOrigin-RevId: 236493865
2019-03-29 16:54:20 -07:00
Uday Bondhugula 8254aabd4a A simple pass to detect and mark all parallel loops
- detect all parallel loops based on dep information and mark them with a
  "parallel" attribute
- add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method
  to use that (reuse some code from @andydavis (cl/236007073) for this)
- a simple/meaningful way to test memref dep test as well

Ex:

$ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir
#map1 = ()[s0] -> (s0)
func @foo(%arg0: index) {
  %0 = alloc() : memref<1024x1024xvector<64xf32>>
  %1 = alloc() : memref<1024x1024xvector<64xf32>>
  %2 = alloc() : memref<1024x1024xvector<64xf32>>
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg0 {
      for %i2 = 0 to %arg0 {
        %3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>>
        %4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>>
        %5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
        %6 = mulf %3, %4 : vector<64xf32>
        %7 = addf %5, %6 : vector<64xf32>
        store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
      } {parallel: false}
    } {parallel: true}
  } {parallel: true}
  return
}

PiperOrigin-RevId: 236367368
2019-03-29 16:53:03 -07:00
MLIR Team d038e34735 Loop fusion for input reuse.
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.

PiperOrigin-RevId: 236350987
2019-03-29 16:52:35 -07:00
Uday Bondhugula 932e4fb29f Analysis support for floordiv/mod's in loop bounds/
- handle floordiv/mod's in loop bounds for all analysis purposes
- allows fusion slicing to be more powerful
- add simple test cases based on -memref-bound-check
- fusion based test cases in follow up CLs

PiperOrigin-RevId: 236328551
2019-03-29 16:52:04 -07:00
Uday Bondhugula 58889884a2 Change some of the debug messages to use emitError / emitWarning / emitNote - NFC
PiperOrigin-RevId: 236169676
2019-03-29 16:50:29 -07:00
Uday Bondhugula a003179367 Detect more trivially redundant constraints better
- detect more trivially redundant constraints in
  FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to
  constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 +
  64j - 8 >= 0) is now detected.  The method is still linear-time and does
  a single scan over the FlatAffineConstraints buffer. This detection is useful
  and needed to eliminate redundant constraints generated after FM elimination.

- update GCDTightenInequalities so that we also normalize by the GCD while at
  it. This way more constraints will show up as redundant (232i - 203 >= 0
  becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call
  normalizeConstraintsByGCD.

- In FourierMotzkinEliminate, call GCDTightenInequalities and
  normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints()
  - so that more redundant constraints are detected. As a result, redundancy
    due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >=
    0 is now detected (here only i >= 7 is non-redundant).

As a result of these, a -memref-bound-check on the added test case runs in 16ms
instead of 1.35s (opt build) and no longer returns a conservative result.

PiperOrigin-RevId: 235983550
2019-03-29 16:47:59 -07:00
MLIR Team c2766f3760 Fix bug in memref region computation with slice loop bounds. Adds loop IV values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system.
PiperOrigin-RevId: 235952912
2019-03-29 16:47:29 -07:00
River Riddle cdbfd48471 Rewrite the dominance info classes to allow for operating on arbitrary control flow within operation regions. The CSE pass is also updated to properly handle nested dominance.
PiperOrigin-RevId: 235742627
2019-03-29 16:43:35 -07:00
Uday Bondhugula a1dad3a5d9 Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox
- compute slices precisely where the destination iteration depends on multiple source
  iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case

PiperOrigin-RevId: 234714069
2019-03-29 16:33:11 -07:00
Uday Bondhugula 5021dc4fa0 DMA placement update - hoist loops invariant DMAs
- hoist DMAs past all loops immediately surrounding the region that the latter
  is invariant on - do this at DMA generation time itself

PiperOrigin-RevId: 234628447
2019-03-29 16:32:41 -07:00
Uday Bondhugula f97c1c5b06 Misc. updates/fixes to analysis utils used for DMA generation; update DMA
generation pass to make it drop certain assumptions, complete TODOs.

- multiple fixes for getMemoryFootprintBytes
  - pass loopDepth correctly from getMemoryFootprintBytes()
  - use union while computing memory footprints

- bug fixes for addAffineForOpDomain
  - take into account loop step
  - add domains of other loop IVs in turn that might have been used in the bounds

- dma-generate: drop assumption of "non-unit stride loops being tile space loops
  and skipping those and recursing to inner depths"; DMA generation is now purely
  based on available fast mem capacity and memory footprint's calculated

- handle memory region compute failures/bailouts correctly from dma-generate

- loop tiling cleanup/NFC

- update some debug and error messages to use emitNote/emitError in
  pipeline-data-transfer pass - NFC

PiperOrigin-RevId: 234245969
2019-03-29 16:30:26 -07:00
MLIR Team 58aa383e60 Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region.
PiperOrigin-RevId: 234240958
2019-03-29 16:30:11 -07:00
MLIR Team 8f5f2c765d LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests.
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.

PiperOrigin-RevId: 234158370
2019-03-29 16:29:26 -07:00
MLIR Team affb2193cc Update direction vector computation to use FlatAffineConstraints::getLower/UpperBounds.
Update FlatAffineConstraints::getLower/UpperBounds to project to the identifier for which bounds are being computed. This change enables computing bounds on an identifier which were previously dependent on the bounds of another identifier.

PiperOrigin-RevId: 234017514
2019-03-29 16:28:25 -07:00
Uday Bondhugula 00860662a2 Generate dealloc's for alloc's of pipeline-data-transfer
- for the DMA transfers being pipelined through double buffering, generate
  deallocs for the double buffers being alloc'ed

This change is along the lines of cl/233502632. We initially wanted to experiment with
scoped allocation - so the deallocation's were usually not necessary; however, they are
needed even with scoped allocations in some situations - for eg. when the enclosing loop
gets unrolled. The dealloc serves as an end of lifetime marker.

PiperOrigin-RevId: 233653463
2019-03-29 16:25:53 -07:00
Uday Bondhugula 8b3f841daf Generate dealloc's for the alloc's of dma-generate.
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass

Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.

PiperOrigin-RevId: 233502632
2019-03-29 16:24:08 -07:00
Uday Bondhugula f5eed89df0 Fix + cleanup for getMemRefRegion()
- determine symbols for the memref region correctly

- this wasn't exposed earlier since we didn't have any test cases where the
  portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of
  one IV  depending on other IVs within that part)

PiperOrigin-RevId: 233493872
2019-03-29 16:23:53 -07:00
Uday Bondhugula c419accea3 Automated rollback of changelist 232728977.
PiperOrigin-RevId: 232944889
2019-03-29 16:21:38 -07:00
River Riddle 13a45c7194 Add verification for AffineApply/AffineFor/AffineIf dimension and symbol operands. This also allows a DimOp to be a valid dimension identifier if its operand is a valid dimension identifier.
PiperOrigin-RevId: 232923468
2019-03-29 16:21:08 -07:00
River Riddle a886625813 Modify the canonicalizations of select and muli to use the fold hook.
This also extends the greedy pattern rewrite driver to add the operands of folded operations back to the worklist.

PiperOrigin-RevId: 232878959
2019-03-29 16:20:06 -07:00
Uday Bondhugula 4ba8c9147d Automated rollback of changelist 232717775.
PiperOrigin-RevId: 232807986
2019-03-29 16:19:33 -07:00
River Riddle fd2d7c857b Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace
the AffineOps dialect with 'affine'.

PiperOrigin-RevId: 232728977
2019-03-29 16:18:59 -07:00
River Riddle 90d10b4e00 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
2019-03-29 16:17:59 -07:00
River Riddle 3227dee15d NFC: Rename affine_apply to affine.apply. This is the first step to adding a namespace to the affine dialect.
PiperOrigin-RevId: 232707862
2019-03-29 16:17:29 -07:00
River Riddle 0c65cf283c Move the AffineFor loop bound folding to a canonicalization pattern on the AffineForOp.
PiperOrigin-RevId: 232610715
2019-03-29 16:16:11 -07:00
River Riddle 10237de8eb Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring:
* AffineStructures has moved to IR.

* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.

* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.

* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.

PiperOrigin-RevId: 232586468
2019-03-29 16:15:41 -07:00
MLIR Team a78edcda5b Loop fusion improvements:
*) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion.
*) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check.

PiperOrigin-RevId: 232477853
2019-03-29 16:13:50 -07:00
Uday Bondhugula b26900dce5 Update dma-generate pass to (1) work on blocks of instructions (instead of just
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors

- change dma-generate pass to work on blocks of instructions (start/end
  iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
  straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
  in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
  generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
  capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
  option (besides command line flag)

- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
  replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods

Eg. output

$ mlir-opt  -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
            ^

$ mlir-opt -debug-only=dma-generate  -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {

PiperOrigin-RevId: 232297044
2019-03-29 16:09:52 -07:00
River Riddle 5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
MLIR Team d7c824451f LoopFusion: insert the source loop nest slice at a depth in the destination loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out.
PiperOrigin-RevId: 231684492
2019-03-29 16:03:23 -07:00
River Riddle a642bb1779 Update tests using affine maps to not rely on specific map numbers in the output IR. This is necessary to remove the dependency on ForInst not numbering the AffineMap bounds it has custom formatting for.
PiperOrigin-RevId: 231634812
2019-03-29 16:03:08 -07:00
River Riddle b6928c945c Standardize the spelling of debug info to "debuginfo" in opt flags.
PiperOrigin-RevId: 231610337
2019-03-29 16:02:38 -07:00
River Riddle 994111238b Fold CallIndirectOp to CallOp when the callee operand is a known constant function.
PiperOrigin-RevId: 231511697
2019-03-29 16:01:23 -07:00
MLIR Team a0f3db4024 Support fusing loop nests which require insertion into a new instruction Block position while preserving dependences, opening up additional fusion opportunities.
- Adds SSA Value edges to the data dependence graph used in the loop fusion pass.

PiperOrigin-RevId: 231417649
2019-03-29 16:00:04 -07:00
River Riddle 755538328b Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231342063
2019-03-29 15:59:30 -07:00
Nicolas Vasilache ae772b7965 Automated rollback of changelist 231318632.
PiperOrigin-RevId: 231327161
2019-03-29 15:42:38 -07:00
River Riddle 5ecef2b3f6 Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231318632
2019-03-29 15:42:08 -07:00
Uday Bondhugula fb679fc2b5 Drop unused result from affine map in test case - NFC
PiperOrigin-RevId: 231008044
2019-03-29 15:38:53 -07:00
Chris Lattner 607d1c2ca7 More updates of tests to move towards single result affine maps.
PiperOrigin-RevId: 230991929
2019-03-29 15:38:38 -07:00
Uday Bondhugula b4a1443508 Update replaceAllMemRefUsesWith to generate single result affine_apply's for
index remapping
- generate a sequence of single result affine_apply's for the index remapping
  (instead of one multi result affine_apply)
- update dma-generate and loop-fusion test cases; while on this, change test cases
  to use single result affine apply ops
- some fusion comment fix/cleanup

PiperOrigin-RevId: 230985830
2019-03-29 15:38:23 -07:00
Uday Bondhugula b588d58c5f Update createAffineComputationSlice to generate single result affine maps
- Update createAffineComputationSlice to generate a sequence of single result
  affine apply ops instead of one multi-result affine apply
- update pipeline-data-transfer test case; while on this, also update the test
  case to use only single result affine maps, and make it more robust to
  change.

PiperOrigin-RevId: 230965478
2019-03-29 15:37:53 -07:00
Uday Bondhugula f94b15c247 Update dma-generate: update for multiple load/store op's per memref
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
  supplied index remap was identity

PiperOrigin-RevId: 230848185
2019-03-29 15:35:38 -07:00
Chris Lattner f60a0ba61c Incremental progress to move the testsuite towards single-result affine_apply
instructions.

PiperOrigin-RevId: 230775607
2019-03-29 15:34:53 -07:00
Uday Bondhugula 72e5c7f428 Minor updates + cleanup to dma-generate
- switch some debug info to emitError
- use a single constant op for zero index to make it easier to write/update
  test cases; avoid creating new constant op's for common zero index cases
- test case cleanup

This is in preparation for an upcoming major update to this pass.

PiperOrigin-RevId: 230728379
2019-03-29 15:34:06 -07:00
River Riddle f319bbbd28 Add a function pass to strip debug info from functions and instructions.
PiperOrigin-RevId: 230654315
2019-03-29 15:33:50 -07:00
MLIR Team b28009b681 Fix single producer check in loop fusion pass.
PiperOrigin-RevId: 230565482
2019-03-29 15:32:20 -07:00
Uday Bondhugula 864d9e02a1 Update fusion cost model + some additional infrastructure and debug information for -loop-fusion
- update fusion cost model to fuse while tolerating a certain amount of redundant
  computation; add cl option -fusion-compute-tolerance
  evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC

PiperOrigin-RevId: 230541857
2019-03-29 15:32:06 -07:00
Uday Bondhugula 92e9d9484c loop unroll update: unroll factor one for a single iteration loop
- unrolling a single iteration loop by a factor of one should promote its body
  into its parent; this makes it consistent with the behavior/expectation that
  unrolling a loop by a factor equal to its trip count makes the loop go away.

PiperOrigin-RevId: 230426499
2019-03-29 15:31:35 -07:00
Uday Bondhugula 94a03f864f Allocate private/local buffers for slices accurately during fusion
- the size of the private memref created for the slice should be based on
  the memref region accessed at the depth at which the slice is being
  materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
  to the region accessed based on the entire domain.

- leads to a significant contraction of the temporary / intermediate memref
  whenever the memref isn't reduced to a single scalar (through store fwd'ing).

Other changes

- update to promoteIfSingleIteration - avoid introducing unnecessary identity
  map affine_apply from IV; makes it much easier to write and read test cases
  and pass output for all passes that use promoteIfSingleIteration; loop-fusion
  test cases become much simpler

- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
  'domInstFilter' could be one of the ops erased due to a memref replacement in
  it.

- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
  missing (the latter need not always be 1); add lbFloorDivisors output argument

- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape

PiperOrigin-RevId: 230405218
2019-03-29 15:30:31 -07:00
MLIR Team 71495d58a7 Handle escaping memrefs in loop fusion pass:
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).

PiperOrigin-RevId: 230398630
2019-03-29 15:30:14 -07:00
Uday Bondhugula c1880a857d AffineExpr pretty print - add missing handling to print expr * - 1 as -expr
- print multiplication by -1 as unary negate; expressions like s0 * -1, d0 * -1
  + d1 will now appear as -s0, -d0 + d1 resp.
- a minor cleanup while on printAffineExprInternal

PiperOrigin-RevId: 230222151
2019-03-29 15:28:44 -07:00
River Riddle 512d87cefc Add a constant folding hook to ExtractElementOp to fold extracting the element of a constant. This also adds a 'getValue' function to DenseElementsAttr and SparseElementsAttr to get the element at a constant index.
PiperOrigin-RevId: 230098938
2019-03-29 15:28:28 -07:00
Uday Bondhugula d7522eb264 Fix test cases that were accessing out of bounds to start with (b/123072438)
- detected with memref-bound-check

- fixes b/123072438; while on this, fix another test case which was reported
  out of bounds

PiperOrigin-RevId: 229978187
2019-03-29 15:27:29 -07:00
MLIR Team c4237ae990 LoopFusion: Creates private MemRefs which are used only by operations in the fused loop.
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).

PiperOrigin-RevId: 229936698
2019-03-29 15:26:15 -07:00
Nicolas Vasilache 24e5a72dac Fix AffineApply corner case
This CL adds a test reported by andydavis@ and fixes the corner case that
appears when operands do not come from an AffineApply and no Dim composition
is needed.

In such cases, we would need to create an empty map which is disallowed.
The composition in such cases becomes trivial: there is no composition.

This CL also updates the name AffineNormalizer to AffineApplyNormalizer.

PiperOrigin-RevId: 229819234
2019-03-29 15:25:59 -07:00
Uday Bondhugula 40f7535571 Update stale / target-specific information in comments - NFC
PiperOrigin-RevId: 229800834
2019-03-29 15:25:29 -07:00
Nicolas Vasilache 4573a8da9a Fix improperly indexed DimOp in LowerVectorTransfers.cpp
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.

The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`

Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`

and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:

```
    %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```

But this was actuallyincorrect.

This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.

PiperOrigin-RevId: 229622766
2019-03-29 15:24:13 -07:00
River Riddle 5843e5a7c0 Add a canonicalization pattern to remove Dealloc operations if the memref is an AllocOp that is only used by Dealloc operations.
PiperOrigin-RevId: 229606558
2019-03-29 15:23:13 -07:00
River Riddle ada685f352 Add canonicalization to remove AllocOps if there are no uses. AllocOp has side effects on the heap, but can still be deleted if it has zero uses.
PiperOrigin-RevId: 229596556
2019-03-29 15:22:28 -07:00
MLIR Team 27d067e164 LoopFusion improvements:
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.

PiperOrigin-RevId: 229575126
2019-03-29 15:21:59 -07:00
River Riddle ed26dd0421 Add a canonicalization pattern for conditional branch to fold constant branch conditions.
PiperOrigin-RevId: 229242007
2019-03-29 15:14:37 -07:00
MLIR Team 38c2fe3158 LoopFusion: automate selection of source loop nest slice depth and destination loop nest insertion depth based on a simple cost model (cost model can be extended/replaced at a later time).
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.

PiperOrigin-RevId: 229219757
2019-03-29 15:13:53 -07:00
Nicolas Vasilache 0ab81776aa Fix typo in lower_vector_transfers.mlir
PiperOrigin-RevId: 229010160
2019-03-29 15:12:40 -07:00
Nicolas Vasilache d734c50c5f [MLIR] Clip all access dimensions during LowerVectorTransfers
This CL adds a short term remedy to an issue that was found during execution
tests.

Lowering of vector transfer ops uses the permutation map to determine which
ForInst have been super-vectorized. During materialization to HW vector sizes
however, some of those dimensions may be fully unrolled and do not appear in
the permutation map.
Such dimensions were then not clipped and may have accessed out of bounds.

This CL conservatively clips all dimensions to ensure no out of bounds access.
The longer term solution is still up for debate but will probably require
either passing more information between Materialization and lowering, or just
merging the 2 passes.

PiperOrigin-RevId: 228980787
2019-03-29 15:12:26 -07:00
Uday Bondhugula c35d6b4f2d Drop -canonicalize from -dma-generate test case cmd
- should be testing on the output of -dma-generate and not '-dma-generate
  -canonicalize'; save trouble for those updating -canonicalize in the future!

PiperOrigin-RevId: 228915192
2019-03-29 15:11:26 -07:00
Lei Zhang 311af4abf3 Const fold splat vectors/tensors in standard add, sub, and mul ops
The const folding logic is structurally similar, so use a template
to abstract the common part.

Moved mul(x, 0) to a legalization pattern to be consistent with
mul(x, 1).

Also promoted getZeroAttr() to be a method on Builder since it is
expected to be frequently used.

PiperOrigin-RevId: 228891989
2019-03-29 15:09:55 -07:00
Nicolas Vasilache cfa5831960 Uniformize composition of AffineApplyOp by construction
This CL is the 5th on the path to simplifying AffineMap composition.
This removes the distinction between normalized single-result AffineMap and
more general composed multi-result map.

One nice byproduct of making the implementation driven by single-result is
that the multi-result extension is a trivial change: the implementation is
still single-result and we just use:

```
unsigned idx = getIndexOf(...);
map.getResult(idx);
```

This CL also fixes an AffineNormalizer implementation issue related to symbols.
Namely it stops performing substitutions on symbols in AffineNormalizer and
instead concatenates them all to be consistent with the call to
`AffineMap::compose(AffineMap)`. This latter call to `compose` cannot perform
simplifications of symbols coming from different maps based on positions only:
i.e. dims are applied and renumbered but symbols must be concatenated.

The only way to determine whether symbols from different AffineApply are the
same is to look at the concrete values. The canonicalizeMapAndOperands is thus
extended with behavior to support replacing operands that appear multiple
times.

Lastly, this CL demonstrates that the implementation is correct by rewriting
ComposeAffineMaps using only `makeComposedAffineApply`. The implementation
uses a matcher because AffineApplyOp are introduced as composed operations on
the fly instead of iteratively forwardSubstituting. For this purpose, a walker
would revisit freshly introduced AffineApplyOp. Regardless, ComposeAffineMaps
is scheduled to disappear, this CL replaces the implementation based on
iterative `forwardSubstitute` by a composed-by-construction
`makeComposedAffineApply`.
Remaining calls to `forwardSubstitute` will be removed in the next CL.

PiperOrigin-RevId: 228830443
2019-03-29 15:08:40 -07:00
Uday Bondhugula 2370c601ba Add safeguard against FM explosion
- FM has a worst case exponential complexity. For our purposes, this worst case
  is rarely expected, but could still appear due to improperly constructed
  constraints (a logical/memory error in other methods for eg.) or artificially
  created arbitrarily complex integer sets (adversarial / fuzz tests).

  Add a check to detect such an explosion in the number of constraints and
  conservatively return false from isEmpty() (instead of running out of memory
  or running for too long).

- Add an artifical virus test case.

PiperOrigin-RevId: 228753496
2019-03-29 15:07:55 -07:00
Alex Zinenko 9003490287 Implement branch-free single-division lowering of affine division/remainder
This implements the lowering of `floordiv`, `ceildiv` and `mod` operators from
affine expressions to the arithmetic primitive operations.  Integer division
rules in affine expressions explicitly require rounding towards either negative
or positive infinity unlike machine implementations that round towards zero.
In the general case, implementing `floordiv` and `ceildiv` using machine signed
division requires computing both the quotient and the remainder.  When the
divisor is positive, this can be simplified by adjusting the dividend and the
quotient by one and switching signs.

In the current use cases, we are unlikely to encounter affine expressions with
negative divisors (affine divisions appear in loop transformations such as
tiling that guarantee that divisors are positive by construction).  Therefore,
it is reasonable to use branch-free single-division implementation.  In case of
affine maps, divisors can only be literals so we can check the sign and
implement the case for negative divisors when the need arises.

The affine lowering pass can still fail when applied to semi-affine maps
(division or modulo by a symbol).

PiperOrigin-RevId: 228668181
2019-03-29 15:07:40 -07:00
Uday Bondhugula 742c37abc9 Fix DMA overlap pass buffer mapping
- the double buffer should be indexed (iv floordiv step) % 2 and NOT (iv % 2);
  step wasn't being accounted for.

- fix test cases, enable failing test cases

PiperOrigin-RevId: 228635726
2019-03-29 15:07:10 -07:00
Uday Bondhugula 303c09299f Fix affine expr flattener bug + improve simplification in a particular scenario
- fix visitDivExpr: constraints constructed for localVarCst used the original
  divisor instead of the simplified divisor; fix this. Add a simple test case
  in memref-bound-check that reproduces this bug - although this was encountered in the
  context of slicing for fusion.

- improve mod expr flattening: when flattening mod expressions,
  cancel out the GCD of the numerator and denominator so that we can get a
  simpler flattened form along with a simpler floordiv local var for it

PiperOrigin-RevId: 228539928
2019-03-29 15:06:11 -07:00
Nicolas Vasilache 1f78d63f05 [MLIR] Make SuperVectorization use normalized AffineApplyOp
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL uses the simpler single-result unbounded AffineApplyOp in the
MaterializeVectors pass.

PiperOrigin-RevId: 228469085
2019-03-29 15:05:55 -07:00
Nicolas Vasilache c6f798a976 Introduce AffineMap::compose(AffineMap)
This CL is the 2nd on the path to simplifying AffineMap composition.
This CL uses the now accepted `AffineExpr::compose(AffineMap)` to
implement `AffineMap::compose(AffineMap)`.

Implications of keeping the simplification function in
Analysis are documented where relevant.

PiperOrigin-RevId: 228276646
2019-03-29 15:04:20 -07:00
Uday Bondhugula e94ba6815a Fix 0-d memref corner case for getMemRefRegion()
- fix crash on test/Transforms/canonicalize.mlir with
  -memref-bound-check

PiperOrigin-RevId: 228268486
2019-03-29 15:03:50 -07:00
Nicolas Vasilache c449e46ceb Introduce AffineExpr::compose(AffineMap)
This CL is the 1st on the path to simplifying AffineMap composition.
This CL uses the now accepted AffineExpr.replaceDimsAndSymbols to
implement `AffineExpr::compose(AffineMap)`.

Arguably, `simplifyAffineExpr` should be part of IR and not Analysis but
this CL does not yet pull the trigger on that.

PiperOrigin-RevId: 228265845
2019-03-29 15:03:36 -07:00
Uday Bondhugula 21baf86a2f Extend loop-fusion's slicing utility + other fixes / updates
- refactor toAffineFromEq and the code surrounding it; refactor code into
  FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
  identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
  instead, don't do anything; change cmdline flags
  src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
  a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
  never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
  IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
  all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
  we didn't have any non-hyperrectangular ones.

PiperOrigin-RevId: 228265152
2019-03-29 15:03:20 -07:00
Uday Bondhugula b934d75b8f Convert expr - c * (expr floordiv c) to expr mod c in AffineExpr
- Detect 'mod' to replace the combination of floordiv, mul, and subtract when
  possible at construction time; when 'c' is a power of two, this reduces the number of
  operations; also more compact and readable. Update simplifyAdd for this.

  On a side note:
  - with the affine expr flattening we have, a mod expression like d0 mod c
    would be flattened into d0 - c * q,  c * q <= d0 <= c*q + c - 1, with 'q'
    being added as the local variable (q = d0 floordiv c); as a result, a mod
    was turned into a floordiv whenever the expression was reconstructed back,
    i.e., as  d0 - c * (d0 floordiv c); as a result of this change, we recover
    the mod back.

- rename SimplifyAffineExpr -> SimplifyAffineStructures (pass had been renamed but
  the file hadn't been).

PiperOrigin-RevId: 228258120
2019-03-29 15:02:56 -07:00
Nicolas Vasilache 7c0bbe0939 Iterate on vector rather than DenseMap during AffineMap normalization
This CL removes a flakyness associated to a spurious iteration on DenseMap
iterators when normalizing AffineMap.

PiperOrigin-RevId: 228160074
2019-03-29 14:59:37 -07:00
Alex Zinenko c47ed53211 Add simple constant folding hook for CmpIOp
Integer comparisons can be constant folded if both of their arguments are known
constants, which we can compare in the compiler.  This requires implementing
all comparison predicates, but thanks to consistency between LLVM and MLIR
comparison predicates, we have a one-to-one correspondence between predicates
and llvm::APInt comparison functions.  Constant folding of comparsions with
maximum/minimum values of the integer type are left for future work.

This will be used to test the lowering of mod/floordiv/ceildiv in affine
expressions at compile time.

PiperOrigin-RevId: 228077580
2019-03-29 14:59:22 -07:00
Alex Zinenko bc04556cf8 Introduce integer division and remainder operations
This adds signed/unsigned integer division and remainder operations to the
StandardOps dialect.  Two versions are required because MLIR integers are
signless, but the meaning of the leading bit is important in division and
affects the results.  LLVM IR made a similar choice.  Define the operations in
the tablegen file and add simple constant folding hooks in the C++
implementation.  Handle signed division overflow and division by zero errors in
constant folding.  Canonicalization is left for future work.

These operations are necessary to lower affine_apply's down to LLVM IR.

PiperOrigin-RevId: 228077549
2019-03-29 14:58:52 -07:00
Uday Bondhugula 8496f2c30b Complete TODOs / cleanup for loop-fusion utility
- this is CL 1/2 that does a clean up and gets rid of one limitation in an
  underlying method - as a result, fusion works for more cases.
- fix bugs/incomplete impl. in toAffineMapFromEq
- fusing across rank changing reshapes for example now just works

  For eg. given a rank 1 memref to rank 2 memref reshape (64 -> 8 x 8) like this,
  -loop-fusion -memref-dataflow-opt now completely fuses and inlines/store-forward
  to get rid of the temporary:

INPUT

  // Rank 1 -> Rank 2 reshape
  for %i0 = 0 to 64 {
     %v = load %A[%i0]
     store %v, %B[%i0 floordiv 8, i0 mod 8]
  }

  for %i1 = 0 to 8
    for %i2 = 0 to 8
      %w = load %B[%i1, i2]
      "foo"(%w) : (f32) -> ()

OUTPUT

$ mlir-opt -loop-fusion -memref-dataflow-opt fuse_reshape.mlir

#map0 = (d0, d1) -> (d0 * 8 + d1)
mlfunc @fuse_reshape(%arg0: memref<64xf32>) {
  for %i0 = 0 to 8 {
    for %i1 = 0 to 8 {
      %0 = affine_apply #map0(%i0, %i1)
      %1 = load %arg0[%0] : memref<64xf32>
      "foo"(%1) : (f32) -> ()
    }
  }
}

AFAIK, there is no polyhedral tool / compiler that can perform such fusion -
because it's not really standard loop fusion, but possible through a
generalized slicing-based approach such as ours.

PiperOrigin-RevId: 227918338
2019-03-29 14:57:22 -07:00
Nicolas Vasilache 618c6a74c6 [MLIR] Introduce normalized single-result unbounded AffineApplyOp
Supervectorization does not plan on handling multi-result AffineMaps and
non-canonical chains of > 1 AffineApplyOp.
This CL introduces a simpler abstraction and composition of single-result
unbounded AffineApplyOp by using the existing unbound AffineMap composition.

This CL adds a simple API call and relevant tests:

```c++
OpPointer<AffineApplyOp> makeNormalizedAffineApply(
  FuncBuilder *b, Location loc, AffineMap map, ArrayRef<Value*> operands);
```

which creates a single-result unbounded AffineApplyOp.
The operands of AffineApplyOp are not themselves results of AffineApplyOp by
consrtuction.

This represent the simplest possible interface to complement the composition
of (mathematical) AffineMap, for the cases when we are interested in applying
it to Value*.

In this CL the composed AffineMap is not compressed (i.e. there exist operands
that are not part of the result). A followup commit will compress to normal
form.

The single-result unbounded AffineApplyOp abstraction will be used in a
followup CL to support the MaterializeVectors pass.

PiperOrigin-RevId: 227879021
2019-03-29 14:56:37 -07:00
Chris Lattner 7983bbc251 Introduce a simple canonicalization of affine_apply that drops unused dims and
symbols.

Included with this is some other infra:
 - Testcases for other canonicalizations that I will implement next.
 - Some helpers in AffineMap/Expr for doing simple walks without defining whole
   visitor classes.
 - A 'replaceDimsAndSymbols' facility that I'll be using to simplify maps and
   exprs, e.g. to fold one constant into a mapping and to drop/renumber unused dims.
 - Allow index (and everything else) to work in memref's, as we previously
   discussed, to make the testcase easier to write.
 - A "getAffineBinaryExpr" helper to produce a binop when you know the kind as
   an enum.

This line of work will eventually subsume the ComposeAffineApply pass, but it is no where close to that yet :-)

PiperOrigin-RevId: 227852951
2019-03-29 14:56:07 -07:00
Alex Zinenko 0c4ee54198 Merge LowerAffineApplyPass into LowerIfAndForPass, rename to LowerAffinePass
This change is mechanical and merges the LowerAffineApplyPass and
LowerIfAndForPass into a single LowerAffinePass.  It makes a step towards
defining an "affine dialect" that would contain all polyhedral-related
constructs.  The motivation for merging these two passes is based on retiring
MLFunctions and, eventually, transforming If and For statements into regular
operations.  After that happens, LowerAffinePass becomes yet another
legalization.

PiperOrigin-RevId: 227566113
2019-03-29 14:52:52 -07:00
Alex Zinenko fa710c17f4 LowerForAndIf: expand affine_apply's inplace
Existing implementation was created before ML/CFG unification refactoring and
did not concern itself with further lowering to separate concerns.  As a
result, it emitted `affine_apply` instructions to implement `for` loop bounds
and `if` conditions and required a follow-up function pass to lower those
`affine_apply` to arithmetic primitives.  In the unified function world,
LowerForAndIf is mostly a lowering pass with low complexity.  As we move
towards a dialect for affine operations (including `for` and `if`), it makes
sense to lower `for` and `if` conditions directly to arithmetic primitives
instead of relying on `affine_apply`.

Expose `expandAffineExpr` function in LoweringUtils.  Use this function
together with `expandAffineMaps` to emit primitives that implement loop and
branch conditions directly.

Also remove tests that become unnecessary after transforming LowerForAndIf into
a function pass.

PiperOrigin-RevId: 227563608
2019-03-29 14:52:22 -07:00