Commit Graph

218 Commits

Author SHA1 Message Date
Nicolas Vasilache 071ca8da91 Support composition of symbols in AffineApplyOp
This CL revisits the composition of AffineApplyOp for the special case where a symbol
itself comes from an AffineApplyOp.
This is achieved by rewriting such symbols into dims to allow composition to occur mathematically.
The implementation is also refactored to improve readability.

Rationale for locally rewriting symbols as dims:
================================================
The mathematical composition of AffineMap must always concatenate symbols
because it does not have enough information to do otherwise. For example,
composing `(d0)[s0] -> (d0 + s0)` with itself must produce
`(d0)[s0, s1] -> (d0 + s0 + s1)`.

The result is only equivalent to `(d0)[s0] -> (d0 + 2 * s0)` when
applied to the same mlir::Value* for both s0 and s1.
As a consequence mathematical composition of AffineMap always concatenates
symbols.

When AffineMaps are used in AffineApplyOp however, they may specify
composition via symbols, which is ambiguous mathematically. This corner case
is handled by locally rewriting such symbols that come from AffineApplyOp
into dims and composing through dims.

PiperOrigin-RevId: 239791597
2019-03-29 17:30:59 -07:00
Nicolas Vasilache f43388e4ce Port LowerVectorTransfers from EDSC + AST to declarative builders
This CL removes the dependency of LowerVectorTransfers on the AST version of EDSCs which will be retired.

This exhibited a pretty fundamental staging difference in AST-based vs declarative based emission.

Since the delayed creation with an AST was staged, the loop order came into existence after the clipping expressions were computed.
This now changes as the loops first need to be created declaratively in fixed order and then the clipping expressions are created.
Also, due to lack of staging, coalescing cannot be done on the fly anymore and
needs to be done either as a pre-pass (current implementation) or as a local transformation on the generated IR (future work).

Tests are updated accordingly.

PiperOrigin-RevId: 238971631
2019-03-29 17:22:06 -07:00
Uday Bondhugula e1e455f7dd Change parallelism detection test pass to emit a note
- emit a note on the loop being parallel instead of setting a loop attribute
- rename the pass -test-detect-parallel (from -detect-parallel)

PiperOrigin-RevId: 238122847
2019-03-29 17:16:27 -07:00
Uday Bondhugula 9f2781e8dd Fix misc bugs / TODOs / other improvements to analysis utils
- fix for getConstantBoundOnDimSize: floordiv -> ceildiv for extent
- make getConstantBoundOnDimSize also return the identifier upper bound
- fix unionBoundingBox to correctly use the divisor and upper bound identified by
  getConstantBoundOnDimSize
- deal with loop step correctly in addAffineForOpDomain (covers most cases now)
- fully compose bound map / operands and simplify/canonicalize before adding
  dim/symbol to FlatAffineConstraints; fixes false positives in -memref-bound-check; add
  test case there
- expose mlir::isTopLevelSymbol from AffineOps

PiperOrigin-RevId: 238050395
2019-03-29 17:15:27 -07:00
Uday Bondhugula 075090f891 Extend loop unrolling and unroll-jamming to non-matching bound operands and
multi-result upper bounds, complete TODOs, fix/improve test cases.

- complete TODOs for loop unroll/unroll-and-jam. Something as simple as
  "for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written
  as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now.

- update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it
  more powerful as it composes inputs into it

- getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same
  code; refactor and remove one.

- reorganize test cases, write previous ones better; most of these changes are
  "label replacements".

- fix wrongly labeled test cases in unroll-jam.mlir

PiperOrigin-RevId: 238014653
2019-03-29 17:14:12 -07:00
MLIR Team 8d62a6092f Clean up some stray mlfunc/cfgfunc leftovers.
PiperOrigin-RevId: 237936610
2019-03-29 17:13:26 -07:00
Uday Bondhugula ce7e59536c Add a basic model to set tile sizes + some cleanup
- compute tile sizes based on a simple model that looks at memory footprints
  (instead of using the hardcoded default value)
- adjust tile sizes to make them factors of trip counts based on an option
- update loop fusion CL options to allow setting maximal fusion at pass creation
- change an emitError to emitWarning (since it's not a hard error unless the client
  treats it that way, in which case, it can emit one)

$ mlir-opt -debug-only=loop-tile -loop-tile test/Transforms/loop-tiling.mlir

test/Transforms/loop-tiling.mlir:81:3: note: using tile sizes [4 4 5 ]

  for %i = 0 to 256 {

for %i0 = 0 to 256 step 4 {
    for %i1 = 0 to 256 step 4 {
      for %i2 = 0 to 250 step 5 {
        for %i3 = #map4(%i0) to #map11(%i0) {
          for %i4 = #map4(%i1) to #map11(%i1) {
            for %i5 = #map4(%i2) to #map12(%i2) {
              %0 = load %arg0[%i3, %i5] : memref<8x8xvector<64xf32>>
              %1 = load %arg1[%i5, %i4] : memref<8x8xvector<64xf32>>
              %2 = load %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
              %3 = mulf %0, %1 : vector<64xf32>
              %4 = addf %2, %3 : vector<64xf32>
              store %4, %arg2[%i3, %i4] : memref<8x8xvector<64xf32>>
            }
          }
        }
      }
    }
  }

PiperOrigin-RevId: 237461836
2019-03-29 17:06:51 -07:00
MLIR Team c1ff9e866e Use FlatAffineConstraints::unionBoundingBox to perform slice bounds union for loop fusion pass (WIP).
Adds utility to convert slice bounds to a FlatAffineConstraints representation.
Adds utility to FlatAffineConstraints to promote loop IV symbol identifiers to dim identifiers.

PiperOrigin-RevId: 236973261
2019-03-29 16:59:21 -07:00
Uday Bondhugula 5836fae8a0 DMA generation CL flag update
- allow mem capacity to be overridden by command-line flag
- change default fast mem space to 2

PiperOrigin-RevId: 236951598
2019-03-29 16:59:05 -07:00
Uday Bondhugula 7e288e7c19 Add missing run command to fusion test cases - follow up to cl/236882988
PiperOrigin-RevId: 236947383
2019-03-29 16:58:50 -07:00
Uday Bondhugula b34f8d3c83 Fix and improve detectAsMod
- fix for the mod detection
- simplify/avoid the mod at construction (if the dividend is already known to be less
  than the divisor), since the information is available at hand there

PiperOrigin-RevId: 236882988
2019-03-29 16:57:36 -07:00
Uday Bondhugula a77734e185 Make sure that fusion test cases don't have out of bounds accesses
- fix out of bounds test case
- -memref-bound-check on the test/Transforms/loop-fusion.mlir no longer reports any
  errors, before or after -loop-fusion is run

PiperOrigin-RevId: 236757658
2019-03-29 16:56:35 -07:00
MLIR Team 39a1ddeb1c Adds loop attribute as a temporary work around to prevent slice fusion of loop nests containing instructions with side effects (the proper solution will be do use memref read/write regions in the future).
PiperOrigin-RevId: 236733739
2019-03-29 16:56:20 -07:00
Uday Bondhugula 12b9dece8d Bug fix for getConstantBoundOnDimSize
- this was detected when memref-bound-check was run on the output of the
  loop-fusion pass
- the addition (to represent ceildiv as a floordiv) had to be performed only
  for the constant term of the constraint
- update test cases
- memref-bound-check no longer returns an error on the output of this test case

PiperOrigin-RevId: 236731137
2019-03-29 16:56:06 -07:00
River Riddle eeeef090ef Set the namespace of the StandardOps dialect to "std", but add a special case to the parser to allow parsing standard operations without the "std" prefix. This will now allow for the standard dialect to be looked up dynamically by name.
PiperOrigin-RevId: 236493865
2019-03-29 16:54:20 -07:00
Uday Bondhugula 8254aabd4a A simple pass to detect and mark all parallel loops
- detect all parallel loops based on dep information and mark them with a
  "parallel" attribute
- add mlir::isLoopParallel(OpPointer<AffineForOp> ...), and refactor an existing method
  to use that (reuse some code from @andydavis (cl/236007073) for this)
- a simple/meaningful way to test memref dep test as well

Ex:

$ mlir-opt -detect-parallel test/Transforms/parallelism-detection.mlir
#map1 = ()[s0] -> (s0)
func @foo(%arg0: index) {
  %0 = alloc() : memref<1024x1024xvector<64xf32>>
  %1 = alloc() : memref<1024x1024xvector<64xf32>>
  %2 = alloc() : memref<1024x1024xvector<64xf32>>
  for %i0 = 0 to %arg0 {
    for %i1 = 0 to %arg0 {
      for %i2 = 0 to %arg0 {
        %3 = load %0[%i0, %i2] : memref<1024x1024xvector<64xf32>>
        %4 = load %1[%i2, %i1] : memref<1024x1024xvector<64xf32>>
        %5 = load %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
        %6 = mulf %3, %4 : vector<64xf32>
        %7 = addf %5, %6 : vector<64xf32>
        store %7, %2[%i0, %i1] : memref<1024x1024xvector<64xf32>>
      } {parallel: false}
    } {parallel: true}
  } {parallel: true}
  return
}

PiperOrigin-RevId: 236367368
2019-03-29 16:53:03 -07:00
MLIR Team d038e34735 Loop fusion for input reuse.
*) Breaks fusion pass into multiple sub passes over nodes in data dependence graph:
- first pass fuses single-use producers into their unique consumer.
- second pass enables fusing for input-reuse by fusing sibling nodes which read from the same memref, but which do not share dependence edges.
- third pass fuses remaining producers into their consumers (Note that the sibling fusion pass may have transformed a producer with multiple uses into a single-use producer).
*) Fusion for input reuse is enabled by computing a sibling node slice using the load/load accesses to the same memref, and fusion safety is guaranteed by checking that the sibling node memref write region (to a different memref) is preserved.
*) Enables output vector and output matrix computations from KFAC patches-second-moment operation to fuse into a single loop nest and reuse input from the image patches operation.
*) Adds a generic loop utilitiy for finding all sequential loops in a loop nest.
*) Adds and updates unit tests.

PiperOrigin-RevId: 236350987
2019-03-29 16:52:35 -07:00
Uday Bondhugula 932e4fb29f Analysis support for floordiv/mod's in loop bounds/
- handle floordiv/mod's in loop bounds for all analysis purposes
- allows fusion slicing to be more powerful
- add simple test cases based on -memref-bound-check
- fusion based test cases in follow up CLs

PiperOrigin-RevId: 236328551
2019-03-29 16:52:04 -07:00
Uday Bondhugula 58889884a2 Change some of the debug messages to use emitError / emitWarning / emitNote - NFC
PiperOrigin-RevId: 236169676
2019-03-29 16:50:29 -07:00
Uday Bondhugula a003179367 Detect more trivially redundant constraints better
- detect more trivially redundant constraints in
  FlatAffineConstraints::removeTrivialRedundantConstraints. Redundancy due to
  constraints that only differ in the constant part (eg., 32i + 64j - 3 >= 0, 32 +
  64j - 8 >= 0) is now detected.  The method is still linear-time and does
  a single scan over the FlatAffineConstraints buffer. This detection is useful
  and needed to eliminate redundant constraints generated after FM elimination.

- update GCDTightenInequalities so that we also normalize by the GCD while at
  it. This way more constraints will show up as redundant (232i - 203 >= 0
  becomes i - 1 >= 0 instead of 232i - 232 >= 0) without having to call
  normalizeConstraintsByGCD.

- In FourierMotzkinEliminate, call GCDTightenInequalities and
  normalizeConstraintsByGCD before calling removeTrivialRedundantConstraints()
  - so that more redundant constraints are detected. As a result, redundancy
    due to constraints like i - 5 >= 0, i - 7 >= 0, 2i - 5 >= 0, 232i - 203 >=
    0 is now detected (here only i >= 7 is non-redundant).

As a result of these, a -memref-bound-check on the added test case runs in 16ms
instead of 1.35s (opt build) and no longer returns a conservative result.

PiperOrigin-RevId: 235983550
2019-03-29 16:47:59 -07:00
MLIR Team c2766f3760 Fix bug in memref region computation with slice loop bounds. Adds loop IV values to ComputationSliceState which are used in FlatAffineConstraints::addSliceBounds, to ensure that constraints are only added for loop IV values which are present in the constraint system.
PiperOrigin-RevId: 235952912
2019-03-29 16:47:29 -07:00
River Riddle cdbfd48471 Rewrite the dominance info classes to allow for operating on arbitrary control flow within operation regions. The CSE pass is also updated to properly handle nested dominance.
PiperOrigin-RevId: 235742627
2019-03-29 16:43:35 -07:00
Uday Bondhugula a1dad3a5d9 Extend/improve getSliceBounds() / complete TODO + update unionBoundingBox
- compute slices precisely where the destination iteration depends on multiple source
  iterations (instead of over-approximating to the whole source loop extent)
- update unionBoundingBox to deal with input with non-matching symbols
- reenable disabled backend test case

PiperOrigin-RevId: 234714069
2019-03-29 16:33:11 -07:00
Uday Bondhugula 5021dc4fa0 DMA placement update - hoist loops invariant DMAs
- hoist DMAs past all loops immediately surrounding the region that the latter
  is invariant on - do this at DMA generation time itself

PiperOrigin-RevId: 234628447
2019-03-29 16:32:41 -07:00
Uday Bondhugula f97c1c5b06 Misc. updates/fixes to analysis utils used for DMA generation; update DMA
generation pass to make it drop certain assumptions, complete TODOs.

- multiple fixes for getMemoryFootprintBytes
  - pass loopDepth correctly from getMemoryFootprintBytes()
  - use union while computing memory footprints

- bug fixes for addAffineForOpDomain
  - take into account loop step
  - add domains of other loop IVs in turn that might have been used in the bounds

- dma-generate: drop assumption of "non-unit stride loops being tile space loops
  and skipping those and recursing to inner depths"; DMA generation is now purely
  based on available fast mem capacity and memory footprint's calculated

- handle memory region compute failures/bailouts correctly from dma-generate

- loop tiling cleanup/NFC

- update some debug and error messages to use emitNote/emitError in
  pipeline-data-transfer pass - NFC

PiperOrigin-RevId: 234245969
2019-03-29 16:30:26 -07:00
MLIR Team 58aa383e60 Support fusing producer loop nests which write to a memref which is live out, provided that the write region of the consumer loop nest to the same memref is a super set of the producer's write region.
PiperOrigin-RevId: 234240958
2019-03-29 16:30:11 -07:00
MLIR Team 8f5f2c765d LoopFusion: perform a series of loop interchanges to increase the loop depth at which slices of producer loop nests can be fused into constumer loop nests.
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.

PiperOrigin-RevId: 234158370
2019-03-29 16:29:26 -07:00
MLIR Team affb2193cc Update direction vector computation to use FlatAffineConstraints::getLower/UpperBounds.
Update FlatAffineConstraints::getLower/UpperBounds to project to the identifier for which bounds are being computed. This change enables computing bounds on an identifier which were previously dependent on the bounds of another identifier.

PiperOrigin-RevId: 234017514
2019-03-29 16:28:25 -07:00
Uday Bondhugula 00860662a2 Generate dealloc's for alloc's of pipeline-data-transfer
- for the DMA transfers being pipelined through double buffering, generate
  deallocs for the double buffers being alloc'ed

This change is along the lines of cl/233502632. We initially wanted to experiment with
scoped allocation - so the deallocation's were usually not necessary; however, they are
needed even with scoped allocations in some situations - for eg. when the enclosing loop
gets unrolled. The dealloc serves as an end of lifetime marker.

PiperOrigin-RevId: 233653463
2019-03-29 16:25:53 -07:00
Uday Bondhugula 8b3f841daf Generate dealloc's for the alloc's of dma-generate.
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass

Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.

PiperOrigin-RevId: 233502632
2019-03-29 16:24:08 -07:00
Uday Bondhugula f5eed89df0 Fix + cleanup for getMemRefRegion()
- determine symbols for the memref region correctly

- this wasn't exposed earlier since we didn't have any test cases where the
  portion of the nest being DMAed for was non-hyperrectangular (i.e., bounds of
  one IV  depending on other IVs within that part)

PiperOrigin-RevId: 233493872
2019-03-29 16:23:53 -07:00
Uday Bondhugula c419accea3 Automated rollback of changelist 232728977.
PiperOrigin-RevId: 232944889
2019-03-29 16:21:38 -07:00
River Riddle 13a45c7194 Add verification for AffineApply/AffineFor/AffineIf dimension and symbol operands. This also allows a DimOp to be a valid dimension identifier if its operand is a valid dimension identifier.
PiperOrigin-RevId: 232923468
2019-03-29 16:21:08 -07:00
River Riddle a886625813 Modify the canonicalizations of select and muli to use the fold hook.
This also extends the greedy pattern rewrite driver to add the operands of folded operations back to the worklist.

PiperOrigin-RevId: 232878959
2019-03-29 16:20:06 -07:00
Uday Bondhugula 4ba8c9147d Automated rollback of changelist 232717775.
PiperOrigin-RevId: 232807986
2019-03-29 16:19:33 -07:00
River Riddle fd2d7c857b Rename the 'if' operation in the AffineOps dialect to 'affine.if' and namespace
the AffineOps dialect with 'affine'.

PiperOrigin-RevId: 232728977
2019-03-29 16:18:59 -07:00
River Riddle 90d10b4e00 NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect.
PiperOrigin-RevId: 232717775
2019-03-29 16:17:59 -07:00
River Riddle 3227dee15d NFC: Rename affine_apply to affine.apply. This is the first step to adding a namespace to the affine dialect.
PiperOrigin-RevId: 232707862
2019-03-29 16:17:29 -07:00
River Riddle 0c65cf283c Move the AffineFor loop bound folding to a canonicalization pattern on the AffineForOp.
PiperOrigin-RevId: 232610715
2019-03-29 16:16:11 -07:00
River Riddle 10237de8eb Refactor the affine analysis by moving some functionality to IR and some to AffineOps. This is important for allowing the affine dialect to define canonicalizations directly on the operations instead of relying on transformation passes, e.g. ComposeAffineMaps. A summary of the refactoring:
* AffineStructures has moved to IR.

* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.

* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.

* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.

PiperOrigin-RevId: 232586468
2019-03-29 16:15:41 -07:00
MLIR Team a78edcda5b Loop fusion improvements:
*) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion.
*) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check.

PiperOrigin-RevId: 232477853
2019-03-29 16:13:50 -07:00
Uday Bondhugula b26900dce5 Update dma-generate pass to (1) work on blocks of instructions (instead of just
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors

- change dma-generate pass to work on blocks of instructions (start/end
  iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
  straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
  in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
  generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
  capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
  option (besides command line flag)

- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
  replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods

Eg. output

$ mlir-opt  -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
            ^

$ mlir-opt -debug-only=dma-generate  -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block

        for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {

PiperOrigin-RevId: 232297044
2019-03-29 16:09:52 -07:00
River Riddle 5052bd8582 Define the AffineForOp and replace ForInst with it. This patch is largely mechanical, i.e. changing usages of ForInst to OpPointer<AffineForOp>. An important difference is that upon construction an AffineForOp no longer automatically creates the body and induction variable. To generate the body/iv, 'createBody' can be called on an AffineForOp with no body.
PiperOrigin-RevId: 232060516
2019-03-29 16:06:49 -07:00
MLIR Team d7c824451f LoopFusion: insert the source loop nest slice at a depth in the destination loop nest which preserves dependences (above any loop carried or other dependences). This is accomplished by updating the maximum destination loop depth based on dependence checks between source loop nest loads and stores which access the memref on which the source loop nest has a store op. In addition, prevent fusing in source loop nests which write to memrefs which escape or are live out.
PiperOrigin-RevId: 231684492
2019-03-29 16:03:23 -07:00
River Riddle a642bb1779 Update tests using affine maps to not rely on specific map numbers in the output IR. This is necessary to remove the dependency on ForInst not numbering the AffineMap bounds it has custom formatting for.
PiperOrigin-RevId: 231634812
2019-03-29 16:03:08 -07:00
River Riddle b6928c945c Standardize the spelling of debug info to "debuginfo" in opt flags.
PiperOrigin-RevId: 231610337
2019-03-29 16:02:38 -07:00
River Riddle 994111238b Fold CallIndirectOp to CallOp when the callee operand is a known constant function.
PiperOrigin-RevId: 231511697
2019-03-29 16:01:23 -07:00
MLIR Team a0f3db4024 Support fusing loop nests which require insertion into a new instruction Block position while preserving dependences, opening up additional fusion opportunities.
- Adds SSA Value edges to the data dependence graph used in the loop fusion pass.

PiperOrigin-RevId: 231417649
2019-03-29 16:00:04 -07:00
River Riddle 755538328b Recommit: Define a AffineOps dialect as well as an AffineIfOp operation. Replace all instances of IfInst with AffineIfOp and delete IfInst.
PiperOrigin-RevId: 231342063
2019-03-29 15:59:30 -07:00
Nicolas Vasilache ae772b7965 Automated rollback of changelist 231318632.
PiperOrigin-RevId: 231327161
2019-03-29 15:42:38 -07:00