Commit Graph

21 Commits

Author SHA1 Message Date
Uday Bondhugula b9f53dc0bd Update/Fix LoopUtils::stmtBodySkew to handle loop step.
- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
  print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)

PiperOrigin-RevId: 224892958
2019-03-29 14:25:07 -07:00
Uday Bondhugula d59a95a05c Fix missing check for dependent DMAs in pipeline-data-transfer
- adding a conservative check for now (TODO: use the dependence analysis pass
  once the latter is extended to deal with DMA ops). resolve an existing bug on
  a test case.

- update test cases

PiperOrigin-RevId: 224869526
2019-03-29 14:24:53 -07:00
Uday Bondhugula 2ef57806ba Update/fix -pipeline-data-transfer; fix b/120770946
- fix replaceAllMemRefUsesWith call to replace only inside loop body.
- handle the case where DMA buffers are dynamic; extend doubleBuffer() method
  to handle dynamically shaped DMA buffers (pass the right operands to AllocOp)
- place alloc's for DMA buffers at the depth at which pipelining is being done
  (instead of at top-level)
- add more test cases

PiperOrigin-RevId: 224852231
2019-03-29 14:24:22 -07:00
Uday Bondhugula b6c03917ad Remove allocations for memref's that become dead as a result of double
buffering in the auto DMA overlap pass.

This is done online in the pass.

PiperOrigin-RevId: 222313640
2019-03-29 14:05:19 -07:00
Uday Bondhugula fff1efbaf5 Updates to transformation/analysis passes/utilities. Update DMA generation pass
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.

- add support for getMemRefRegion symbolic in outer loops - hence support for
  DMAs symbolic in outer surrounding loops.

- add DMA generation support for outgoing DMAs (store op's to lower memory
  space); extend getMemoryRegion to store op's. -memref-bound-check now works
  with store op's as well.

- fix dma-generate (references to the old memref in the dma_start op were also
  being replaced with the new buffer); we need replace all memref uses to work
  only on a subset of the uses - add a new optional argument for
  replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
  'operation' argument to serve as a filter - if provided, only those uses that
  are dominated by the filter are replaced.

- Add missing print for attributes for dma_start, dma_wait op's.

- update the FlatAffineConstraints API

PiperOrigin-RevId: 221889223
2019-03-29 14:00:51 -07:00
Jacques Pienaar cc9a6ed09d Initialize Pass with PassID.
The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo.

PiperOrigin-RevId: 220485662
2019-03-29 13:50:34 -07:00
Jacques Pienaar 6f0fb22723 Add static pass registration
Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage.

Change build targets to alwayslink to enforce registration.

PiperOrigin-RevId: 220390178
2019-03-29 13:49:34 -07:00
Uday Bondhugula 8201e19e3d Introduce memref bound checking.
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:

$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir

/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
  %0 = alloc() : memref<9x9xi32>
  %1 = alloc() : memref<128xi32>
  for %i0 = -1 to 9 {
    for %i1 = -1 to 9 {
      %2 = affine_apply #map0(%i0, %i1)
      %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
      %4 = affine_apply #map1(%i0, %i1)
      %5 = load %1[%4] : memref<128xi32>
    }
  }
  return
}

- Improves productivity while manually / semi-automatically developing MLIR for
  testing / prototyping; also provides an indirect way to catch errors in
  transformations.

- This pass is an easy way to test the underlying affine analysis
  machinery including low level routines.

Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.

While on this:

- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/

- fix a bug in AffineAnalysis.cpp::toAffineExpr

TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
2019-03-29 13:46:08 -07:00
River Riddle 4c465a181d Implement value type abstraction for types.
This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.

PiperOrigin-RevId: 219372163
2019-03-29 13:45:54 -07:00
Chris Lattner adbba70d82 Simplify FunctionPass to eliminate the CFGFunctionPass/MLFunctionPass
distinction.  FunctionPasses can now choose to get called on all functions, or
have the driver split CFG/ML Functions up for them.  NFC.

PiperOrigin-RevId: 218775885
2019-03-29 13:40:05 -07:00
Uday Bondhugula ccfe593715 PassResult return cleanup.
- return success as long as IR is in a valid state.

PiperOrigin-RevId: 218225317
2019-03-29 13:35:47 -07:00
Chris Lattner 7850258c49 Introduce a new Operation::erase helper to generalize some code in
the pattern matcher / canonicalizer, and rename existing eraseFromBlock methods
to align with it.

PiperOrigin-RevId: 218104455
2019-03-29 13:34:51 -07:00
Feng Liu 34927e2474 Rename Operation::getAs to Operation::dyn_cast
Also rename Operation::is to Operation::isa
Introduce Operation::cast

All of these are for consistency with global dyn_cast/cast/isa operators.

PiperOrigin-RevId: 217878786
2019-03-29 13:33:41 -07:00
Uday Bondhugula 18e666702c Generalize / improve DMA transfer overlap; nested and multiple DMA support; resolve
multiple TODOs.

- replace the fake test pass (that worked on just the first loop in the
  MLFunction) to perform DMA pipelining on all suitable loops.
- nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops)
- fix bugs / assumptions: correctly copy memory space and elemental type of source
  memref for double buffering.
- correctly identify matching start/finish statements, handle multiple DMAs per
  loop.
- introduce dominates/properlyDominates utitilies for MLFunction statements.
- move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it
  getShiftValidity
- refactor getContainingStmtPos -> findAncestorStmtInBlock - move into
  Analysis/Utils.h; has two users.
- other improvements / cleanup for related API/utilities
- add size argument to dma_wait - for nested DMAs or in general, it makes it
  easy to obtain the size to use when lowering the dma_wait since we wouldn't
  want to identify the matching dma_start, and more importantly, in general/in the
  future, there may not always be a dma_start dominating the dma_wait.
- add debug information in the pass

PiperOrigin-RevId: 217734892
2019-03-29 13:32:28 -07:00
Uday Bondhugula 86eac4618c Create private exclusive / single use affine computation slice for an op stmt.
- add util to create a private / exclusive / single use affine
  computation slice for an op stmt (see method doc comment); a single
  multi-result affine_apply op is prepended to the op stmt to provide all
  results needed for its operands as a function of loop iterators and symbols.
- use it for DMA pipelining (to create private slices for DMA start stmt's);
  resolve TODOs/feature request (b/117159533)
- move createComposedAffineApplyOp to Transforms/Utils; free it from taking a
  memref as input / generalize it.

PiperOrigin-RevId: 216926818
2019-03-29 13:29:21 -07:00
Jacques Pienaar 764fd035b0 Split BuiltinOps out of StandardOps.
* Move Return, Constant and AffineApply out into BuiltinOps;
* BuiltinOps are always registered, while StandardOps follow the same dynamic registration;
* Kept isValidX in MLValue as we don't have a verify on AffineMap so need to keep it callable from Parser (I wanted to move it to be called in verify instead);

PiperOrigin-RevId: 216592527
2019-03-29 13:28:12 -07:00
Nicolas Vasilache 1d3e7e2616 [MLIR] AffineMap value type
This CL applies the same pattern as AffineExpr to AffineMap: a simple struct
that acts as the storage is allocated in the bump pointer. The AffineMap is
immutable and accessed everywhere by value.

PiperOrigin-RevId: 216445930
2019-03-29 13:26:24 -07:00
Uday Bondhugula 82e55750d2 Add target independent standard DMA ops: dma.start, dma.wait
Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline
data transfer to use these to detect DMA ops.

While on this
- return failure from mlir-opt::performActions if a pass generates invalid output
- improve error message for verify 'n' operand traits

PiperOrigin-RevId: 216429885
2019-03-29 13:26:10 -07:00
Nicolas Vasilache ce2edea135 [MLIR] Cleanup AffineExpr
This CL introduces a series of cleanups for AffineExpr value types:
1. to make it clear that the value types should be used, the pointer
AffineExpr types are put in the detail namespace. Unfortunately, since the
value type operator-> only forwards to the underlying pointer type, we
still
need to expose this in the include file for now;
2. AffineExprKind is ok to use, it thus comes out of detail and thus of
AffineExpr
3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are
similarly
extracted as free functions and their naming is mande consistent across
Builder, MLContext and AffineExpr
4. AffineBinaryOpEx::simplify functions are made into static free
functions.
In particular it is moved away from AffineMap.cpp where it does not belong
5. operator AffineExprType is made explicit
6. uses the binary operators everywhere possible
7. drops the pointer usage everywhere outside of AffineExpr.cpp,
MLIRContext.cpp and AsmPrinter.cpp

PiperOrigin-RevId: 216207212
2019-03-29 13:24:45 -07:00
Uday Bondhugula 6cfdb756b1 Introduce memref replacement/rewrite support: to replace an existing memref
with a new one (of a potentially different rank/shape) with an optional index
remapping.

- introduce Utils::replaceAllMemRefUsesWith
- use this for DMA double buffering

(This CL also adds a few temporary utilities / code that will be done away with
once:
1) abstract DMA op's are added
2) memref deferencing side-effect / trait is available on op's
3) b/117159533 is resolved (memref index computation slices).
PiperOrigin-RevId: 215831373
2019-03-29 13:23:19 -07:00
Uday Bondhugula 041817a45e Introduce loop body skewing / loop pipelining / loop shifting utility.
- loopBodySkew shifts statements of a loop body by stmt-wise delays, and is
  typically meant to be used to:
  - allow overlap of non-blocking start/wait until completion operations with
    other computation
  - allow shifting of statements (for better register
    reuse/locality/parallelism)
  - software pipelining (when applied to the innermost loop)
- an additional argument specifies whether to unroll the prologue and epilogue.
- add method to check SSA dominance preservation.
- add a fake loop pipeline pass to test this utility.

Sample input/output are below. While on this, fix/add following:

- fix minor bug in getAddMulPureAffineExpr
- add additional builder methods for common affine map cases
- fix const_operand_iterator's for ForStmt, etc. When there is no such thing
  as 'const MLValue', the iterator shouldn't be returning const MLValue's.
  Returning MLValue is const correct.

Sample input/output examples:

1) Simplest case: shift second statement by one.

Input:

for %i = 0 to 7 {
  %y = "foo"(%i) : (affineint) -> affineint
  %x = "bar"(%i) : (affineint) -> affineint
}

Output:

#map0 = (d0) -> (d0 - 1)
mlfunc @loop_nest_simple1() {
  %c8 = constant 8 : affineint
  %c0 = constant 0 : affineint
  %0 = "foo"(%c0) : (affineint) -> affineint
  for %i0 = 1 to 7 {
    %1 = "foo"(%i0) : (affineint) -> affineint
    %2 = affine_apply #map0(%i0)
    %3 = "bar"(%2) : (affineint) -> affineint
  }
  %4 = affine_apply #map0(%c8)
  %5 = "bar"(%4) : (affineint) -> affineint
  return
}

2) DMA overlap: shift dma.wait and compute by one.

Input
  for %i = 0 to 7 {
    %pingpong = affine_apply (d0) -> (d0 mod 2) (%i)
    "dma.enqueue"(%pingpong) : (affineint) -> affineint
    %pongping = affine_apply (d0) -> (d0 mod 2) (%i)
    "dma.wait"(%pongping) : (affineint) -> affineint
    "compute1"(%pongping) : (affineint) -> affineint
  }

Output

#map0 = (d0) -> (d0 mod 2)
#map1 = (d0) -> (d0 - 1)
#map2 = ()[s0] -> (s0 + 7)
mlfunc @loop_nest_dma() {
  %c8 = constant 8 : affineint
  %c0 = constant 0 : affineint
  %0 = affine_apply #map0(%c0)
  %1 = "dma.enqueue"(%0) : (affineint) -> affineint
  for %i0 = 1 to 7 {
    %2 = affine_apply #map0(%i0)
    %3 = "dma.enqueue"(%2) : (affineint) -> affineint
    %4 = affine_apply #map1(%i0)
    %5 = affine_apply #map0(%4)
    %6 = "dma.wait"(%5) : (affineint) -> affineint
    %7 = "compute1"(%5) : (affineint) -> affineint
  }
  %8 = affine_apply #map1(%c8)
  %9 = affine_apply #map0(%8)
  %10 = "dma.wait"(%9) : (affineint) -> affineint
  %11 = "compute1"(%9) : (affineint) -> affineint
  return
}

3) With arbitrary affine bound maps:

Shift last two statements by two.

Input:

  for %i = %N to ()[s0] -> (s0 + 7)()[%N] {
    %y = "foo"(%i) : (affineint) -> affineint
    %x = "bar"(%i) : (affineint) -> affineint
    %z = "foo_bar"(%i) : (affineint) -> (affineint)
    "bar_foo"(%i) : (affineint) -> (affineint)
  }

Output

#map0 = ()[s0] -> (s0 + 1)
#map1 = ()[s0] -> (s0 + 2)
#map2 = ()[s0] -> (s0 + 7)
#map3 = (d0) -> (d0 - 2)
#map4 = ()[s0] -> (s0 + 8)
#map5 = ()[s0] -> (s0 + 9)

  for %i0 = %arg0 to #map0()[%arg0] {
    %0 = "foo"(%i0) : (affineint) -> affineint
    %1 = "bar"(%i0) : (affineint) -> affineint
  }
  for %i1 = #map1()[%arg0] to #map2()[%arg0] {
    %2 = "foo"(%i1) : (affineint) -> affineint
    %3 = "bar"(%i1) : (affineint) -> affineint
    %4 = affine_apply #map3(%i1)
    %5 = "foo_bar"(%4) : (affineint) -> affineint
    %6 = "bar_foo"(%4) : (affineint) -> affineint
  }
  for %i2 = #map4()[%arg0] to #map5()[%arg0] {
    %7 = affine_apply #map3(%i2)
    %8 = "foo_bar"(%7) : (affineint) -> affineint
    %9 = "bar_foo"(%7) : (affineint) -> affineint
  }

4) Shift one by zero, second by one, third by two

  for %i = 0 to 7 {
    %y = "foo"(%i) : (affineint) -> affineint
    %x = "bar"(%i) : (affineint) -> affineint
    %z = "foobar"(%i) : (affineint) -> affineint
  }

#map0 = (d0) -> (d0 - 1)
#map1 = (d0) -> (d0 - 2)
#map2 = ()[s0] -> (s0 + 7)

  %c9 = constant 9 : affineint
  %c8 = constant 8 : affineint
  %c1 = constant 1 : affineint
  %c0 = constant 0 : affineint
  %0 = "foo"(%c0) : (affineint) -> affineint
  %1 = "foo"(%c1) : (affineint) -> affineint
  %2 = affine_apply #map0(%c1)
  %3 = "bar"(%2) : (affineint) -> affineint
  for %i0 = 2 to 7 {
    %4 = "foo"(%i0) : (affineint) -> affineint
    %5 = affine_apply #map0(%i0)
    %6 = "bar"(%5) : (affineint) -> affineint
    %7 = affine_apply #map1(%i0)
    %8 = "foobar"(%7) : (affineint) -> affineint
  }
  %9 = affine_apply #map0(%c8)
  %10 = "bar"(%9) : (affineint) -> affineint
  %11 = affine_apply #map1(%c8)
  %12 = "foobar"(%11) : (affineint) -> affineint
  %13 = affine_apply #map1(%c9)
  %14 = "foobar"(%13) : (affineint) -> affineint

5) SSA dominance violated; no shifting if a shift is specified for the second
statement.

  for %i = 0 to 7 {
    %x = "foo"(%i) : (affineint) -> affineint
    "bar"(%x) : (affineint) -> affineint
  }

PiperOrigin-RevId: 214975731
2019-03-29 13:21:26 -07:00