Commit Graph

377 Commits

Author SHA1 Message Date
MLIR Team fe490043b0 Affine map composition.
*) Implements AffineValueMap forward substitution for AffineApplyOps.
*) Adds ComposeAffineMaps transformation pass, which composes affine maps for all loads/stores in an MLFunction.
*) Adds multiple affine map composition tests.

PiperOrigin-RevId: 216216446
2019-03-29 13:24:59 -07:00
Chris Lattner d2d89cbc19 Rename affineint type to index type. The name 'index' may not be perfect, but is better than the old name. Here is some justification:
1) affineint (as it is named) is not a type suitable for general computation (e.g. the multiply/adds in an integer matmul).  It has undefined width and is undefined on overflow.  They are used as the indices for forstmt because they are intended to be used as indexes inside the loop.

2) It can be used in both cfg and ml functions, and in cfg functions.  As you mention, “symbols” are not affine, and we use affineint values for symbols.

3) Integers aren’t affine, the algorithms applied to them can be. :)

4) The only suitable use for affineint in MLIR is for indexes and dimension sizes (i.e. the bounds of those indexes).

PiperOrigin-RevId: 216057974
2019-03-29 13:24:16 -07:00
Uday Bondhugula d18ae9e2c7 Constant folding for loop bounds.
- Fold the lower/upper bound of a loop to a constant whenever the result of the
  application of the bound's affine map on the operand list yields a constant.

- Update/complete 'for' stmt's API to set lower/upper bounds with operands.
  Resolve TODOs for ForStmt::set{Lower,Upper}Bound.

- Moved AffineExprConstantFolder into AffineMap.cpp and added
  AffineMap::constantFold to be used by both AffineApplyOp and
  ForStmt::constantFoldBound.

PiperOrigin-RevId: 215997346
2019-03-29 13:24:01 -07:00
Chris Lattner 6822c4e29c Implement support for constant folding operations even when their operands are
not all constant.  Implement support for folding dim, x*0, and affine_apply.

PiperOrigin-RevId: 215917432
2019-03-29 13:23:32 -07:00
Uday Bondhugula 6cfdb756b1 Introduce memref replacement/rewrite support: to replace an existing memref
with a new one (of a potentially different rank/shape) with an optional index
remapping.

- introduce Utils::replaceAllMemRefUsesWith
- use this for DMA double buffering

(This CL also adds a few temporary utilities / code that will be done away with
once:
1) abstract DMA op's are added
2) memref deferencing side-effect / trait is available on op's
3) b/117159533 is resolved (memref index computation slices).
PiperOrigin-RevId: 215831373
2019-03-29 13:23:19 -07:00
Feng Liu 7d016fd352 Add support to Add, Sub, Mul for both Integer and Float types.
The new operations are registered and also the const folding of them are implemented.

PiperOrigin-RevId: 215575999
2019-03-29 13:21:40 -07:00
Uday Bondhugula 041817a45e Introduce loop body skewing / loop pipelining / loop shifting utility.
- loopBodySkew shifts statements of a loop body by stmt-wise delays, and is
  typically meant to be used to:
  - allow overlap of non-blocking start/wait until completion operations with
    other computation
  - allow shifting of statements (for better register
    reuse/locality/parallelism)
  - software pipelining (when applied to the innermost loop)
- an additional argument specifies whether to unroll the prologue and epilogue.
- add method to check SSA dominance preservation.
- add a fake loop pipeline pass to test this utility.

Sample input/output are below. While on this, fix/add following:

- fix minor bug in getAddMulPureAffineExpr
- add additional builder methods for common affine map cases
- fix const_operand_iterator's for ForStmt, etc. When there is no such thing
  as 'const MLValue', the iterator shouldn't be returning const MLValue's.
  Returning MLValue is const correct.

Sample input/output examples:

1) Simplest case: shift second statement by one.

Input:

for %i = 0 to 7 {
  %y = "foo"(%i) : (affineint) -> affineint
  %x = "bar"(%i) : (affineint) -> affineint
}

Output:

#map0 = (d0) -> (d0 - 1)
mlfunc @loop_nest_simple1() {
  %c8 = constant 8 : affineint
  %c0 = constant 0 : affineint
  %0 = "foo"(%c0) : (affineint) -> affineint
  for %i0 = 1 to 7 {
    %1 = "foo"(%i0) : (affineint) -> affineint
    %2 = affine_apply #map0(%i0)
    %3 = "bar"(%2) : (affineint) -> affineint
  }
  %4 = affine_apply #map0(%c8)
  %5 = "bar"(%4) : (affineint) -> affineint
  return
}

2) DMA overlap: shift dma.wait and compute by one.

Input
  for %i = 0 to 7 {
    %pingpong = affine_apply (d0) -> (d0 mod 2) (%i)
    "dma.enqueue"(%pingpong) : (affineint) -> affineint
    %pongping = affine_apply (d0) -> (d0 mod 2) (%i)
    "dma.wait"(%pongping) : (affineint) -> affineint
    "compute1"(%pongping) : (affineint) -> affineint
  }

Output

#map0 = (d0) -> (d0 mod 2)
#map1 = (d0) -> (d0 - 1)
#map2 = ()[s0] -> (s0 + 7)
mlfunc @loop_nest_dma() {
  %c8 = constant 8 : affineint
  %c0 = constant 0 : affineint
  %0 = affine_apply #map0(%c0)
  %1 = "dma.enqueue"(%0) : (affineint) -> affineint
  for %i0 = 1 to 7 {
    %2 = affine_apply #map0(%i0)
    %3 = "dma.enqueue"(%2) : (affineint) -> affineint
    %4 = affine_apply #map1(%i0)
    %5 = affine_apply #map0(%4)
    %6 = "dma.wait"(%5) : (affineint) -> affineint
    %7 = "compute1"(%5) : (affineint) -> affineint
  }
  %8 = affine_apply #map1(%c8)
  %9 = affine_apply #map0(%8)
  %10 = "dma.wait"(%9) : (affineint) -> affineint
  %11 = "compute1"(%9) : (affineint) -> affineint
  return
}

3) With arbitrary affine bound maps:

Shift last two statements by two.

Input:

  for %i = %N to ()[s0] -> (s0 + 7)()[%N] {
    %y = "foo"(%i) : (affineint) -> affineint
    %x = "bar"(%i) : (affineint) -> affineint
    %z = "foo_bar"(%i) : (affineint) -> (affineint)
    "bar_foo"(%i) : (affineint) -> (affineint)
  }

Output

#map0 = ()[s0] -> (s0 + 1)
#map1 = ()[s0] -> (s0 + 2)
#map2 = ()[s0] -> (s0 + 7)
#map3 = (d0) -> (d0 - 2)
#map4 = ()[s0] -> (s0 + 8)
#map5 = ()[s0] -> (s0 + 9)

  for %i0 = %arg0 to #map0()[%arg0] {
    %0 = "foo"(%i0) : (affineint) -> affineint
    %1 = "bar"(%i0) : (affineint) -> affineint
  }
  for %i1 = #map1()[%arg0] to #map2()[%arg0] {
    %2 = "foo"(%i1) : (affineint) -> affineint
    %3 = "bar"(%i1) : (affineint) -> affineint
    %4 = affine_apply #map3(%i1)
    %5 = "foo_bar"(%4) : (affineint) -> affineint
    %6 = "bar_foo"(%4) : (affineint) -> affineint
  }
  for %i2 = #map4()[%arg0] to #map5()[%arg0] {
    %7 = affine_apply #map3(%i2)
    %8 = "foo_bar"(%7) : (affineint) -> affineint
    %9 = "bar_foo"(%7) : (affineint) -> affineint
  }

4) Shift one by zero, second by one, third by two

  for %i = 0 to 7 {
    %y = "foo"(%i) : (affineint) -> affineint
    %x = "bar"(%i) : (affineint) -> affineint
    %z = "foobar"(%i) : (affineint) -> affineint
  }

#map0 = (d0) -> (d0 - 1)
#map1 = (d0) -> (d0 - 2)
#map2 = ()[s0] -> (s0 + 7)

  %c9 = constant 9 : affineint
  %c8 = constant 8 : affineint
  %c1 = constant 1 : affineint
  %c0 = constant 0 : affineint
  %0 = "foo"(%c0) : (affineint) -> affineint
  %1 = "foo"(%c1) : (affineint) -> affineint
  %2 = affine_apply #map0(%c1)
  %3 = "bar"(%2) : (affineint) -> affineint
  for %i0 = 2 to 7 {
    %4 = "foo"(%i0) : (affineint) -> affineint
    %5 = affine_apply #map0(%i0)
    %6 = "bar"(%5) : (affineint) -> affineint
    %7 = affine_apply #map1(%i0)
    %8 = "foobar"(%7) : (affineint) -> affineint
  }
  %9 = affine_apply #map0(%c8)
  %10 = "bar"(%9) : (affineint) -> affineint
  %11 = affine_apply #map1(%c8)
  %12 = "foobar"(%11) : (affineint) -> affineint
  %13 = affine_apply #map1(%c9)
  %14 = "foobar"(%13) : (affineint) -> affineint

5) SSA dominance violated; no shifting if a shift is specified for the second
statement.

  for %i = 0 to 7 {
    %x = "foo"(%i) : (affineint) -> affineint
    "bar"(%x) : (affineint) -> affineint
  }

PiperOrigin-RevId: 214975731
2019-03-29 13:21:26 -07:00
Uday Bondhugula 591fa9698e Change behavior of loopUnrollFull with unroll factor 1
Using loopUnrollFull with unroll factor 1 should promote the loop body as
opposed to doing nothing.

PiperOrigin-RevId: 214812126
2019-03-29 13:20:59 -07:00
Nicolas Vasilache 54e5b4b4c0 [MLIR] Fix AsmPrinter for short-hand bound notation
This CL retricts shorthand notation printing to only the bounds that can
be roundtripped unambiguously; i.e.:
1. ()[]->(%some_cst) ()[]
2. ()[s0]->(s0) ()[%some_symbol]

Upon inspection it turns out that the constant case was lossy so this CL also
updates it.

Note however that fixing this issue exhibits a potential issues in unroll.mlir.
L488 exhibits a map ()[s0] -> (1)()[%arg0] which could be simplified down to
()[]->(1)()[].
This does not seem like a bug but maybe an undesired complexity in the maps
generated by unrolling.
bondhugula@, care to take a look?

PiperOrigin-RevId: 214531410
2019-03-29 13:19:04 -07:00
MLIR Team 99188b9d98 Adds constant folding hook for AffineApplyOp.
PiperOrigin-RevId: 214287780
2019-03-29 13:18:19 -07:00
Chris Lattner 82eb284a53 Implement support for constant folding operations and a simple constant folding
optimization pass:

 - Give the ability for operations to implement a constantFold hook (a simple
   one for single-result ops as well as general support for multi-result ops).
 - Implement folding support for constant and addf.
 - Implement support in AbstractOperation and Operation to make this usable by
   clients.
 - Implement a very simple constant folding pass that does top down folding on
   CFG and ML functions, with a testcase that exercises all the above stuff.

Random cleanups:
 - Improve the build APIs for ConstantOp.
 - Stop passing "-o -" to mlir-opt in the testsuite, since that is the default.

PiperOrigin-RevId: 213749809
2019-03-29 13:16:33 -07:00
Uday Bondhugula ab4797229c Extend loop unroll/unroll-and-jam to affine bounds + refactor related code.
- extend loop unroll-jam similar to loop unroll for affine bounds
- extend both loop unroll/unroll-jam to deal with cleanup loop for non multiple
  of unroll factor.
- extend promotion of single iteration loops to work with affine bounds
- fix typo bugs in loop unroll
- refactor common code b/w loop unroll and loop unroll-jam
- move prototypes of non-pass transforms to LoopUtils.h
- add additional builder methods.
- introduce loopUnrollUpTo(factor) to unroll by either factor or trip count,
  whichever is less.
- remove Statement::isInnermost (not used for now - will come back at the right
  place/in right form later)

PiperOrigin-RevId: 213471227
2019-03-29 13:15:06 -07:00
Uday Bondhugula 64812a56c7 Extend getConstantTripCount to deal with a larger subset of loop bounds; make loop
unroll/unroll-and-jam more powerful; add additional affine expr builder methods

- use previously added analysis/simplification to infer multiple of unroll
  factor trip counts, making loop unroll/unroll-and-jam more general.

- for loop unroll, support bounds that are single result affine map's with the
  same set of operands. For unknown loop bounds, loop unroll will now work as
  long as trip count can be determined to be a multiple of unroll factor.

- extend getConstantTripCount to deal with single result affine map's with the
  same operands. move it to mlir/Analysis/LoopAnalysis.cpp

- add additional builder utility methods for affine expr arithmetic
  (difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to
  use the utility methods.

- move affine analysis routines to AffineAnalysis.cpp/.h from
  AffineStructures.cpp/.h.

- Rename LoopUnrollJam to LoopUnrollAndJam to match class name.

- add an additional simplification for simplifyFloorDiv, simplifyCeilDiv

- Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does
  not have operands. Operands are passed to it through affine_apply, from loop
  bounds/if condition's, etc., operands are stored in the latter.

This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU
code generation, and can move to other analyses/transformations.

Loop nests like these are now unrolled without any cleanup loop being generated.

  for %i = 1 to 100 {
    // unroll factor 4: no cleanup loop will be generated.
    for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) {
      %x = "foo"(%j) : (affineint) -> i32
    }
  }

  for %i = 1 to 100 {
    // unroll factor 4: no cleanup loop will be generated.
    for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) {
      %y = "foo"(%j) : (affineint) -> i32
    }
  }

  for %i = 1 to 100 {
    for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) {
      %x = "foo"() : () -> i32
    }
  }

TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor
changes).

PiperOrigin-RevId: 212661212
2019-03-29 13:13:00 -07:00
Uday Bondhugula 3bae041e5d Add utility to promote single iteration loops. Add methods for getting constant
loop counts. Improve / refactor loop unroll / loop unroll and jam.

- add utility to remove single iteration loops.
- use this utility to promote single iteration loops after unroll/unroll-and-jam
- use loopUnrollByFactor for loopUnrollFull and remove most of the latter.
- add methods for getting constant loop trip count

PiperOrigin-RevId: 212039569
2019-03-29 13:11:21 -07:00
Uday Bondhugula d5416f299e Complete AffineExprFlattener based simplification for floordiv/ceildiv.
- handle floordiv/ceildiv in AffineExprFlattener; update the simplification to
  work even if mod/floordiv/ceildiv expressions appearing in the tree can't be eliminated.
- refactor the flattening / analysis to move it out of lib/Transforms/
- fix MutableAffineMap::isMultipleOf
- add AffineBinaryOpExpr:getAdd/getMul/... utility methods

PiperOrigin-RevId: 211540536
2019-03-29 13:09:18 -07:00
Uday Bondhugula 0122a99cbb Affine expression analysis and simplification.
Outside of IR/
- simplify a MutableAffineMap by flattening the affine expressions
- add a simplify affine expression pass that uses this analysis
- update the FlatAffineConstraints API (to be used in the next CL)

In IR:
- add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR
  simplication of simplifyMod simpler and more powerful.
- rename the AffineExpr visitor methods to distinguish b/w visiting and
  walking, and to simplify API names based on context.

The next CL will use some of these for the loop unrolling/unroll-jam to make
the detection for the need of cleanup loop powerful/non-trivial.

A future CL will finally move this simplification to FlatAffineConstraints to
make it more powerful. For eg., currently, even if a mod expr appearing in a
part of the expression tree can't be simplified, the whole thing won't be
simplified.

PiperOrigin-RevId: 211012256
2019-03-29 13:07:44 -07:00
Uday Bondhugula e9fb4b492d Introduce loop unroll jam transformation.
- for test purposes, the unroll-jam pass unroll jams the first outermost loop.

While on this:
- fix StmtVisitor to allow overriding of function to iterate walk over children
  of a stmt.

PiperOrigin-RevId: 210644813
2019-03-29 13:07:30 -07:00
Uday Bondhugula 00bed4bd99 Extend loop unrolling to unroll by a given factor; add builder for affine
apply op.

- add builder for AffineApplyOp (first one for an operation that has
  non-zero operands)
- add support for loop unrolling by a given factor; uses the affine apply op
  builder.

While on this, change 'step' of ForStmt to be 'unsigned' instead of
AffineConstantExpr *. Add setters for ForStmt lb, ub, step.

Sample Input:

// CHECK-LABEL: mlfunc @loop_nest_unroll_cleanup() {
mlfunc @loop_nest_unroll_cleanup() {
  for %i = 1 to 100 {
    for %j = 0 to 17 {
      %x = "addi32"(%j, %j) : (affineint, affineint) -> i32
      %y = "addi32"(%x, %x) : (i32, i32) -> i32
    }
  }
  return
}

Output:

$ mlir-opt -loop-unroll -unroll-factor=4 /tmp/single2.mlir
#map0 = (d0) -> (d0 + 1)
#map1 = (d0) -> (d0 + 2)
#map2 = (d0) -> (d0 + 3)
mlfunc @loop_nest_unroll_cleanup() {
  for %i0 = 1 to 100 {
    for %i1 = 0 to 17 step 4 {
      %0 = "addi32"(%i1, %i1) : (affineint, affineint) -> i32
      %1 = "addi32"(%0, %0) : (i32, i32) -> i32
      %2 = affine_apply #map0(%i1)
      %3 = "addi32"(%2, %2) : (affineint, affineint) -> i32
      %4 = affine_apply #map1(%i1)
      %5 = "addi32"(%4, %4) : (affineint, affineint) -> i32
      %6 = affine_apply #map2(%i1)
      %7 = "addi32"(%6, %6) : (affineint, affineint) -> i32
    }
    for %i2 = 16 to 17 {
      %8 = "addi32"(%i2, %i2) : (affineint, affineint) -> i32
      %9 = "addi32"(%8, %8) : (i32, i32) -> i32
    }
  }
  return
}

PiperOrigin-RevId: 209676220
2019-03-29 13:03:38 -07:00
Uday Bondhugula 98a24881d3 ShortLoopUnroll - bug fix.
Collect loops through a post order walk instead of a pre-order so that loops
are collected from inner loops are collected before outer surrounding ones.

Add a complex test case.

PiperOrigin-RevId: 209041057
2019-03-29 13:01:22 -07:00
Chris Lattner d6c4c748d7 Escape and unescape strings in the parser and printer so they can roundtrip,
print floating point in a structured form that we know can round trip,
enumerate attributes in the visitor so we print affine mapping attributes
symbolically (the majority of the testcase updates).

We still have an issue where the hexadecimal floating point syntax is reparsed
as an integer, but that can evolve in subsequent patches.

PiperOrigin-RevId: 208828876
2019-03-29 13:00:05 -07:00
Uday Bondhugula d8490d8d4f Loop unrolling pass update
- fix/complete forStmt cloning for unrolling to work for outer loops
- create IV const's only when needed
- test outer loop unrolling by creating a short trip count unroll pass for
  loops with trip counts <= <parameter>
- add unrolling test cases for multiple op results, outer loop unrolling
- fix/clean up StmtWalker class while on this
- switch unroll loop iterator values from i32 to affineint

PiperOrigin-RevId: 207645967
2019-03-29 12:56:16 -07:00
Tatiana Shpeisman a0a6414ca2 Implement ML function arguments. Add representation for argument list in ML Function using TrailingObjects template. Implement argument iterators, parsing and printing.
Unrelated minor change - remove OperationStmt::dropReferences(). Since MLFunction does not have cyclic operand references (it's an AST) destruction can be safely done w/o a special pass to drop references.

PiperOrigin-RevId: 207583024
2019-03-29 12:55:47 -07:00
Uday Bondhugula 65b6e73245 Loop unrolling update.
- deal with non-operation stmt's (if/for stmt's) in loops being unrolled
  (unrolling of non-innermost loops works).
- update uses in unrolled bodies to use results of new operations that may be
  introduced in the unrolled bodies.

Unrolling now works for all kinds of loop nests - perfect nests, imperfect
nests, loops at any depth, and with any kind of operation in the body. (IfStmt
support not done, hence untested there).

Added missing dump/print method for StmtBlock.

TODO: add test case for outer loop unrolling.
PiperOrigin-RevId: 207314286
2019-03-29 12:55:19 -07:00
Uday Bondhugula 2a003256ae MLStmt cloning and IV replacement for loop unrolling, add constant pool to
MLFunctions.

- MLStmt cloning and IV replacement
- While at this, fix the innermostLoopGatherer to actually gather all the
  innermost loops (it was stopping its walk at the first innermost loop it
  found)
- Improve comments for MLFunction statement classes, fix inheritance order.

- Fixed StmtBlock destructor.

PiperOrigin-RevId: 207049173
2019-03-29 12:53:02 -07:00
Tatiana Shpeisman c8b0273f19 Implement induction variables. Pretty print induction variable operands as %i<ssa value number>. Add support for future pretty printing of ML function arguments as %arg<ssa value number>.
Induction variables are implemented by inheriting ForStmt from MLValue. ForStmt provides APIs that make this design decision invisible to the ForStmt users.

This CL in combination with cl/206253643 resolves  http://b/111769060.

PiperOrigin-RevId: 206655937
2019-03-29 12:49:36 -07:00
Uday Bondhugula a0abd666a7 Sketch out loop unrolling transformation.
- Implement a full loop unroll for innermost loops.
- Use it to implement a pass that unroll all the innermost loops of all
  mlfunction's in a module. ForStmt's parsed currently have constant trip
  counts (and constant loop bounds).
- Implement StmtVisitor based (Visitor pattern)

Loop IVs aren't currently parsed and represented as SSA values. Replacing uses
of loop IVs in unrolled bodies is thus a TODO. Class comments are sparse at some places - will add them after one round of comments.

A cmd-line flag triggers this for now.

Original:

mlfunc @loops() {
  for x = 1 to 100 step 2 {
    for x = 1 to 4 {
      "Const"(){value: 1} : () -> ()
    }
  }
  return
}

After unrolling:

mlfunc @loops() {
  for x = 1 to 100 step 2 {
    "Const"(){value: 1} : () -> ()
    "Const"(){value: 1} : () -> ()
    "Const"(){value: 1} : () -> ()
    "Const"(){value: 1} : () -> ()
  }
  return
}

PiperOrigin-RevId: 205933235
2019-03-29 12:43:01 -07:00
Tatiana Shpeisman 1b24c48b91 Scaffolding for convertToCFG pass that replaces all instances of ML functions with equivalent CFG functions. Traverses module MLIR, generates CFG functions (empty for now) and removes ML functions. Adds Transforms library and tests.
PiperOrigin-RevId: 205848367
2019-03-29 12:41:15 -07:00