Commit Graph

33 Commits

Author SHA1 Message Date
Uday Bondhugula 89c41fdca1 FlatAffineConstraints::composeMap: return failure instead of asserting on semi-affine maps
FlatAffineConstraints::composeMap: should return false instead of asserting on
a semi-affine map. Make getMemRefRegion just propagate false when encountering
semi-affine maps (instead of crashing!)
PiperOrigin-RevId: 223828743
2019-03-29 14:14:56 -07:00
Uday Bondhugula 2631b155a9 Fix bugs in DMA generation and FlatAffineConstraints; add more test
cases.

- fix bug in calculating index expressions for DMA buffers in certain cases
  (affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
  additional operands to the index remap AffineMap can now be supplied by the
  client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
  ::composeMap - fix position bug.
- Some clean up and more comments

PiperOrigin-RevId: 222434628
2019-03-29 14:07:31 -07:00
Uday Bondhugula fff1efbaf5 Updates to transformation/analysis passes/utilities. Update DMA generation pass
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.

- add support for getMemRefRegion symbolic in outer loops - hence support for
  DMAs symbolic in outer surrounding loops.

- add DMA generation support for outgoing DMAs (store op's to lower memory
  space); extend getMemoryRegion to store op's. -memref-bound-check now works
  with store op's as well.

- fix dma-generate (references to the old memref in the dma_start op were also
  being replaced with the new buffer); we need replace all memref uses to work
  only on a subset of the uses - add a new optional argument for
  replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
  'operation' argument to serve as a filter - if provided, only those uses that
  are dominated by the filter are replaced.

- Add missing print for attributes for dma_start, dma_wait op's.

- update the FlatAffineConstraints API

PiperOrigin-RevId: 221889223
2019-03-29 14:00:51 -07:00
MLIR Team cd051dc634 Bug fixes in FlatAffineConstraints. Tests cases that discovered these in follow up CL on memref dependence checks.
PiperOrigin-RevId: 220632386
2019-03-29 13:51:47 -07:00
Uday Bondhugula 6cd5d5c544 Introduce loop tiling code generation (hyper-rectangular case)
- simple perfectly nested band tiling with fixed tile sizes.
- only the hyper-rectangular case is handled, with other limitations of
  getIndexSet applying (constant loop bounds, etc.);  once
  the latter utility is extended, tiled code generation should become more
  general.
- Add FlatAffineConstraints::isHyperRectangular()

PiperOrigin-RevId: 220324933
2019-03-29 13:49:05 -07:00
Uday Bondhugula 4269a01863 Clean up memref dep check utilities; update FlatAffineConstraints API, add
simple utility methods.

- clean up some of the analysis utilities used by memref dep checking
- add additional asserts / comments at places in analysis utilities
- add additional simple methods to the FlatAffineConstraints API.

PiperOrigin-RevId: 220124523
2019-03-29 13:48:35 -07:00
Uday Bondhugula 74c62c8ce0 Complete memref bound checker for arbitrary affine expressions. Handle local
variables from mod's and div's when converting to flat form.

- propagate mod, floordiv, ceildiv / local variables constraint information
  when flattening affine expressions and converting them into flat affine
  constraints; resolve multiple TODOs.
- enables memref bound checker to work with arbitrary affine expressions
- update FlatAffineConstraints API with several new methods
- test/exercise functionality mostly through -memref-bound-check
- other analyses such as dependence tests, etc. should now be able to work in the
  presence of any affine composition of add, mul, floor, ceil, mod.

PiperOrigin-RevId: 219711806
2019-03-29 13:47:29 -07:00
MLIR Team f28e4df666 Adds a dependence check to test whether two accesses to the same memref access the same element.
- Builds access functions and iterations domains for each access.
- Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds.
- Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses.
- Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops.
- The LoopFusion pass will be extended in subsequent CLs.

PiperOrigin-RevId: 219630898
2019-03-29 13:47:13 -07:00
Uday Bondhugula 8201e19e3d Introduce memref bound checking.
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:

$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir

/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
      %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
      %y = load %B[%idy] : memref<128 x i32>
           ^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
  %0 = alloc() : memref<9x9xi32>
  %1 = alloc() : memref<128xi32>
  for %i0 = -1 to 9 {
    for %i1 = -1 to 9 {
      %2 = affine_apply #map0(%i0, %i1)
      %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
      %4 = affine_apply #map1(%i0, %i1)
      %5 = load %1[%4] : memref<128xi32>
    }
  }
  return
}

- Improves productivity while manually / semi-automatically developing MLIR for
  testing / prototyping; also provides an indirect way to catch errors in
  transformations.

- This pass is an easy way to test the underlying affine analysis
  machinery including low level routines.

Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.

While on this:

- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/

- fix a bug in AffineAnalysis.cpp::toAffineExpr

TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
2019-03-29 13:46:08 -07:00
Uday Bondhugula c5128e152a FlatAffineConstraints API update - additional methods
- add methods addConstantLowerBound, addConstantUpperBound, setIdToConstant,
  addDimsForMap
- update coefficient storage to use numReservedCols * rows instead of numCols *
  rows (makes the code simpler/natural; reduces movement of data when new
  columns are added, eliminates movement of data when columns are added to the
  end).

(addDimsForMap is tested in the child CL on memref bound checking: cl/219000460)

PiperOrigin-RevId: 219358376
2019-03-29 13:45:14 -07:00
Uday Bondhugula 1ec77cecf2 FourierMotzkinEliminate trivial bug fix
PiperOrigin-RevId: 219148982
2019-03-29 13:43:30 -07:00
River Riddle 6e6e40ae79 Move AffineMap.h/IntegerSet.h from Attributes.h to AttributeDetail.h where they belong.
PiperOrigin-RevId: 218806426
2019-03-29 13:41:05 -07:00
MLIR Team 13f6cc0187 Run GCD test before elimination. Adds test case with rational solutions, but no integer solutions.
PiperOrigin-RevId: 218772332
2019-03-29 13:39:34 -07:00
Uday Bondhugula 80610c2f49 Introduce Fourier-Motzkin variable elimination + other cleanup/support
- Introduce Fourier-Motzkin variable elimination to eliminate a dimension from
  a system of linear equalities/inequalities. Update isEmpty to use this.
  Since FM is only exact on rational/real spaces, an emptiness check based on
  this is guaranteed to be exact whenever it says the underlying set is empty;
  if it says, it's not empty, there may still be no integer points in it.
  Also, supports a version that computes "dark shadows".

- Test this by checking for "always false" conditionals in if statements.

- Unique IntegerSet's that are small (few constraints, few variables). This
  basically means the canonical empty set and other small sets that are
  likely commonly used get uniqued; allows checking for the canonical empty set
  by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size
  for uniqui'ing.

- rename simplify-affine-expr -> simplify-affine-structures

Other cleanup

- IntegerSet::numConstraints, AffineMap::numResults are no longer needed;
  remove them.
- add copy assignment operators for AffineMap, IntegerSet.
- rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet
- Misc cleanup for FlatAffineConstraints API

PiperOrigin-RevId: 218690456
2019-03-29 13:38:24 -07:00
MLIR Team 5413239350 Adds Gaussian Elimination to FlatAffineConstraints.
- Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system.
- Adds GCD test check if equality constraints have no solution.
- Adds unit test cases.

PiperOrigin-RevId: 218546319
2019-03-29 13:38:10 -07:00
Nicolas Vasilache 3013dadb7c [MLIR] Basic infrastructure for vectorization test
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.

The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.

The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.

In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.

One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.

This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).

PiperOrigin-RevId: 217622889
2019-03-29 13:32:13 -07:00
MLIR Team 0114e232d8 Adds method to AffineApplyOp which forward substitutes its results into any of its users which are also AffineApplyOps.
Updates ComposeAffineMaps test pass to use this method.
Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass.

PiperOrigin-RevId: 217163351
2019-03-29 13:30:49 -07:00
Jacques Pienaar 764fd035b0 Split BuiltinOps out of StandardOps.
* Move Return, Constant and AffineApply out into BuiltinOps;
* BuiltinOps are always registered, while StandardOps follow the same dynamic registration;
* Kept isValidX in MLValue as we don't have a verify on AffineMap so need to keep it callable from Parser (I wanted to move it to be called in verify instead);

PiperOrigin-RevId: 216592527
2019-03-29 13:28:12 -07:00
Nicolas Vasilache b04f881dcb [MLIR] IntegerSet value type
This CL applies the same pattern as AffineMap to IntegerSet: a simple struct
that acts as the storage is allocated in the bump pointer. The IntegerSet is
immutable and accessed everywhere by value.

Note that unlike AffineMap, it is not possible to remove the MLIRContext
parameter when constructing an IntegerSet for now. One possible way to achieve
this would be to add an enum to distinguish between the mathematically empty
set, the universe set and other sets.

This is left for future discussion.

PiperOrigin-RevId: 216545361
2019-03-29 13:27:19 -07:00
Nicolas Vasilache 1d3e7e2616 [MLIR] AffineMap value type
This CL applies the same pattern as AffineExpr to AffineMap: a simple struct
that acts as the storage is allocated in the bump pointer. The AffineMap is
immutable and accessed everywhere by value.

PiperOrigin-RevId: 216445930
2019-03-29 13:26:24 -07:00
Nicolas Vasilache 8ebb6ff171 [MLIR] Sketch AffineExpr value type
This CL sketches what it takes for AffineExpr to fully have by-value semantics
and not be a not-so-smart pointer anymore.

This essentially makes the underyling class a simple storage struct and
implements the operations on the value type directly. Since there is no
forwarding of operations anymore, we can full isolate the storage class and
make a hard visibility barrier by moving detail::AffineExpr into
AffineExprDetail.h.

AffineExprDetail.h is only included where storage-related information is
needed.

PiperOrigin-RevId: 216385459
2019-03-29 13:25:42 -07:00
MLIR Team c386143834 Address comments from previous CL/216216446
PiperOrigin-RevId: 216298139
2019-03-29 13:25:28 -07:00
Nicolas Vasilache 6707c7bea1 [MLIR] AffineExpr final cleanups
This CL:
1. performs the global codemod AffineXExpr->AffineXExprClass and
AffineXExprRef -> AffineXExpr;
2. simplifies function calls by removing the redundant MLIRContext parameter;
3. adds missing binary operator versions of scalar op AffineExpr where it
makes sense.

PiperOrigin-RevId: 216242674
2019-03-29 13:25:14 -07:00
MLIR Team fe490043b0 Affine map composition.
*) Implements AffineValueMap forward substitution for AffineApplyOps.
*) Adds ComposeAffineMaps transformation pass, which composes affine maps for all loads/stores in an MLFunction.
*) Adds multiple affine map composition tests.

PiperOrigin-RevId: 216216446
2019-03-29 13:24:59 -07:00
Nicolas Vasilache ce2edea135 [MLIR] Cleanup AffineExpr
This CL introduces a series of cleanups for AffineExpr value types:
1. to make it clear that the value types should be used, the pointer
AffineExpr types are put in the detail namespace. Unfortunately, since the
value type operator-> only forwards to the underlying pointer type, we
still
need to expose this in the include file for now;
2. AffineExprKind is ok to use, it thus comes out of detail and thus of
AffineExpr
3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are
similarly
extracted as free functions and their naming is mande consistent across
Builder, MLContext and AffineExpr
4. AffineBinaryOpEx::simplify functions are made into static free
functions.
In particular it is moved away from AffineMap.cpp where it does not belong
5. operator AffineExprType is made explicit
6. uses the binary operators everywhere possible
7. drops the pointer usage everywhere outside of AffineExpr.cpp,
MLIRContext.cpp and AsmPrinter.cpp

PiperOrigin-RevId: 216207212
2019-03-29 13:24:45 -07:00
Nicolas Vasilache b55b407601 [RFC][MLIR] Use AffineExprRef in place of AffineExpr* in IR
This CL starts by replacing AffineExpr* with value-type AffineExprRef in a few
places in the IR. By a domino effect that is pretty telling of the
inconsistencies in the codebase, const is removed where it makes sense.

The rationale is that the decision was concisously made that unique'd types
have pointer semantics without const specifier. This is fine but we should be
consistent. In the end, the only logical invariant is that there should never
be such a thing as a const AffineExpr*, const AffineMap* or const IntegerSet*
in our codebase.

This CL takes a number of shortcuts to killing const with fire, in particular
forcing const AffineExprRef to return the underlying non-const
AffineExpr*. This will be removed once AffineExpr* has disappeared in
containers but for now such shortcuts allow a bit of sanity in this long quest
for cleanups.

The **only** places where const AffineExpr*, const AffineMap* or const
IntegerSet* may still appear is by transitive needs from containers,
comparison operators etc.

There is still one major thing remaining here: figure out why cast/dyn_cast
return me a const AffineXXX*, which in turn requires a bunch of ugly
const_casts. I suspect this is due to the classof
taking const AffineXXXExpr*. I wonder whether this is a side effect of 1., if
it is coming from llvm itself (I'd doubt it) or something else (clattner@?)

In light of this, the whole discussion about const makes total sense to me now
and I would systematically apply the rule that in the end, we should never
have any const XXX in our codebase for unique'd types (assuming we can remove
them all in containers and no additional constness constraint is added on us
from the outside world).

PiperOrigin-RevId: 215811554
2019-03-29 13:23:05 -07:00
Nicolas Vasilache 544f5e7a9b [MLIR] Remove uses of AffineExpr* outside of IR
This CL uniformizes the uses of AffineExprWrap outside of IR.
The public API of AffineExpr builder is modified to only use AffineExprWrap.
A few places access AffineExprWrap.expr, this is only while the API is in
transition to easily keep track (i.e. make expr private and let the compiler
track the errors).

Parser.cpp exhibits patterns that are dependent on nullptr values so
converting it is left for another CL.

PiperOrigin-RevId: 215642005
2019-03-29 13:22:35 -07:00
Uday Bondhugula 64812a56c7 Extend getConstantTripCount to deal with a larger subset of loop bounds; make loop
unroll/unroll-and-jam more powerful; add additional affine expr builder methods

- use previously added analysis/simplification to infer multiple of unroll
  factor trip counts, making loop unroll/unroll-and-jam more general.

- for loop unroll, support bounds that are single result affine map's with the
  same set of operands. For unknown loop bounds, loop unroll will now work as
  long as trip count can be determined to be a multiple of unroll factor.

- extend getConstantTripCount to deal with single result affine map's with the
  same operands. move it to mlir/Analysis/LoopAnalysis.cpp

- add additional builder utility methods for affine expr arithmetic
  (difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to
  use the utility methods.

- move affine analysis routines to AffineAnalysis.cpp/.h from
  AffineStructures.cpp/.h.

- Rename LoopUnrollJam to LoopUnrollAndJam to match class name.

- add an additional simplification for simplifyFloorDiv, simplifyCeilDiv

- Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does
  not have operands. Operands are passed to it through affine_apply, from loop
  bounds/if condition's, etc., operands are stored in the latter.

This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU
code generation, and can move to other analyses/transformations.

Loop nests like these are now unrolled without any cleanup loop being generated.

  for %i = 1 to 100 {
    // unroll factor 4: no cleanup loop will be generated.
    for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) {
      %x = "foo"(%j) : (affineint) -> i32
    }
  }

  for %i = 1 to 100 {
    // unroll factor 4: no cleanup loop will be generated.
    for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) {
      %y = "foo"(%j) : (affineint) -> i32
    }
  }

  for %i = 1 to 100 {
    for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) {
      %x = "foo"() : () -> i32
    }
  }

TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor
changes).

PiperOrigin-RevId: 212661212
2019-03-29 13:13:00 -07:00
MLIR Team f884e8da82 Fix opt build where compiled out assert leaves unused local variable.
PiperOrigin-RevId: 211631896
2019-03-29 13:09:44 -07:00
Uday Bondhugula d5416f299e Complete AffineExprFlattener based simplification for floordiv/ceildiv.
- handle floordiv/ceildiv in AffineExprFlattener; update the simplification to
  work even if mod/floordiv/ceildiv expressions appearing in the tree can't be eliminated.
- refactor the flattening / analysis to move it out of lib/Transforms/
- fix MutableAffineMap::isMultipleOf
- add AffineBinaryOpExpr:getAdd/getMul/... utility methods

PiperOrigin-RevId: 211540536
2019-03-29 13:09:18 -07:00
Uday Bondhugula 0122a99cbb Affine expression analysis and simplification.
Outside of IR/
- simplify a MutableAffineMap by flattening the affine expressions
- add a simplify affine expression pass that uses this analysis
- update the FlatAffineConstraints API (to be used in the next CL)

In IR:
- add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR
  simplication of simplifyMod simpler and more powerful.
- rename the AffineExpr visitor methods to distinguish b/w visiting and
  walking, and to simplify API names based on context.

The next CL will use some of these for the loop unrolling/unroll-jam to make
the detection for the need of cleanup loop powerful/non-trivial.

A future CL will finally move this simplification to FlatAffineConstraints to
make it more powerful. For eg., currently, even if a mod expr appearing in a
part of the expression tree can't be simplified, the whole thing won't be
simplified.

PiperOrigin-RevId: 211012256
2019-03-29 13:07:44 -07:00
Uday Bondhugula 851353687f Introduce hyper-rectangular sets for analysis.
- introduce hyper-rectangular set representation and API sketch for
  analysis/code generation
- implement the 'intersect' and 'project out' operations.

The represention is lighter weight, and operations and other queries on it are
much faster on such domains when compared to general polyhedral domains.

PiperOrigin-RevId: 210245882
2019-03-29 13:05:29 -07:00
Uday Bondhugula 6911c24e97 Sketch out affine analysis structures: AffineValueMap, IntegerValueSet,
FlatAffineConstraints, and MutableAffineMap.

All four classes introduced reside in lib/Analysis and are not meant to be
used in the IR (from lib/IR or lib/Parser/). They are all mutable, alloc'ed,
dealloc'ed - although with their fields pointing to immutable affine
expressions (AffineExpr *).

While on this, update simplifyMod to fold mod to a zero when possible.

PiperOrigin-RevId: 209618437
2019-03-29 13:03:24 -07:00