llvm-project

Commit Graph

Author	SHA1	Message	Date
Uday Bondhugula	b9f53dc0bd	Update/Fix LoopUtils::stmtBodySkew to handle loop step. - loop step wasn't handled and there wasn't a TODO or an assertion; fix this. - rename 'delay' to shift for consistency/readability. - other readability changes. - remove duplicate attribute print for DmaStartOp; fix misplaced attribute print for DmaWaitOp - add build method for AddFOp (unrelated to this CL, but add it anyway) PiperOrigin-RevId: 224892958	2019-03-29 14:25:07 -07:00
Uday Bondhugula	d59a95a05c	Fix missing check for dependent DMAs in pipeline-data-transfer - adding a conservative check for now (TODO: use the dependence analysis pass once the latter is extended to deal with DMA ops). resolve an existing bug on a test case. - update test cases PiperOrigin-RevId: 224869526	2019-03-29 14:24:53 -07:00
Uday Bondhugula	2ef57806ba	Update/fix -pipeline-data-transfer; fix b/120770946 - fix replaceAllMemRefUsesWith call to replace only inside loop body. - handle the case where DMA buffers are dynamic; extend doubleBuffer() method to handle dynamically shaped DMA buffers (pass the right operands to AllocOp) - place alloc's for DMA buffers at the depth at which pipelining is being done (instead of at top-level) - add more test cases PiperOrigin-RevId: 224852231	2019-03-29 14:24:22 -07:00
Uday Bondhugula	b6c03917ad	Remove allocations for memref's that become dead as a result of double buffering in the auto DMA overlap pass. This is done online in the pass. PiperOrigin-RevId: 222313640	2019-03-29 14:05:19 -07:00
Uday Bondhugula	fff1efbaf5	Updates to transformation/analysis passes/utilities. Update DMA generation pass and getMemRefRegion() to work with specified loop depths; add support for outgoing DMAs, store op's. - add support for getMemRefRegion symbolic in outer loops - hence support for DMAs symbolic in outer surrounding loops. - add DMA generation support for outgoing DMAs (store op's to lower memory space); extend getMemoryRegion to store op's. -memref-bound-check now works with store op's as well. - fix dma-generate (references to the old memref in the dma_start op were also being replaced with the new buffer); we need replace all memref uses to work only on a subset of the uses - add a new optional argument for replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional 'operation' argument to serve as a filter - if provided, only those uses that are dominated by the filter are replaced. - Add missing print for attributes for dma_start, dma_wait op's. - update the FlatAffineConstraints API PiperOrigin-RevId: 221889223	2019-03-29 14:00:51 -07:00
Jacques Pienaar	cc9a6ed09d	Initialize Pass with PassID. The passID is not currently stored in Pass but this avoids the unused variable warning. The passID is used to uniquely identify passes, currently this is only stored/used in PassInfo. PiperOrigin-RevId: 220485662	2019-03-29 13:50:34 -07:00
Jacques Pienaar	6f0fb22723	Add static pass registration Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage. Change build targets to alwayslink to enforce registration. PiperOrigin-RevId: 220390178	2019-03-29 13:49:34 -07:00
Uday Bondhugula	8201e19e3d	Introduce memref bound checking. Introduce analysis to check memref accesses (in MLFunctions) for out of bound ones. It works as follows: $ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2 %x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ /tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1 %y = load %B[%idy] : memref<128 x i32> ^ #map0 = (d0, d1) -> (d0, d1) #map1 = (d0, d1) -> (d0 * 128 - d1) mlfunc @test() { %0 = alloc() : memref<9x9xi32> %1 = alloc() : memref<128xi32> for %i0 = -1 to 9 { for %i1 = -1 to 9 { %2 = affine_apply #map0(%i0, %i1) %3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32> %4 = affine_apply #map1(%i0, %i1) %5 = load %1[%4] : memref<128xi32> } } return } - Improves productivity while manually / semi-automatically developing MLIR for testing / prototyping; also provides an indirect way to catch errors in transformations. - This pass is an easy way to test the underlying affine analysis machinery including low level routines. Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256. While on this: - create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/ - fix a bug in AffineAnalysis.cpp::toAffineExpr TODO: extend to non-constant loop bounds (straightforward). Will transparently work for all accesses once floordiv, mod, ceildiv are supported in the AffineMap -> FlatAffineConstraints conversion. PiperOrigin-RevId: 219397961	2019-03-29 13:46:08 -07:00
River Riddle	4c465a181d	Implement value type abstraction for types. This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast. PiperOrigin-RevId: 219372163	2019-03-29 13:45:54 -07:00
Chris Lattner	adbba70d82	Simplify FunctionPass to eliminate the CFGFunctionPass/MLFunctionPass distinction. FunctionPasses can now choose to get called on all functions, or have the driver split CFG/ML Functions up for them. NFC. PiperOrigin-RevId: 218775885	2019-03-29 13:40:05 -07:00
Uday Bondhugula	ccfe593715	PassResult return cleanup. - return success as long as IR is in a valid state. PiperOrigin-RevId: 218225317	2019-03-29 13:35:47 -07:00
Chris Lattner	7850258c49	Introduce a new Operation::erase helper to generalize some code in the pattern matcher / canonicalizer, and rename existing eraseFromBlock methods to align with it. PiperOrigin-RevId: 218104455	2019-03-29 13:34:51 -07:00
Feng Liu	34927e2474	Rename Operation::getAs to Operation::dyn_cast Also rename Operation::is to Operation::isa Introduce Operation::cast All of these are for consistency with global dyn_cast/cast/isa operators. PiperOrigin-RevId: 217878786	2019-03-29 13:33:41 -07:00
Uday Bondhugula	18e666702c	Generalize / improve DMA transfer overlap; nested and multiple DMA support; resolve multiple TODOs. - replace the fake test pass (that worked on just the first loop in the MLFunction) to perform DMA pipelining on all suitable loops. - nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops) - fix bugs / assumptions: correctly copy memory space and elemental type of source memref for double buffering. - correctly identify matching start/finish statements, handle multiple DMAs per loop. - introduce dominates/properlyDominates utitilies for MLFunction statements. - move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it getShiftValidity - refactor getContainingStmtPos -> findAncestorStmtInBlock - move into Analysis/Utils.h; has two users. - other improvements / cleanup for related API/utilities - add size argument to dma_wait - for nested DMAs or in general, it makes it easy to obtain the size to use when lowering the dma_wait since we wouldn't want to identify the matching dma_start, and more importantly, in general/in the future, there may not always be a dma_start dominating the dma_wait. - add debug information in the pass PiperOrigin-RevId: 217734892	2019-03-29 13:32:28 -07:00
Uday Bondhugula	86eac4618c	Create private exclusive / single use affine computation slice for an op stmt. - add util to create a private / exclusive / single use affine computation slice for an op stmt (see method doc comment); a single multi-result affine_apply op is prepended to the op stmt to provide all results needed for its operands as a function of loop iterators and symbols. - use it for DMA pipelining (to create private slices for DMA start stmt's); resolve TODOs/feature request (b/117159533) - move createComposedAffineApplyOp to Transforms/Utils; free it from taking a memref as input / generalize it. PiperOrigin-RevId: 216926818	2019-03-29 13:29:21 -07:00
Jacques Pienaar	764fd035b0	Split BuiltinOps out of StandardOps. * Move Return, Constant and AffineApply out into BuiltinOps; * BuiltinOps are always registered, while StandardOps follow the same dynamic registration; * Kept isValidX in MLValue as we don't have a verify on AffineMap so need to keep it callable from Parser (I wanted to move it to be called in verify instead); PiperOrigin-RevId: 216592527	2019-03-29 13:28:12 -07:00
Nicolas Vasilache	1d3e7e2616	[MLIR] AffineMap value type This CL applies the same pattern as AffineExpr to AffineMap: a simple struct that acts as the storage is allocated in the bump pointer. The AffineMap is immutable and accessed everywhere by value. PiperOrigin-RevId: 216445930	2019-03-29 13:26:24 -07:00
Uday Bondhugula	82e55750d2	Add target independent standard DMA ops: dma.start, dma.wait Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline data transfer to use these to detect DMA ops. While on this - return failure from mlir-opt::performActions if a pass generates invalid output - improve error message for verify 'n' operand traits PiperOrigin-RevId: 216429885	2019-03-29 13:26:10 -07:00
Nicolas Vasilache	ce2edea135	[MLIR] Cleanup AffineExpr This CL introduces a series of cleanups for AffineExpr value types: 1. to make it clear that the value types should be used, the pointer AffineExpr types are put in the detail namespace. Unfortunately, since the value type operator-> only forwards to the underlying pointer type, we still need to expose this in the include file for now; 2. AffineExprKind is ok to use, it thus comes out of detail and thus of AffineExpr 3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are similarly extracted as free functions and their naming is mande consistent across Builder, MLContext and AffineExpr 4. AffineBinaryOpEx::simplify functions are made into static free functions. In particular it is moved away from AffineMap.cpp where it does not belong 5. operator AffineExprType is made explicit 6. uses the binary operators everywhere possible 7. drops the pointer usage everywhere outside of AffineExpr.cpp, MLIRContext.cpp and AsmPrinter.cpp PiperOrigin-RevId: 216207212	2019-03-29 13:24:45 -07:00
Uday Bondhugula	6cfdb756b1	Introduce memref replacement/rewrite support: to replace an existing memref with a new one (of a potentially different rank/shape) with an optional index remapping. - introduce Utils::replaceAllMemRefUsesWith - use this for DMA double buffering (This CL also adds a few temporary utilities / code that will be done away with once: 1) abstract DMA op's are added 2) memref deferencing side-effect / trait is available on op's 3) b/117159533 is resolved (memref index computation slices). PiperOrigin-RevId: 215831373	2019-03-29 13:23:19 -07:00
Uday Bondhugula	041817a45e	Introduce loop body skewing / loop pipelining / loop shifting utility. - loopBodySkew shifts statements of a loop body by stmt-wise delays, and is typically meant to be used to: - allow overlap of non-blocking start/wait until completion operations with other computation - allow shifting of statements (for better register reuse/locality/parallelism) - software pipelining (when applied to the innermost loop) - an additional argument specifies whether to unroll the prologue and epilogue. - add method to check SSA dominance preservation. - add a fake loop pipeline pass to test this utility. Sample input/output are below. While on this, fix/add following: - fix minor bug in getAddMulPureAffineExpr - add additional builder methods for common affine map cases - fix const_operand_iterator's for ForStmt, etc. When there is no such thing as 'const MLValue', the iterator shouldn't be returning const MLValue's. Returning MLValue is const correct. Sample input/output examples: 1) Simplest case: shift second statement by one. Input: for %i = 0 to 7 { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint } Output: #map0 = (d0) -> (d0 - 1) mlfunc @loop_nest_simple1() { %c8 = constant 8 : affineint %c0 = constant 0 : affineint %0 = "foo"(%c0) : (affineint) -> affineint for %i0 = 1 to 7 { %1 = "foo"(%i0) : (affineint) -> affineint %2 = affine_apply #map0(%i0) %3 = "bar"(%2) : (affineint) -> affineint } %4 = affine_apply #map0(%c8) %5 = "bar"(%4) : (affineint) -> affineint return } 2) DMA overlap: shift dma.wait and compute by one. Input for %i = 0 to 7 { %pingpong = affine_apply (d0) -> (d0 mod 2) (%i) "dma.enqueue"(%pingpong) : (affineint) -> affineint %pongping = affine_apply (d0) -> (d0 mod 2) (%i) "dma.wait"(%pongping) : (affineint) -> affineint "compute1"(%pongping) : (affineint) -> affineint } Output #map0 = (d0) -> (d0 mod 2) #map1 = (d0) -> (d0 - 1) #map2 = ()[s0] -> (s0 + 7) mlfunc @loop_nest_dma() { %c8 = constant 8 : affineint %c0 = constant 0 : affineint %0 = affine_apply #map0(%c0) %1 = "dma.enqueue"(%0) : (affineint) -> affineint for %i0 = 1 to 7 { %2 = affine_apply #map0(%i0) %3 = "dma.enqueue"(%2) : (affineint) -> affineint %4 = affine_apply #map1(%i0) %5 = affine_apply #map0(%4) %6 = "dma.wait"(%5) : (affineint) -> affineint %7 = "compute1"(%5) : (affineint) -> affineint } %8 = affine_apply #map1(%c8) %9 = affine_apply #map0(%8) %10 = "dma.wait"(%9) : (affineint) -> affineint %11 = "compute1"(%9) : (affineint) -> affineint return } 3) With arbitrary affine bound maps: Shift last two statements by two. Input: for %i = %N to ()[s0] -> (s0 + 7)()[%N] { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint %z = "foo_bar"(%i) : (affineint) -> (affineint) "bar_foo"(%i) : (affineint) -> (affineint) } Output #map0 = ()[s0] -> (s0 + 1) #map1 = ()[s0] -> (s0 + 2) #map2 = ()[s0] -> (s0 + 7) #map3 = (d0) -> (d0 - 2) #map4 = ()[s0] -> (s0 + 8) #map5 = ()[s0] -> (s0 + 9) for %i0 = %arg0 to #map0()[%arg0] { %0 = "foo"(%i0) : (affineint) -> affineint %1 = "bar"(%i0) : (affineint) -> affineint } for %i1 = #map1()[%arg0] to #map2()[%arg0] { %2 = "foo"(%i1) : (affineint) -> affineint %3 = "bar"(%i1) : (affineint) -> affineint %4 = affine_apply #map3(%i1) %5 = "foo_bar"(%4) : (affineint) -> affineint %6 = "bar_foo"(%4) : (affineint) -> affineint } for %i2 = #map4()[%arg0] to #map5()[%arg0] { %7 = affine_apply #map3(%i2) %8 = "foo_bar"(%7) : (affineint) -> affineint %9 = "bar_foo"(%7) : (affineint) -> affineint } 4) Shift one by zero, second by one, third by two for %i = 0 to 7 { %y = "foo"(%i) : (affineint) -> affineint %x = "bar"(%i) : (affineint) -> affineint %z = "foobar"(%i) : (affineint) -> affineint } #map0 = (d0) -> (d0 - 1) #map1 = (d0) -> (d0 - 2) #map2 = ()[s0] -> (s0 + 7) %c9 = constant 9 : affineint %c8 = constant 8 : affineint %c1 = constant 1 : affineint %c0 = constant 0 : affineint %0 = "foo"(%c0) : (affineint) -> affineint %1 = "foo"(%c1) : (affineint) -> affineint %2 = affine_apply #map0(%c1) %3 = "bar"(%2) : (affineint) -> affineint for %i0 = 2 to 7 { %4 = "foo"(%i0) : (affineint) -> affineint %5 = affine_apply #map0(%i0) %6 = "bar"(%5) : (affineint) -> affineint %7 = affine_apply #map1(%i0) %8 = "foobar"(%7) : (affineint) -> affineint } %9 = affine_apply #map0(%c8) %10 = "bar"(%9) : (affineint) -> affineint %11 = affine_apply #map1(%c8) %12 = "foobar"(%11) : (affineint) -> affineint %13 = affine_apply #map1(%c9) %14 = "foobar"(%13) : (affineint) -> affineint 5) SSA dominance violated; no shifting if a shift is specified for the second statement. for %i = 0 to 7 { %x = "foo"(%i) : (affineint) -> affineint "bar"(%x) : (affineint) -> affineint } PiperOrigin-RevId: 214975731	2019-03-29 13:21:26 -07:00

21 Commits