* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors
- change dma-generate pass to work on blocks of instructions (start/end
iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
option (besides command line flag)
- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods
Eg. output
$ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
^
$ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
PiperOrigin-RevId: 232297044
Addresses b/122486036
This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.
As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.
This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.
PiperOrigin-RevId: 231058512
index remapping
- generate a sequence of single result affine_apply's for the index remapping
(instead of one multi result affine_apply)
- update dma-generate and loop-fusion test cases; while on this, change test cases
to use single result affine apply ops
- some fusion comment fix/cleanup
PiperOrigin-RevId: 230985830
- Update createAffineComputationSlice to generate a sequence of single result
affine apply ops instead of one multi-result affine apply
- update pipeline-data-transfer test case; while on this, also update the test
case to use only single result affine maps, and make it more robust to
change.
PiperOrigin-RevId: 230965478
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
supplied index remap was identity
PiperOrigin-RevId: 230848185
canonicalizations of operations. The ultimate important user of this is
going to be a funcBuilder->foldOrCreate<YourOp>(...) API, but for now it
is just a more convenient way to write certain classes of canonicalizations
(see the change in StandardOps.cpp).
NFC.
PiperOrigin-RevId: 230770021
- unrolling a single iteration loop by a factor of one should promote its body
into its parent; this makes it consistent with the behavior/expectation that
unrolling a loop by a factor equal to its trip count makes the loop go away.
PiperOrigin-RevId: 230426499
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
This CL is the 6th and last on the path to simplifying AffineMap composition.
This removes `AffineValueMap::forwardSubstitutions` and replaces it by simple
calls to `fullyComposeAffineMapAndOperands`.
PiperOrigin-RevId: 228962580
- when SSAValue/MLValue existed, code at several places was forced to create additional
aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
rid of such redundant code
- use filling ctors instead of explicit loops
- for smallvectors, change insert(list.end(), ...) -> append(...
- improve comments at various places
- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
getMemRefAccess. In the next CL, provide getAccess() accessors for load,
store, DMA op's to return a MemRefAccess.
PiperOrigin-RevId: 228243638
This change is mechanical and merges the LowerAffineApplyPass and
LowerIfAndForPass into a single LowerAffinePass. It makes a step towards
defining an "affine dialect" that would contain all polyhedral-related
constructs. The motivation for merging these two passes is based on retiring
MLFunctions and, eventually, transforming If and For statements into regular
operations. After that happens, LowerAffinePass becomes yet another
legalization.
PiperOrigin-RevId: 227566113
Existing implementation was created before ML/CFG unification refactoring and
did not concern itself with further lowering to separate concerns. As a
result, it emitted `affine_apply` instructions to implement `for` loop bounds
and `if` conditions and required a follow-up function pass to lower those
`affine_apply` to arithmetic primitives. In the unified function world,
LowerForAndIf is mostly a lowering pass with low complexity. As we move
towards a dialect for affine operations (including `for` and `if`), it makes
sense to lower `for` and `if` conditions directly to arithmetic primitives
instead of relying on `affine_apply`.
Expose `expandAffineExpr` function in LoweringUtils. Use this function
together with `expandAffineMaps` to emit primitives that implement loop and
branch conditions directly.
Also remove tests that become unnecessary after transforming LowerForAndIf into
a function pass.
PiperOrigin-RevId: 227563608
In LoweringUtils, extract out `expandAffineMap`. This function takes an affine
map and a list of values the map should be applied to and emits a sequence of
arithmetic instructions that implement the affine map. It is independent of
the AffineApplyOp and can be used in places where we need to insert an
evaluation of an affine map without relying on a (temporary) `affine_apply`
instruction. This prepares for a merge between LowerAffineApply and
LowerForAndIf passes.
Move the `expandAffineApply` function to the LowerAffineApply pass since it is
the only place that must be aware of the `affine_apply` instructions.
PiperOrigin-RevId: 227563439
simplifying them in minor ways. The only significant cleanup here
is the constant folding pass. All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.
The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).
This is step 28/n towards merging instructions and statements.
PiperOrigin-RevId: 227328115
Remove an unnecessary restriction in forward substitution. Slightly
simplify LLVM IR lowering, which previously would crash if given an ML
function, it should now produce a clean error if given a function with an
if/for instruction in it, just like it does any other unsupported op.
This is step 27/n towards merging instructions and statements.
PiperOrigin-RevId: 227324542
representation, shrinking by 70LOC. The PatternRewriter class can probably
also be simplified as well, but one step at a time.
This is step 26/n towards merging instructions and statements. NFC.
PiperOrigin-RevId: 227324218
- introduce PostDominanceInfo in the right/complete way and use that for post
dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
check; similarly, substitute dominates() in checkMemRefAccessDependence with
a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'
PiperOrigin-RevId: 227320507
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops. Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.
This is step 25/n towards merging instructions and statements.
PiperOrigin-RevId: 227243966
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.
This is step 21/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227178245
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.
This is step 19/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227163082
StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names.
This is step 17/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227121537
OperationInst derives from it. This allows eliminating some forwarding
functions, other complex code handling multiple paths, and the 'isStatement'
bit tracked by Operation.
This is the last patch I think I can make before the big mechanical change
merging Operation into OperationInst, coming next.
This is step 15/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227077411
StmtSuccessorIterator/StmtSuccessorIterator, and rename and move the
CFGFunctionViewGraph pass to ViewFunctionGraph.
This is step 13/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227069438
FuncBuilder class. Also rename SSAValue.cpp to Value.cpp
This is step 12/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227067644
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.
This is step 11/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227064624
This *only* changes the internal data structures, it does not affect the user visible syntax or structure of MLIR code. Function gets new "isCFG()" sorts of predicates as a transitional measure.
This patch is gross in a number of ways, largely in an effort to reduce the amount of mechanical churn in one go. It introduces a bunch of using decls to keep the old names alive for now, and a bunch of stuff needs to be renamed.
This is step 10/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227044402
making it more similar to the CFG side of things. It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.
This is step 9/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226983931
from it. This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future. This also removes IfClause and ForStmtBody.
This is step 5/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226936541
StmtBlock. This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.
This is step 1/N towards merging BasicBlock and StmtBlock. This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.
This is a mechanical patch, NFC.
PiperOrigin-RevId: 226684158
- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)
PiperOrigin-RevId: 224892958
- fix replaceAllMemRefUsesWith call to replace only inside loop body.
- handle the case where DMA buffers are dynamic; extend doubleBuffer() method
to handle dynamically shaped DMA buffers (pass the right operands to AllocOp)
- place alloc's for DMA buffers at the depth at which pipelining is being done
(instead of at top-level)
- add more test cases
PiperOrigin-RevId: 224852231
cl/224246657); eliminate repeated evaluation of exprs in loop upper bounds.
- while on this, sweep through and fix potential repeated evaluation of
expressions in loop upper bounds
PiperOrigin-RevId: 224268918
The check for whether the memref was used in a non-derefencing context had to
be done inside, i.e., only for the op stmt's that the replacement was specified
to be performed on (by the domStmtFilter arg if provided). As such, it is
completely fine for example for a function to return a memref while the replacement
is being performed only a specific loop's body (as in the case of DMA
generation).
PiperOrigin-RevId: 223827753
class. This change is NFC, but allows for new kinds of patterns, specifically
LegalizationPatterns which will be allowed to change the types of things they
rewrite.
PiperOrigin-RevId: 223243783
Several things were suggested in post-submission reviews. In particular, use
pointers in function interfaces instead of references (still use references
internally). Clarify the behavior of the pass in presence of MLFunctions.
PiperOrigin-RevId: 222556851
cases.
- fix bug in calculating index expressions for DMA buffers in certain cases
(affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
additional operands to the index remap AffineMap can now be supplied by the
client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
::composeMap - fix position bug.
- Some clean up and more comments
PiperOrigin-RevId: 222434628
This function pass replaces affine_apply operations in CFG functions with
sequences of primitive arithmetic instructions that form the affine map.
The actual replacement functionality is located in LoweringUtils as a
standalone function operating on an individual affine_apply operation and
inserting the result at the location of the original operation. It is expected
to be useful for other, target-specific lowering passes that may start at
MLFunction level that Deaffinator does not support.
PiperOrigin-RevId: 222406692
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.
- add support for getMemRefRegion symbolic in outer loops - hence support for
DMAs symbolic in outer surrounding loops.
- add DMA generation support for outgoing DMAs (store op's to lower memory
space); extend getMemoryRegion to store op's. -memref-bound-check now works
with store op's as well.
- fix dma-generate (references to the old memref in the dma_start op were also
being replaced with the new buffer); we need replace all memref uses to work
only on a subset of the uses - add a new optional argument for
replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
'operation' argument to serve as a filter - if provided, only those uses that
are dominated by the filter are replaced.
- Add missing print for attributes for dma_start, dma_wait op's.
- update the FlatAffineConstraints API
PiperOrigin-RevId: 221889223
Array attributes can nested and function attributes can appear anywhere at that
level. They should be remapped to point to the generated CFGFunction after
ML-to-CFG conversion, similarly to plain function attributes. Extract the
nested attribute remapping functionality from the Parser to Utils. Extract out
the remapping function for individual Functions from the module remapping
function. Use these new functions in the ML-to-CFG conversion pass and in the
parser.
PiperOrigin-RevId: 221510997
These functions are declared in Transforms/LoopUtils.h (included to the
Transforms/Utils library) but were defined in the loop unrolling pass in
Transforms/LoopUnroll.cpp. As a result, targets depending only on
TransformUtils library but not on Transforms could get link errors. Move the
definitions to Transforms/Utils/LoopUtils.cpp where they should actually live.
This does not modify any code.
PiperOrigin-RevId: 221508882
Change the storage type to APInt from int64_t for IntegerAttr (following the change to APFloat storage in FloatAttr). Effectively a direct change from int64_t to 64-bit APInt throughout (the bitwidth hardcoded). This change also adds a getInt convenience method to IntegerAttr and replaces previous getValue calls with getInt calls.
While this changes updates the storage type, it does not update all constant folding calls.
PiperOrigin-RevId: 221082788
Value type abstraction for locations differ from others in that a Location can NOT be null. NOTE: dyn_cast returns an Optional<T>.
PiperOrigin-RevId: 220682078
This CL implement exclusive upper bound behavior as per b/116854378.
A followup CL will update the semantics of the for loop.
PiperOrigin-RevId: 220448963
FuncBuilder is useful to build a operation to replace an existing operation, so change the constructor to allow constructing it with an existing operation. Change FuncBuilder to contain (effectively) a tagged union of CFGFuncBuilder and MLFuncBuilder (as these should be cheap to copy and avoid allocating/deletion when created via a operation).
PiperOrigin-RevId: 219532952
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:
$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
%0 = alloc() : memref<9x9xi32>
%1 = alloc() : memref<128xi32>
for %i0 = -1 to 9 {
for %i1 = -1 to 9 {
%2 = affine_apply #map0(%i0, %i1)
%3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
%4 = affine_apply #map1(%i0, %i1)
%5 = load %1[%4] : memref<128xi32>
}
}
return
}
- Improves productivity while manually / semi-automatically developing MLIR for
testing / prototyping; also provides an indirect way to catch errors in
transformations.
- This pass is an easy way to test the underlying affine analysis
machinery including low level routines.
Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.
While on this:
- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/
- fix a bug in AffineAnalysis.cpp::toAffineExpr
TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.
PiperOrigin-RevId: 219372163
1) We incorrectly reassociated non-reassociative operations like subi, causing
miscompilations.
2) When constant folding, we didn't add users of the new constant back to the
worklist for reprocessing, causing us to miss some cases (pointed out by
Uday).
The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch.
PiperOrigin-RevId: 218803984
distinction. FunctionPasses can now choose to get called on all functions, or
have the driver split CFG/ML Functions up for them. NFC.
PiperOrigin-RevId: 218775885
make operations provide a list of canonicalizations that can be applied to
them. This allows canonicalization to be general to any IR definition.
As part of this, sink PatternMatch.h/cpp down to the IR library to fix a
layering problem.
PiperOrigin-RevId: 218773981
This is done by changing Attribute to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.
PiperOrigin-RevId: 218764173
just having the pattern matcher in its own library. At this point,
lib/Transforms/*.cpp are all actually passes themselves (and will probably
eventually be themselves move to a new subdirectory as we accrete more).
PiperOrigin-RevId: 218745193