*) Adds parameter to public API of MemRefRegion::compute for passing in the slice loop bounds to compute the memref region of the loop nest slice.
*) Exposes public method MemRefRegion::getRegionSize for computing the size of the memref region in bytes.
PiperOrigin-RevId: 232706165
* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
*) After a private memref buffer is created for a fused loop nest, dependences on the old memref are reduced, which can open up fusion opportunities. In these cases, users of the old memref are added back to the worklist to be reconsidered for fusion.
*) Fixed a bug in fusion insertion point dependence check where the memref being privatized was being skipped from the check.
PiperOrigin-RevId: 232477853
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
allocation and unique_ptr where possible)
- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
arguments. Since the method is just returning copies of 'Value *', the client
can't mutate the pointers themselves; it's fine to mutate the 'Value''s
themselves, but that doesn't mutate the pointers to those.
- change the way extractForInductionVars returns (see b/123437690)
PiperOrigin-RevId: 232359277
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors
- change dma-generate pass to work on blocks of instructions (start/end
iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
option (besides command line flag)
- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods
Eg. output
$ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
^
$ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
PiperOrigin-RevId: 232297044
- fusion already includes the necessary analysis to create small/local buffers
post fusion; allocate these buffers in a higher memory space if the necessary
pass parameters are provided (threshold size, memory space id)
- although there will be a separate utility at some point to directly detect
and promote small local buffers to higher memory spaces, doing it while fusion
when possible is much less expensive, comes free with fusion analysis, and covers
a key common case.
PiperOrigin-RevId: 232063894
Addresses b/122486036
This CL addresses some leftover crumbs in AffineMap and IntegerSet by removing
the Null method and cleaning up the constructors.
As the ::Null uses were tracked down, opportunities appeared to untangle some
of the Parsing logic and make it explicit where AffineMap/IntegerSet have
ambiguous syntax. Previously, ambiguous cases were hidden behind the implicit
pointer values of AffineMap* and IntegerSet* that were passed as function
parameters. Depending the values of those pointers one of 3 behaviors could
occur.
This parsing logic convolution is one of the rare cases where I would advocate
for code duplication. The more proper fix would be to make the syntax
unambiguous or to allow some lookahead.
PiperOrigin-RevId: 231058512
Example:
dma-generate options:
-dma-fast-mem-capacity - Set fast memory space ...
-dma-fast-mem-space=<uint> - Set fast memory space ...
loop-fusion options:
-fusion-compute-tolerance=<number> - Fractional increase in ...
-fusion-maximal - Enables maximal loop fusion
loop-tile options:
-tile-size=<uint> - Use this tile size for ...
loop-unroll options:
-unroll-factor=<uint> - Use this unroll factor ...
-unroll-full - Fully unroll loops
-unroll-full-threshold=<uint> - Unroll all loops with ...
-unroll-num-reps=<uint> - Unroll innermost loops ...
loop-unroll-jam options:
-unroll-jam-factor=<uint> - Use this unroll jam factor ...
PiperOrigin-RevId: 231019363
index remapping
- generate a sequence of single result affine_apply's for the index remapping
(instead of one multi result affine_apply)
- update dma-generate and loop-fusion test cases; while on this, change test cases
to use single result affine apply ops
- some fusion comment fix/cleanup
PiperOrigin-RevId: 230985830
- update fusion cost model to fuse while tolerating a certain amount of redundant
computation; add cl option -fusion-compute-tolerance
evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC
PiperOrigin-RevId: 230541857
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).
PiperOrigin-RevId: 230398630
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).
PiperOrigin-RevId: 229936698
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.
PiperOrigin-RevId: 229575126
*) LoopFusion: Adds fusion cost function which compares the cost of the fused loop nest, with the cost of the two unfused loop nests to determine if it is profitable to fuse the candidate loop nests. The fusion cost function is run for various combinations for src/dst loop depths attempting find the minimum cost setting for src/dst loop depths which does not increase the computational cost when the loop nests are fused. Combinations of src/dst loop depth are evaluated attempting to maximize loop depth (i.e. take a bigger computation slice from the source loop nest, and insert it deeper in the destination loop nest for better locality).
*) LoopFusion: Adds utility to compute op instance count for loop nests, sliced loop nests, and to compute the cost of a loop nest fused with another sliced loop nest.
*) LoopFusion: canonicalizes slice bound AffineMaps (and updates related tests).
*) Analysis::Utils: Splits getBackwardComputationSlice into two functions: one which calculates and returns the slice loop bounds for analysis by LoopFusion, and the other for insertion of the computation slice (ones fusion has calculated the min-cost src/dst loop depths).
*) Test: Adds multiple unit tests to test the new functionality.
PiperOrigin-RevId: 229219757
- refactor toAffineFromEq and the code surrounding it; refactor code into
FlatAffineConstraints::getSliceBounds
- add FlatAffineConstraints methods to detect identifiers as mod's and div's of other
identifiers
- add FlatAffineConstraints::getConstantLower/UpperBound
- Address b/122118218 (don't assert on invalid fusion depths cmdline flags -
instead, don't do anything; change cmdline flags
src-loop-depth -> fusion-src-loop-depth
- AffineExpr/Map print method update: don't fail on null instances (since we have
a wrapper around a pointer, it's avoidable); rationale: dump/print methods should
never fail if possible.
- Update memref-dataflow-opt to add an optimization to avoid a unnecessary call to
IsRangeOneToOne when it's trivially going to be true.
- Add additional test cases to exercise the new support
- update a few existing test cases since the maps are now generated uniformly with
all destination loop operands appearing for the backward slice
- Fix projectOut - fix wrong range for getBestElimCandidate.
- Fix for getConstantBoundOnDimSize() - didn't show up in any test cases since
we didn't have any non-hyperrectangular ones.
PiperOrigin-RevId: 228265152
- when SSAValue/MLValue existed, code at several places was forced to create additional
aggregate temporaries of SmallVector<SSAValue/MLValue> to handle the conversion; get
rid of such redundant code
- use filling ctors instead of explicit loops
- for smallvectors, change insert(list.end(), ...) -> append(...
- improve comments at various places
- turn getMemRefAccess into MemRefAccess ctor and drop duplicated
getMemRefAccess. In the next CL, provide getAccess() accessors for load,
store, DMA op's to return a MemRefAccess.
PiperOrigin-RevId: 228243638
simplifying them in minor ways. The only significant cleanup here
is the constant folding pass. All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.
The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).
This is step 28/n towards merging instructions and statements.
PiperOrigin-RevId: 227328115
- the load/store forwarding relies on memref dependence routines as well as
SSA/dominance to identify the memref store instance uniquely supplying a value
to a memref load, and replaces the result of that load with the value being
stored. The memref is also deleted when possible if only stores remain.
- add methods for post dominance for MLFunction blocks.
- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
earlier static)
- add a helper method in FlatAffineConstraints - isRangeOneToOne.
PiperOrigin-RevId: 227252907
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops. Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.
This is step 25/n towards merging instructions and statements.
PiperOrigin-RevId: 227243966
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.
This is step 21/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227178245
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.
This is step 19/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227163082
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.
This is step 11/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227064624
from it. This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future. This also removes IfClause and ForStmtBody.
This is step 5/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226936541
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.
PiperOrigin-RevId: 225842944
Updates MemRefDependenceCheck to check and report on all memref access pairs at all loop nest depths.
Updates old and adds new memref dependence check tests.
Resolves multiple TODOs.
PiperOrigin-RevId: 220816515