This is done by providing a walk callback that returns a WalkResult. This result is either `advance` or `interrupt`. `advance` means that the walk should continue, whereas `interrupt` signals that the walk should stop immediately. An example is shown below:
auto result = op->walk([](Operation *op) {
if (some_invariant)
return WalkResult::interrupt();
return WalkResult::advance();
});
if (result.wasInterrupted())
...;
PiperOrigin-RevId: 266436700
This change refactors and cleans up the implementation of the operation walk methods. After this refactoring is that the explicit template parameter for the operation type is no longer needed for the explicit op walks. For example:
op->walk<AffineForOp>([](AffineForOp op) { ... });
is now accomplished via:
op->walk([](AffineForOp op) { ... });
PiperOrigin-RevId: 266209552
Refactor replaceAllMemRefUsesWith to split it into two methods: the new
method does the replacement on a single op, and is used by the existing
one.
- make the methods return LogicalResult instead of bool
- Earlier, when replacement failed (due to non-deferencing uses of the
memref), the set of ops that had already been processed would have
been replaced leaving the IR in an inconsistent state. Now, a
pass is made over all ops to first check for non-deferencing
uses, and then replacement is performed. No test cases were affected
because all clients of this method were first checking for
non-deferencing uses before calling this method (for other reasons).
This isn't true for a use case in another upcoming PR (scalar
replacement); clients can now bail out with consistent IR on failure
of replaceAllMemRefUsesWith. Add test case.
- multiple deferencing uses of the same memref in a single op is
possible (we have no such use cases/scenarios), and this has always
remained unsupported. Add an assertion for this.
- minor fix to another test pipeline-data-transfer case.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#87
PiperOrigin-RevId: 265808183
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.
PiperOrigin-RevId: 263953918
There are currently several different terms used to refer to a parent IR unit in 'get' methods: getParent/getEnclosing/getContaining. This cl standardizes all of these methods to use 'getParent*'.
PiperOrigin-RevId: 262680287
This will allow for reusing the same pattern list, which may be costly to continually reconstruct, on multiple invocations.
PiperOrigin-RevId: 262664599
These methods will allow replacing the uses of results with an existing operation, with the same number of results, or a range of values. This removes a number of hand-rolled result replacement loops and simplifies replacement for operations with multiple results.
PiperOrigin-RevId: 262206600
This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'.
PiperOrigin-RevId: 261816316
This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values.
A corresponding unit test is added.
For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop:
```
loop.for %i = %lb to %ub step %step {
...
}
```
is rewritten into a version resembling the following pseudo-IR:
```
loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub
step %gridDim.x * blockDim.x {
...
}
```
PiperOrigin-RevId: 258945942
This CL adapts the recently introduced parametric tiling to have an API matching the tiling
of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops.
Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result.
A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def).
PiperOrigin-RevId: 258928309
Multiple (perfectly) nested loops with independent bounds can be combined into
a single loop and than subdivided into blocks of arbitrary size for load
balancing or more efficient parallelism exploitation. However, MLIR wants to
preserve the multi-dimensional multi-loop structure at higher levels of
abstraction. Introduce a transformation that coalesces nested loops with
independent bounds so that they can be further subdivided by tiling.
PiperOrigin-RevId: 258151016
These ops should not belong to the std dialect.
This CL extracts them in their own dialect and updates the corresponding conversions and tests.
PiperOrigin-RevId: 258123853
This field wasn't updated as the insertion point changed, making it potentially dangerous given the multi-level of MLIR(e.g. 'createBlock' would always insert the new block in 'region'). This also allows for building an OpBuilder with just a context.
PiperOrigin-RevId: 257829135
The GreedyPatternRewriteDriver currently does not notify the OperationFolder when constants are removed as part of a pattern match. This materializes in a nasty bug where a different operation may be allocated to the same address. This causes an assertion in the OperationFolder when it gets notified of the new operations removal.
PiperOrigin-RevId: 257817627
This CL splits the lowering of affine to LLVM into 2 parts:
1. affine -> std
2. std -> LLVM
The conversions mostly consists of splitting concerns between the affine and non-affine worlds from existing conversions.
Short-circuiting of affine `if` conditions was never tested or exercised and is removed in the process, it can be reintroduced later if needed.
LoopParametricTiling.cpp is updated to reflect the newly added ForOp::build.
PiperOrigin-RevId: 257794436
This allows for the attribute to hold symbolic references to other operations than FuncOp. This also allows for removing the dependence on FuncOp from the base Builder.
PiperOrigin-RevId: 257650017
Parametric tiling can be used to extract outer loops with fixed number of
iterations. This in turn enables mapping to GPU kernels on a fixed grid
independently of the range of the original loops, which may be unknown
statically, making the kernel adaptable to different sizes. Provide a utility
function that also computes the parametric tile size given the range of the
loop. Exercise the utility function through a simple pass that applies it to
all top-level loop nests. Permutability or parallelism checks must be
performed before calling this utility function in actual passes.
Note that parametric tiling cannot be implemented in a purely affine way,
although it can be encoded using semi-affine maps. The choice to implement it
on standard loops is guided by them being the common representation between
Affine loops, Linalg and GPU kernels.
PiperOrigin-RevId: 257180251
These methods assume that a function is a valid builtin top-level operation, and removing these methods allows for decoupling FuncOp and IR/. Utility "getParentOfType" methods have been added to Operation/OpState to allow for querying the first parent operation of a given type.
PiperOrigin-RevId: 257018913
In most places, this is just a name change (with the exception of affine.dma_start swapping the operand positions of its tag memref and num_elements operands).
Significant code changes occur here:
*) Vectorization: LoopAnalysis.cpp, Vectorize.cpp
*) Affine Transforms: Transforms/Utils/Utils.cpp
PiperOrigin-RevId: 256395088
Move the data members out of Function and into a new impl storage class 'FunctionStorage'. This allows for Function to become value typed, which will greatly simplify the transition of Function to FuncOp(given that FuncOp is also value typed).
PiperOrigin-RevId: 255983022
The OperationFolder currently just inserts into the entry block of a Function, but regions may be isolated above, i.e. explicit capture only, and blindly inserting constants may break the invariants of these regions.
PiperOrigin-RevId: 254987796
Now that Locations are Attributes they contain a direct reference to the MLIRContext, i.e. the context can be directly accessed from the given location instead of being explicitly passed in.
PiperOrigin-RevId: 254568329
Arguably, this function is only useful for transformations and should not
pollute the main IR. Also make sure it accepts a the resulting container
by-reference instead of returning it.
PiperOrigin-RevId: 253622981
This converts entire loops into threads/blocks. No check on the size of the
block or grid, or on the validity of parallelization is performed, it is under
the responsibility of the caller to strip-mine the loops and to perform the
dependence analysis before calling the conversion.
PiperOrigin-RevId: 253189268
*) Factors slice union computation out of LoopFusion into Analysis/Utils (where other iteration slice utilities exist).
*) Generalizes slice union computation to take the union of slices computed on all loads/stores pairs between source and destination loop nests.
*) Fixes a bug in FlatAffineConstraints::addSliceBounds where redundant constraints were added.
*) Takes care of a TODO to expose FlatAffineConstraints::mergeAndAlignIds as a public method.
--
PiperOrigin-RevId: 250561529
*) Adds LoopFusionUtils which will expose a set of loop fusion utilities (e.g. dependence checks, fusion cost/storage reduction, loop fusion transformation) for use by loop fusion algorithms. Support for checking block-level fusion-preventing dependences is added in this CL (additional loop fusion utilities will be added in subsequent CLs).
*) Adds TestLoopFusion test pass for testing LoopFusionUtils at a fine granularity.
*) Adds unit test for testing dependence check for block-level fusion-preventing dependences.
--
PiperOrigin-RevId: 249861071
* There is no longer a need to explicitly remap function attrs.
- This removes a potentially expensive call from the destructor of Function.
- This will enable some interprocedural transformations to now run intraprocedurally.
- This wasn't scalable and forces dialect defined attributes to override
a virtual function.
* Replacing a function is now a trivial operation.
* This is a necessary first step to representing functions as operations.
--
PiperOrigin-RevId: 249510802
During the pattern rewrite, if the function is changed, i.e. ops created,
deleted or swapped, the pattern rewriter needs to re-scan the function entirely
and apply the patterns again, so the patterns whose root ops have been popped
out from the working list nor an immediate users of the changed ops can be
reconsidered.
A command line flag is added to set the max number of iterations rescanning the
function for pattern match. If the rewrite doesn' converge after this number,
this compiling will continue and the result can be sub-optimal.
One unit test is updated because this change fixed the missing optimization opportunities.
--
PiperOrigin-RevId: 244754190
Note: This now means that we cannot fold chains of operations, i.e. where constant foldable operations feed into each other. Given that this is a testing pass solely for constant folding, this isn't really something that we want anyways. Constant fold tests should be simple and direct, with more advanced folding/feeding being tested with the canonicalizer.
--
PiperOrigin-RevId: 242011744
There are two places containing constant folding logic right now: the ConstantFold
pass and the GreedyPatternRewriteDriver. The logic was not shared and started to
drift apart. We were testing constant folding logic using the ConstantFold pass,
but lagged behind the GreedyPatternRewriteDriver, where we really want the constant
folding to happen.
This CL pulled the logic into utility functions and classes for sharing between
these two places. A new ConstantFoldHelper class is created to help constant fold
and de-duplication.
Also, renamed the ConstantFold pass to TestConstantFold to make it clear that it is
intended for testing purpose.
--
PiperOrigin-RevId: 241971681
Due to legacy reasons (ML/CFG function separation), regions in affine control
flow operations require contained blocks not to have terminators. This is
inconsistent with the notion of the block and may complicate code motion
between regions of affine control operations and other regions.
Introduce `affine.terminator`, a special terminator operation that must be used
to terminate blocks inside affine operations and transfers the control back to
he region enclosing the affine operation. For brevity and readability reasons,
allow `affine.for` and `affine.if` to omit the `affine.terminator` in their
regions when using custom printing and parsing format. The custom parser
injects the `affine.terminator` if it is missing so as to always have it
present in constructed operations.
Update transformations to account for the presence of terminator. In
particular, most code motion transformation between loops should leave the
terminator in place, and code motion between loops and non-affine blocks should
drop the terminator.
PiperOrigin-RevId: 240536998
a pointer. This makes it consistent with all the other methods in
FunctionPass, as well as with ModulePass::getModule(). NFC.
PiperOrigin-RevId: 240257910
This combined match/rewrite functionality allows simplifying the majority of existing RewritePatterns, as they do not benefit from separate match and rewrite functions.
Some of the existing canonicalization patterns in StandardOps have been modified to take advantage of this functionality.
PiperOrigin-RevId: 240187856
inherited constructors, which is cleaner and means you can now use DimOp()
to get a null op, instead of having to use Instruction::getNull<DimOp>().
This removes another 200 lines of code.
PiperOrigin-RevId: 240068113
This eliminate ConstOpPointer (but keeps OpPointer for now) by making OpPointer
implicitly launder const in a const incorrect way. It will eventually go away
entirely, this is a progressive step towards the new const model.
PiperOrigin-RevId: 239512640
This CL fixes an issue where cloned loop induction variables were not properly
propagated and beefs up the corresponding test.
PiperOrigin-RevId: 239422961
multi-result upper bounds, complete TODOs, fix/improve test cases.
- complete TODOs for loop unroll/unroll-and-jam. Something as simple as
"for %i = 0 to %N" wasn't being unrolled earlier (unless it had been written
as "for %i = ()[s0] -> (0)()[%N] to %N"; addressed now.
- update/replace getTripCountExpr with buildTripCountMapAndOperands; makes it
more powerful as it composes inputs into it
- getCleanupLowerBound and getUnrolledLoopUpperBound actually needed the same
code; refactor and remove one.
- reorganize test cases, write previous ones better; most of these changes are
"label replacements".
- fix wrongly labeled test cases in unroll-jam.mlir
PiperOrigin-RevId: 238014653
* bool succeeded(Status)
- Return if the status corresponds to a success value.
* bool failed(Status)
- Return if the status corresponds to a failure value.
PiperOrigin-RevId: 237153884
This CL changes dialect op source files (.h, .cpp, .td) to follow the following
convention:
<full-dialect-name>/<dialect-namespace>Ops.{h|cpp|td}
Builtin and standard dialects are specially treated, though. Both of them do
not have dialect namespace; the former is still named as BuiltinOps.* and the
latter is named as Ops.*.
Purely mechanical. NFC.
PiperOrigin-RevId: 236371358
This CL adds a primitive to perform stripmining of a loop by a given factor and
sinking it under multiple target loops.
In turn this is used to implement imperfectly nested loop tiling (with interchange) by repeatedly calling the stripmineSink primitive.
The API returns the point loops and allows repeated invocations of tiling to achieve declarative, multi-level, imperfectly-nested tiling.
Note that this CL is only concerned with the mechanical aspects and does not worry about analysis and legality.
The API is demonstrated in an example which creates an EDSC block, emits the corresponding MLIR and applies imperfectly-nested tiling:
```cpp
auto block = edsc::block({
For(ArrayRef<edsc::Expr>{i, j}, {zero, zero}, {M, N}, {one, one}, {
For(k1, zero, O, one, {
C({i, j, k1}) = A({i, j, k1}) + B({i, j, k1})
}),
For(k2, zero, O, one, {
C({i, j, k2}) = A({i, j, k2}) + B({i, j, k2})
}),
}),
});
// clang-format on
emitter.emitStmts(block.getBody());
auto l_i = emitter.getAffineForOp(i), l_j = emitter.getAffineForOp(j),
l_k1 = emitter.getAffineForOp(k1), l_k2 = emitter.getAffineForOp(k2);
auto indicesL1 = mlir::tile({l_i, l_j}, {512, 1024}, {l_k1, l_k2});
auto l_ii1 = indicesL1[0][0], l_jj1 = indicesL1[1][0];
mlir::tile({l_jj1, l_ii1}, {32, 16}, l_jj1);
```
The edsc::Expr for the induction variables (i, j, k_1, k_2) provide the programmatic hooks from which tiling can be applied declaratively.
PiperOrigin-RevId: 235548228
Analysis - NFC
- refactor AffineExprFlattener (-> SimpleAffineExprFlattener) so that it
doesn't depend on FlatAffineConstraints, and so that FlatAffineConstraints
could be moved out of IR/; the simplification that the IR needs for
AffineExpr's doesn't depend on FlatAffineConstraints
- have AffineExprFlattener derive from SimpleAffineExprFlattener to use for
all Analysis/Transforms purposes; override addLocalFloorDivId in the derived
class
- turn addAffineForOpDomain into a method on FlatAffineConstraints
- turn AffineForOp::getAsValueMap into an AffineValueMap ctor
PiperOrigin-RevId: 235283610
*) Adds utility to LoopUtils to perform loop interchange of two AffineForOps.
*) Adds utility to LoopUtils to sink a loop to a specified depth within a loop nest, using a series of loop interchanges.
*) Computes dependences between all loads and stores in the loop nest, and classifies each loop as parallel or sequential.
*) Computes loop interchange permutation required to sink sequential loops (and raise parallel loop nests) while preserving relative order among them.
*) Checks each dependence against the permutation to make sure that dependences would not be violated by the loop interchange transformation.
*) Calls loop interchange in LoopFusion pass on consumer loop nests before fusing in producers, sinking loops with loop carried dependences deeper into the consumer loop nest.
*) Adds and updates related unit tests.
PiperOrigin-RevId: 234158370
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass
Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.
PiperOrigin-RevId: 233502632