Flags some potential cases where splitting isn't happening and so could result
in confusing results. Also update some test files where there were near misses
in splitting that seemed unintentional.
Differential Revision: https://reviews.llvm.org/D109636
This revision reintroduces tensor.insert_slice verification which seems
to have vanished over time: a verifier was initially introduced in cf9503c1b7
but for some reason the invalid.mlir was not properly updated; as time passed the verifier was not called anymore and later the code was deleted.
As a consequence, a non-negligible portion of tests has run astray using invalid
tensor.insert_slice semantics and needed to be fixed.
Also, extract isRankReducedType from TensorOps for better reuse
Originally, this facility was used by both tensor and memref forms but
it got copied around as dialects were split.
Differential Revision: https://reviews.llvm.org/D114715
* Implement `FlatAffineConstraints::getConstantBound(EQ)`.
* Inject a simpler constraint for loops that have at most 1 iteration.
* Taking into account constant EQ bounds of FlatAffineConstraint dims/symbols during canonicalization of the resulting affine map in `canonicalizeMinMaxOp`.
Differential Revision: https://reviews.llvm.org/D114138
For the semi affine expressions, whenever rhs of a floordiv, ceildiv, mod
or product expression is a symbolic expression, we introduce a local variable
representing the result, and store the floordiv/ceildiv, mod or product
affine expression in LocalExprs. In this way the expression is flattened,
and trivial addition and subtraction related simplifications are performed.
Also rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Differential Revision: https://reviews.llvm.org/D112808
The current implementation invokes materializations
whenever an input operand does not have a mapping for the
desired type, i.e. it requires materialization at the earliest possible
point. This conflicts with goal of dialect conversion (and also the
current documentation) which states that a materialization is only
required if the materialization is supposed to persist after the
conversion process has finished.
This revision refactors this such that whenever a target
materialization "might" be necessary, we insert an
unrealized_conversion_cast to act as a temporary materialization.
This allows for deferring the invocation of the user
materialization hooks until the end of the conversion process,
where we actually have a better sense if it's actually
necessary. This has several benefits:
* In some cases a target materialization hook is no longer
necessary
When performing a full conversion, there are some situations
where a temporary materialization is necessary. Moving forward,
these users won't need to provide any target materializations,
as the temporary materializations do not require the user to
provide materialization hooks.
* getRemappedValue can now handle values that haven't been
converted yet
Before this commit, it wasn't well supported to get the remapped
value of a value that hadn't been converted yet (making it
difficult/impossible to convert multiple operations in many
situations). This commit updates getRemappedValue to properly
handle this case by inserting temporary materializations when
necessary.
Another code-health related benefit is that with this change we
can move a majority of the complexity related to materializations
to the end of the conversion process, instead of handling adhoc
while conversion is happening.
Differential Revision: https://reviews.llvm.org/D111620
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
Differential Revision: https://reviews.llvm.org/D108454
Generate an scf.for instead of an scf.if for the partial iteration. This is for consistency reasons: The peeling of linalg.tiled_loop also uses another loop for the partial iteration.
Note: Canonicalizations patterns may rewrite partial iterations to scf.if afterwards.
Differential Revision: https://reviews.llvm.org/D109568
Fold dim ops of scf.for results to dim ops of the respective iter args if the loop is shape preserving.
Differential Revision: https://reviews.llvm.org/D109430
The limitation on iter_args introduced with D108806 is too restricting. Changes of the runtime type should be allowed.
Extends the dim op canonicalization with a simple analysis to determine when it is safe to canonicalize.
Differential Revision: https://reviews.llvm.org/D109125
* Add `DimOfIterArgFolder`.
* Move existing cross-dialect canonicalization patterns to `LoopCanonicalization.cpp`.
* Rename `SCFAffineOpCanonicalization` pass to `SCFForLoopCanonicalization`.
* Expand documentaton of scf.for: The type of loop-carried variables may not change with iterations. (Not even the dynamic type.)
Differential Revision: https://reviews.llvm.org/D108806
* Add support for affine.max ops to SCF loop peeling pattern.
* Add support for affine.max ops to `AffineMinSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalizationPattern` to `AffineOpSCFCanonicalizationPattern`.
* Rename `AffineMinSCFCanonicalization` pass to `SCFAffineOpCanonicalization`.
Differential Revision: https://reviews.llvm.org/D108009
This canonicalization simplifies affine.min operations inside "for loop"-like operations (e.g., scf.for and scf.parallel) based on two invariants:
* iv >= lb
* iv < lb + step * ((ub - lb - 1) floorDiv step) + 1
This commit adds a new pass `canonicalize-scf-affine-min` (instead of being a canonicalization pattern) to avoid dependencies between the Affine dialect and the SCF dialect.
Differential Revision: https://reviews.llvm.org/D107731
If additional static type information can be deduced from a insert_slice's size operands, insert an explicit cast of the op's source operand.
This enables other canonicalization patterns that are matching for tensor_cast ops such as `ForOpTensorCastFolder` in SCF.
Differential Revision: https://reviews.llvm.org/D108617
Do not apply loop peeling to loops that are contained in the partial iteration of an already peeled loop. This is to avoid code explosion when dealing with large loop nests. Can be controlled with a new pass option `skip-partial`.
Differential Revision: https://reviews.llvm.org/D108542
Previously, ExecuteRegionOps with multiple return values would fail a round-trip test due to missing parenthesis around the types.
Differential Revision: https://reviews.llvm.org/D108402
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
Expand ParallelLoopTilingPass with an inbound_check mode.
In default mode, the upper bound of the inner loop is from the min op; in
inbound_check mode, the upper bound of the inner loop is the step of the outer
loop and an additional inbound check will be emitted inside of the inner loop.
This was 'FIXME' in the original codes and a typical usage is for GPU backends,
thus the outer loop and inner loop can be mapped to blocks/threads in seperate.
Differential Revision: https://reviews.llvm.org/D105455
This can be useful when one needs to know which unrolled iteration an Op belongs to, for example, conveying noalias information among memory-affecting ops in parallel-access loops.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D107789
Add ForLoopBoundSpecialization pass, which specializes scf.for loops into a "main loop" where `step` divides the iteration space evenly and into an scf.if that handles the last iteration.
This transformation is useful for vectorization and loop tiling. E.g., when vectorizing loads/stores, programs will spend most of their time in the main loop, in which only unmasked loads/stores are used. Only the in the last iteration (scf.if), slower masked loads/stores are used.
Subsequent commits will apply this transformation in the SparseDialect and in Linalg's loop tiling.
Differential Revision: https://reviews.llvm.org/D105804
Historically the builtin dialect has had an empty namespace. This has unfortunately created a very awkward situation, where many utilities either have to special case the empty namespace, or just don't work at all right now. This revision adds a namespace to the builtin dialect, and starts to cleanup some of the utilities to no longer handle empty namespaces. For now, the assembly form of builtin operations does not require the `builtin.` prefix. (This should likely be re-evaluated though)
Differential Revision: https://reviews.llvm.org/D105149
This CL adds a new RegionBranchTerminatorOpInterface to query information about operands that can be
passed to successor regions. Similar to the BranchOpInterface, it allows to freely define the
involved operands. However, in contrast to the BranchOpInterface, it expects an additional region
number to distinguish between various use cases which might require different operands passed to
different regions.
Moreover, we added new utility functions (namely getMutableRegionBranchSuccessorOperands and
getRegionBranchSuccessorOperands) to query (mutable) operand ranges for operations equiped with the
ReturnLike trait and/or implementing the newly added interface. This simplifies reasoning about
terminators in the scope of the nested regions.
We also adjusted the SCF.ConditionOp to benefit from the newly added capabilities.
Differential Revision: https://reviews.llvm.org/D105018
This is the first step to support software pipeline for scf.for loops.
This is only the transformation to create pipelined kernel and
prologue/epilogue.
The scheduling needs to be given by user as many different algorithm
and heuristic could be applied.
This currently doesn't handle loop arguments, this will be added in a
follow up patch.
Differential Revision: https://reviews.llvm.org/D105868
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
The executeregionop is used to allow multiple blocks within SCF constructs. If the container allows multiple blocks, inline the region
Differential Revision: https://reviews.llvm.org/D104960
In cases where arithmetic (addi/muli) ops are performed on an scf.for loops induction variable with a single use, we can fold those ops directly into the scf.for loop.
For example, in the following code:
```
scf.for %i = %c0 to %arg1 step %c1 {
%0 = addi %arg2, %i : index
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
we can lift `%0` up into the scf.for loop range, as it is the only user of %i:
```
%lb = addi %arg2, %c0 : index
%ub = addi %arg2, %i : index
scf.for %i = %lb to %ub step %c1 {
%1 = muli %0, %c4 : index
%2 = memref.load %arg0[%1] : memref<?xi32>
%3 = muli %2, %2 : i32
memref.store %3, %arg0[%1] : memref<?xi32>
}
```
Reviewed By: mehdi_amini, ftynse, Anthony
Differential Revision: https://reviews.llvm.org/D104289
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename SubTensorOp -> tensor.extract_slice, SubTensorInsertOp -> tensor.insert_slice.
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Note: This is a fixed version of https://reviews.llvm.org/D104499, which was reverted due to a missing update to two CMakeFile.txt.
Differential Revision: https://reviews.llvm.org/D104676
The main goal of this commit is to remove the dependency of Standard dialect on the Tensor dialect.
* Rename ops: SubTensorOp --> ExtractTensorOp, SubTensorInsertOp --> InsertTensorOp
* Some helper functions are (already) duplicated between the Tensor dialect and the MemRef dialect. To keep this commit smaller, this will be cleaned up in a separate commit.
* Additional dialect dependencies: Shape --> Tensor, Tensor --> Standard
* Remove dialect dependencies: Standard --> Tensor
* Move canonicalization test cases to correct dialect (Tensor/MemRef).
Differential Revision: https://reviews.llvm.org/D104499
Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.
Differential Revision: https://reviews.llvm.org/D75837
This provides a sizable compile time improvement by seeding
the worklist in an order that leads to less iterations of the
worklist.
This patch only changes the behavior of the Canonicalize pass
itself, it does not affect other passes that use the
GreedyPatternRewrite driver
Differential Revision: https://reviews.llvm.org/D103053
Add two canoncalizations for scf.if.
1) A canonicalization that allows users of a condition within an if to assume the condition
is true if in the true region, etc.
2) A canonicalization that removes yielded statements that are equivalent to the condition
or its negation
Differential Revision: https://reviews.llvm.org/D101012
* Do we need a threshold on maximum number of Yeild arguments processed (maximum number of SelectOp's to be generated)?
* Had to modify some old IfOp tests to not get optimized by this pattern
Differential Revision: https://reviews.llvm.org/D98592