1) We incorrectly reassociated non-reassociative operations like subi, causing
miscompilations.
2) When constant folding, we didn't add users of the new constant back to the
worklist for reprocessing, causing us to miss some cases (pointed out by
Uday).
The code for tensorflow/mlir#2 is gross, but I'll add the new APIs in a followup patch.
PiperOrigin-RevId: 218803984
- Introduce Fourier-Motzkin variable elimination to eliminate a dimension from
a system of linear equalities/inequalities. Update isEmpty to use this.
Since FM is only exact on rational/real spaces, an emptiness check based on
this is guaranteed to be exact whenever it says the underlying set is empty;
if it says, it's not empty, there may still be no integer points in it.
Also, supports a version that computes "dark shadows".
- Test this by checking for "always false" conditionals in if statements.
- Unique IntegerSet's that are small (few constraints, few variables). This
basically means the canonical empty set and other small sets that are
likely commonly used get uniqued; allows checking for the canonical empty set
by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size
for uniqui'ing.
- rename simplify-affine-expr -> simplify-affine-structures
Other cleanup
- IntegerSet::numConstraints, AffineMap::numResults are no longer needed;
remove them.
- add copy assignment operators for AffineMap, IntegerSet.
- rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet
- Misc cleanup for FlatAffineConstraints API
PiperOrigin-RevId: 218690456
- Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system.
- Adds GCD test check if equality constraints have no solution.
- Adds unit test cases.
PiperOrigin-RevId: 218546319
"shape_cast" only applies to tensors, and there are other operations that
actually affect shape, for example "reshape". Rename "shape_cast" to
"tensor_cast" in both the code and the documentation.
PiperOrigin-RevId: 218528122
is a straight-forward change, but required adding missing moveBefore() methods
on operations (requiring moving some traits around to make C++ happy). This
also fixes a constness issue with the getBlock/getFunction() methods on
Instruction, and adds a missing getFunction() method on MLFuncBuilder.
PiperOrigin-RevId: 218523905
For some of the constant vector / tesor, if the compiler doesn't need to
interpret their elements content, they can be stored in this class to save the
serialize / deserialize cost.
syntax:
`opaque<` tensor-type `,` opaque-string `>`
opaque-string ::= `0x` [0-9a-fA-F]*
PiperOrigin-RevId: 218399426
- Add a few canonicalization patterns to fold memref_cast into
load/store/dealloc.
- Canonicalize alloc(constant) into an alloc with a constant shape followed by
a cast.
- Add a new PatternRewriter::updatedRootInPlace API to make this more convenient.
SimplifyAllocConst and the testcase is heavily based on Uday's implementation work, just
in a different framework.
PiperOrigin-RevId: 218361237
This was left as a TODO in the code. Move the type verification from
MLFuncVerifier::verifyReturn to ReturnOp::verify. Since the return operation
can only appear as the last statement of an MLFunction, i.e. where the
surrounding block is the function itself, it is easy to access the function
descriptor (ReturnOp::verify already relies on this). From the function
descriptor, one can easily access the type information. Note that this
slightly modifies the error message due to the use of emitOpError instead of a
plain emitError.
Drop the obsolete TODO comment in MLFunction::verify about checking that
"return" only appears as the last operation of an MLFunction since
ReturnOp::verify explicitly checks for that.
PiperOrigin-RevId: 218347843
This was left as a TODO in the code. Note that the spec does not explicitly
prohibit the first basic block from having a predecessor, and may be worth
updating.
The error is reported at the location of the cfgfunc to which the basic block
belongs since the location information of the block label is not propagated
beyond the IR parser. Arguably, pointing to a function that starts with an
ill-formed block is better than pointing to the first operation in that block
as it makes easier to follow the code down until the first block label.
PiperOrigin-RevId: 218343654
The SparseElementsAttr uses (COO) Coordinate List encoding to represents a
sparse tensor / vector. Specifically, the coordinates and values are stored as
two dense elements attributes. The first dense elements attribute is a 2-D
attribute with shape [N, ndims], which contains the indices of the elements
with nonzero values in the constant vector/tensor. The second elements
attribute is a 1-D attribute list with shape [N], which supplies the values for
each element in the first elements attribute. ndims is the rank of the
vector/tensor and N is the total nonzero elements.
The syntax is:
`sparse<` (tensor-type | vector-type)`, ` indices-attribute-list, values-attribute-list `>`
Example: a sparse tensor
sparse<vector<3x4xi32>, [[0, 0], [1, 2]], [1, 2]> represents the dense tensor
[[1, 0, 0, 0]
[0, 0, 2, 0]
[0, 0, 0, 0]]
PiperOrigin-RevId: 217764319
The syntax of dense vecor/tensor attribute value is
`dense<` (tensor-type | vector-type)`,` attribute-list`>`
and
attribute-list ::= `[` attribute-list (`, ` attribute-list)* `]`.
The construction of the dense vector/tensor attribute takes a vector/tensor
type and a character array as arguments. The size of the input array should be
larger than the size specified by the type argument. It also assumes the
elements of the vector or tensor have been trunked to the data type sizes in
the input character array, so it extends the trunked data to 64 bits when it is
retrieved.
PiperOrigin-RevId: 217762811
multiple TODOs.
- replace the fake test pass (that worked on just the first loop in the
MLFunction) to perform DMA pipelining on all suitable loops.
- nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops)
- fix bugs / assumptions: correctly copy memory space and elemental type of source
memref for double buffering.
- correctly identify matching start/finish statements, handle multiple DMAs per
loop.
- introduce dominates/properlyDominates utitilies for MLFunction statements.
- move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it
getShiftValidity
- refactor getContainingStmtPos -> findAncestorStmtInBlock - move into
Analysis/Utils.h; has two users.
- other improvements / cleanup for related API/utilities
- add size argument to dma_wait - for nested DMAs or in general, it makes it
easy to obtain the size to use when lowering the dma_wait since we wouldn't
want to identify the matching dma_start, and more importantly, in general/in the
future, there may not always be a dma_start dominating the dma_wait.
- add debug information in the pass
PiperOrigin-RevId: 217734892
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.
The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.
The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.
In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.
One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.
This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).
PiperOrigin-RevId: 217622889
Updates ComposeAffineMaps test pass to use this method.
Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass.
PiperOrigin-RevId: 217163351
Associate BasicBlocks with the function being parsed to avoid leaks in the case of parse failures. Associating with the function means that we can no longer determine if defined/fwd declared simply by considering if a BasicBlock has an associated function, so track forward declared block references explicitly (this should also allow flagging multiple undeclared fwd references). Split out getting the named block from defining it, in the case of definition move the block to the end of the function.
Also destroy all forward reference placeholders in FunctionParser.
Return parse failure in parseAttributeDict if there is no left brace instead of
asserting.
PiperOrigin-RevId: 217049507
- add util to create a private / exclusive / single use affine
computation slice for an op stmt (see method doc comment); a single
multi-result affine_apply op is prepended to the op stmt to provide all
results needed for its operands as a function of loop iterators and symbols.
- use it for DMA pipelining (to create private slices for DMA start stmt's);
resolve TODOs/feature request (b/117159533)
- move createComposedAffineApplyOp to Transforms/Utils; free it from taking a
memref as input / generalize it.
PiperOrigin-RevId: 216926818
out canonicalization pass to drive it, and a simple (x-x) === 0 pattern match
as a test case.
There is a tremendous number of improvements that need to land, and the
matcher/rewriter and patterns will be split out of this file, but this is a
starting point.
PiperOrigin-RevId: 216788604
This attribute represents a reference to a splat vector or tensor, where all
the elements have the same value. The syntax of the attribute is:
`splat<` (tensor-type | vector-type)`,` attribute-value `>`
PiperOrigin-RevId: 216537997
Add target independent standard DMA ops: dma.start, dma.wait. Update pipeline
data transfer to use these to detect DMA ops.
While on this
- return failure from mlir-opt::performActions if a pass generates invalid output
- improve error message for verify 'n' operand traits
PiperOrigin-RevId: 216429885
1) affineint (as it is named) is not a type suitable for general computation (e.g. the multiply/adds in an integer matmul). It has undefined width and is undefined on overflow. They are used as the indices for forstmt because they are intended to be used as indexes inside the loop.
2) It can be used in both cfg and ml functions, and in cfg functions. As you mention, “symbols” are not affine, and we use affineint values for symbols.
3) Integers aren’t affine, the algorithms applied to them can be. :)
4) The only suitable use for affineint in MLIR is for indexes and dimension sizes (i.e. the bounds of those indexes).
PiperOrigin-RevId: 216057974
- Fold the lower/upper bound of a loop to a constant whenever the result of the
application of the bound's affine map on the operand list yields a constant.
- Update/complete 'for' stmt's API to set lower/upper bounds with operands.
Resolve TODOs for ForStmt::set{Lower,Upper}Bound.
- Moved AffineExprConstantFolder into AffineMap.cpp and added
AffineMap::constantFold to be used by both AffineApplyOp and
ForStmt::constantFoldBound.
PiperOrigin-RevId: 215997346
with a new one (of a potentially different rank/shape) with an optional index
remapping.
- introduce Utils::replaceAllMemRefUsesWith
- use this for DMA double buffering
(This CL also adds a few temporary utilities / code that will be done away with
once:
1) abstract DMA op's are added
2) memref deferencing side-effect / trait is available on op's
3) b/117159533 is resolved (memref index computation slices).
PiperOrigin-RevId: 215831373
- introduce mlir::{floorDiv, ceilDiv, mod} for constant inputs in
mlir/Support/MathExtras.h
- consistently use these everywhere in IR, Analysis, and Transforms.
PiperOrigin-RevId: 215580677
mode. We even diagnose mistakes nicely (aside from the a/an vowel confusion
which isn't worth worrying about):
test/IR/invalid.mlir split at line tensorflow/mlir#399:8:34: error: 'note' diagnostic emitted when expecting a 'error'
%x = "bar"() : () -> i32 // expected-error {{operand defined here}}
^
PiperOrigin-RevId: 214773208
This CL retricts shorthand notation printing to only the bounds that can
be roundtripped unambiguously; i.e.:
1. ()[]->(%some_cst) ()[]
2. ()[s0]->(s0) ()[%some_symbol]
Upon inspection it turns out that the constant case was lossy so this CL also
updates it.
Note however that fixing this issue exhibits a potential issues in unroll.mlir.
L488 exhibits a map ()[s0] -> (1)()[%arg0] which could be simplified down to
()[]->(1)()[].
This does not seem like a bug but maybe an undesired complexity in the maps
generated by unrolling.
bondhugula@, care to take a look?
PiperOrigin-RevId: 214531410
This CL adds support for `mulf` which is necessary to write/emit a simple scalar
matmul in MLIR. This CL does not consider automation of generation of ops but
mulf is important and useful enough to be added on its own atm.
PiperOrigin-RevId: 214496098
The AsmPrinter wrongly assumes that all single ssa-id AffineMap
are the identity map for the purpose of printing.
This CL adds the missing level of indirection as well as a test.
This bug was originally shaken off by the experimental TC->MLIR path.
Before this CL, the test would print:
```
mlfunc @mlfuncsimplemap(%arg0 : affineint, %arg1 : affineint, %arg2 : affineint) {
for %i0 = 0 to %arg0 {
for %i1 = 0 to %i0 {
~~~ should be %arg1
%c42_i32 = constant 42 : i32
}
}
return
}
```
PiperOrigin-RevId: 214120817
verifier. We get most of this infrastructure directly from LLVM, we just
need to adapt it to our CFG abstraction.
This has a few unrelated changes engangled in it:
- getFunction() in various classes was const incorrect, fix it.
- This moves Verifier.cpp to the analysis library, since Verifier depends on
dominance and these are both really analyses.
- IndexedAccessorIterator::reference was defined wrong, leading to really
exciting template errors that were fun to diagnose.
- This flips the boolean sense of the foldOperation() function in constant
folding pass in response to previous patch feedback.
PiperOrigin-RevId: 214046593
optimization pass:
- Give the ability for operations to implement a constantFold hook (a simple
one for single-result ops as well as general support for multi-result ops).
- Implement folding support for constant and addf.
- Implement support in AbstractOperation and Operation to make this usable by
clients.
- Implement a very simple constant folding pass that does top down folding on
CFG and ML functions, with a testcase that exercises all the above stuff.
Random cleanups:
- Improve the build APIs for ConstantOp.
- Stop passing "-o -" to mlir-opt in the testsuite, since that is the default.
PiperOrigin-RevId: 213749809