This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types.
To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator.
* TypeUniquer is now responsible for all construction and uniquing of types.
* TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext.
This cl also standardizes what a derived type storage class needs to provide:
- Define a type alias, KeyTy, to a type that uniquely identifies the
instance of the type within its kind.
* The key type must be constructible from the values passed into the
detail::TypeUniquer::get call after the type kind.
* The key type must have a llvm::DenseMapInfo specialization for
hashing.
- Provide a method, 'KeyTy getKey() const', to construct the key type
from an existing storage instance.
- Provide a construction method:
'DerivedStorage *construct(TypeStorageAllocator &, ...)'
that builds a unique instance of the derived storage. The arguments
after the TypeStorageAllocator must correspond with the values passed
into the detail::TypeUniquer::get call after the type kind.
PiperOrigin-RevId: 226507184
* Extend to handle rewrite patterns with output attributes;
- Constant attributes are defined with a value and a type;
- The type of the value is mapped to the corresponding attribute type (string -> StringAttr);
* Verifies the type of operands in the resultant matches the defined op's operands;
PiperOrigin-RevId: 226468908
Change operands to arguments in Op and use it for both operands and arguments. This unifies the way that operands and attributes are specified and the intended way that matching/creating ops with attributes will look. Both can now be represented using the same dag structure (and also makes the ordering more explicit). Derived attributes are not considered as part of the arguments (as they are inferred from the created op, not something needed to created it).
* Generate named operand accessors;
* Simplified the way of specifying Attr and use ElementAttr for TFL_Const instead.
* Fix a incorrect assertion generated;
The input parsing can be made more robust, I'll address that in a follow up.
PiperOrigin-RevId: 226307424
reuse existing ones.
- drop IterationDomainContext, redundant since FlatAffineConstraints has
MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)
PiperOrigin-RevId: 226082008
- when adding constraints from a 'for' stmt into FlatAffineConstraints,
correctly add bound operands of the 'for' stmt as a dimensional identifier or
a symbolic identifier depending on whether the bound operand is a valid
MLFunction symbol
- update test case to exercise this.
PiperOrigin-RevId: 225988511
Existing implementation always uses 64 bits to store floating point values in
DenseElementsAttr. This was due to FloatAttrs always a `double` for storage
independently of the actual type. Recent commits added support for FloatAttrs
with the proper f32 type and floating semantics and changed the bitwidth
reporting on FloatType.
Use the existing infrastructure for densely storing 16 and 32-bit values in
DenseElementsAttr storage to store f16 and f32 values. Move floating semantics
definition to the FloatType level. Properly support f16 / IEEEhalf semantics
at the FloatAttr level and in the builder.
Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics
because APFloat does not have first-class support for bf16 types.
PiperOrigin-RevId: 225981289
addDomainConstraints; add support for mod/div for dependence testing.
- add support for mod/div expressions in dependence analysis
- refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead
of getFlattenedAffineExpr); update addDomainConstraints.
- rename AffineExprFlattener::cst -> localVarCst
PiperOrigin-RevId: 225933306
This introduces a generic lowering pass for ML functions. The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.
While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality. In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder. When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.
Use LowerVectorTransfers as a testbed for the generic pass.
PiperOrigin-RevId: 225887424
Introduce support for lowering vector_type_cast to LLVM IR. It consists in
creating a new MemRef descriptor with the base pointer with the type that
corresponds to the lowered element type of the target memref. Since
`vector_type_cast` does not support dynamic shapes in the target type, no
dynamic size conversion is necessary.
This commit goes in the opposite direction of what is expected of LLVM IR
lowering: it should not be aware of all the other dialects. Instead, we should
have separate definitions for conversions in a global lowering framework.
However, this requires LLVM dialect to be implemented, which is currently
blocked by the absence of user-defined types. Implement the lowering anyway to
unblock end-to-end vectorization experiments.
PiperOrigin-RevId: 225887368
This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now. For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR. Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future. Register vector_type_cast operation as a part of the SuperVector
dialect.
Arguably, this operation is a special case of the `view` operation from the
Standard dialect. The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation. Additionally, using a custom
operation may help to achieve clear dialect separation.
PiperOrigin-RevId: 225887305
As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does
not make sense for all of them. Even with the current type system, the bit
width is not defined (and causes the method in question to abort) for all
TensorFlow types.
This commit restricts the bit width definition to primitive standard types that
have a number of bits appearing verbatim in their type, i.e., integers and
floats. As a side effect, it delegates the decision on the bit width of the
`index` to the backends. Existing backends currently hardcode it to 64 bits.
The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that
only applies to integers and floats. The call sites are updated to use the new
method, where applicable, or rewritten so as not rely on it. Incidentally,
this fixes a utility method that did not account for memrefs being allowed to
have vectors as element types in the size computation.
As an observation, several places in the code use Type in places where a more
specific type could be used instead. Some of those are fixed by this commit.
PiperOrigin-RevId: 225844792
provide unroll factors, and a cmd line argument to specify number of
innermost loop unroll repetitions.
- add function callback parameter for outside targets to provide unroll factors
- add a cmd line parameter to repeatedly apply innermost loop unroll a certain
number of times (to avoid using -loop-unroll -loop-unroll ...; instead
-unroll-num-reps=2).
- implement the callback for a target
- update test cases / usage
PiperOrigin-RevId: 225843191
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.
PiperOrigin-RevId: 225842944
Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type:
* FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches;
* FloatAttr created using double has the APFloat created to match the semantics of the type;
Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute.
PiperOrigin-RevId: 225821834
Renamed the name field in Op to opName since it is the opcode's name.
Renamed the name parameters in TFLite op templates to opSummary since
they are meant as a summary of the op's functionality.
We will use the name symbol later for the name given by users via TF.
PiperOrigin-RevId: 225807135
- use addBoundsForForStmt
- getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the
returned things aren't owned / part of the stmt on which it's being called.
- other minor API cleanup
PiperOrigin-RevId: 225774301
- extend memref-bound-check to store op's
- make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that
one doesn't need to always create a pass to use it)
PiperOrigin-RevId: 225564830
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction. In particular, they
are lowered to the StandardOps dialect before further processing. As such, it
does not make sense to keep them at the same level as StandardOps. Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.
PiperOrigin-RevId: 225554492
Named operands allow generating builders with more meaningful names + lay the groundwork for allowing the specification of attributes as part of the inputs pattern of an op (which allows the declarative pattern rewrite generator to define ops with attributs). This is a minimal change that just changes how input operands are represented, changes to attributes in follow up and returnTypes later.
PiperOrigin-RevId: 225509805
- if a local id was already for a specific mod/div expression, just reuse it if
the expression repeats (instead of adding a new one).
- drastically reduces the number of local variables added during flattening for
real use cases - since the same div's and mod expressions often repeat.
- add getFlattenedAffineExprs for AffineMap, IntegerSet based on the above
As a natural result of the above:
- FlatAffineConstraints(IntegerSet) ctor now deals with integer sets that have mod
and div constraints as well, and these get simplified as well from -simplify-affine-structures
PiperOrigin-RevId: 225452174
- Define tf.FakeQuantWithMinMaxArgs and tf.FakeQuantWithMinMaxVars
- Add the unit tests for valid and invalid IRs
- Rewrite both to the tfl.FakeQuant op
- Add the unit tests for the rewriting
PiperOrigin-RevId: 225447109
Besides the ops.td file changes to define both ops, this CL also changes the
mlir-op-gen to allow more flexible traits definition for "optional" operation
inputs.
Unit tests are added.
One TODO for the mlir-op-gen is to make attribute optional in the ops.
PiperOrigin-RevId: 225408349
trivially redundant constraints. Update projectOut to eliminate identifiers in
a more efficient order. Fix b/120801118.
- add method to remove duplicate / trivially redundant constraints from
FlatAffineConstraints (use a hashing-based approach with DenseSet)
- update projectOut to eliminate identifiers in a more efficient order
(A sequence of affine_apply's like this (from a real use case) finally exposed
the lack of the above trivial/low hanging simplifications).
for %ii = 0 to 64 {
for %jj = 0 to 9 {
%a0 = affine_apply (d0, d1) -> (d0 * (9 * 1024) + d1 * 128) (%ii, %jj)
%a1 = affine_apply (d0) ->
(d0 floordiv (2 * 3 * 3 * 128 * 128),
(d0 mod 294912) floordiv (3 * 3 * 128 * 128),
(((d0 mod 294912) mod 147456) floordiv 1152) floordiv 8,
(((d0 mod 294912) mod 147456) mod 1152) floordiv 384,
((((d0 mod 294912) mod 147456) mod 1152) mod 384) floordiv 128,
(((((d0 mod 294912) mod 147456) mod 1152) mod 384) mod 128)
floordiv 128) (%a0)
%v0 = load %in[%a1tensorflow/mlir#0, %a1tensorflow/mlir#1, %a1tensorflow/mlir#3, %a1tensorflow/mlir#4, %a1tensorflow/mlir#2, %a1tensorflow/mlir#5]
: memref<2x2x3x3x16x1xi32>
}
}
- update FlatAffineConstraints::print to print number of constraints.
PiperOrigin-RevId: 225397480
- These test cases had to be updated post the switch to exclusive upper bound;
however, the test cases hadn't originally been written to check correctly; as
a result, they didn't fail and weren't updated. Update test case and fix
upper bound.
PiperOrigin-RevId: 225194016
Introduce initial support for 1D vector operations. LLVM does not support
higher-dimensional vectors so the caller must make sure they don't appear in
the input MLIR. Handle the presence of higher-dimensional vectors by failing
gracefully.
Introduce the type conversion for 1D vector types and hook it up with the rest
of the type convresion system. Support "splat" constants for vector types. As
a side effect, this refactors constant operation emission by separating out
scalar integer constants into a separate case and by extracting out the helper
function for scalar float construction. Existing binary operations apply to
vectors transparently.
PiperOrigin-RevId: 225172349
Originally, loop steps were implemented using `addi` and `constant` operations
because `affine_apply` was not handled in the first implementation. The
support for `affine_apply` has been added, use it to implement the update of
the loop induction variable. This is more consistent with the lower and upper
bounds of the loop that are also implemented as `affine_apply`, removes the
dependence of the converted function on the StandardOps dialect and makes it
clear from the CFG function that all operations on the loop induction variable
are purely affine.
PiperOrigin-RevId: 225165337
* Start very basic (about as basic as possible) with the pattern rewrite generation by only
- Matching single node dags,
- Single output, single result,
- No constraints on inputs/outputs.
- No attributes (only operands)
* The matcher generates C++ code akin to what is currently manually written.
- This is very much not the final end state, and only intended for the short term;
* Always generate the default builder method to make it easier to generate calls;
- Also add additional builder method for TFL::Add as attributes are not yet supported;
* Replace TF Add -> TFL Add matching using this generation;
* Introduce a conceptual textual namespace in the op registry
- Will allow importing multiple dialect's op registry
- Avoids needing to do anything special with tablegen or define a custom DSL;
= I really want to do a custom DSL but this urge could just be as its fun :) So defer for now. From this structure we can dump out another structured form if needed;
- Add a mapping from <namespace>_<op> in the op_gen and pattern rewrite gen
= This allows placing ops in different namespaces from the same op registry which is convenient, esp. if we want to consider subnamespaces in future;
* Update tfl namespace to TFL to match TF and XLA;
PiperOrigin-RevId: 225155164
- getDimensionBounds() was added initially for quick experimentation - no
longer used (getConstantBoundOnDimSize is the more powerful/complete
replacement).
- FlatAffineConstraints::getConstantLower/UpperBound are incomplete,
functionality/naming-wise misleading, and not used currently. Removing these;
complete/fixed version will be added in an upcoming CL.
PiperOrigin-RevId: 225075061
For each op, generate another builder with the following signature:
static void build(Builder* builder, OperationState* result,
ArrayRef<Type> resultTypes,
ArrayRef<SSAValue*> args,
ArrayRef<NamedAttribute> attributes);
PiperOrigin-RevId: 225066007
An extensive discussion demonstrated that it is difficult to support `index`
types as elements of compound (vector, memref, tensor) types. In particular,
their size is unknown until the target-specific lowering takes place. MLIR may
need to store constants of the fixed-shape compound types (e.g.,
vector<4 x index>) internally and must know the size of the element type and
data layout constraints. The same information is necessary for target-specific
lowering and translation to reliably support compound types with `index`
elements, but MLIR does not have a dedicated target description mechanism yet.
The uses cases for compound types with `index` elements, should they appear,
can be handled via an `index_cast` operation that converts between `index` and
fixed-size integer types at the SSA value level instead of the type level.
PiperOrigin-RevId: 225064373
- loop step wasn't handled and there wasn't a TODO or an assertion; fix this.
- rename 'delay' to shift for consistency/readability.
- other readability changes.
- remove duplicate attribute print for DmaStartOp; fix misplaced attribute
print for DmaWaitOp
- add build method for AddFOp (unrelated to this CL, but add it anyway)
PiperOrigin-RevId: 224892958
- adding a conservative check for now (TODO: use the dependence analysis pass
once the latter is extended to deal with DMA ops). resolve an existing bug on
a test case.
- update test cases
PiperOrigin-RevId: 224869526
- add method normalizeConstraintsByGCD
- call normalizeConstraintsByGCD() and GCDTightenInequalities() at the end of
projectOut.
- remove call to GCDTightenInequalities() from getMemRefRegion
- change isEmpty() to check isEmptyByGCDTest() / hasInvalidConstraint() each
time an identifier is eliminated (to detect emptiness early).
- make FourierMotzkinEliminate, gaussianEliminateId(s),
GCDTightenInequalities() private
- improve / update stale comments
PiperOrigin-RevId: 224866741
- fix replaceAllMemRefUsesWith call to replace only inside loop body.
- handle the case where DMA buffers are dynamic; extend doubleBuffer() method
to handle dynamically shaped DMA buffers (pass the right operands to AllocOp)
- place alloc's for DMA buffers at the depth at which pipelining is being done
(instead of at top-level)
- add more test cases
PiperOrigin-RevId: 224852231
This was missing from the original commit. The implementation of
createLowerAffineApply was defined in the default namespace but declared in the
`mlir` namespace, which could lead to linking errors when it was used. Put the
definition in `mlir` namespace.
PiperOrigin-RevId: 224830894
are a max/min of several expressions.
- Extend loop tiling to handle non-constant loop bounds and bounds that
are a max/min of several expressions, i.e., bounds using multi-result affine
maps
- also fix b/120630124 as a result (the IR was in an invalid state when tiled
loop generation failed; SSA uses were created that weren't plugged into the IR).
PiperOrigin-RevId: 224604460
- generate DMAs correctly now using strided DMAs where needed
- add support for multi-level/nested strides; op still supports one level of
stride for now.
Other things
- add test case for symbolic lower/upper bound; cases where the DMA buffer
size can't be bounded by a known constant
- add test case for dynamic shapes where the DMA buffers are however bounded by
constants
- refactor some of the '-dma-generate' code
PiperOrigin-RevId: 224584529
This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp
to a simple loop nest via local buffer allocations.
This is an MLIR->MLIR lowering based on builders.
A few TODOs are left to address in particular:
1. invert the permutation map so the accesses to the remote memref are coalesced;
2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory);
3. support broadcast / avoid copies when permutation_map is not of full column rank
4. add a proper "element_cast" op
One notable limitation is this does not plan on supporting boundary conditions.
It should be significantly easier to use pre-baked MLIR functions to handle such paddings.
This is left for future consideration.
Therefore the current CL only works properly for full-tile cases atm.
This CL also adds 2 simple tests:
```mlir
for %i0 = 0 to %M step 3 {
for %i1 = 0 to %N step 4 {
for %i2 = 0 to %O {
for %i3 = 0 to %P step 5 {
vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index
```
lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
for %i1 = 0 to %arg1 step 4 {
for %i2 = 0 to %arg2 {
for %i3 = 0 to %arg3 step 5 {
%1 = alloc() : memref<5x4x3xf32>
%2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>>
for %i4 = 0 to 5 {
%3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
for %i5 = 0 to 4 {
%4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5)
for %i6 = 0 to 3 {
%5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
%6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32>
store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32>
dealloc %1 : memref<5x4x3xf32>
```
and
```mlir
for %i0 = 0 to %M step 3 {
for %i1 = 0 to %N {
for %i2 = 0 to %O {
for %i3 = 0 to %P step 5 {
%f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32>
```
lowers into:
```mlir
for %i0 = 0 to %arg0 step 3 {
for %i1 = 0 to %arg1 {
for %i2 = 0 to %arg2 {
for %i3 = 0 to %arg3 step 5 {
%1 = alloc() : memref<5x4x3xf32>
%2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>>
for %i4 = 0 to 5 {
%3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4)
for %i5 = 0 to 4 {
for %i6 = 0 to 3 {
%4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6)
%5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32>
store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32>
%6 = load %2[%c0] : memref<1xvector<5x4x3xf32>>
dealloc %1 : memref<5x4x3xf32>
```
PiperOrigin-RevId: 224552717
This CL adds a finer grain composition function between AffineExpr and an
unbounded map. This will be used in the next CL.
Also cleans up some comments remaining from a previous CL.
PiperOrigin-RevId: 224536314