Existing implementation of isContiguousAccess asserts that one of the
function arguments is within certain range, depending on another parameter.
However, the value of this argument may come from outside, in particular in the
loop vectorization pass it may come from command line arguments. This leads
to 'mlir-opt' crashing on an assertion depending on flags. Handle the error
gracefully by reporting error returning a negative result instead. This
negative result prevents any further transformation by the vectorizer so the IR
remains valid.
PiperOrigin-RevId: 227029496
Move PrintOpStatsPass out of tools and to other passes (moved to Analysis as it
doesn't modify the program but it is different than the other analysis passes
as it is only consumer at present is the user).
PiperOrigin-RevId: 227018996
making it more similar to the CFG side of things. It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.
This is step 9/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226983931
For performance/memory saving purpose, having the Instruction holding a
std::vector for the operands isn't a really good tradeoff. The only reason for
this was to support adding/removing easily BasicBlock arguments to Terminator.
Since this isn't the most common operation, we instead force a pre-allocated
list of operands on Instructions at creation time.
PiperOrigin-RevId: 226981227
graph specializations for doing CFG traversals of ML Functions, making the two
sorts of functions have the same capabilities.
This is step 8/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226968502
BlockArgument arguments of the entry block instead. This makes MLFunctions and
CFGFunctions work more similarly.
This is step 7/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226966975
MLFunction, IfStmt, ForStmt even though they currently only contain exactly one
block in that list.
This is step 6/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226960278
The NameLoc can be used to represent a variable, node or method. The
CallSiteLoc has two fields, one represents the concrete location and another
one represents the caller's location. Multiple CallSiteLocs can be chained as
a call stack.
For example, the following call stack
```
AAA
at file1:1
at file2:135
at file3:34
```
can be formed by call0:
```
auto name = NameLoc::get("AAA");
auto file1 = FileLineColLoc::get("file1", 1);
auto file2 = FileLineColLoc::get("file2", 135);
auto file3 = FileLineColLoc::get("file3", 34);
auto call2 = CallSiteLoc::get(file2, file3);
auto call1 = CallSiteLoc::get(file1, call2);
auto call0 = CallSiteLoc::get(name, call1);
```
PiperOrigin-RevId: 226941797
Supervectorization uses null pointers to SSA values as a means of communicating
the failure to vectorize. In operation vectorization, all operations producing
the values of operation arguments must be vectorized for the given operation to
be vectorized. The existing check verified if any of the value "def"
statements was vectorized instead, sometimes leading to assertions inside `isa`
called on a null pointer. Fix this to check that all "def" statements were
vectorized.
PiperOrigin-RevId: 226941552
The binary subtraction operations were not supported by the lowering because
they were not essential for the testing flow. Add support for these
operations.
PiperOrigin-RevId: 226941463
from it. This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future. This also removes IfClause and ForStmtBody.
This is step 5/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226936541
for SSA values in terminators, but easily worked around. At the same time,
move the StmtOperand list in a OperationStmt to the end of its trailing
objects list so we can *reduce* the number of operands, without affecting
offsets to the other stuff in the allocation.
This is important because we want OperationStmts to be consequtive, including
their operands - we don't want to use an std::vector of operands like
Instructions have.
This is patch 4/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226865727
clients to use OperationState instead. This makes MLFuncBuilder more similiar
to CFGFuncBuilder. This whole area will get tidied up more when cfg and ml
worlds get unified. This patch is just gardening, NFC.
PiperOrigin-RevId: 226701959
optional successor operands when they are terminator operations.
This isn't used yet, but is part 2/n towards merging BasicBlock into StmtBlock
and Instruction into OperationStmt.
PiperOrigin-RevId: 226684636
StmtBlock. This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.
This is step 1/N towards merging BasicBlock and StmtBlock. This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.
This is a mechanical patch, NFC.
PiperOrigin-RevId: 226684158
This allows for us to decouple type uniquing/construction from MLIRContext and pave the way for dialect specific types.
To accomplish this we two new classes, TypeUniquer and TypeStorageAllocator.
* TypeUniquer is now responsible for all construction and uniquing of types.
* TypeStorageAllocator is a utility used by derived type storage objects to allocate memory within an MLIRContext.
This cl also standardizes what a derived type storage class needs to provide:
- Define a type alias, KeyTy, to a type that uniquely identifies the
instance of the type within its kind.
* The key type must be constructible from the values passed into the
detail::TypeUniquer::get call after the type kind.
* The key type must have a llvm::DenseMapInfo specialization for
hashing.
- Provide a method, 'KeyTy getKey() const', to construct the key type
from an existing storage instance.
- Provide a construction method:
'DerivedStorage *construct(TypeStorageAllocator &, ...)'
that builds a unique instance of the derived storage. The arguments
after the TypeStorageAllocator must correspond with the values passed
into the detail::TypeUniquer::get call after the type kind.
PiperOrigin-RevId: 226507184
* Extend to handle rewrite patterns with output attributes;
- Constant attributes are defined with a value and a type;
- The type of the value is mapped to the corresponding attribute type (string -> StringAttr);
* Verifies the type of operands in the resultant matches the defined op's operands;
PiperOrigin-RevId: 226468908
Change operands to arguments in Op and use it for both operands and arguments. This unifies the way that operands and attributes are specified and the intended way that matching/creating ops with attributes will look. Both can now be represented using the same dag structure (and also makes the ordering more explicit). Derived attributes are not considered as part of the arguments (as they are inferred from the created op, not something needed to created it).
* Generate named operand accessors;
* Simplified the way of specifying Attr and use ElementAttr for TFL_Const instead.
* Fix a incorrect assertion generated;
The input parsing can be made more robust, I'll address that in a follow up.
PiperOrigin-RevId: 226307424
reuse existing ones.
- drop IterationDomainContext, redundant since FlatAffineConstraints has
MLValue information associated with its dimensions.
- refactor to use existing support
- leads to a reduction in LOC
- as a result of these changes, non-constant loop bounds get naturally
supported for dep analysis.
- update test cases to include a couple with non-constant loop bounds
- rename addBoundsFromForStmt -> addForStmtDomain
- complete TODO for getLoopIVs (handle 'if' statements)
PiperOrigin-RevId: 226082008
- when adding constraints from a 'for' stmt into FlatAffineConstraints,
correctly add bound operands of the 'for' stmt as a dimensional identifier or
a symbolic identifier depending on whether the bound operand is a valid
MLFunction symbol
- update test case to exercise this.
PiperOrigin-RevId: 225988511
Existing implementation always uses 64 bits to store floating point values in
DenseElementsAttr. This was due to FloatAttrs always a `double` for storage
independently of the actual type. Recent commits added support for FloatAttrs
with the proper f32 type and floating semantics and changed the bitwidth
reporting on FloatType.
Use the existing infrastructure for densely storing 16 and 32-bit values in
DenseElementsAttr storage to store f16 and f32 values. Move floating semantics
definition to the FloatType level. Properly support f16 / IEEEhalf semantics
at the FloatAttr level and in the builder.
Note that bf16 is still stored as a 64-bit value with IEEEdouble semantics
because APFloat does not have first-class support for bf16 types.
PiperOrigin-RevId: 225981289
addDomainConstraints; add support for mod/div for dependence testing.
- add support for mod/div expressions in dependence analysis
- refactor addMemRefAccessConstraints to use getFlattenedAffineExprs (instead
of getFlattenedAffineExpr); update addDomainConstraints.
- rename AffineExprFlattener::cst -> localVarCst
PiperOrigin-RevId: 225933306
This introduces a generic lowering pass for ML functions. The pass is
parameterized by template arguments defining individual pattern rewriters.
Concrete lowering passes define individual pattern rewriters and inherit from
the generic class that takes care of allocating rewriters, traversing ML
functions and performing the actual rewrite.
While this is similar to the greedy pattern rewriter available in
Transform/Utils, it requires adjustments due to the ML/CFG duality. In
particular, ML function rewriters must be able to create statements, not only
operations, and need access to an MLFuncBuilder. When we move to using the
unified function type, the ML-specific rewriting will become unnecessary.
Use LowerVectorTransfers as a testbed for the generic pass.
PiperOrigin-RevId: 225887424
Introduce support for lowering vector_type_cast to LLVM IR. It consists in
creating a new MemRef descriptor with the base pointer with the type that
corresponds to the lowered element type of the target memref. Since
`vector_type_cast` does not support dynamic shapes in the target type, no
dynamic size conversion is necessary.
This commit goes in the opposite direction of what is expected of LLVM IR
lowering: it should not be aware of all the other dialects. Instead, we should
have separate definitions for conversions in a global lowering framework.
However, this requires LLVM dialect to be implemented, which is currently
blocked by the absence of user-defined types. Implement the lowering anyway to
unblock end-to-end vectorization experiments.
PiperOrigin-RevId: 225887368
This operation is produced and used by the super-vectorization passes and has
been emitted as an abstract unregistered operation until now. For end-to-end
testing purposes, it has to be eventually lowered to LLVM IR. Matching
abstract operation by name goes into the opposite direction of the generic
lowering approach that is expected to be used for LLVM IR lowering in the
future. Register vector_type_cast operation as a part of the SuperVector
dialect.
Arguably, this operation is a special case of the `view` operation from the
Standard dialect. The semantics of `view` is not fully specified at this point
so it is safer to rely on a custom operation. Additionally, using a custom
operation may help to achieve clear dialect separation.
PiperOrigin-RevId: 225887305
As MLIR moves towards dialect-specific types, a generic Type::getBitWidth does
not make sense for all of them. Even with the current type system, the bit
width is not defined (and causes the method in question to abort) for all
TensorFlow types.
This commit restricts the bit width definition to primitive standard types that
have a number of bits appearing verbatim in their type, i.e., integers and
floats. As a side effect, it delegates the decision on the bit width of the
`index` to the backends. Existing backends currently hardcode it to 64 bits.
The Type::getBitWidth method is replaced by Type::getIntOrFloatBitWidth that
only applies to integers and floats. The call sites are updated to use the new
method, where applicable, or rewritten so as not rely on it. Incidentally,
this fixes a utility method that did not account for memrefs being allowed to
have vectors as element types in the size computation.
As an observation, several places in the code use Type in places where a more
specific type could be used instead. Some of those are fixed by this commit.
PiperOrigin-RevId: 225844792
provide unroll factors, and a cmd line argument to specify number of
innermost loop unroll repetitions.
- add function callback parameter for outside targets to provide unroll factors
- add a cmd line parameter to repeatedly apply innermost loop unroll a certain
number of times (to avoid using -loop-unroll -loop-unroll ...; instead
-unroll-num-reps=2).
- implement the callback for a target
- update test cases / usage
PiperOrigin-RevId: 225843191
*) Adds simple greedy fusion algorithm to drive experimentation. This algorithm greedily fuses loop nests with single-writer/single-reader memref dependences to improve locality.
*) Adds support for fusing slices of a loop nest computation: fusing one loop nest into another by adjusting the source loop nest's iteration bounds (after it is fused into the destination loop nest). This is accomplished by solving for the source loop nest's IVs in terms of the destination loop nests IVs and symbols using the dependece polyhedron, then creating AffineMaps of these functions for the loop bounds of the fused source loop.
*) Adds utility function 'insertMemRefComputationSlice' which computes and inserts computation slice from loop nest surrounding a source memref access into the loop nest surrounding the destingation memref access.
*) Adds FlatAffineConstraints::toAffineMap function which returns and AffineMap which represents an equality contraint where one dimension identifier is represented as a function of all others in the equality constraint.
*) Adds multiple fusion unit tests.
PiperOrigin-RevId: 225842944
Store FloatAttr using more appropriate fltSemantics (mostly fixing up F32/F64 storage, F16/BF16 pending). Previously F32 type was used incorrectly for double (the storage was double). Also add query method that returns fltSemantics for IEEE fp types and use that to verify that the APfloat given matches the type:
* FloatAttr created using APFloat is verified that the semantics of the type and APFloat matches;
* FloatAttr created using double has the APFloat created to match the semantics of the type;
Change parsing of tensor negative splat element to pass in the element type expected. Misc other changes to account for the storage type matching the attribute.
PiperOrigin-RevId: 225821834
Renamed the name field in Op to opName since it is the opcode's name.
Renamed the name parameters in TFLite op templates to opSummary since
they are meant as a summary of the op's functionality.
We will use the name symbol later for the name given by users via TF.
PiperOrigin-RevId: 225807135
- use addBoundsForForStmt
- getLoopIVs can return a vector of ForStmt * instead of const ForStmt *; the
returned things aren't owned / part of the stmt on which it's being called.
- other minor API cleanup
PiperOrigin-RevId: 225774301
- extend memref-bound-check to store op's
- make the bound check an analysis util and move to lib/Analysis/Utils.cpp (so that
one doesn't need to always create a pass to use it)
PiperOrigin-RevId: 225564830
From the beginning, vector_transfer_read and vector_transfer_write opreations
were intended as a mid-level vectorization abstraction. In particular, they
are lowered to the StandardOps dialect before further processing. As such, it
does not make sense to keep them at the same level as StandardOps. Introduce
the new SuperVectorOps dialect and move vector_transfer_* operations there.
This will be used as a testbed for the generic lowering/legalization pass.
PiperOrigin-RevId: 225554492
Named operands allow generating builders with more meaningful names + lay the groundwork for allowing the specification of attributes as part of the inputs pattern of an op (which allows the declarative pattern rewrite generator to define ops with attributs). This is a minimal change that just changes how input operands are represented, changes to attributes in follow up and returnTypes later.
PiperOrigin-RevId: 225509805
- if a local id was already for a specific mod/div expression, just reuse it if
the expression repeats (instead of adding a new one).
- drastically reduces the number of local variables added during flattening for
real use cases - since the same div's and mod expressions often repeat.
- add getFlattenedAffineExprs for AffineMap, IntegerSet based on the above
As a natural result of the above:
- FlatAffineConstraints(IntegerSet) ctor now deals with integer sets that have mod
and div constraints as well, and these get simplified as well from -simplify-affine-structures
PiperOrigin-RevId: 225452174
- Define tf.FakeQuantWithMinMaxArgs and tf.FakeQuantWithMinMaxVars
- Add the unit tests for valid and invalid IRs
- Rewrite both to the tfl.FakeQuant op
- Add the unit tests for the rewriting
PiperOrigin-RevId: 225447109
Besides the ops.td file changes to define both ops, this CL also changes the
mlir-op-gen to allow more flexible traits definition for "optional" operation
inputs.
Unit tests are added.
One TODO for the mlir-op-gen is to make attribute optional in the ops.
PiperOrigin-RevId: 225408349
trivially redundant constraints. Update projectOut to eliminate identifiers in
a more efficient order. Fix b/120801118.
- add method to remove duplicate / trivially redundant constraints from
FlatAffineConstraints (use a hashing-based approach with DenseSet)
- update projectOut to eliminate identifiers in a more efficient order
(A sequence of affine_apply's like this (from a real use case) finally exposed
the lack of the above trivial/low hanging simplifications).
for %ii = 0 to 64 {
for %jj = 0 to 9 {
%a0 = affine_apply (d0, d1) -> (d0 * (9 * 1024) + d1 * 128) (%ii, %jj)
%a1 = affine_apply (d0) ->
(d0 floordiv (2 * 3 * 3 * 128 * 128),
(d0 mod 294912) floordiv (3 * 3 * 128 * 128),
(((d0 mod 294912) mod 147456) floordiv 1152) floordiv 8,
(((d0 mod 294912) mod 147456) mod 1152) floordiv 384,
((((d0 mod 294912) mod 147456) mod 1152) mod 384) floordiv 128,
(((((d0 mod 294912) mod 147456) mod 1152) mod 384) mod 128)
floordiv 128) (%a0)
%v0 = load %in[%a1tensorflow/mlir#0, %a1tensorflow/mlir#1, %a1tensorflow/mlir#3, %a1tensorflow/mlir#4, %a1tensorflow/mlir#2, %a1tensorflow/mlir#5]
: memref<2x2x3x3x16x1xi32>
}
}
- update FlatAffineConstraints::print to print number of constraints.
PiperOrigin-RevId: 225397480