- update fusion cost model to fuse while tolerating a certain amount of redundant
computation; add cl option -fusion-compute-tolerance
evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC
PiperOrigin-RevId: 230541857
This CL adds the Return op to EDSCs types and emitter.
This allows generating full function bodies that can be compiled all the way
down to LLVMIR and executed on CPU.
At this point, the MLIR lacks the testing infrastructure to exercise this.
End-to-end testing of full functions written in EDSCs is left for a future CL.
PiperOrigin-RevId: 230527530
- unrolling a single iteration loop by a factor of one should promote its body
into its parent; this makes it consistent with the behavior/expectation that
unrolling a loop by a factor equal to its trip count makes the loop go away.
PiperOrigin-RevId: 230426499
- ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have
this instead of deriving from the InstWalker
PiperOrigin-RevId: 230413820
- improve/fix doc comments for affine apply composition related methods.
- drop makeSingleValueComposedAffineApply - really redundant and out of line in
a public API; it's just returning the first result of the composed affine
apply op, and not making a single result affine map or an affine_apply op.
PiperOrigin-RevId: 230406169
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
*) Do not remove loop nests which write to memrefs which escape the function.
*) Do not remove memrefs which escape the function (e.g. are used in the return instruction).
PiperOrigin-RevId: 230398630
Add default values to attributes, to allow attribute being left unspecified. The attr getter will always return an attribute so callers need not check for it, if the attribute is not set then the default will be returned (at present the default will be constructed upon query but this will be changed).
Add op definition for tf.AvgPool in ops.td, rewrite matcher using pattern using attribute matching & transforms. Adding some helper functions to make it simpler.
Handle attributes with dialect prefix and map them to getter without dialect prefix.
Note: VerifyAvgPoolOp could probably be autogenerated by know given the predicate specification on attributes, but deferring that to a follow up.
PiperOrigin-RevId: 230364857
Start doc generation pass that generates simple markdown output. The output is formatted simply[1] in markdown, but this allows seeing what info we have, where we can refine the op description (e.g., the inputs is probably redundant), what info is missing (e.g., the attributes could probably have a description).
The formatting of the description is still left up to whatever was in the op definition (which luckily, due to the uniformity in the .td file, turned out well but relying on the indentation there is fragile). The mechanism to autogenerate these post changes has not been added yet either. The output file could be run through a markdown formatter too to remove extra spaces.
[1]. This is not proposal for final style :) There could also be a discussion around single doc vs multiple (per dialect, per op), whether we want a TOC, whether operands/attributes should be headings or just formatted differently ...
PiperOrigin-RevId: 230354538
This is needed to allow binding to more constant types.
Tests that exercise this behavior will come in a followup CL.
In the meantime this does not breaks things.
PiperOrigin-RevId: 230320621
1) Fix FloatAttr type inconsistency in conversion from tf.FusedBatchNorm to TFLite ops
We used to compose the splat tensor out of the scalar epsilon attribute by using the
type of the variance operand. However, the epsilon attribute may have a different
bitwidth than the one in the variance operand. So it ends up we were creating
inconsistent types within the FloatAttr itself.
2) Fix SplatElementsAttr type inconsistency in AnnotateInputArrays
We need to create the zero-valued attribute according to the type provided as the
command-line arguments.
3) Concretize the result type of tf.Shape constant folding test case
Currently the resultant constant is created by the constant folding harness, using
the result type of the original op as the constant's result type. That can be
a different type than the constant's internal DenseElementsAttr.
PiperOrigin-RevId: 230244665
- print multiplication by -1 as unary negate; expressions like s0 * -1, d0 * -1
+ d1 will now appear as -s0, -d0 + d1 resp.
- a minor cleanup while on printAffineExprInternal
PiperOrigin-RevId: 230222151
This CL also makes ScopedEDSCContexts to reset the Bindable numbering when
creating a new context.
This is useful to write minimal tests that don't use FileCheck pattern
captures for now.
PiperOrigin-RevId: 230079997
This CL performs a bunch of cleanups related to EDSCs that are generally
useful in the context of using them with a simple wrapping C API (not in this
CL) and with simple language bindings to Python and Swift.
PiperOrigin-RevId: 230066505
- detected with memref-bound-check
- fixes b/123072438; while on this, fix another test case which was reported
out of bounds
PiperOrigin-RevId: 229978187
*) Enables reduction of private memref size based on MemRef region accessed by fused slice.
*) Enables maximal fusion by creating a private memref to break a fusion-preventing dependence.
*) Adds maximal fusion flag to enable fusing as much as possible (though it still fuses the minimum cost computation slice).
PiperOrigin-RevId: 229936698
This CL adds a test reported by andydavis@ and fixes the corner case that
appears when operands do not come from an AffineApply and no Dim composition
is needed.
In such cases, we would need to create an empty map which is disallowed.
The composition in such cases becomes trivial: there is no composition.
This CL also updates the name AffineNormalizer to AffineApplyNormalizer.
PiperOrigin-RevId: 229819234
Change MinMaxAttr to match hasValidMinMaxAttribute behavior. Post rewriting the other users of that function it could be removed too. The currently generated error message is:
error: 'tfl.fake_quant' op attribute 'minmax' failed to satisfy constraint of MinMaxAttr
PiperOrigin-RevId: 229775631
This CL fixes a misunderstanding in how to build DimOp which triggered
execution issues in the CPU path.
The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to
construct the dynamic dimensions should be:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and
`dim %arg, 4 : memref<?x4x?x8x?xf32>`
Before this CL, we wold construct:
`dim %arg, 0 : memref<?x4x?x8x?xf32>`
`dim %arg, 1 : memref<?x4x?x8x?xf32>`
`dim %arg, 2 : memref<?x4x?x8x?xf32>`
and expect the other dimensions to be constants.
This assumption seems consistent at first glance with the syntax of alloc:
```
%tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32>
```
But this was actuallyincorrect.
This CL also makes the relevant functions available to EDSCs and removes
duplication of the incorrect function.
PiperOrigin-RevId: 229622766
The operand and result types of binary ops are not necessarily the
same. For those binary ops, we cannot print in the short-form assembly.
Enhance impl:::printBinaryOp to consider operand and result types
to select which assembly form to use.
PiperOrigin-RevId: 229608142
A recent change in TableGen definitions allowed arbitrary AND/OR predicate
compositions at the cost of removing known-true predicate simplification.
Introduce a more advanced simplification mechanism instead.
In particular, instead of folding predicate C++ expressions directly in
TableGen, keep them as is and build a predicate tree in TableGen C++ library.
The predicate expression-substitution mechanism, necessary to implement complex
predicates for nested classes such as `ContainerType`, is replaced by a
dedicated predicate. This predicate appears in the predicate tree and can be
used for tree matching and separation. More specifically, subtrees defined
below such predicate may be subject to different transformations than those
that appear above. For example, a subtree known to be true above the
substitution predicate is not necessarily true below it.
Use the predicate tree structure to eliminate known-true and known-false
predicates before code emission, as well as to collapse AND and OR predicates
if their value can be deduced based on the value of one child.
PiperOrigin-RevId: 229605997
Start simple with single predicate match & transform rules for attributes.
* Its unclear whether modelling Attr predicates will be needed so start with allowing matching attributes with a single predicate.
* The input and output attr type often differs and so add ability to specify a transform between the input and output format.
PiperOrigin-RevId: 229580879
*) Adds support for fusing into consumer loop nests with multiple loads from the same memref.
*) Adds support for reducing slice loop trip count by projecting out destination loop IVs greater than destination loop depth.
*) Removes dependence on src loop depth and simplifies cost model computation.
PiperOrigin-RevId: 229575126
This is mostly plumbing to start allowing testing EDSC lowering. Prototype specifying reference implementation using verbose format without any generation/binding support. Add test pass that dumps the constructed EDSC (of which there can only be one). The idea is to enable iterating from multiple sides, this is wrong on many dimensions at the moment.
PiperOrigin-RevId: 229570535
In TableGen definitions, the "Type" class has been used for types of things
that can be stored in Attributes, but not necessarily present in the MLIR type
system. As a consequence, records like "String" or "DerviedAttrBody" were of
class "Type", which can be confusing. Furthermore, the "builderCall" field of
the "Type" class serves only for attribute construction. Some TableGen "Type"
subclasses that correspond to MLIR kinds of types do not have a canonical way
of construction only from the data available in TableGen, e.g. MemRefType would
require the list of affine maps. This leads to a conclusion that the entities
that describe types of objects appearing in Attributes should be independent of
"Type": they have some properties "Type"s don't and vice versa.
Do not parameterize Tablegen "Attr" class by an instance of "Type". Instead,
provide a "constBuilderCall" field that can be used to build an attribute from
a constant value stored in TableGen instead of indirectly going through
Attribute.Type.builderCall. Some attributes still don't have a
"constBuilderCall" because they used to depend on types without a
"builderCall".
Drop definitions of class "Type" that don't correspond to MLIR Types. Provide
infrastructure to define type-dependent attributes and string-backed attributes
for convenience.
PiperOrigin-RevId: 229570087
We also need the broadcast logic in the TensorFlow dialect. Move it to a
Dialect/ directory for a broader scope. This Dialect/ directory is intended
for code not in core IR, but can potentially be shared by multiple dialects.
Apart from fixing TensorFlow op TableGen to use this trait, this CL only
contains mechanical code shuffling.
PiperOrigin-RevId: 229563911